Available models

Long-horizon agentic reasoning, 1M context

Text

$12.00/1M tokens

High-fidelity images with near-perfect text rendering

Image

Commercial use

Reference-guided video with audio

Video

$0.10/secCommercial use

Precise image generation and editing

Image

$0.05/imageCommercial use

Low-cost 1K image generation

Image

$0.04/imageCommercial use

Text-to-speech with voice cloning and prosody control

Audio

Commercial use

Text-to-video up to 1080p

Video

$0.20/sec

Coding and agents, 1M-token context

Text

$1.40/1M tokens

Image-to-video up to 1080p

Video

$0.20/sec

Reference-to-video, up to 9 images

Video

$0.20/sec

Adaptive thinking coding and agents

Text

$3.50/1M tokens

Image-to-video with prompt control, up to 1080p

Video

$0.10/secCommercial use

Aspect-ratio reframe with outpainting, up to 1080p

Video

$0.10/secCommercial use

Text-to-video with prompt control, up to 1080p

Video

$0.10/secCommercial use

Prompt-guided video editing, up to 1080p

Video

$0.10/secCommercial use

High-quality audio-driven video

Video

$0.12/secCommercial use

Lift SDR video into HDR

Video

$0.12/secCommercial use

High-quality image-to-video with audio

Video

$0.12/secCommercial use

Structure-guided video-to-video generation

Video

$0.12/secCommercial use

High-quality text-to-video with audio

Video

$0.12/secCommercial use

Image-to-video with audio (Grok 1.5)

Video

$0.07/secCommercial use

Agentic coding, 1M-token context

Text

$5.00/1M tokens

HDR video conversion

Video

Omni-modal short video generation and editing

Video

$1.95/1M tokensCommercial use

Image-to-video up to 1080p

Video

$0.14/sec

Reference-to-video, up to 9 images

Video

$0.14/sec

Text-to-video up to 1080p

Video

$0.14/sec

Prompt-based video editing up to 1080p

Video

$0.18/sec

Image-to-video, 1080p, frame control

Video

$0.10/sec

Multi-reference video, up to 1080p

Video

$0.10/sec

Text-to-video, 1080p, multi-shot

Video

$0.10/sec

Efficient MoE model, 1M context

Text

$0.18/1M tokens

Large MoE for reasoning and coding

Text

$0.57/1M tokens

Reasoning and coding model, 1M context

Text

$6.50/1M tokens

Deep reasoning, 1M-token context

Text

$39.00/1M tokens

Agentic coding, 1M-token context

Text

$5.00/1M tokens

Prompt-based video editing

Video

$0.10/secCommercial use

Fast, affordable image-to-video

Video

$0.08/secCommercial use

Fast reference-guided video generation

Video

$0.09/secCommercial use

Fast, affordable text-to-video

Video

$0.08/secCommercial use

Animate images into 4k video

Video

$0.10/secCommercial use

Cinematic text-to-video, up to 4k

Video

$0.10/secCommercial use

Open multimodal model, 256K context

Text

$0.14/1M tokens

Lightweight on-device multimodal model

Text

$0.00/1M tokensFine-tunable

Compact on-device multimodal model

Text

$0.00/1M tokensFine-tunable

Closed model for reasoning, coding

Text

$0.50/1M tokens

Detail-preserving video upscaling

Video

Coding and agentic engineering, 200K context

Text

$1.40/1M tokens

Fast model for coding, computer use

Text

$0.75/1M tokens

Consistent depth maps from video

Video

$0.05/secCommercial use

Controllable text/image-to-video generation

Video

$0.0028/sec

Agentic coding model, 1M context

Text

$2.50/1M tokens

Fast image-to-video with native audio

Video

$0.04/secCommercial use

Fast text-to-video with native audio

Video

$0.04/secCommercial use

Image-to-video with native audio

Video

$0.12/secFine-tunable

Audio-driven video with lip-sync

Video

$0.12/secCommercial use

Extend existing video clips

Video

$0.10/secCommercial use

Regenerate video segments via prompts

Video

$0.12/sec

Text-to-video with native audio

Video

$0.12/sec

GPT-5.3 instant chat model

Text

$1.75/1M tokens

Agentic reasoning MoE, 1M context

Text

$0.30/1M tokens

Pro quality at Flash speed

Image

$0.08/imageCommercial use

Multimodal model with vision, 256K context

Text

$0.0014/secFine-tunable

Multimodal model with vision, 256K context

Text

$0.0014/secFine-tunable

Multimodal model with vision, 256K context

Text

$0.0014/secFine-tunable

Multimodal model with vision, 256K context

Text

$0.0014/secFine-tunable

Intelligent visual reasoning image model

Image

$0.04/imageCommercial use

Multimodal reasoning over text, audio, video

Text

$2.00/1M tokens

Balanced coding and agents, 1M context

Text

$3.00/1M tokens

Text-and-image to image generation

Image

$0.08/image

Coding and agents, 1M context

Text

$5.00/1M tokens

4K cinematic image-to-video

Video

$0.55/secCommercial use

4K reference-driven video

Video

$0.55/secCommercial use

4K cinematic text-to-video

Video

$0.55/secCommercial use

Pro cinematic image-to-video

Video

$0.22/secCommercial use

Pro reference-driven cinematic video

Video

$0.22/secCommercial use

Pro video-to-video editing

Video

$0.34/secCommercial use

Transfer motion from video to image

Video

$0.17/secCommercial use

Versatile image styles by xAI

Image

$0.02/imageCommercial use

Edit images with Grok Imagine

Image

$0.02/imageCommercial use

Edit videos with Grok Imagine

Video

$0.08/secCommercial use

Image-to-video with audio by xAI

Video

$0.07/secCommercial use

Typography-focused image and design generation

Image

$0.06/imageCommercial use

High-fidelity text-to-image with style references

Image

$0.06/imageCommercial use

Compact, fast text-to-image model

Image

$0.01/imageFine-tunable

Compact, fast text-to-image model

Image

$0.02/imageFine-tunable

Targeted video segment editing

Video

$0.10/sec

Fast HD image-to-video generation

Video

$0.12/secFine-tunable

Open MoE multimodal agentic model

Text

$0.60/1M tokens

Coding, reasoning, and agentic tasks

Text

$0.95/1M tokens

Photorealistic portraits and natural scenes

Image

$0.20/imageFine-tunable

Improved natural-language image editing

Image

$0.03/imageFine-tunable

Low-latency multimodal model, 1M context

Text

$0.05/1M tokens

Cinematic image-to-video with audio

Video

$0.07/secCommercial use

Cinematic text-to-video with native audio

Video

$0.07/secCommercial use

True-color precision rendering

Image

$8.00/1M tokens

Image-to-video with audio and lip-sync

Video

$0.10/secCommercial use

Reference video with character consistency

Video

$0.10/secCommercial use

Reasoning model, 400K context

Text

$1.75/1M tokens

GPT-5.2 chat model for ChatGPT

Text

$1.75/1M tokens

Cinematic image-to-video generation

Video

$0.12/secCommercial use

Reference-driven cinematic video

Video

$0.11/secCommercial use

Advanced video editing

Video

$0.18/secCommercial use

Text-to-image and editing, up to 4K

Image

$0.04/imageCommercial use

Fast photorealistic bilingual image generation

Image

$0.01/imageFine-tunable

Open text-to-image with multi-reference

Image

$0.01/imageFine-tunable

Tunable open text-to-image model

Image

$0.12/image

Professional FLUX.2 image generation

Image

$0.10/image

Software engineering and agentic workflows

Text

$5.00/1M tokens

Studio-quality 4K image generation

Image

$0.15/imageCommercial use

Zero-shot image segmentation

Image

$0.01/imageCommercial use

Zero-shot video object segmentation

Video

$0.02/imageCommercial use

Adaptive reasoning and instruction following

Text

$1.25/1M tokens

Professional AI image upscaling

Image

$0.05/image

Professional AI video upscaling

Video

$0.04/sec

Text/image-to-video, audio, narrative control

Video

$0.52/sec

Fast text/image-to-video with audio, 4K

Video

$0.13/secCommercial use

Cost-efficient text/image-to-video by Google

Video

$0.07/secCommercial use

Realistic video generation with audio

Video

$0.30/sec

Fast cinematic image-to-video

Video

$0.07/secCommercial use

Multi-image natural-language editing

Image

$0.03/imageFine-tunable

On-device photorealistic image generation

Image

$0.04/image

Balanced coding, agents, computer use

Text

$3.00/1M tokens

Photorealistic 4K image generation

Image

$0.04/imageCommercial use

OpenAI audio-input chat model

Text

$32.00/1M tokens

Natural-language image editing

Image

$0.03/imageFine-tunable

Vision-language model, 256K context

Text

$0.0014/secFine-tunable

Vision-language model, long context

Text

$0.0014/secFine-tunable

Vision-language model for text and images

Text

$0.0014/secFine-tunable

Open text-to-image with editing tools

Image

$0.03/imageFine-tunable

Multi-model routing, 400K context

Text

$1.25/1M tokens

Low-cost GPT-5 for real-time apps

Text

$0.25/1M tokens

Ultra-low-latency GPT-5 model

Text

$0.05/1M tokens

Refined reasoning and coding model

Text

$15.00/1M tokens

Open MoE reasoning model

Text

$0.15/1M tokens

Open MoE, runs on consumer hardware

Text

$0.07/1M tokensFine-tunable

Text-to-image with strong text rendering

Image

$0.03/imageFine-tunable

High-fidelity open text-to-video, 720p

Video

$0.0014/secFine-tunable

Efficient 720p video on consumer GPUs

Video

$0.0014/secFine-tunable

Fast multimodal reasoning, 1M context

Text

$0.30/1M tokens

In-context image editing model

Image

$0.03/imageFine-tunable

Text/image-to-video with native audio

Video

$0.20/sec

Multimodal reasoning and coding, 1M context

Text

$1.25/1M tokens

Open model, 256K context

Text

$0.0014/secFine-tunable

Model for edge and offline use

Text

$0.00046/secFine-tunable

Reasoning for coding, math, science

Text

$2.00/1M tokens

Fast, low-cost multimodal reasoning

Text

$1.10/1M tokens

Coding and chat, 1M-token context

Text

$2.00/1M tokens

Lower-cost GPT-4.1, 1M context

Text

$0.40/1M tokens

Low-latency GPT-4.1 for classification

Text

$0.10/1M tokens

Multimodal model, 128K context

Text

$0.10/1M tokens

Multi-step web research and synthesis

Text

$5.00/1M tokens

Multi-step reasoning over web search

Text

$3.00/1M tokens

Multilingual on-device chat model

Text

$0.00046/secFine-tunable

Text-to-video with bilingual text

Video

$0.0014/secFine-tunable

Fast web-grounded search, built on Llama

Text

$2.00/1M tokens

Advanced web-grounded search, 200K context

Text

$5.00/1M tokens

Efficient model for edge deployment

Text

$0.10/1M tokens

Vision-language model, 128K context

Text

$0.15/1M tokens

Fast, low-cost multimodal model

Text

$0.15/1M tokens

Fine-detail video upscaling

Video

Text embeddings for search, RAG

Embeddings

$0.02/1M tokens

Multimodal model for text, audio, vision

Text

$2.50/1M tokens

MoE model for code and multilingual

Text

$2.00/1M tokens

Long-context reasoning model, 128K

Text

$2.00/1M tokens

High-quality text embeddings

Embeddings

$0.13/1M tokens

Efficient text embeddings

Embeddings

$0.02/1M tokens

Versatile video upscaling

Video

Efficient open MoE model

Text

$0.70/1M tokens

Face-focused video upscaling

Video

General-purpose open model

Text

$0.25/1M tokens

Model for on-device and edge

Text

$0.04/1M tokens

Reasoning for science, math, code

Text

$15.00/1M tokens

Fast reasoning for STEM tasks

Text

$1.10/1M tokens

Efficient multilingual model

Text

$0.00046/secFine-tunable

Creative video upscaling for AI video

Video

Creative generative image upscaling with prompts

Image

SDR-to-HDR video, 10-bit HDR10 output

Video

All-in-one image enhancement and upscaling

Image

Lightweight text-to-video on consumer GPUs

Video

$0.0014/secFine-tunable

Model library

Model library

Filters

Fine-tuning

Model type

Developer

Favorites

All models

Claude Fable 5

GPT Image 2

Seedance 2.0 - Reference to Video

Seedream 5.0 Pro

Nano Banana 2 Lite

Seed Audio 1.0

Happy Horse 1.1 - Text to Video

GLM 5.2

Happy Horse 1.1 - Image to Video

Happy Horse 1.1 - Reference to Video

Claude Sonnet 5

Luma Ray 3.2 - Image to Video

Luma Ray 3.2 - Reframe

Luma Ray 3.2 - Text to Video

Luma Ray 3.2 - Video to Video

LTX 2.3 Quality: Audio to Video

LTX 2.3 Quality: Video to HDR

LTX 2.3 Quality: Image to Video

LTX 2.3 Quality: Reference Video to Video

LTX 2.3 Quality: Text to Video

Grok Imagine Video 1.5 - Image to Video

Claude Opus 4.8

Topaz Hyperion HDR

Gemini Omni Flash

Happy Horse - Image to Video

Happy Horse - Reference to Video

Happy Horse - Text to Video

Happy Horse - Video Edit

WAN 2.7 - Image to Video

WAN 2.7 - Reference to Video

WAN 2.7 - Text to Video

DeepSeek V4 Flash

DeepSeek V4 Pro

GPT 5.5

GPT 5.5 Pro

Claude Opus 4.7

WAN 2.7 - Edit Video

Seedance 2.0 Fast - Image to Video

Seedance 2.0 Fast - Reference to Video

Seedance 2.0 Fast - Text to Video

Seedance 2.0 - Image to Video

Seedance 2.0 - Text to Video

Gemma 4 31B

Gemma 4 E2B

Gemma 4 E4B

Qwen3.6 Plus

Topaz Starlight Precise 2.5

GLM 5.1

GPT 5.4 Mini

Depth Anything Video

LTX-2.3 Pro 22B IC-LoRA Union Control

GPT 5.4

LTX 2.3 Fast: Image to Video

LTX 2.3 Fast: Text to Video

LTX-2.3 Pro

LTX 2.3 Pro: Audio to Video

LTX 2.3 Pro: Extend Video

LTX 2.3 Pro: Retake

LTX 2.3 Pro: Text to Video

GPT 5.3 Chat

Nemotron 3 Super

Nano Banana 2

Qwen3.5 0.8B

Qwen3.5 2B

Qwen3.5 4B

Qwen3.5 9B

Seedream 5.0 Lite

Gemini 3.1 Pro Preview

Claude Sonnet 4.6

Qwen Image 2.0 Pro

Claude Opus 4.6

Kling O3 4K: Image-to-Video