Search Models
Model Types
Fine-tuning
Modalities
Developers
GPT Image 2
OpenAI
Text-to-image generation with photorealistic output, accurate text rendering, and strong prompt adherence.
Claude Opus 4.7
Anthropic
Anthropic's most capable model, with a step-change jump in agentic coding over Opus 4.6 and a native 1M-token context window.
Seedance 2.0 - Reference to Video
ByteDance
Reference-guided video from prompt plus optional images, videos, and audio references.
Happy Horse - Image to Video
Alibaba Wan
Image-to-video generation up to 1080P from a single reference image.
Happy Horse - Reference to Video
Alibaba Wan
Reference-to-video generation up to 1080P with up to 9 reference images.
Happy Horse - Text to Video
Alibaba Wan
Text-to-video generation up to 1080P with configurable aspect ratio and duration.
WAN 2.7 - Image to Video
Alibaba Wan
Animates images into video up to 15s at 1080P with first/last-frame guidance, video continuation, and optional driving audio.
WAN 2.7 - Reference to Video
Alibaba Wan
Reference-guided video generation with character consistency, multi-character support, optional reference voices, and up to 1080P output.
WAN 2.7 - Text to Video
Alibaba Wan
Text-to-video with multi-shot generation, up to 1080P, 2-15s duration, and optional driving audio.
GPT Image 2 Edit
OpenAI
Image-to-image editing with prompt-guided transformations and multi-reference composition.
Seedance 2.0 Fast - Image to Video
ByteDance
Fast-tier image-to-video with optional start-to-end frame transitions, flexible duration and aspect ratio, resolution up to 720p, and optional synchronized audio.
Seedance 2.0 Fast - Reference to Video
ByteDance
Fast-tier reference-guided video from prompt plus optional images, videos, and audio references.
WAN 2.7 - Edit Video
Alibaba Wan
Edit videos via text instructions or reference images with style transfer, up to 1080p, and flexible audio handling.
Seedance 2.0 - Image to Video
ByteDance
Image-to-video with optional start-to-end frame transitions, flexible duration and aspect ratio, resolution up to 1080p, and optional synchronized audio.
Seedance 2.0 - Text to Video
ByteDance
Text-to-video with flexible duration and aspect ratio, resolution up to 720p, and optional synchronized audio.
Gemma 4 31B
Google
Flagship 31B dense multimodal model supporting text, image, and video input with 256K context window. Achieves competitive performance with much larger models.
Gemma 4 E2B
Google
Lightweight 2.3B multimodal model supporting text, image, video, and audio input with 128K context window and 140+ language support.
Gemma 4 E4B
Google
Efficient 4.5B multimodal model supporting text, image, video, and audio input with 128K context window and 140+ language support.
Qwen3.6 Plus
Qwen
Alibaba's latest flagship closed model for advanced reasoning, coding, and complex text generation.
Topaz Starlight Precise 2.5
Topaz Labs
Video restoration and upscaling model from Topaz for detail-preserving 1080p/4k outputs.
GPT 5.4 Mini
OpenAI
Strongest OpenAI mini model for coding and agentic workloads, with 400K context, 128K max output, multimodal input, and broad tool support.
Depth Anything Video
ByteDance
Video-to-depth estimation with temporal consistency, selectable model size, colormaps, and optional raw depth export.
LTX-2.3 Pro 22B IC-LoRA Union Control
Lightricks
Baseten-configured LTX 2.3 Pro 22B model with IC/Union-Control support for text-to-video and image-conditioned video generation.
GPT 5.4
OpenAI
Frontier model for complex professional work with 1.05M context, configurable reasoning, and extensive tool support including computer use and MCP.
LTX-2.3 Pro
Lightricks
Generates high-res 4K@25FPS videos from image+text, camera control, and synced audio.
LTX 2.3 Pro: Audio to Video
Lightricks
Audio-to-video generation from image + audio input, 1080p output with synchronized visuals.
LTX 2.3 Pro: Retake
Lightricks
Targeted video segment editing: replace video, audio, or both via prompts.
LTX 2.3 Pro: Text to Video
Lightricks
Text-to-video generation up to 4K@50FPS with optional audio and camera motion.
Gemini 3.1 Flash-Lite
Google
Fast, low-cost Gemini 3.1 model for high-throughput multimodal workloads, with configurable reasoning and a 1M-token context window.
GPT 5.3 Chat
OpenAI
GPT-5.3 Instant model for ChatGPT with 128K context, text and image inputs, and optimized conversational performance.
Nemotron 3 Super
NVIDIA
Hybrid Mamba-Transformer MoE with 1M context, optimized for agentic reasoning; 120B total, 12B active parameters.
Nano Banana 2
Google
Nano Banana 2 is a text-to-image model that generates images from text descriptions.
Nano Banana 2 - Image Edit
Google
Nano Banana 2 Edit is an image editing model that enables blending multiple images, maintaining character consistency, targeted transformations using natural language, and leveraging world knowledge for precise edits.
Qwen3.5 0.8B
Qwen
Compact multimodal model with dual reasoning modes, native vision capabilities, support for over 200 languages, and long-context processing up to 262,144 tokens.
Qwen3.5 2B
Qwen
Multimodal LLM with native vision, image and video understanding, tool calling, optional thinking mode, support for 201 languages, and long-context processing up to 262,144 tokens.
Qwen3.5 4B
Qwen
Multimodal LLM with thinking mode by default, native vision, image and video understanding, tool calling, support for 201 languages, and long-context up to 262K tokens (extensible to 1M with YaRN).
Qwen3.5 9B
Qwen
Multimodal LLM with thinking mode by default, native vision, image and video understanding, tool calling, support for 201 languages, and long-context up to 262K tokens (extensible to 1M with YaRN).
Seedream 5.0 Lite
ByteDance
Image generation with built-in reasoning, example-based editing, multi-reference control (up to 14 images), and 3K resolution support.(128 characters)
Gemini 3.1 Pro Preview
Google
Flagship Gemini 3 reasoning model for complex multimodal and agentic workflows with a 1M-token context window.
Claude Sonnet 4.6
Anthropic
Anthropic's latest Sonnet model with strong coding and agent performance, fast latency, and improved long-context reasoning.
Qwen Image 2.0 Pro
Qwen
Pro version of Qwen Image 2 with enhanced text rendering, realism, and semantic adherence for high-quality image generation and editing.
Claude Opus 4.6
Anthropic
Anthropic's most advanced model, excelling in coding, agentic workflows, computer use, reasoning, math, and domain expertise in finance, law, STEM.
Kling O3 Pro: Image-to-Video
Kling
Kling Video O3 Pro is an advanced image-to-video generation model that animates static images into high-quality videos based on text prompts.
Kling O3 Pro - Reference to Video
Kling
Kling o3 Pro reference-to-video model generates videos from a reference image and text prompt describing motion and cinematic intent.
Kling O3 Edit - Video to Video
Kling
Edit videos using text prompts and reference images for character consistency or object replacement.
Kling 3.0 Pro: Motion Control
Kling
Motion transfer from reference video to character image. Cost-effective for portraits and simple animations.
Grok Imagine - Text to Image
xAI
Grok Imagine text-to-image is a high-quality image generation model from xAI that produces cinematic, stylistically consistent images from text prompts.
Grok Imagine - Image Edit
xAI
Grok Imagine - Image Edit is a high-quality image generation model from xAI that produces cinematic, stylistically consistent images from text prompts.
Grok Imagine - Video Edit
xAI
Video editing model for prompt-driven modifications like object swapping, scene restyling, and character animation with synced native audio.
Grok Imagine - Image to Video
xAI
Generate videos from images with audio using xAI's Grok Imagine Video model.
FLUX.2 Klein 4B
Black Forest Labs
FLUX.2 Klein 4B is a compact 4 billion parameter text-to-image diffusion model optimized for fast inference and high-quality image generation.
FLUX.2 Klein 9B
Black Forest Labs
FLUX.2 Klein 9B is a compact 9 billion parameter text-to-image diffusion model optimized for fast inference and high-quality image generation.
LTX-2 Retake
Lightricks
Multimodal LLM for targeted video editing: regenerate 2-16s segments (video/audio/both) via prompts, preserving motion, lighting, and continuity.
LTX-2 Pro
Lightricks
Generates high-res 4K@25FPS videos from image+text, camera control, and synced audio.
Kimi K2.5
Moonshot AI
Native multimodal agentic model with vision, Agent Swarm (up to 100 sub-agents, 1,500 tool calls), coding from visual specs, and 256K context.
Z
GLM 5
Z AI
Frontier open LLM with advanced coding, agentic, and reasoning capabilities; 744B MoE with DSA for efficient 200K context.
Qwen Image - 2512
Qwen
Generative image model that improves on in photorealistic human portraits, finer natural scenes (landscapes, animal fur, and other natural elements), better text rendering overall.
Qwen Image Edit - 2511
Qwen
Delivers high-fidelity, controllable image editing with dual semantic and appearance modes, precise on-image text, multi-image composition, and robust identity preservation.
Gemini 3 Flash
Google
Fast multimodal model with configurable reasoning, strong agentic workflows, long context, and tool use for interactive chat, coding, and complex tasks.
Kling 2.6 Pro - Image to Video
Kling
Transforms static images into cinematic videos with synchronized audio, dialogue, and sound effects in 1080p.
Kling 2.6 Pro - Text to Video
Kling
Generates 1080p videos from text with native synchronized audio, including dialogue, sound effects, and lip-sync.
GPT Image 1.5
OpenAI
Diffusion model for high‑fidelity image generation and editing, with strong prompt adherence, preserved composition and lighting, and adjustable quality controls.
WAN 2.6 - Image to Video
Alibaba Wan
Animates images into 15s, 1080p videos with preserved identity, native audio, lip-sync, and multi-shot sequences guided by reference videos.
WAN 2.6 - Video to Video
Alibaba Wan
Generates videos from reference videos, maintaining character consistency, with multi-shot narratives, up to 15s duration, and native audio sync.
GPT 5.2
OpenAI
Frontier model for professional work with configurable reasoning effort, 400K context, structured outputs, and distillation support.
GPT 5.2 Chat
OpenAI
GPT-5.2 model optimized for ChatGPT with 128K context, text and image input support, streaming, and structured outputs.
Seedream 4.5
ByteDance
High-fidelity text-to-image and image-to-image generation with multi-reference control (up to 10 images), 4K support, and batch output.(128 characters)
Kling O1 - Image to Video
Kling
Transforms images (with text and up to 7 references) into cinematic video clips with stable characters, controlled motion, and consistent environments.
Kling O1 - Reference to Video
Kling
Multimodal video model for reference-guided generation, preserving characters and styles from reference images.
Kling O1 Edit - Video to Video
Kling
Text-guided video-to-video editing that preserves motion and continuity while enabling character swaps, style changes, motion transfer, and scene transformations.
Z-Image-Turbo
Tongyi-MAI
Fast photorealistic text-to-image model with accurate English and Chinese on-image text, ideal for interactive design, marketing visuals, and UI/UX workflows.
FLUX.2 [dev]
Black Forest Labs
Generates photorealistic images with precise multi-reference editing, excels at legible text and infographics, and supports rapid LoRA fine-tuning workflows.
FLUX.2 [flex]
Black Forest Labs
Delivers high-quality image generation and editing with advanced text rendering, multi-image reference for style consistency, and precise, JSON-based prompt control.
FLUX.2 [pro]
Black Forest Labs
Delivers photorealistic, high-resolution images with advanced multi-reference editing, precise pose and color control, and reliable prompt and text adherence for professionals.
Claude Opus 4.5
Anthropic
Excels at long-horizon reasoning, advanced coding, dynamic effort control, robust multimodal tasks, and detailed computer interface inspection for complex workflows.
Nano Banana Pro
Google
Delivers high-fidelity images with advanced text rendering, consistent character identities, and precise prompt following for professional visual design and branding.
Segment Anything 3 - Image
Meta
Zero-shot image segmentation with text/visual prompts; exhaustive instance detection and presence head reduce false positives.
Segment Anything 3 - Video
Meta
Detects, segments, and tracks objects across video frames using text, exemplars, points, or masks, with memory for occlusions and real-time streaming.
GPT 5.1
OpenAI
Automatically routes prompts to fast or deep reasoning modes, with adaptive effort, enhanced tone and style controls, and improved coding and math.
Veo 3.1
Google
Generates high-fidelity videos with native synced audio, offering strong narrative control, scene consistency, image-to-video animation, and multi-shot support.
Veo 3.1 Fast - Image to Video
Google
Animates an input image into short videos with controllable motion, duration, aspect ratio, resolution, and optional audio.
Veo 3.1 Lite - Image to Video
Google
Animates a single image into short videos with controllable motion, duration, aspect ratio, and cost-efficient quality settings.
Sora 2 Pro
OpenAI
Generates high-quality 1080p videos up to 12s with synced native audio, multi-scene reasoning, timeline prompting, and realistic physics.
Gemini 2.5 Flash Lite Preview
Google
Optimized for rapid, high-volume multimodal tasks with a 1M-token context window, delivering strong reasoning and cost efficiency for enterprise workflows.
Kling v2.5 - Image to Video
Kling
Transforms single images into smooth, cinematic videos with natural motion, realistic camera work like dolly zooms, and preserved style.
Qwen Image Edit - 2509
Qwen
Delivers high-fidelity, controllable image editing with dual semantic and appearance modes, precise on-image text, multi-image composition, and robust identity preservation.
nano-banana
Google
Generates photorealistic images with precise prompt and text rendering, mask-free editing, and layout-aware outpainting, ideal for creative and multilingual content.
Seedream 4.0
ByteDance
Delivers ultra-fast, high-resolution image generation, precise natural-language editing, and consistent multi-image output—ideal for creative, batch, or professional workflows.
Claude Sonnet 4.5
Anthropic
Anthropic's most advanced AI model, excelling in coding, agent-based tasks, and computer usage. It delivers high performance in reasoning, math, and domain-specific knowledge across fields like finance, law, and STEM.
Qwen Image Edit
Qwen
Enables precise bilingual text and semantic edits with strong consistency, advanced multi-image editing, and native pose/control support for creative compositions.
Qwen3 VL 2B - Instruct
Qwen
Lightweight multimodal model for visual Q&A, multilingual OCR, document and UI understanding, and agentic screen interpretation in constrained environments.
Qwen3 VL 4B - Instruct
Qwen
Multimodal LLM for text and images, excelling in visual QA, document/UI understanding, spatial reasoning, image captioning, and multimodal coding.
Qwen3 VL 8B - Instruct
Qwen
versatile multimodal large language model capable of understanding and generating both text and images. Built on the Qwen3 architecture, it provides strong general reasoning, detailed image interpretation, and instruction-following performance in a compact 8B parameter size.
FLUX.1 [dev]
Black Forest Labs
Open-weight text-to-image model with advanced prompt adherence, anatomically accurate details, and powerful tools for inpainting, outpainting, and structural edits.
GPT 5
OpenAI
Handles complex reasoning, code generation, and multimodal inputs with improved accuracy, long context retention, and robust multilingual and personalization features.
GPT 5 Mini
OpenAI
Optimized for cost and speed, handles long contexts, supports text and image input, and excels at structured outputs and tool integration for precise tasks.
GPT 5 Nano
OpenAI
Multimodal model optimized for ultra-fast, cost-efficient summarization and classification, supporting both text and image inputs with real-time streaming output.
Claude Opus 4.1
Anthropic
Excels at complex coding, autonomous research, and agent workflows, with advanced reasoning and a 200,000-token context for deep analysis and synthesis.
OpenAI/GPT-OSS-120B
OpenAI
Built with a Mixture-of-Experts design, delivers efficient, transparent reasoning, tool use, and agentic capabilities, even with 128K token context windows.
OpenAI/GPT-OSS-20B
OpenAI
Delivers strong reasoning and chain-of-thought, agentic features, and multilingual support, optimized for local deployment and efficient use on modest hardware.
Qwen Image
Qwen
An image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and support for a wide range of artistic styles. From photorealistic scenes to impressionist paintings, from anime aesthetics to minimalist design
Wan2.2 A14B - Text to Video
Alibaba Wan
Delivers high-fidelity text-to-video synthesis at 480p/720p using dual expert models for scene layout and fine motion detail, ideal for creative production.
Wan2.2 5B - Text to Video
Alibaba Wan
Unified text-to-video and image-to-video model generates high-definition 720p, 24fps video clips efficiently on consumer GPUs, with advanced compression for speed.
Gemini 2.5 Flash
Google
Fast, cost-efficient multimodal reasoning model with million-token context for high-volume applications requiring speed and versatility.
FLUX.1-Kontext [dev]
Black Forest Labs
Delivers precise, iterative image editing and generation with consistent character, style, and text changes—using multimodal input for seamless scene transformations.
Claude Opus 4
Anthropic
Excels at deep reasoning, complex coding, and autonomous agent workflows with sustained performance, extended thinking, tool use, and memory across tasks.
Claude Sonnet 4
Anthropic
Balances intelligence with efficiency for coding, research, and automation tasks; excels in reasoning, content generation, and nuanced instruction following.
Veo 3.0
Google
Generates realistic text- and image-conditioned videos with native synchronized audio, including dialogue, ambient sound, and effects.
Gemini 2.5 Pro
Google
Excels at building interactive web apps, advanced code editing and agentic workflows, with native multimodality and strong video-to-code capabilities.
Qwen/Qwen3-4B
Qwen
Dual reasoning modes enable rapid or step-by-step responses, with robust support for over 100 languages and long-context processing up to 262,144 tokens.
Qwen/Qwen3-0.6B
Qwen
Efficient conversational AI for resource-limited devices with multilingual support, document summarization, translation, code generation, and simple information retrieval.
o3
OpenAI
Excels at advanced reasoning, coding, math, and visual tasks with simulated reasoning, tool use, web browsing, and image understanding integration.
o4 mini
OpenAI
Optimized for fast, affordable reasoning with strong coding and visual skills, large 200k-token context, and efficient handling of complex tasks.
GPT 4.1
OpenAI
Excels in coding and instruction following with million-token context window, enabling superior performance on complex, multi-step tasks.
GPT 4.1 mini
OpenAI
Powerful mid-sized model with GPT-4o-level performance at lower cost and latency, featuring a 1 million token context window for complex tasks.
GPT 4.1 nano
OpenAI
OpenAI's fastest, cost-effective model with full 1 million token context, optimized for classification, autocompletion, and real-time AI agent tasks.
Mistral Small 3.1
Mistral AI
A lightweight, versatile 24B multimodal model handling text and images with extensive multilingual support and 128k token context window.
Gemma 3 27B
Google
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
Perplexity Sonar Deep Research
Perplexity
Performs exhaustive, multi-step research by autonomously searching and synthesizing hundreds of sources into detailed, expert-level reports across domains.
Perplexity Sonar Reasoning Pro
Perplexity
Premium reasoning model for complex, multi-step analysis. Delivers detailed explanations, real-time web search, and double citations for thorough answers.
Llama 3.2 1B Instruct
Meta
Efficient, multilingual instruction-tuned model designed for privacy-focused, on-device dialogue, summarization, and agentic retrieval across mobile and edge platforms.
Wan2.1 14B - Text to Video
Alibaba Wan
Generates high-fidelity, temporally consistent videos from text or images, with readable English and Chinese text, sound effects, and customizable aspect ratios.
Perplexity Sonar
Perplexity
Optimized for search-augmented tasks, delivering fast, accurate answers with real-time web data and detailed citations. Excels in research and fact-checking.
Perplexity Sonar Pro
Perplexity
Excels at complex, multi-step queries with real-time web search, detailed answers, extensive citations, and customizable information retrieval.
Ministral 8B
Mistral AI
Efficient edge model with native function calling and interleaved sliding-window attention for fast, memory-efficient processing in resource-constrained environments.
Pixtral 12B
Mistral AI
Multimodal model handling text and images at native resolution with 128K context window, excelling in visual reasoning tasks like document analysis and image captioning.
GPT 4o mini
OpenAI
Cost-efficient, fast model with 128K context window, supporting text/vision inputs and improved multilingual performance.
Topaz Rhea - Fine Detail Video Upscaler
Topaz Labs
Next-generation general-purpose Topaz video upscaler with tunable detail, noise, blur, and grain controls.
Text Embedding 004
Google
Generates vector representations capturing semantic meaning/context for tasks like semantic search, text classification, and clustering. Multilingual support with versatile applications.
GPT 4o
OpenAI
Multimodal LLM for real-time text, audio, and visual processing with multilingual support, emotional audio responses, and image generation.
Mixtral 8x22B
Mistral AI
Efficient Sparse MoE architecture with 39B active parameters, excels in multilingual tasks, math, coding, and handles 64K token contexts.
Mistral Large 2
Mistral AI
Powerful LLM with 123B parameters, excelling in multilingual tasks, coding, and reasoning, optimized for single-node inference and long-context applications.
Text Embedding 3 - Large
OpenAI
Generates high-quality embeddings for complex text analysis and multilingual applications with 8,191 token context.
Text Embedding 3 - Small
OpenAI
Generates compact, efficient embeddings for NLP tasks with multilingual support, balancing performance and low latency.
Topaz Proteus - Versatile Video Upscaler
Topaz Labs
General-purpose Topaz video upscaling and enhancement with tunable detail, noise, blur, and grain controls.
Mixtral 8x7B
Mistral AI
Efficient Mixture of Experts (8 experts) with 13B active parameters, optimized for multilingual tasks and cost-performance balance.
Topaz Iris - Face Detail Video Upscaler
Topaz Labs
Topaz video upscaler focused on face restoration for medium-quality sources.
Mistral 7B
Mistral AI
Balanced performance in natural language and code tasks, efficiently handling longer sequences with innovative attention mechanisms.
Gemini 2.0 Flash
Google
Multimodal LLM for agentic applications, handling real-time data integration and multi-step tasks with enhanced reasoning via Thinking Mode, integrating Google tools and third-party functions.
Ministral 3B
Mistral AI
Optimized for edge computing with function-calling capabilities, excelling in knowledge retrieval and commonsense reasoning with 128k token context.
o1
OpenAI
Specializes in complex reasoning through chain-of-thought processing, excelling in STEM tasks like coding, math, and scientific analysis.
o3 mini
OpenAI
Optimized for STEM reasoning and problem-solving, excelling in complex tasks like advanced math and coding with improved cost efficiency.
Qwen/Qwen3-1.7B
Qwen
Efficiently generates multilingual text and code, with dual modes for rapid chat or detailed reasoning; ideal for lightweight AI, agents, and education.