Home
Repositories
Models
Docs
Blog
Pricing
Login
Sign up
Repositories
Models
Blog
Community
Pricing
Search
Docs
LOG IN
SIGN UP
Model Library
Try the latest models and share the outputs with your team - all in one place.
New models added every week.
Copyright © 2026 Oxen Labs, Inc., All Rights Reserved
Careers
Privacy Policy
Terms and Conditions
Model Library
Try the latest models and share the outputs with your team - all in one place.
100 models
. New models added every week.
Search Models
Model Types
Image
Text
Video
Embeddings
Fine-tuning
Fine-tunable
24
Modalities
Image to Image
15
Image to Text
21
Image to Video
12
Text to Embeddings
3
Text to Image
15
Text to Text
48
Text to Video
14
Video to Text
1
Video to Video
7
Developers
Openai
19
Google
14
Qwen
11
Kling
9
Mistral
8
Black_forest_labs
7
Anthropic
6
Alibaba
6
Xai
4
Perplexity
4
ByteDance
3
Meta
3
Lightricks
2
Topazlabs
2
Tongyi-MAI
1
Moonshotai
1
Nano Banana 2 - Image Edit
Google
Nano Banana 2 Edit is an image editing model that enables blending multiple images, maintaining character consistency, targeted transformations using natural language, and leveraging world knowledge for precise edits.
$0.08/image
image-to-image
Claude Opus 4.6
Anthropic
Anthropic's most advanced model, excelling in coding, agentic workflows, computer use, reasoning, math, and domain expertise in finance, law, STEM.
$5.00/1M tokens
multi-to-text
Kling O3 Pro: Image-to-Video
Kling
Kling Video O3 Pro is an advanced image-to-video generation model that animates static images into high-quality videos based on text prompts.
$0.22/sec
multi-to-video
Nano Banana 2
Google
Nano Banana 2 is a text-to-image model that generates images from text descriptions.
$0.08/image
text-to-image
Seedream 5.0 Lite
ByteDance
Image generation with built-in reasoning, example-based editing, multi-reference control (up to 14 images), and 3K resolution support.(128 characters)
$0.04/image
text-to-image
Kling O3 Pro - Reference to Video
Kling
Kling o3 Pro reference-to-video model generates videos from a reference image and text prompt describing motion and cinematic intent.
$0.22/sec
multi-to-video
Kling O3 Edit - Video to Video
Kling
Edit videos using text prompts and reference images for character consistency or object replacement.
$0.34/sec
video-to-video
Grok Imagine - Text to Image
xAI
Grok Imagine text-to-image is a high-quality image generation model from xAI that produces cinematic, stylistically consistent images from text prompts.
$0.02/image
text-to-image
Grok Imagine - Image Edit
xAI
Grok Imagine - Image Edit is a high-quality image generation model from xAI that produces cinematic, stylistically consistent images from text prompts.
$0.02/image
image-to-image
Grok Imagine - Video Edit
xAI
Video editing model for prompt-driven modifications like object swapping, scene restyling, and character animation with synced native audio.
$0.08/sec
video-to-video
Grok Imagine - Image to Video
xAI
Generate videos from images with audio using xAI's Grok Imagine Video model.
$0.07/sec
image-to-video
FLUX.2 Klein 4B
Black Forest Labs
Fine-tunable
FLUX.2 Klein 4B is a compact 4 billion parameter text-to-image diffusion model optimized for fast inference and high-quality image generation.
$0.01/image
multi-to-image
FLUX.2 Klein 9B
Black Forest Labs
Fine-tunable
FLUX.2 Klein 9B is a compact 9 billion parameter text-to-image diffusion model optimized for fast inference and high-quality image generation.
$0.02/image
multi-to-image
LTX-2 Retake
Lightricks
Multimodal LLM for targeted video editing: regenerate 2-16s segments (video/audio/both) via prompts, preserving motion, lighting, and continuity.
$0.10/sec
video-to-video
LTX-2 Pro
Lightricks
Fine-tunable
Generates high-res 4K@25FPS videos from image+text, camera control, and synced audio.
$0.12/sec
multi-to-video
Qwen Image - 2512
Qwen
Fine-tunable
Generative image model that improves on in photorealistic human portraits, finer natural scenes (landscapes, animal fur, and other natural elements), better text rendering overall.
$0.20/image
text-to-image
Qwen Image Edit - 2511
Qwen
Fine-tunable
Delivers high-fidelity, controllable image editing with dual semantic and appearance modes, precise on-image text, multi-image composition, and robust identity preservation.
$0.03/image
image-to-image
Gemini 3 Flash
Google
Fast multimodal model with configurable reasoning, strong agentic workflows, long context, and tool use for interactive chat, coding, and complex tasks.
$0.05/1M tokens
multi-to-text
Kling 2.6 Pro - Image to Video
Kling
Transforms static images into cinematic videos with synchronized audio, dialogue, and sound effects in 1080p.
$0.07/sec
image-to-video
Kling 2.6 Pro - Text to Video
Kling
Generates 1080p videos from text with native synchronized audio, including dialogue, sound effects, and lip-sync.
$0.07/sec
text-to-video
GPT Image 1.5
OpenAI
Diffusion model for high‑fidelity image generation and editing, with strong prompt adherence, preserved composition and lighting, and adjustable quality controls.
$8.00/1M tokens
text-to-image
WAN 2.6 - Image to Video
Alibaba Wan
Animates images into 15s, 1080p videos with preserved identity, native audio, lip-sync, and multi-shot sequences guided by reference videos.
$0.10/sec
multi-to-video
WAN 2.6 - Video to Video
Alibaba Wan
Generates videos from reference videos, maintaining character consistency, with multi-shot narratives, up to 15s duration, and native audio sync.
$0.10/sec
video-to-video
Seedream 4.5
ByteDance
High-fidelity text-to-image and image-to-image generation with multi-reference control (up to 10 images), 4K support, and batch output.(128 characters)
$0.04/image
text-to-image
Kling O1 - Image to Video
Kling
Transforms images (with text and up to 7 references) into cinematic video clips with stable characters, controlled motion, and consistent environments.
$0.12/sec
multi-to-video
Kling O1 - Reference to Video
Kling
Multimodal video model for reference-guided generation, preserving characters and styles from reference images.
$0.11/sec
image-to-video
Kling O1 Edit - Video to Video
Kling
Text-guided video-to-video editing that preserves motion and continuity while enabling character swaps, style changes, motion transfer, and scene transformations.
$0.18/sec
video-to-video
Z-Image-Turbo
Tongyi-MAI
Fine-tunable
Fast photorealistic text-to-image model with accurate English and Chinese on-image text, ideal for interactive design, marketing visuals, and UI/UX workflows.
$0.01/image
text-to-image
FLUX.2 [dev]
Black Forest Labs
Fine-tunable
Generates photorealistic images with precise multi-reference editing, excels at legible text and infographics, and supports rapid LoRA fine-tuning workflows.
$0.01/image
multi-to-image
FLUX.2 [flex]
Black Forest Labs
Delivers high-quality image generation and editing with advanced text rendering, multi-image reference for style consistency, and precise, JSON-based prompt control.
$0.12/image
multi-to-image
FLUX.2 [pro]
Black Forest Labs
Delivers photorealistic, high-resolution images with advanced multi-reference editing, precise pose and color control, and reliable prompt and text adherence for professionals.
$0.10/image
multi-to-image
Claude Opus 4.5
Anthropic
Excels at long-horizon reasoning, advanced coding, dynamic effort control, robust multimodal tasks, and detailed computer interface inspection for complex workflows.
$5.00/1M tokens
multi-to-text
Nano Banana Pro
Google
Delivers high-fidelity images with advanced text rendering, consistent character identities, and precise prompt following for professional visual design and branding.
$0.15/image
image-to-image
Segment Anything 3 - Image
Meta
Zero-shot image segmentation with text/visual prompts; exhaustive instance detection and presence head reduce false positives.
$0.01/image
image-to-image
Segment Anything 3 - Video
Meta
Detects, segments, and tracks objects across video frames using text, exemplars, points, or masks, with memory for occlusions and real-time streaming.
$0.02/image
video-to-video
Gemini 3 Pro Preview
Google
Excels at deep multimodal reasoning, complex coding, and advanced tool use across text, images, audio, and video with a vast 1M token context.
$2.00/1M tokens
multi-to-text
GPT 5.1
OpenAI
Automatically routes prompts to fast or deep reasoning modes, with adaptive effort, enhanced tone and style controls, and improved coding and math.
$1.25/1M tokens
multi-to-text
Kimi K2 Thinking
Moonshot AI
Enables autonomous, step-by-step reasoning and orchestration of 200–300 tool calls for complex research, coding, and web tasks across long contexts.
$0.60/1M tokens
text-to-text
Topaz Image Upscaler
Topaz Labs
Professional-grade image upscaling powered by AI, from Topaz Labs.
$0.05/image
image-to-image
Topaz Video Upscaler
Topaz Labs
Professional-grade video upscaling powered by AI, from Topaz Labs.
$0.04/sec
video-to-video
Veo 3.1
Google
Generates high-fidelity videos with native synced audio, offering strong narrative control, scene consistency, image-to-video animation, and multi-shot support.
$0.20/sec
multi-to-video
Sora 2 Pro
OpenAI
Generates high-quality 1080p videos up to 12s with synced native audio, multi-scene reasoning, timeline prompting, and realistic physics.
$0.30/sec
multi-to-video
Gemini 2.5 Flash Lite Preview
Google
Optimized for rapid, high-volume multimodal tasks with a 1M-token context window, delivering strong reasoning and cost efficiency for enterprise workflows.
$0.10/1M tokens
multi-to-text
Kling v2.5 - Image to Video
Kling
Transforms single images into smooth, cinematic videos with natural motion, realistic camera work like dolly zooms, and preserved style.
$0.07/sec
multi-to-video
Qwen Image Edit - 2509
Qwen
Fine-tunable
Delivers high-fidelity, controllable image editing with dual semantic and appearance modes, precise on-image text, multi-image composition, and robust identity preservation.
$0.03/image
image-to-image
nano-banana
Google
Generates photorealistic images with precise prompt and text rendering, mask-free editing, and layout-aware outpainting, ideal for creative and multilingual content.
$0.04/image
image-to-image
Seedream 4.0
ByteDance
Delivers ultra-fast, high-resolution image generation, precise natural-language editing, and consistent multi-image output—ideal for creative, batch, or professional workflows.
$0.03/image
text-to-image
Claude Sonnet 4.5
Anthropic
Anthropic's most advanced AI model, excelling in coding, agent-based tasks, and computer usage. It delivers high performance in reasoning, math, and domain-specific knowledge across fields like finance, law, and STEM.
$3.00/1M tokens
multi-to-text
Qwen Image Edit
Qwen
Fine-tunable
Enables precise bilingual text and semantic edits with strong consistency, advanced multi-image editing, and native pose/control support for creative compositions.
$0.03/image
image-to-image
Qwen3 VL 2B - Instruct
Qwen
Fine-tunable
Lightweight multimodal model for visual Q&A, multilingual OCR, document and UI understanding, and agentic screen interpretation in constrained environments.
$0.00/sec
multi-to-text
Qwen3 VL 4B - Instruct
Qwen
Fine-tunable
Multimodal LLM for text and images, excelling in visual QA, document/UI understanding, spatial reasoning, image captioning, and multimodal coding.
$0.00/sec
multi-to-text
Qwen3 VL 8B - Instruct
Qwen
Fine-tunable
versatile multimodal large language model capable of understanding and generating both text and images. Built on the Qwen3 architecture, it provides strong general reasoning, detailed image interpretation, and instruction-following performance in a compact 8B parameter size.
$0.00/sec
multi-to-text
FLUX.1 [dev]
Black Forest Labs
Fine-tunable
Open-weight text-to-image model with advanced prompt adherence, anatomically accurate details, and powerful tools for inpainting, outpainting, and structural edits.
$0.03/image
text-to-image
GPT 5
OpenAI
Handles complex reasoning, code generation, and multimodal inputs with improved accuracy, long context retention, and robust multilingual and personalization features.
$1.25/1M tokens
text-to-text
GPT 5 Mini
OpenAI
Optimized for cost and speed, handles long contexts, supports text and image input, and excels at structured outputs and tool integration for precise tasks.
$0.25/1M tokens
text-to-text
GPT 5 Nano
OpenAI
Multimodal model optimized for ultra-fast, cost-efficient summarization and classification, supporting both text and image inputs with real-time streaming output.
$0.05/1M tokens
text-to-text
Claude Opus 4.1
Anthropic
Excels at complex coding, autonomous research, and agent workflows, with advanced reasoning and a 200,000-token context for deep analysis and synthesis.
$15.00/1M tokens
text-to-text
OpenAI GPT OSS 120B
OpenAI
Built with a Mixture-of-Experts design, delivers efficient, transparent reasoning, tool use, and agentic capabilities, even with 128K token context windows.
$0.15/1M tokens
text-to-text
OpenAI/GPT-OSS-20B
OpenAI
Fine-tunable
Delivers strong reasoning and chain-of-thought, agentic features, and multilingual support, optimized for local deployment and efficient use on modest hardware.
$0.07/1M tokens
text-to-text
Qwen Image
Qwen
Fine-tunable
An image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and support for a wide range of artistic styles. From photorealistic scenes to impressionist paintings, from anime aesthetics to minimalist design
$0.03/image
text-to-image
Wan2.2 A14B - Text to Video
Alibaba Wan
Fine-tunable
Delivers high-fidelity text-to-video synthesis at 480p/720p using dual expert models for scene layout and fine motion detail, ideal for creative production.
$0.00/sec
text-to-video
Wan2.2 5B - Text to Video
Alibaba Wan
Fine-tunable
Unified text-to-video and image-to-video model generates high-definition 720p, 24fps video clips efficiently on consumer GPUs, with advanced compression for speed.
$0.00/sec
text-to-video
Gemini 2.5 Flash
Google
Fast, cost-efficient multimodal reasoning model with million-token context for high-volume applications requiring speed and versatility.
$0.30/1M tokens
multi-to-text
FLUX.1-Kontext [dev]
Black Forest Labs
Fine-tunable
Delivers precise, iterative image editing and generation with consistent character, style, and text changes—using multimodal input for seamless scene transformations.
$0.03/image
image-to-image
Claude Opus 4
Anthropic
Excels at deep reasoning, complex coding, and autonomous agent workflows with sustained performance, extended thinking, tool use, and memory across tasks.
$15.00/1M tokens
text-to-text
Claude Sonnet 4
Anthropic
Balances intelligence with efficiency for coding, research, and automation tasks; excels in reasoning, content generation, and nuanced instruction following.
$3.00/1M tokens
multi-to-text
Veo 3.0
Google
Generates realistic text- and image-conditioned videos with native synchronized audio, including dialogue, ambient sound, and effects.
$0.20/sec
multi-to-video
Gemini 2.5 Pro
Google
Excels at building interactive web apps, advanced code editing and agentic workflows, with native multimodality and strong video-to-code capabilities.
$1.25/1M tokens
multi-to-text
Qwen/Qwen3-4B
Qwen
Fine-tunable
Dual reasoning modes enable rapid or step-by-step responses, with robust support for over 100 languages and long-context processing up to 262,144 tokens.
$0.00/sec
text-to-text
Qwen/Qwen3-0.6B
Qwen
Fine-tunable
Efficient conversational AI for resource-limited devices with multilingual support, document summarization, translation, code generation, and simple information retrieval.
$0.00/sec
text-to-text
o3
OpenAI
Excels at advanced reasoning, coding, math, and visual tasks with simulated reasoning, tool use, web browsing, and image understanding integration.
$2.00/1M tokens
multi-to-text
o4 mini
OpenAI
Optimized for fast, affordable reasoning with strong coding and visual skills, large 200k-token context, and efficient handling of complex tasks.
$1.10/1M tokens
multi-to-text
GPT 4.1
OpenAI
Excels in coding and instruction following with million-token context window, enabling superior performance on complex, multi-step tasks.
$2.00/1M tokens
multi-to-text
GPT 4.1 mini
OpenAI
Powerful mid-sized model with GPT-4o-level performance at lower cost and latency, featuring a 1 million token context window for complex tasks.
$0.40/1M tokens
multi-to-text
GPT 4.1 nano
OpenAI
OpenAI's fastest, cost-effective model with full 1 million token context, optimized for classification, autocompletion, and real-time AI agent tasks.
$0.10/1M tokens
multi-to-text
Mistral Small 3.1
Mistral AI
A lightweight, versatile 24B multimodal model handling text and images with extensive multilingual support and 128k token context window.
$0.10/1M tokens
text-to-text
Gemma 3 27B
Google
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
$0.20/1M tokens
text-to-text
Perplexity Sonar Deep Research
Perplexity
Performs exhaustive, multi-step research by autonomously searching and synthesizing hundreds of sources into detailed, expert-level reports across domains.
$5.00/1M tokens
text-to-text
Perplexity Sonar Reasoning Pro
Perplexity
Premium reasoning model for complex, multi-step analysis. Delivers detailed explanations, real-time web search, and double citations for thorough answers.
$3.00/1M tokens
text-to-text
Llama 3.2 1B Instruct
Meta
Fine-tunable
Efficient, multilingual instruction-tuned model designed for privacy-focused, on-device dialogue, summarization, and agentic retrieval across mobile and edge platforms.
$0.00/sec
text-to-text
Wan2.1 14B - Text to Video
Alibaba Wan
Fine-tunable
Generates high-fidelity, temporally consistent videos from text or images, with readable English and Chinese text, sound effects, and customizable aspect ratios.
$0.00/sec
text-to-video
Perplexity Sonar
Perplexity
Optimized for search-augmented tasks, delivering fast, accurate answers with real-time web data and detailed citations. Excels in research and fact-checking.
$2.00/1M tokens
text-to-text
Perplexity Sonar Pro
Perplexity
Excels at complex, multi-step queries with real-time web search, detailed answers, extensive citations, and customizable information retrieval.
$5.00/1M tokens
text-to-text
Ministral 8B
Mistral AI
Efficient edge model with native function calling and interleaved sliding-window attention for fast, memory-efficient processing in resource-constrained environments.
$0.10/1M tokens
text-to-text
Pixtral 12B
Mistral AI
Multimodal model handling text and images at native resolution with 128K context window, excelling in visual reasoning tasks like document analysis and image captioning.
$0.15/1M tokens
text-to-text
GPT 4o mini
OpenAI
Cost-efficient, fast model with 128K context window, supporting text/vision inputs and improved multilingual performance.
$0.15/1M tokens
multi-to-text
Text Embedding 004
Google
Generates vector representations capturing semantic meaning/context for tasks like semantic search, text classification, and clustering. Multilingual support with versatile applications.
$0.02/1M tokens
text-to-embeddings
GPT 4o
OpenAI
Multimodal LLM for real-time text, audio, and visual processing with multilingual support, emotional audio responses, and image generation.
$2.50/1M tokens
multi-to-text
Mixtral 8x22B
Mistral AI
Efficient Sparse MoE architecture with 39B active parameters, excels in multilingual tasks, math, coding, and handles 64K token contexts.
$2.00/1M tokens
text-to-text
Mistral Large 2
Mistral AI
Powerful LLM with 123B parameters, excelling in multilingual tasks, coding, and reasoning, optimized for single-node inference and long-context applications.
$2.00/1M tokens
text-to-text
Text Embedding 3 - Large
OpenAI
Generates high-quality embeddings for complex text analysis and multilingual applications with 8,191 token context.
$0.13/1M tokens
text-to-embeddings
Text Embedding 3 - Small
OpenAI
Generates compact, efficient embeddings for NLP tasks with multilingual support, balancing performance and low latency.
$0.02/1M tokens
text-to-embeddings
Mixtral 8x7B
Mistral AI
Efficient Mixture of Experts (8 experts) with 13B active parameters, optimized for multilingual tasks and cost-performance balance.
$0.70/1M tokens
text-to-text
Mistral 7B
Mistral AI
Balanced performance in natural language and code tasks, efficiently handling longer sequences with innovative attention mechanisms.
$0.25/1M tokens
text-to-text
Gemini 2.0 Flash
Google
Multimodal LLM for agentic applications, handling real-time data integration and multi-step tasks with enhanced reasoning via Thinking Mode, integrating Google tools and third-party functions.
$0.10/1M tokens
multi-to-text
Ministral 3B
Mistral AI
Optimized for edge computing with function-calling capabilities, excelling in knowledge retrieval and commonsense reasoning with 128k token context.
$0.04/1M tokens
text-to-text
o1
OpenAI
Specializes in complex reasoning through chain-of-thought processing, excelling in STEM tasks like coding, math, and scientific analysis.
$15.00/1M tokens
text-to-text
o3 mini
OpenAI
Optimized for STEM reasoning and problem-solving, excelling in complex tasks like advanced math and coding with improved cost efficiency.
$1.10/1M tokens
text-to-text
Qwen/Qwen3-1.7B
Qwen
Fine-tunable
Efficiently generates multilingual text and code, with dual modes for rapid chat or detailed reasoning; ideal for lightweight AI, agents, and education.
$0.00/sec
text-to-text
Wan2.1 1.3B - Text to Video
Alibaba Wan
Fine-tunable
Generates 480P videos from text prompts on consumer GPUs, with multilingual support, image-to-video, aspect ratio control, and audio integration features.
$0.00/sec
text-to-video