Home
Repositories
Models
Docs
Blog
Community
Pricing
Login
Sign up
Repositories
Models
Blog
Community
Pricing
Search
Docs
LOG IN
SIGN UP
Model Library
Choose the right model, get to the perfect prompt.
95 models
. New models added every week.
Model Types
Text
Image
Video
Embeddings
Fine-tuning
Fine-tunable
18
Modalities
Image to Image
9
Image to Text
22
Text to Embeddings
3
Text to Image
7
Text to Text
75
Text to Video
3
Video to Text
1
Developers
Openai
21
Qwen
16
Google
11
Mistral
9
Anthropic
8
Meta
8
Black_forest_labs
5
Deepseek
5
Perplexity
5
Alibaba
3
Moonshotai
2
ByteDance
1
CMU
1
FLUX.2 [dev]
Generates photorealistic images with precise multi-reference editing, excels at legible text and infographics, and supports rapid LoRA fine-tuning workflows.
text-to-image
image-to-image
FLUX.2 [pro]
Delivers photorealistic, high-resolution images with advanced multi-reference editing, precise pose and color control, and reliable prompt and text adherence for professionals.
text-to-image
image-to-image
FLUX.2 [flex]
Delivers high-quality image generation and editing with advanced text rendering, multi-image reference for style consistency, and precise, JSON-based prompt control.
text-to-image
image-to-image
Claude Opus 4.5
Excels at long-horizon reasoning, advanced coding, dynamic effort control, robust multimodal tasks, and detailed computer interface inspection for complex workflows.
text-to-text
image-to-text
Nano Banana Pro
Delivers high-fidelity images with advanced text rendering, consistent character identities, and precise prompt following for professional visual design and branding.
image-to-image
Gemini 3 Pro Preview
Excels at deep multimodal reasoning, complex coding, and advanced tool use across text, images, audio, and video with a vast 1M token context.
text-to-text
image-to-text
GPT 5.1
Automatically routes prompts to fast or deep reasoning modes, with adaptive effort, enhanced tone and style controls, and improved coding and math.
text-to-text
image-to-text
moonshotai/Kimi-K2-Thinking
Enables autonomous, step-by-step reasoning and orchestration of 200–300 tool calls for complex research, coding, and web tasks across long contexts.
text-to-text
Gemini 2.5 Flash Preview
Delivers rapid, cost-effective multimodal responses with flexible reasoning, supports text, image, and native audio output, and adapts style or tone via prompts.
text-to-text
image-to-text
Gemini 2.5 Flash Lite Preview
Optimized for rapid, high-volume multimodal tasks with a 1M-token context window, delivering strong reasoning and cost efficiency for enterprise workflows.
text-to-text
image-to-text
Qwen/Qwen-Image-Edit-Plus
Delivers high-fidelity, controllable image editing with dual semantic and appearance modes, precise on-image text, multi-image composition, and robust identity preservation.
image-to-image
nano-banana
Generates photorealistic images with precise prompt and text rendering, mask-free editing, and layout-aware outpainting, ideal for creative and multilingual content.
image-to-image
Seedream 4.0
Delivers ultra-fast, high-resolution image generation, precise natural-language editing, and consistent multi-image output—ideal for creative, batch, or professional workflows.
text-to-image
Claude Sonnet 4.5
Anthropic's most advanced AI model, excelling in coding, agent-based tasks, and computer usage. It delivers high performance in reasoning, math, and domain-specific knowledge across fields like finance, law, and STEM.
text-to-text
image-to-text
Qwen/Qwen-Image-Edit
Enables precise bilingual text and semantic edits with strong consistency, advanced multi-image editing, and native pose/control support for creative compositions.
image-to-image
Qwen/Qwen3-VL-8B-Instruct
versatile multimodal large language model capable of understanding and generating both text and images. Built on the Qwen3 architecture, it provides strong general reasoning, detailed image interpretation, and instruction-following performance in a compact 8B parameter size.
text-to-text
image-to-text
video-to-text
black-forest-labs/FLUX.1-dev
Open-weight text-to-image model with advanced prompt adherence, anatomically accurate details, and powerful tools for inpainting, outpainting, and structural edits.
text-to-image
GPT 5 Mini
Optimized for cost and speed, handles long contexts, supports text and image input, and excels at structured outputs and tool integration for precise tasks.
text-to-text
GPT 5 Nano
Multimodal model optimized for ultra-fast, cost-efficient summarization and classification, supporting both text and image inputs with real-time streaming output.
text-to-text
GPT 5
Handles complex reasoning, code generation, and multimodal inputs with improved accuracy, long context retention, and robust multilingual and personalization features.
text-to-text
OpenAI/GPT-OSS-20B
Delivers strong reasoning and chain-of-thought, agentic features, and multilingual support, optimized for local deployment and efficient use on modest hardware.
text-to-text
OpenAI GPT OSS 120B
Built with a Mixture-of-Experts design, delivers efficient, transparent reasoning, tool use, and agentic capabilities, even with 128K token context windows.
text-to-text
Claude Opus 4.1
Excels at complex coding, autonomous research, and agent workflows, with advanced reasoning and a 200,000-token context for deep analysis and synthesis.
text-to-text
Qwen/Qwen-Image
Excels at multilingual text rendering and precise image editing, supporting ControlNet guidance and diverse styles from photorealistic to anime and minimalist.
text-to-image
Wan-AI/Wan2.2-T2V-A14B-Diffusers
Delivers high-fidelity text-to-video synthesis at 480p/720p using dual expert models for scene layout and fine motion detail, ideal for creative production.
text-to-video
Qwen 3 Coder 480B
Designed for complex coding and agentic workflows, it excels at understanding large codebases and supports 262K-token context for advanced software engineering.
text-to-text
Qwen3 Coder 480B (A35B) Instruct
Enables advanced agentic coding, large-scale code analysis, and tool integration with up to 256K native context for repository-level tasks and automation.
text-to-text
Kimi K2 Instruct
Handles complex reasoning, coding, and tool use with 128K token context, enabling autonomous problem-solving and workflow automation.
text-to-text
Gemini 2.5 Flash
Fast, cost-efficient multimodal reasoning model with million-token context for high-volume applications requiring speed and versatility.
text-to-text
image-to-text
black-forest-labs/FLUX.1-Kontext-dev
Delivers precise, iterative image editing and generation with consistent character, style, and text changes—using multimodal input for seamless scene transformations.
image-to-image
Claude Opus 4
Excels at deep reasoning, complex coding, and autonomous agent workflows with sustained performance, extended thinking, tool use, and memory across tasks.
text-to-text
Claude Sonnet 4
Balances intelligence with efficiency for coding, research, and automation tasks; excels in reasoning, content generation, and nuanced instruction following.
text-to-text
image-to-text
Gemini 2.5 Pro
Excels at building interactive web apps, advanced code editing and agentic workflows, with native multimodality and strong video-to-code capabilities.
text-to-text
image-to-text
Qwen/Qwen3-4B
Dual reasoning modes enable rapid or step-by-step responses, with robust support for over 100 languages and long-context processing up to 262,144 tokens.
text-to-text
Qwen 3 235B-A22B
MoE model with 22B active parameters featuring dual thinking modes for complex reasoning and efficient conversation across 100+ languages.
text-to-text
Qwen 3 30B-A3B
MoE architecture with 3.3B active parameters, balancing efficiency with strong reasoning, multilingual capabilities, and specialized thinking mode.
text-to-text
Qwen/Qwen3-0.6B
Efficient conversational AI for resource-limited devices with multilingual support, document summarization, translation, code generation, and simple information retrieval.
text-to-text
o4 mini
Optimized for fast, affordable reasoning with strong coding and visual skills, large 200k-token context, and efficient handling of complex tasks.
text-to-text
image-to-text
o3
Excels at advanced reasoning, coding, math, and visual tasks with simulated reasoning, tool use, web browsing, and image understanding integration.
text-to-text
image-to-text
GPT 4.1 mini
Powerful mid-sized model with GPT-4o-level performance at lower cost and latency, featuring a 1 million token context window for complex tasks.
text-to-text
image-to-text
GPT 4.1
Excels in coding and instruction following with million-token context window, enabling superior performance on complex, multi-step tasks.
text-to-text
image-to-text
GPT 4.1 nano
OpenAI's fastest, cost-effective model with full 1 million token context, optimized for classification, autocompletion, and real-time AI agent tasks.
text-to-text
image-to-text
Llama 4 Scout
Multimodal model specializing in multilingual text and image analysis, multi-document summarization, codebase reasoning, and highly personalized task automation.
text-to-text
Llama 4 Maverick
Multimodal model with 17B active parameters, excelling at text, image, code, and multilingual tasks. Supports 1M-token context for advanced enterprise use.
text-to-text
image-to-text
Qwen2.5-VL 32B Instruct
Handles advanced visual recognition, complex analysis of images and videos, structured data extraction, agentic tool use, and robust multilingual reasoning.
image-to-text
Deepseek V3
Delivers advanced reasoning, code generation, and mathematical skills, processes long inputs efficiently, and accelerates results with innovative Mixture-of-Experts design.
text-to-text
Mistral Small 3.1
A lightweight, versatile 24B multimodal model handling text and images with extensive multilingual support and 128k token context window.
text-to-text
Gemma 3 27B
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.
text-to-text
Perplexity Sonar Deep Research
Performs exhaustive, multi-step research by autonomously searching and synthesizing hundreds of sources into detailed, expert-level reports across domains.
text-to-text
Perplexity Sonar Reasoning Pro
Premium reasoning model for complex, multi-step analysis. Delivers detailed explanations, real-time web search, and double citations for thorough answers.
text-to-text
Llama 3.2 1B Instruct
Efficient, multilingual instruction-tuned model designed for privacy-focused, on-device dialogue, summarization, and agentic retrieval across mobile and edge platforms.
text-to-text
Wan-AI/Wan2.1-T2V-14B-Diffusers
Generates high-fidelity, temporally consistent videos from text or images, with readable English and Chinese text, sound effects, and customizable aspect ratios.
text-to-video
Claude 3.7 Sonnet
A hybrid reasoning model with standard and extended thinking modes, delivering twice the speed and exceptional performance in coding and problem-solving tasks.
text-to-text
image-to-text
Perplexity Sonar Reasoning
Fast reasoning model with real-time web search, chain-of-thought capabilities, and citation support. Excels at complex queries with quick, accurate responses.
text-to-text
Perplexity Sonar
Optimized for search-augmented tasks, delivering fast, accurate answers with real-time web data and detailed citations. Excels in research and fact-checking.
text-to-text
Perplexity Sonar Pro
Excels at complex, multi-step queries with real-time web search, detailed answers, extensive citations, and customizable information retrieval.
text-to-text
Deepseek R1
Employs a massive Mixture-of-Experts architecture and Multi-Layer Attention to deliver advanced, polished reasoning and problem-solving across math, code, and more.
text-to-text
Deepseek R1
An open-source reasoning model using Mixture-of-Experts architecture, delivering powerful math and code capabilities comparable to OpenAI's o1.
text-to-text
Deepseek V3
Powers advanced reasoning, code generation, and multilingual tasks with efficient MoE architecture and enhanced multi-token prediction for faster, optimized results.
text-to-text
Llama 3.3 70B Instruct
Optimized for dialogue with strong reasoning, multilingual support, and efficient performance approaching larger models.
text-to-text
QwQ 32B Preview
Specialized in advanced reasoning and problem-solving, excelling in mathematics and programming with a 32B parameter transformer architecture.
text-to-text
Qwen 2.5 Coder 32B Instruct
Specializes in code generation, reasoning, and fixing with 128K token context, open-source licensing, and local deployment capabilities.
text-to-text
Llama 3.2 3B Instruct
Optimized for multilingual dialogue, agentic tasks, and efficient on-device use, supporting eight languages and a 128K context for privacy-focused applications.
text-to-text
Claude 3.5 Sonnet
Powerful AI with exceptional coding abilities, twice the speed of previous versions, and advanced reasoning for complex software development tasks.
text-to-text
Claude 3.5 Haiku
Anthropic's fastest model offering advanced coding, tool use, and reasoning capabilities with rapid response times for real-time applications and personalized experiences.
text-to-text
Ministral 8B
Efficient edge model with native function calling and interleaved sliding-window attention for fast, memory-efficient processing in resource-constrained environments.
text-to-text
Qwen2.5 1.5B Instruct
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
text-to-text
Qwen2.5 72B Instruct
Instruction-tuned LLM excelling in long-context processing (131K tokens), multilingual support (29+ languages), and structured data handling.
text-to-text
o1 preview
Reasoning-focused LLM for complex science, math, and coding tasks, generating detailed thought processes before responses.
text-to-text
o1 mini
Optimized for coding and math with Chain-of-Thought reasoning, offering fast, cost-efficient responses for complex problem-solving.
text-to-text
Pixtral 12B
Multimodal model handling text and images at native resolution with 128K context window, excelling in visual reasoning tasks like document analysis and image captioning.
text-to-text
Llama 3.1 70B Instruct
Multilingual LLM excelling in question answering, reasoning, code generation, and synthetic data generation.
text-to-text
Llama 3.1 8B Instruct
Optimized for multilingual dialogue and instruction following, supporting eight major languages for chat, text generation, and translation tasks.
text-to-text
Llama 3.1 405B Instruct
Optimized for multilingual dialogue with 128k context, instruction-tuned via SFT/RLHF, and enhanced with synthetic data for safety and performance.
text-to-text
Mistral Nemo
Handles long-form content with 128k token context, excels in multilingual tasks, coding, and function calling via natural language.
text-to-text
GPT 4o mini
Cost-efficient, fast model with 128K context window, supporting text/vision inputs and improved multilingual performance.
text-to-text
image-to-text
Text Embedding 004
Generates vector representations capturing semantic meaning/context for tasks like semantic search, text classification, and clustering. Multilingual support with versatile applications.
text-to-embeddings
GPT 4o
Multimodal LLM for real-time text, audio, and visual processing with multilingual support, emotional audio responses, and image generation.
text-to-text
image-to-text
Mixtral 8x22B
Efficient Sparse MoE architecture with 39B active parameters, excels in multilingual tasks, math, coding, and handles 64K token contexts.
text-to-text
Mistral Large 2
Powerful LLM with 123B parameters, excelling in multilingual tasks, coding, and reasoning, optimized for single-node inference and long-context applications.
text-to-text
Text Embedding 3 - Large
Generates high-quality embeddings for complex text analysis and multilingual applications with 8,191 token context.
text-to-embeddings
Text Embedding 3 - Small
Generates compact, efficient embeddings for NLP tasks with multilingual support, balancing performance and low latency.
text-to-embeddings
Mixtral 8x7B
Efficient Mixture of Experts (8 experts) with 13B active parameters, optimized for multilingual tasks and cost-performance balance.
text-to-text
DALL-E 3
Translates nuanced text prompts into detailed, accurate images with automatic prompt rewriting, multiple aspect ratios, and ChatGPT integration for creative workflows[1][2][6].
text-to-image
Mistral 7B
Balanced performance in natural language and code tasks, efficiently handling longer sequences with innovative attention mechanisms.
text-to-text
Open Pose
OpenPose is a real-time computer vision model that detects and maps body, hand, and facial keypoints for multiple people simultaneously using convolutional neural networks.
image-to-image
Wan-AI/Wan2.1-T2V-1.3B-Diffusers
Generates 480P videos from text prompts on consumer GPUs, with multilingual support, image-to-video, aspect ratio control, and audio integration features.
text-to-video
Qwen/Qwen3-1.7B
Efficiently generates multilingual text and code, with dual modes for rapid chat or detailed reasoning; ideal for lightweight AI, agents, and education.
text-to-text
Gemini 2.0 Flash
Multimodal LLM for agentic applications, handling real-time data integration and multi-step tasks with enhanced reasoning via Thinking Mode, integrating Google tools and third-party functions.
text-to-text
image-to-text
Ministral 3B
Optimized for edge computing with function-calling capabilities, excelling in knowledge retrieval and commonsense reasoning with 128k token context.
text-to-text
o1
Specializes in complex reasoning through chain-of-thought processing, excelling in STEM tasks like coding, math, and scientific analysis.
text-to-text
Deepseek R1 Distill Llama 70B
Delivers strong mathematical and coding abilities, matching the performance of larger models while using efficient distillation and multilingual support.
text-to-text
GPT 4.5
Excels in natural conversation and creative tasks with improved emotional intelligence and multilingual support, prioritizing intuitive interactions over structured reasoning.
text-to-text
Gemini 2.0 Flash Lite
Cost-efficient, budget-friendly multimodal LLM for real-time tasks with 1M token input context and enhanced performance.
text-to-text
image-to-text
o3 mini
Optimized for STEM reasoning and problem-solving, excelling in complex tasks like advanced math and coding with improved cost efficiency.
text-to-text
Copyright © 2025 Oxen Labs, Inc., All Rights Reserved
Careers
Privacy Policy
Terms and Conditions