Model Inference
Choose the right model, get to the perfect prompt.
88 models, on 11 inference providers. New models added every week.

Claude Opus 4
May 2025
Excels at deep reasoning, complex coding, and autonomous agent workflows with sustained performance, extended thinking, tool use, and memory across tasks.
Input: $15.00 / Output: $75.00
Claude Sonnet 4
May 2025
Balances intelligence with efficiency for coding, research, and automation tasks; excels in reasoning, content generation, and nuanced instruction following.
Input: $3.00 / Output: $15.00
Gemini 2.5 Pro Preview
May 2025
Excels at building interactive web apps, advanced code editing and agentic workflows, with native multimodality and strong video-to-code capabilities.

Input: $1.25 / Output: $10.00
Qwen 3 235B-A22B
Apr 2025
MoE model with 22B active parameters featuring dual thinking modes for complex reasoning and efficient conversation across 100+ languages.

Input: $0.22 / Output: $0.88
Qwen 3 30B-A3B
Apr 2025
MoE architecture with 3.3B active parameters, balancing efficiency with strong reasoning, multilingual capabilities, and specialized thinking mode.

Input: $0.90 / Output: $0.90
Gemini 2.5 Flash Preview
Apr 2025
A thinking model offering enhanced reasoning with controllable "thinking" capabilities, balancing speed, cost, and performance for developers.

Input: $0.15 / Output: $3.50
o4 mini
Apr 2025
Optimized for fast, affordable reasoning with strong coding and visual skills, large 200k-token context, and efficient handling of complex tasks.
Input: $1.10 / Output: $4.40
o3
Apr 2025
Excels at advanced reasoning, coding, math, and visual tasks with simulated reasoning, tool use, web browsing, and image understanding integration.
Input: $2.00 / Output: $8.00
GPT 4.1 nano
Apr 2025
OpenAI's fastest, cost-effective model with full 1 million token context, optimized for classification, autocompletion, and real-time AI agent tasks.
Input: $0.10 / Output: $0.40
GPT 4.1 mini
Apr 2025
Powerful mid-sized model with GPT-4o-level performance at lower cost and latency, featuring a 1 million token context window for complex tasks.
Input: $0.40 / Output: $1.60
GPT 4.1
Apr 2025
Excels in coding and instruction following with million-token context window, enabling superior performance on complex, multi-step tasks.
Input: $2.00 / Output: $8.00
Llama 4 Maverick
Apr 2025
Multimodal model with 17B active parameters, excelling at text, image, code, and multilingual tasks. Supports 1M-token context for advanced enterprise use.

Input: $0.22 / Output: $0.88
Llama 4 Scout
Apr 2025
Efficient multimodal MoE model with 10M-token context, excelling at multi-document analysis, codebase reasoning, and image understanding across 12 languages.
Input: $0.18 / Output: $0.59
Llama 4 Scout
Apr 2025
Multimodal model with 10M-token context, efficient MoE design, and strong performance in text, image, code, and multilingual reasoning tasks.

Input: $0.15 / Output: $0.60
Llama 4 Maverick
Apr 2025
Multimodal model with 400B parameters, 128 experts, and 1M context; excels at multilingual text/image reasoning, coding, and enterprise-scale applications.
Input: $0.27 / Output: $0.85
Deepseek V3
Mar 2025
Delivers advanced reasoning, code generation, and mathematical skills, processes long inputs efficiently, and accelerates results with innovative Mixture-of-Experts design.
Input: $0.27 / Output: $1.10
Mistral Small 3.1
Mar 2025
A lightweight, versatile 24B multimodal model handling text and images with extensive multilingual support and 128k token context window.
Input: $0.10 / Output: $0.30
Gemma 3 27B
Mar 2025
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning.

Input: $0.20 / Output: $0.40
Gemini 2.0 Pro
Mar 2025
Input: $1.25 / Output: $5.00
Perplexity Sonar Deep Research
Mar 2025
Performs exhaustive, multi-step research by autonomously searching and synthesizing hundreds of sources into detailed, expert-level reports across domains.
Input: $5.00 / Output: $15.00
Perplexity Sonar Reasoning Pro
Mar 2025
Premium reasoning model for complex, multi-step analysis. Delivers detailed explanations, real-time web search, and double citations for thorough answers.
Input: $3.00 / Output: $10.00
QwQ 32B
Mar 2025
Reasoning-focused LLM excelling in complex tasks like math and coding. Matches larger models' performance while running efficiently on consumer hardware.
Input: $1.20 / Output: $1.20
Claude 3.7 Sonnet
Feb 2025
A hybrid reasoning model with standard and extended thinking modes, delivering twice the speed and exceptional performance in coding and problem-solving tasks.
Input: $3.00 / Output: $15.00
Perplexity Sonar Reasoning
Jan 2025
Fast reasoning model with real-time web search, chain-of-thought capabilities, and citation support. Excels at complex queries with quick, accurate responses.
Input: $2.00 / Output: $6.00
Perplexity Sonar
Jan 2025
Optimized for search-augmented tasks, delivering fast, accurate answers with real-time web data and detailed citations. Excels in research and fact-checking.
Input: $2.00 / Output: $2.00
Perplexity Sonar Pro
Jan 2025
Excels at complex, multi-step queries with real-time web search, detailed answers, extensive citations, and customizable information retrieval.
Input: $5.00 / Output: $20.00
Deepseek R1
Jan 2025
Employs a massive Mixture-of-Experts architecture and Multi-Layer Attention to deliver advanced, polished reasoning and problem-solving across math, code, and more.

Input: $3.00 / Output: $8.00
Deepseek R1 (FP8)
Jan 2025
Excels at step-by-step reasoning and code generation, delivering transparent, structured answers through reinforcement learning and Mixture of Experts.
Input: $7.00 / Output: $7.00
Deepseek R1
Jan 2025
An open-source reasoning model using Mixture-of-Experts architecture, delivering powerful math and code capabilities comparable to OpenAI's o1.
Input: $0.55 / Output: $2.19
Deepseek V3 (FP8)
Dec 2024
Excels at efficient reasoning and code generation, leveraging large-scale mixture-of-experts architecture with advanced multi-token prediction and training innovations.
Input: $1.25 / Output: $1.25
Deepseek V3
Dec 2024
Powers advanced reasoning, code generation, and multilingual tasks with efficient MoE architecture and enhanced multi-token prediction for faster, optimized results.

Input: $0.75 / Output: $3.00
Hermes 3 8B
Dec 2024
Advanced agentic capabilities, roleplaying, reasoning, multi-turn conversation, long context coherence, and code generation with structured outputs.
Input: $0.03 / Output: $0.03
Hermes 3 70B
Dec 2024
Advanced agentic capabilities with strong roleplaying, reasoning, and structured output generation for technical tasks.
Input: $0.20 / Output: $0.20
Llama 3.3 70B Instruct
Dec 2024
Optimized for dialogue with strong reasoning, multilingual support, and efficient performance approaching larger models.

Input: $0.90 / Output: $0.90
Llama 3.3 70B Speculative Decoding
Dec 2024
Optimized for speed via speculative decoding, excels in reasoning, coding, and complex tasks while maintaining high efficiency.

Input: $0.59 / Output: $0.59
Llama 3.3 70B Instruct Turbo
Dec 2024
Excels in question answering, reasoning, and code generation, with use cases including synthetic data creation and evaluating smaller model outputs.
Input: $0.88 / Output: $0.88
QwQ 32B Preview
Nov 2024
This experimental model excels in math, coding, and scientific reasoning, developed by Alibaba's Qwen Team to advance AI analytical capabilities.
Input: $1.20 / Output: $1.20
QwQ 32B Preview
Nov 2024
Specialized in advanced reasoning and problem-solving, excelling in mathematics and programming with a 32B parameter transformer architecture.

Input: $0.90 / Output: $0.90
Qwen 2.5 Coder 32B Instruct
Nov 2024
Specializes in code generation, reasoning, and fixing across 40+ languages, matching GPT-4o's coding capabilities with 128k token context.
Input: $0.80 / Output: $0.80
Qwen 2.5 Coder 32B Instruct
Nov 2024
Specializes in code generation, reasoning, and fixing with 128K token context, open-source licensing, and local deployment capabilities.

Input: $0.90 / Output: $0.90
Llama 3.2 1B
Oct 2024
Lightweight model optimized for edge/mobile devices, excels in multilingual retrieval and summarization tasks with real-time processing and enhanced privacy.

Input: $0.04 / Output: $0.04
Llama 3.1 8B
Oct 2024
Multilingual dialogue model optimized for tool integration and safety, with 128K context length for extended interactions.

Input: $0.05 / Output: $0.08
Claude 3.5 Sonnet
Oct 2024
Powerful AI with exceptional coding abilities, twice the speed of previous versions, and advanced reasoning for complex software development tasks.
Input: $3.00 / Output: $15.00
Claude 3.5 Haiku
Oct 2024
Anthropic's fastest model offering advanced coding, tool use, and reasoning capabilities with rapid response times for real-time applications and personalized experiences.
Input: $0.80 / Output: $4.00
Ministral 8B
Oct 2024
Efficient edge model with native function calling and interleaved sliding-window attention for fast, memory-efficient processing in resource-constrained environments.
Input: $0.10 / Output: $0.10
Llama 3.1 Nemotron 70B Instruct
Oct 2024
Customized by NVIDIA to enhance helpfulness, this model excels in instruction-following tasks through human preference alignment and improved response relevance.
Input: $0.90 / Output: $0.90
Qwen2.5 1.5B Instruct
Sep 2024
texttext
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

$0.0005844
Llama 3.2 11B Vision
Sep 2024
imagetext
Multimodal model processing text and images for visual reasoning, captioning, and document analysis with cross-attention architecture.

Input: $0.18 / Output: $0.18
Llama 3.2 3B
Sep 2024
Efficient for mobile/edge devices, excels in text summarization, classification, and translation. Ideal for AI writing assistants and customer service applications.

Input: $0.06 / Output: $0.06
Llama 3.2 3B Instruct Turbo
Sep 2024
Optimized for multilingual instruction-following tasks, balancing efficiency and performance in dialogue, summarization, and agentic applications with 3B parameters and scalable architecture.
Input: $0.06 / Output: $0.06
Llama 3.2 90B Vision (Preview)
Sep 2024
imagetext
Multimodal model for visual reasoning and image analysis, excels in coding, math, and multilingual tasks with 128k token context.

Input: $0.90 / Output: $0.90
Gemini 1.5 Flash - 8B
Sep 2024
Optimized for high-volume, cost-effective tasks with multimodal input support, excelling in transcription and long-context processing.

Input: $0.038 / Output: $0.15
Qwen2.5 72B Instruct
Sep 2024
Instruction-tuned LLM excelling in long-context processing (131K tokens), multilingual support (29+ languages), and structured data handling.

Input: $0.90 / Output: $0.90
o1 mini
Sep 2024
Optimized for coding and math with Chain-of-Thought reasoning, offering fast, cost-efficient responses for complex problem-solving.
Input: $3.00 / Output: $12.00
o1 preview
Sep 2024
Reasoning-focused LLM for complex science, math, and coding tasks, generating detailed thought processes before responses.
Input: $15.00 / Output: $60.00
Pixtral 12B
Sep 2024
Multimodal model handling text and images at native resolution with 128K context window, excelling in visual reasoning tasks like document analysis and image captioning.
Input: $0.15 / Output: $0.15
Hermes 3 405B
Aug 2024
Advanced agentic capabilities with enhanced reasoning, roleplaying, and multi-turn conversation handling. Excels in structured output and long-context coherence.
Input: $0.90 / Output: $0.90
Llama 3.1 70B Instruct
Jul 2024
Multilingual LLM excelling in question answering, reasoning, code generation, and synthetic data generation.

Input: $0.90 / Output: $0.90
Llama 3.3 70B Versatile 128k
Jul 2024
Excels in multilingual tasks, tool use, coding, and reasoning with improved accuracy and efficient performance.

Input: $0.59 / Output: $0.79
Llama 3.1 8B Instruct Turbo
Jul 2024
Excels in multilingual dialogue and long-form text processing with strong reasoning for conversational agents and coding assistance.
Input: $0.18 / Output: $0.18
Llama 3.1 8B Instruct
Jul 2024
Optimized for multilingual dialogue with 128k context length, excels in chat, text generation, and language translation.

Input: $0.20 / Output: $0.20
Llama 3.1 405B Instruct
Jul 2024
Optimized for multilingual dialogue with 128k context, instruction-tuned via SFT/RLHF, and enhanced with synthetic data for safety and performance.

Input: $3.00 / Output: $3.00
Llama 3.1 405B Instruct Turbo
Jul 2024
Instruction-tuned LLM excelling in multilingual dialogue, synthetic data generation, and model distillation with 131k token context for complex tasks.
Input: $3.50 / Output: $3.50
Llama 3.1 70B Instruct Turbo
Jul 2024
Optimized for multilingual dialogue and long-context tasks, this model excels in production-scale applications with advanced inference capabilities and a 128k token context window.
Input: $0.88 / Output: $0.88
Mistral Nemo
Jul 2024
Handles long-form content with 128k token context, excels in multilingual tasks, coding, and function calling via natural language.
Input: $0.15 / Output: $0.15
GPT 4o mini
Jul 2024
Cost-efficient, fast model with 128K context window, supporting text/vision inputs and improved multilingual performance.
Input: $0.15 / Output: $0.60
Gemma 2 9B Instruct
Jun 2024
Efficient 9B parameter model trained on diverse web, code, and math data, excelling in coding and mathematical tasks.

Input: $0.20 / Output: $0.20
Codestral 2405
May 2024
Specializes in code generation with 32k token context, excelling in completion, debugging, and optimization across 80+ languages.
Input: $0.20 / Output: $0.60
Gemini 1.5 Pro
May 2024
Multimodal LLM with 2M token context, excels in complex reasoning, coding, and multimodal Q&A across text, images, audio, and video.

Input: $1.25 / Output: $5.00
Text Embedding 004
May 2024
textembeddings
Generates vector representations capturing semantic meaning/context for tasks like semantic search, text classification, and clustering. Multilingual support with versatile applications.

Input: $0.02 / Output: $0.02
Gemini 1.5 Flash
May 2024
Optimized for speed and efficiency, handles high-volume tasks with multimodal processing (text, images, video, audio) for summarization, chat, and data extraction.

Input: $0.075 / Output: $0.30
GPT 4o
May 2024
Multimodal LLM for real-time text, audio, and visual processing with multilingual support, emotional audio responses, and image generation.
Input: $2.50 / Output: $10.00
Mixtral 8x22B
Apr 2024
Efficient Sparse MoE architecture with 39B active parameters, excels in multilingual tasks, math, coding, and handles 64K token contexts.
Input: $2.00 / Output: $6.00
Mistral Large 2
Feb 2024
Powerful LLM with 123B parameters, excelling in multilingual tasks, coding, and reasoning, optimized for single-node inference and long-context applications.
Input: $2.00 / Output: $6.00
Text Embedding 3 - Small
Jan 2024
textembeddings
Generates compact, efficient embeddings for NLP tasks with multilingual support, balancing performance and low latency.
Input: $0.02 / Output: $0.02
Text Embedding 3 - Large
Jan 2024
textembeddings
Generates high-quality embeddings for complex text analysis and multilingual applications with 8,191 token context.
Input: $0.13 / Output: $0.13
Mixtral 8x7B
Dec 2023
Efficient Mixture of Experts (8 experts) with 13B active parameters, optimized for multilingual tasks and cost-performance balance.
Input: $0.70 / Output: $0.70
DALL-E 3
Oct 2023
textimage
Translates nuanced text prompts into detailed, accurate images with automatic prompt rewriting, multiple aspect ratios, and ChatGPT integration for creative workflows[1][2][6].
$NaN
Mistral 7B
Sep 2023
Balanced performance in natural language and code tasks, efficiently handling longer sequences with innovative attention mechanisms.
Input: $0.25 / Output: $0.25
o3 mini
Feb 2025
Optimized for STEM reasoning and problem-solving, excelling in complex tasks like advanced math and coding with improved cost efficiency.
Input: $1.10 / Output: $4.40
Deepseek R1 Distill Llama 70B
Feb 2025
Delivers strong mathematical and coding abilities, matching the performance of larger models while using efficient distillation and multilingual support.

Input: $0.59 / Output: $0.79
o1
Feb 2025
Specializes in complex reasoning through chain-of-thought processing, excelling in STEM tasks like coding, math, and scientific analysis.
Input: $15.00 / Output: $60.00
GPT 4.5
Feb 2025
Excels in natural conversation and creative tasks with improved emotional intelligence and multilingual support, prioritizing intuitive interactions over structured reasoning.
Input: $75.00 / Output: $150.00
Ministral 3B
Oct 2024
Optimized for edge computing with function-calling capabilities, excelling in knowledge retrieval and commonsense reasoning with 128k token context.
Input: $0.04 / Output: $0.04
Gemini 2.0 Flash Lite
Feb 2025
Cost-efficient, budget-friendly multimodal LLM for real-time tasks with 1M token input context and enhanced performance.

Input: $0.075 / Output: $0.30
Gemini 2.0 Flash
Feb 2025
Multimodal LLM for agentic applications, handling real-time data integration and multi-step tasks with enhanced reasoning via Thinking Mode, integrating Google tools and third-party functions.

Input: $0.10 / Output: $0.40
Codestral Latest
Feb 2025
Specializes in coding tasks with multilingual support for 80+ languages, excelling in code generation, fill-in-the-middle, and test creation with a 256K token context.
Input: $0.30 / Output: $0.90
Gemini 2.5 Pro Experimental
Mar 2025
Handles complex reasoning and coding tasks, generates and interprets multimodal content, and supports interactive visualizations with an extensive 1M token context.

Input: $2.50 / Output: $5.00