Models/Qwen2.5-VL 32B Instruct
QwenQwen / Qwen2.5-VL 32B Instruct
Released: 3/25/2025
imagetext
Input: $0.90 / Output: $0.90

Qwen2.5-VL 32B Instruct is a multimodal LLM designed for understanding and generating both visual and textual content.

It excels in advanced visual recognition, analysis of complex visuals (such as charts, tables, and layouts), precise object localisation in images, and structured data extraction from visual documents. The model also demonstrates improved mathematical reasoning, problem-solving, and instruction following, with support for lengthy context windows and robust multilingual capabilities.

Some other noteworthy features of Qwen2.5-VL 32B Instruct include long-video comprehension (processing and summarising videos exceeding one hour) and agentic visual reasoning for dynamic tool use across digital environments.

MetricValue
Parameter Count32 billion
Mixture of ExpertsNo
Context Length128,000 tokens
MultilingualYes
Quantized*No

*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

Qwen models available on Oxen.ai
ModalityPrice (1M tokens)
ModelInference providerInputOutputInputOutput
Fireworks AIFireworks AI
texttext$0.90$0.90
Fireworks AIFireworks AI
texttext$0.22$0.88
Fireworks AIFireworks AI
texttext$0.90$0.90
CerebrasCerebras
texttext$2.00$2.00
BytezBytez
texttextN/AN/A
Fireworks AIFireworks AI
texttext$0.90$0.90
Fireworks AIFireworks AI
imagetext$0.90$0.90
Fireworks AIFireworks AI
texttext$0.45$1.80
Fireworks AIFireworks AI
texttext$0.90$0.90
See all models available on Oxen.ai