Qwen / Qwen2.5-VL 32B Instruct
Released: 3/25/2025Qwen2.5-VL 32B Instruct is a multimodal LLM designed for understanding and generating both visual and textual content.
It excels in advanced visual recognition, analysis of complex visuals (such as charts, tables, and layouts), precise object localisation in images, and structured data extraction from visual documents. The model also demonstrates improved mathematical reasoning, problem-solving, and instruction following, with support for lengthy context windows and robust multilingual capabilities.
Some other noteworthy features of Qwen2.5-VL 32B Instruct include long-video comprehension (processing and summarising videos exceeding one hour) and agentic visual reasoning for dynamic tool use across digital environments.
Metric | Value |
---|---|
Parameter Count | 32 billion |
Mixture of Experts | No |
Context Length | 128,000 tokens |
Multilingual | Yes |
Quantized* | No |
*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.
Qwen models available on Oxen.ai
Modality | Price (1M tokens) | ||||
---|---|---|---|---|---|
Model | Inference provider | Input | Output | Input | Output |
![]() | text | text | $0.90 | $0.90 | |
![]() | text | text | $0.22 | $0.88 | |
![]() | text | text | $0.90 | $0.90 | |
![]() | text | text | $2.00 | $2.00 | |
![]() | text | text | N/A | N/A | |
![]() | text | text | $0.90 | $0.90 | |
![]() | image | text | $0.90 | $0.90 | |
![]() | text | text | $0.45 | $1.80 | |
![]() | text | text | $0.90 | $0.90 |