qwen

Qwen2.5-VL 32B Instruct

image-to-text
Handles advanced visual recognition, complex analysis of images and videos, structured data extraction, agentic tool use, and robust multilingual reasoning.
About
Released: 3/25/2025

Qwen2.5-VL 32B Instruct is a multimodal LLM designed for understanding and generating both visual and textual content.

It excels in advanced visual recognition, analysis of complex visuals (such as charts, tables, and layouts), precise object localisation in images, and structured data extraction from visual documents. The model also demonstrates improved mathematical reasoning, problem-solving, and instruction following, with support for lengthy context windows and robust multilingual capabilities.

Some other noteworthy features of Qwen2.5-VL 32B Instruct include long-video comprehension (processing and summarising videos exceeding one hour) and agentic visual reasoning for dynamic tool use across digital environments.

MetricValue
Parameter Count32 billion
Mixture of ExpertsNo
Context Length128,000 tokens
MultilingualYes
Quantized*No

*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.