Run Qwen2.5-VL 32B Instruct on your data

Qwen2.5-VL 32B Instruct is a multimodal LLM designed for understanding and generating both visual and textual content.

It excels in advanced visual recognition, analysis of complex visuals (such as charts, tables, and layouts), precise object localisation in images, and structured data extraction from visual documents. The model also demonstrates improved mathematical reasoning, problem-solving, and instruction following, with support for lengthy context windows and robust multilingual capabilities.

Some other noteworthy features of Qwen2.5-VL 32B Instruct include long-video comprehension (processing and summarising videos exceeding one hour) and agentic visual reasoning for dynamic tool use across digital environments.

Metric	Value
Parameter Count	32 billion
Mixture of Experts	No
Context Length	128,000 tokens
Multilingual	Yes
Quantized*	No

*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

Qwen2.5-VL 32B Instruct

About