Visual LLMs
This collection is datasets for understanding of images with large language models
85.4 mb
1K12
public
0
OCRBench is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation.
4.5 gb
3210K
public
0
This repo contains data for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive"
466.3 mb
24K30
public
0
An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models
163.9 mb
252411