Collections/datasets/visual-llms

Visual LLMs

This collection is datasets for understanding of images with large language models

85.4 mb
1K12
Updated: 2 years ago

OCRBench is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation.

4.5 gb
3210K
Updated: 1 year ago

This repo contains data for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive"

466.3 mb
24K30
Updated: 2 years ago

405.9 kB
21
Updated: 2 years ago