Explore Dataset Repositories

Featured datasets

lmsys/chatbot_arena_conversations

public

This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp.

41.6 mb

Updated: 2 years ago

lmms-lab/OCRBench-v2

public

OCRBench is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation.

4.5 gb

10K23

Updated: 1 year ago

BLINK-Benchmark/BLINK

public

This repo contains data for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive"

466.3 mb

24K30

Updated: 2 years ago

mlabonne/harmless_alpaca

public

1.2 mb

Updated: 2 years ago

OpenCoder-LLM/opc-sft-stage1

public

Dataset for the paper "OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models"

1 gb

Updated: 2 years ago

models/llama-3-8b-instruct

public

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pre-trained and instruction tuned generative text models in 8B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, they took great care to optimize helpfulness and safety.

16.1 gb

410

Updated: 2 years ago

JoPmt/hf_community_images

public

1.7 gb

3.9K1

Updated: 2 years ago

bdsaglam/musique-raw

public

91.1 mb

Updated: 2 years ago

MBZUAI/Bactrian-X

public

30 B

Updated: 2 years ago

Pejman47/Pj001016655

public

202.6 kB

Updated: 1 year ago

View all featured repositories

Featured collections

Some of the Oxen team's favorite collections.

Browse all collections

Featured datasets

lmsys/chatbot_arena_conversations

lmms-lab/OCRBench-v2

BLINK-Benchmark/BLINK

mlabonne/harmless_alpaca

OpenCoder-LLM/opc-sft-stage1

models/llama-3-8b-instruct

JoPmt/hf_community_images

bdsaglam/musique-raw

MBZUAI/Bactrian-X

Pejman47/Pj001016655

Featured collections

LLM-SFT

Visual LLMs

LLM-Feedback

LLM-Eval

Multimodal