Featured Datasets
Public
0

Evaluating Large Multimodal Models for Integrated Capabilities

70.5 mb
11218
Updated: 2 months ago

Pipeline to answer questions about papers from Arxiv Dives

452.7 mb
402096319
Updated: 1 week ago

Pipeline to answer questions about papers from Arxiv Dives

450.4 mb
189634020
Updated: 4 weeks ago

Pipeline to answer questions about papers from Arxiv Dives

450.4 mb
963182040
Updated: 4 weeks ago

195.7 mb
222K
Updated: 1 week ago

4.3 gb
1258K
Updated: 1 week ago

25 gb
1152K1
Updated: 1 week ago

139.8 mb
21
Updated: 1 week ago

526.9 mb
21
Updated: 1 week ago

590.2 mb
22
Updated: 1 week ago
View all featured repositories
Featured Collections

Some of the Oxen team's favorite collections.

LLM-SFT

Interesting datasets to supervise fine-tune (SFT) language models with.

a collection by ox

Visual LLMs

This collection is datasets for understanding of images with large language models

a collection by datasets

LLM-Feedback

Datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO.

a collection by ox

LLM-Eval

A list of standard benchmarks for LLM evaluation

a collection by ox

Multimodal

List of datasets that cross modalities, combinations of text, image, audio, video etc.

a collection by ox

Browse all collections