Featured Datasets

50.5 mb
1
Updated: 2 days ago

276.9 mb
21
Updated: 2 days ago

3 gb
1345K1
Updated: 3 days ago

test very large graph from an energy based social network simulation from https://synthasaizer.com/

618.3 mb
11
Updated: 4 days ago

4.5 gb
10K21
Updated: 4 days ago

2.7 gb
5.2K12
Updated: 4 days ago

59.5 mb
1
Updated: 2 weeks ago

Example dataset constructed from the steps and prompts in the "Thinking LLMs: General Instruction Following With Thought Generation" paper.

189.3 mb
11
Updated: 4 months ago

137 mb
633
Updated: 2 weeks ago

Materials to build with LLM and LLM agents

92.2 mb
14
Updated: 3 days ago
View all featured repositories
Featured Collections

Some of the Oxen team's favorite collections.

LLM-SFT

Interesting datasets to supervise fine-tune (SFT) language models with.

a collection by ox

Visual LLMs

This collection is datasets for understanding of images with large language models

a collection by datasets

LLM-Feedback

Datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO.

a collection by ox

LLM-Eval

A list of standard benchmarks for LLM evaluation

a collection by ox

Multimodal

List of datasets that cross modalities, combinations of text, image, audio, video etc.

a collection by ox

Browse all collections