datasets
Organization Account
datasets's Repositories
Displaying Page 3 of 18 (180 total Repositories)
public
11

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations.

1.5 gb
46203K
Updated: 8 months ago

This is a dataset of the Game of 24 which is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24. For example, given input “4 9 10 13”, a solution output could be “(10 - 4) * (13 - 9) = 24”.

2.9 kB
1
Updated: 9 months ago

Some examples from hf gretelai/synthetic-gsm8k-reflection-405b

56.1 kB
11
Updated: 9 months ago
public
0

Repository for Decomposed Prompting

2.7 kB
1
Updated: 9 months ago

SCAN is a dataset for grounded navigation which consists of a set of simple compositional navigation commands paired with the corresponding action sequences.

2.9 kB
1
Updated: 9 months ago

This dataset contains a set of math word problems and a final answer.

2.7 kB
1
Updated: 9 months ago

Goal: Make a LLM's little brain go poof 🤯 Evaluate LLM vs LLM with help from my buddies at Oxen.ai.This is a dataset of unanswerable questions to test whether an LLM "knows when it does not know" and minimize hallucinations.

21.2 kB
211
Updated: 10 months ago

This is a list set of example datasets to help you go from an LLM data zero 🤔 to a LLM data hero 🦸.

480.7 kB
11
Updated: 1 year ago

Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, it contains train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.

14 gb
150
Updated: 1 year ago
public
1

508.5 mb
60K62
Updated: 1 year ago