datasets Repositories

datasets

Organization Account

datasets's Repositories

CelebA

public

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations.

Generative AI Computer Vision Image Classification

1.5 gb

46203K

Updated: 2 years ago

Game-of-24

public

This is a dataset of the Game of 24 which is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24. For example, given input “4 9 10 13”, a solution output could be “(10 - 4) * (13 - 9) = 24”.

2.9 kB

Updated: 2 years ago

synthetic-gsm8k-reflection-405b

public

Some examples from hf gretelai/synthetic-gsm8k-reflection-405b

56.1 kB

Updated: 2 years ago

CommaQA

public

Repository for Decomposed Prompting

2.7 kB

Updated: 2 years ago

SCAN-prompting

public

SCAN is a dataset for grounded navigation which consists of a set of simple compositional navigation commands paired with the corresponding action sequences.

2.9 kB

Updated: 2 years ago

MultiArith

public

This dataset contains a set of math word problems and a final answer.

2.7 kB

Updated: 2 years ago

ImpossibleQuestions

public

Goal: Make a LLM's little brain go poof 🤯 Evaluate LLM vs LLM with help from my buddies at Oxen.ai.This is a dataset of unanswerable questions to test whether an LLM "knows when it does not know" and minimize hallucinations.

21.2 kB

211

Updated: 2 years ago

LLM-Zero-Hero

public

This is a list set of example datasets to help you go from an LLM data zero 🤔 to a LLM data hero 🦸.

480.7 kB

Updated: 2 years ago

SlimPajama-6B

public

Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, it contains train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.

14 gb

150

Updated: 2 years ago

cifar10

public

Computer Vision Image Classification

508.5 mb

60K62

Updated: 2 years ago