datasets
datasets's Repositories
This is a dataset of the Game of 24 which is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24. For example, given input “4 9 10 13”, a solution output could be “(10 - 4) * (13 - 9) = 24”.
Some examples from hf gretelai/synthetic-gsm8k-reflection-405b
SCAN is a dataset for grounded navigation which consists of a set of simple compositional navigation commands paired with the corresponding action sequences.
Goal: Make a LLM's little brain go poof 🤯 Evaluate LLM vs LLM with help from my buddies at Oxen.ai.This is a dataset of unanswerable questions to test whether an LLM "knows when it does not know" and minimize hallucinations.
This is a list set of example datasets to help you go from an LLM data zero 🤔 to a LLM data hero 🦸.
Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, it contains train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.