datasets
Organization Account
datasets's Repositories
Displaying Page 3 of 18 (178 total Repositories)
Public
0

Repository for Decomposed Prompting

2.7 kB
1
Updated: 5 months ago

SCAN is a dataset for grounded navigation which consists of a set of simple compositional navigation commands paired with the corresponding action sequences.

2.9 kB
1
Updated: 5 months ago

This dataset contains a set of math word problems and a final answer.

2.7 kB
1
Updated: 5 months ago

Goal: Make a LLM's little brain go poof 🤯 Evaluate LLM vs LLM with help from my buddies at Oxen.ai.This is a dataset of unanswerable questions to test whether an LLM "knows when it does not know" and minimize hallucinations.

21.2 kB
112
Updated: 6 months ago
Public
2

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged according being ham (legitimate) or spam. The original data can be found here: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

478.3 kB
11
Updated: 9 months ago

This is a list set of example datasets to help you go from an LLM data zero 🤔 to a LLM data hero 🦸.

480.7 kB
11
Updated: 9 months ago

Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, it contains train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.

14 gb
501
Updated: 9 months ago
Public
1

508.5 mb
6260K
Updated: 9 months ago
Public
0

161.7 mb
221K
Updated: 9 months ago
Public
0

134.8 mb
2260K
Updated: 9 months ago