datasets
Organization Account
datasets's Repositories
Displaying Page 1 of 17 (166 total Repositories)
Public
2

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged according being ham (legitimate) or spam. The original data can be found here: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

478.3 kB
11
Updated: 4 weeks ago

This is a list set of example datasets to help you go from an LLM data zero 🤔 to a LLM data hero 🦸.

480.7 kB
11
Updated: 1 month ago

Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, it contains train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.

14 gb
150
Updated: 1 month ago
Public
0

508.5 mb
2660K
Updated: 1 month ago
Public
0

20.2 mb
70K22
Updated: 1 month ago
Public
0

161.7 mb
221K
Updated: 1 month ago
Public
0

134.8 mb
60K22
Updated: 1 month ago
Public
0

142.9 mb
5282
Updated: 1 month ago