datasets
Organization Account
datasets's Repositories
Displaying Page 1 of 17 (166 total Repositories)
Public
10

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations.

1.5 gb
42203K
Updated: 1 month ago
Public
2

The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged according being ham (legitimate) or spam. The original data can be found here: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

478.3 kB
11
Updated: 2 months ago

This is a list set of example datasets to help you go from an LLM data zero 🤔 to a LLM data hero 🦸.

480.7 kB
11
Updated: 2 months ago

Sampled version of cerebras/SlimPajama-627B. Since the original data was shuffled before chunking, it contains train/chunk1 (of 10 total) and further sampled 10%. This should result in roughly 6B tokens, hence SlimPajama-6B.

14 gb
501
Updated: 3 months ago
Public
0

508.5 mb
60K62
Updated: 3 months ago
Public
0

20.2 mb
70K22
Updated: 3 months ago
Public
0

161.7 mb
221K
Updated: 3 months ago
Public
0

134.8 mb
2260K
Updated: 3 months ago
Public
0

142.9 mb
5282
Updated: 3 months ago