Featured Repositories

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Challenging Set of questions.

859 kB
1
Updated: 7 months ago

Wikipedia dataset containing cleaned articles. There are 6.4 million articles that can be streamed via apache arrow files.

20.4 gb
651
Updated: 7 months ago
Public
3

A benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. … The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations.

1.1 gb
938.1K
Updated: 10 months ago
Public
26

An image classification dataset containing 3670 images of flowers across 5 classes: daisy, dandelion, roses, sunflowers, tulips. The images are of nonstandard sizes and aspect ratios, ranging from 500 x 442 px to 143 x 240 px.

233.7 mb
143.7K
Updated: 8 months ago

Subset of speech commands to test audio recognition systems on.

252.5 mb
38K1
Updated: 1 year ago

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations.

1.5 gb
25203K
Updated: 2 months ago