Featured Repositories
Updated: 5 months ago

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Challenging Set of questions.

Updated: 5 months ago

Wikipedia dataset containing cleaned articles. There are 6.4 million articles that can be streamed via apache arrow files.

20.4 gb
651
11
ox/Flickr8k
Public
Updated: 8 months ago

A benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. … The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations.

ox/Flowers
Public
Updated: 6 months ago

An image classification dataset containing 3670 images of flowers across 5 classes: daisy, dandelion, roses, sunflowers, tulips. The images are of nonstandard sizes and aspect ratios, ranging from 500 x 442 px to 143 x 240 px.

233.7 mb
3.7K14
Updated: 1 year ago

Subset of speech commands to test audio recognition systems on.

252.5 mb
138K
ox/CelebA
Public
Updated: 1 year ago

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations.