Featured Repositories
Updated: 5 months ago

A dataset from the Allen Institute of AI consisting of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset the Challenging Set of questions.

Updated: 5 months ago

Wikipedia dataset containing cleaned articles. There are 6.4 million articles that can be streamed via apache arrow files.

20.4 gb
Updated: 8 months ago

A benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. … The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations.

Updated: 6 months ago

An image classification dataset containing 3670 images of flowers across 5 classes: daisy, dandelion, roses, sunflowers, tulips. The images are of nonstandard sizes and aspect ratios, ranging from 500 x 442 px to 143 x 240 px.

233.7 mb
Updated: 1 year ago

Subset of speech commands to test audio recognition systems on.

252.5 mb
Updated: 1 year ago

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations.