datasets Repositories

datasets

Organization Account

Repositories

datasets's Repositories

alpaca

public

This is a cleaned version of the original Alpaca Dataset released by Stanford.

42.7 mb

Updated: 2 years ago

ultrafeedback

public

This is a cleaned version of the HuggingFaceH4/ultrafeedback_binarized dataset that just has the chosen and rejected samples.

204.5 mb

Updated: 2 years ago

DBPedia-Short-Abstracts

public

Short abstracts from Wikipedia pages

806.6 mb

Updated: 2 years ago

Not-In-Context

public

Question, context, answer triples that are marked as having the answer in context, not having the answer in context, and being a question that does not make sense to ask.

310 mb

Updated: 2 years ago

A growing and diverse dataset of text for AI to graze on and learn new information. Just like a pasture in the wild, it is a combination of sources. All the data is in Arrow format so it is easy to randomly access and stream.

43.8 gb

1201

Updated: 2 years ago

LLaVA-Instruct

public

LLaVA Visual Instruct 150K is a set of GPT-generated multimodal instruction-following data. It is constructed for visual instruction tuning and for building large multimodal towards GPT-4 vision/language capability.

13.3 gb

81K11

Updated: 2 years ago

Wikipedia

public

Wikipedia dataset containing cleaned articles. There are 6.4 million articles that can be streamed via apache arrow files.

20.4 gb

165

Updated: 2 years ago

WikiText

public

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Natural Language Processing Language Modeling

316.2 mb

Updated: 2 years ago

NQ-Open

public

The NQ-Open task, introduced by Lee et.al. 2019, is an open domain question answering benchmark that is derived from Natural Questions. The goal is to predict an English answer string for an input English question. All questions can be answered using the contents of English Wikipedia.

Natural Language Processing Question Answering

9.5 mb

Updated: 2 years ago

datasets

datasets's Repositories

OxenAI-Prompts

alpaca

ultrafeedback

DBPedia-Short-Abstracts

Not-In-Context

ThePasture

LLaVA-Instruct

Wikipedia

WikiText

NQ-Open