This is the dataset for pretraining the Large Language and Vision Assistant(LLaVA), an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.