Repository evaluations - ox/SQuAD

Evaluations

Label datasets and evaluate model performance

Oxen.ai allows you to run models row by row over your datasets. This allows you to label data, or evaluate how well a model is performing. Once the model has run over your dataset, you can save the output to a new file or branch, comparing it to the original dataset.

Answer Extraction

7ab590a2-3ea9-42dd-8f72-9a64d2d5c5c2

Unknown/gemini-1-5-flash-8btext → text

1 year ago

Prompt

Extract the answer from the question and the context. Only respond with answer strings that are contained in the context.

Question:
{prompt}

Context:
{context}

main

5_shot.jsonl

63779706a272

5_shot.jsonl

completed 5 rows1042 tokens$ 0.0000 3 iterations

3d567cdd-1bd8-4a13-a847-60bf60b7c219

Unknown/gemini-1-5-flashtext → text

1 year ago

Prompt

What is the answer, given the question and context?

Question:
{prompt}

Context:
{context}

Answer:

main

dev.jsonl

completed 5 row sample1168 tokens$ 0.0002 1 iteration

Loading evaluations...