Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
e508cca2-6df9-41b7-bf95-06c19d2b0e56
Anthropic AIAnthropic AI/Claude Sonnet 4texttext
Bessie
ox
1 week ago
Translate the prompt to french

{prompt}
completed 5 row sample1308 tokens$ 0.0130 1 iteration
5f487e18-7667-4257-b809-6385f0f5d605
Anthropic AIAnthropic AI/Claude Sonnet 4texttext
Bessie
ox
1 week ago
Describe the emotion of this image in one word.

{image}
completed 5 row sample295 tokens$ 0.0013 3 iterations
9af30542-0281-4428-a7d9-92cf5143d53c
GoogleGoogle/Gemini 2.0 Flashimagetext
Bessie
ox
2 weeks ago
Describe what the character is doing

{image}
completed 5 row sample0 tokens$ 0.0000 1 iteration