Repository evaluations - ox/Ox-Character

Evaluations/LLM As A Judge - Gemini 2.0 Flash - Model Epoch 50

model-epoch-50

results.parquet

Type: image → text

Model:

Qwen/Qwen2 VL 72B Instruct

Provider:

Fireworks AI

Target field: prediction

Prompt

Judge the following image on a few different dimensions. 

Return the judgements in xml format. The judgement should be either "poor", "good", or "great". The xml should contain with the following field names:

<reasoning>
  Step by step reasoning of why the image is good or not
</reasoning>
<description>
  Does the character match the description?
</description>
<task>
  Is the character doing the task specified?
</task>

Prompt:
{prompt}

Image:
{image}

Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.

Queued: May 29, 2025, 6:00 PM UTC

Completed: May 29, 2025, 6:01 PM UTC

5 row sample

2912 tokens$ 0.0026

5 rows processed, 2912 tokens used ($0.0026)

Estimated cost for all 50 rows: $0.0262

Sample Results completed

3 columns, 1-5 of 50 rows