Evaluations/LLM As A Judge - Gemini 2.0 Flash - Model Epoch 50
model-epoch-50
results.parquet
image → text
MetaMeta/Llama 4 Scout
Fireworks AI Fireworks AI
prediction
Judge the following image on a few different dimensions. Be very critical.

Each of judgements should be one of three values:

* "poor" if the image does not match the description
* "good" if the image matches the description, but could be better
* "great" if there is nothing that could be improved about the image

Return the judgements in xml format. The xml should contain with the following field names:

<reasoning>
  Step by step reasoning of why the image is good or not
</reasoning>
<description>
  Does the character match the description?
</description>
<task>
  How well is the task being portrayed? Are all the items and actions present?
</task>
<quality>
  What is the overall quality of the image?
</quality>

Prompt:
{prompt}

Image:
{image}

Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
May 29, 2025, 6:05 PM UTC
May 29, 2025, 6:05 PM UTC
5 row sample
5742 tokens$ 0.0013
5 rows processed, 5742 tokens used ($0.0013)
Estimated cost for all 50 rows: $0.0134
Sample Results completed
3 columns, 1-5 of 50 rows