Evaluations/Rewritten LLM as a Judge
rewritten
12_sample_rewritten.parquet
texttext
OpenAIOpenAI/GPT 4o
OpenAI OpenAI
is_correct
Compare the two answers and respond with true if the reasoning and answers are the same and false if not. Respond with a single word lower case.

Answer 1:
{reasoning}
{answer}

Answer 2:
{prediction}
Oct 4, 2024, 4:16 PM UTC
Oct 4, 2024, 4:16 PM UTC
12 rows
4072 tokens
12 rows processed, 4072 tokens used
completed
6 columns, 12 rows