Evaluations/LLM as a Judge
no_CoT
sample.parquet
texttext
OpenAIOpenAI/GPT 4o
OpenAI OpenAI
is_correct
Compare the two answers and respond with true if the reasoning and answers are the same and false if not. Respond with a single word lower case.

Answer 1:
{response}

Answer 2:
{prediction}
Oct 4, 2024, 3:40 PM UTC
Oct 4, 2024, 3:41 PM UTC
100 rows
35431 tokens
100 rows processed, 35431 tokens used
completed
5 columns, 100 rows