Evaluations/LLM as a Judge/Iteration history
History
Total running cost: $0.0000
PromptRowsTypeModelTargetStatusRuntimeRunByTokensCost
Run
Compare the two answers and respond with true if the reasoning and answers are the same and false if not. Respond with a single word lower case. Answer 1: {response} Answer 2: {prediction}
100texttextOpenAIOpenAI/GPT 4o62efa0086e6fda69 completed 00:01:0311 months agoox35431 tokens
Sample
Compare the two answers and respond with true if the reasoning and answers are the same and false if not. Respond with a single word lower case. Answer 1: {response} Answer 2: {prediction}
10texttextOpenAIOpenAI/GPT 4oSample - N/A completed 00:00:0411 months agoox3602 tokens