Evaluation e29c3f26-8395-4e04-a7f2-592c4435b46e - ox/GSM8k-sample

Total running cost: $0.0000

	Prompt	Rows	Type	Model	Target	Status	Runtime	Run	By	Tokens	Cost
Run	Compare the two answers and respond with true if the reasoning and answers are the same and false if not. Respond with a single word lower case. Answer 1: {response} Answer 2: {prediction}	100	text → text	OpenAI/GPT 4o	62efa0086e6fda69	completed	00:01:03	2 years ago	ox	35431 tokens
Sample	Compare the two answers and respond with true if the reasoning and answers are the same and false if not. Respond with a single word lower case. Answer 1: {response} Answer 2: {prediction}	10	text → text	OpenAI/GPT 4o	Sample - N/A	completed	00:00:04	2 years ago	ox	3602 tokens