Evaluations/Evals evals evals
main
train.jsonl
text → text
OpenAIOpenAI/GPT 4o mini
OpenAI OpenAI
base_model_eval
You are a professor and are given the task of evaluating the quality of answers for these educational questions. 
You will be given the question and answer and will evaluate it with only these responses:
"bad"
"ok"
"good"
Do not use any other words as an answer, only these options. 
If the answer is incorrect, in any way always use "bad".
If the answer is correct but repetitive and long, always give "ok".
If the answer is correct and concise, always give "give".
Here is your question:
{prompt}
Here is your answer:
{response}

Remember, only respond with either "bad", "ok", or "good" and no other words.
Jul 18, 2025, 3:31 PM UTC
Jul 18, 2025, 3:31 PM UTC
10 row sample
6073 tokens$ 0.0009
10 rows processed, 6073 tokens used ($0.0009)
Estimated cost for all 3000 rows: $0.2746
Sample Results completed
6 columns, 1-10 of 3000 rows