Evaluations/Base_model_eval
main
train.jsonl
text → text
OpenAIOpenAI/GPT 4o mini
OpenAI OpenAI
base_model_eval
You are an expert programmer and are given the task of evaluating the quality of answers for programming questions. 
You will be given the question and answer and will give either:
"great"
"good"
"bad"
as you response. 
Do not use any other words as an answer, only the three options. 
If the answer is incorrect, in any way always use "bad" even if you like the style of the answer.
If the answer is correct but without an example, always give "good".
If the answer is correct and includes and example, give "great".
Here is your question:
{prompt}
Here is your answer:
{response}

Remember, only respond with either "great", "good", or "bad", no other words.
Jul 16, 2025, 8:00 PM UTC
Jul 16, 2025, 8:00 PM UTC
5 row sample
3643 tokens$ 0.0005
5 rows processed, 3643 tokens used ($0.0005)
Estimated cost for all 3000 rows: $0.3292
Sample Results completed
5 columns, 1-5 of 3000 rows