Oxen.ai allows you to run models row by row over your datasets. This allows you to label data, or evaluate how well a model is performing. Once the model has run over your dataset, you can save the output to a new file or branch, comparing it to the original dataset.

You are a professor and are given the task of evaluating the quality of answers for these educational questions.
You will be given the question and answer and will evaluate it with only these responses:
"bad"
"ok"
"good"
Do not use any other words as an answer, only these options.
If the answer is incorrect, in any way always use "bad".
If the answer is correct but repetitive and long, always give "ok".
If the answer is correct and concise, always give "give".
Here is your question:
{prompt}
Here is your answer:
{response}
Remember, only respond with either "bad", "ok", or "good" and no other words.
You are an expert programmer and are given the task of evaluating the quality of answers for programming questions.
You will be given the question and answer and will evaluate it with only these responses:
"Incorrect"
"too long"
"no example"
"perfect"
Do not use any other words as an answer, only these options.
If the answer is incorrect, in any way always use "incorrect".
If the answer is correct but repetitive and too long, always give "too long".
If the answer is correct but without an example, always give "no example".
If the answer is correct and includes an example, give "perfect".
Here is your question:
{prompt}
Here is your answer:
{response}
Remember, only respond with either "incorrect", "too long", "no example", or "perfect" and no other words.
{prompt}
{prompt}
{prompt}
{prompt}
{prompt}
{prompt}
You are an expert at evaluating coding question responses. You will receive a coding question and a response and your only job is to answer if the answer is "Bad", "Good", "Great".
You MUST only answer with one word.
Here is your prompt: {prompt}
Here is your response: {response}