Repository evaluations - mathi/mlabonne-FineTome-100k | Datasets at Oxen.ai

mlabonne-FineTome-100k

Data Branches Evaluations Fine-tune

Evals evals evals

9d7ee25b-1aaf-4106-bdc4-e5222a148027

OpenAI/GPT 4o minitext → text

Mathias

mathi

11 months ago

Prompt

You are a professor and are given the task of evaluating the quality of answers for these educational questions. 
You will be given the question and answer and will evaluate it with only these responses:
"bad"
"ok"
"good"
Do not use any other words as an answer, only these options. 
If the answer is incorrect, in any way always use "bad".
If the answer is correct but repetitive and long, always give "ok".
If the answer is correct and concise, always give "give".
Here is your question:
{prompt}
Here is your answer:
{response}

Remember, only respond with either "bad", "ok", or "good" and no other words.

main

main

completed 3000 rows1706427 tokens$ 0.2573 4 iterations

Base_model_eval

fe25bdd7-0865-4288-8f24-c2bfd32967a0

OpenAI/GPT 4o minitext → text

Mathias

mathi

11 months ago

Prompt

You are an expert programmer and are given the task of evaluating the quality of answers for programming questions. 
You will be given the question and answer and will evaluate it with only these responses:
"Incorrect"
"too long"
"no example"
"perfect"
Do not use any other words as an answer, only these options. 
If the answer is incorrect, in any way always use "incorrect".
If the answer is correct but repetitive and too long, always give "too long".
If the answer is correct but without an example, always give "no example".
If the answer is correct and includes an example, give "perfect".
Here is your question:
{prompt}
Here is your answer:
{response}

Remember, only respond with either "incorrect", "too long", "no example", or "perfect" and no other words.

main

main

completed 3000 rows1795290 tokens$ 0.2715 5 iterations

Llama 3.2 3b Base Model Run

ec79f1f5-b228-492a-bbb4-23f373861ee1

Unknown/llama3-2-3b-instructtext → text

Mathias

mathi

11 months ago

Prompt

{prompt}

main

main

completed 3000 rows1832594 tokens$ 0.0421 3 iterations

Base Llama 3.2 3B Responses

2ea5a4b5-9cd3-4c56-a1a0-7080e91f80ba

Unknown/llama3-2-3b-instructtext → text

Mathias

mathi

11 months ago

Prompt

{prompt}

main

N/A

error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 3000 rows1829385 tokens$ 0.0419 2 iterations

Base Llama 3.2 3b Outputs

449c028c-82b4-4ea6-b7d9-f25e70d5f4fa

Unknown/llama3-2-3b-instructtext → text

Mathias

mathi

11 months ago

Prompt

{prompt}

main

completed 5 row sample3477 tokens$ 0.0001 1 iteration

Base Model Response

3eb2d2b1-5b66-4db0-89a2-6a8d7d920f5d

Unknown/meta-llama-llama-3-2-3b-instruct-turbotext → text

Mathias

mathi

1 year ago

Prompt

{prompt}

main

N/A

error no case clause matching: {:error, "resource_not_found", 0, 0} 3000 rows1863822 tokens$ 0.1113 2 iterations

691ad668-7049-4cff-a7fb-47bfebfa4291

691ad668-7049-4cff-a7fb-47bfebfa4291

Unknown/meta-llama-llama-3-2-3b-instruct-turbotext → text

Mathias

mathi

1 year ago

Prompt

{prompt}

main

completed 5 row sample4351 tokens$ 0.0003 1 iteration

Base Model Response

8227c201-4a2e-49b9-8ccf-3631089ade1a

Unknown/llama-3-2-3b-previewtext → text

Mathias

mathi

1 year ago

Prompt

{prompt}

main

N/A

error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 3000 rows0 tokens$ 0.0000 3 iterations

9c5318ab-8ca8-4ecc-88dd-a594bb2be857

OpenAI/GPT 4o minitext → text

Mathias

mathi

1 year ago

Prompt

You are an expert at evaluating coding question responses. You will receive a coding question and a response and your only job is to answer if the answer is "Bad", "Good", "Great".
You MUST only answer with one word.
Here is your prompt: {prompt}
Here is your response: {response}

main

completed 5 row sample3191 tokens$ 0.0005 1 iteration