Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
4833fdf4-332f-4280-817b-1bbb60903baa
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
2 weeks ago
Translate the user query to a sql statement given the table schema. Return only the SQL and nothing else no markdown.

Schema:
{schema}

Query:
{query}
completed 5 row sample493 tokens$ 0.0001 4 iterations
bbacf907-8971-45d9-b4c4-90161606d12e
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
3 weeks ago
classify the query into "has_date" or "no_date". reply with one word

{query}
completed 5 row sample231 tokens$ 0.0000 2 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
3 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows63125 tokens$ 0.0113 2 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
3 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows62715 tokens$ 0.0123 2 iterations
211c9d8d-5450-4afa-a2be-bf75fa802218
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
3 weeks ago
Write a SQL statement that is equivalent to the natural language user query below given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
completed 200 rows30565 tokens$ 0.1080 1 iteration
ff98baff-87ba-4a8b-80e0-84180930d60f
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
3 weeks ago
Write a SQL statement that is equivalent to the natural language user query below given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 200 rows30564 tokens$ 0.1080 2 iterations
6c10db23-ffb9-4b02-992f-5515440c818c
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
3 weeks ago
Write a SQL statement that is equivalent to the natural language user query below. You are given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 500 rows105981 tokens$ 0.0848 3 iterations
3fd776ec-6942-43d1-b56f-b4def7d67c13
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
3 weeks ago
Write a SQL statement that is equivalent to the natural language user query. You are given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
completed 5 row sample915 tokens$ 0.0007 1 iteration
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 200 rows68095 tokens$ 0.0135 2 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows63788 tokens$ 0.0127 2 iterations
64e34075-866e-4126-a2d7-76539d44da2d
OpenAIOpenAI/GPT 4.1text → text
Bessie
ox
4 weeks ago
Write a SQL statement that is equivalent to the natural language user query below given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
completed 200 rows30621 tokens$ 0.0868 2 iterations
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows54977 tokens$ 0.0144 3 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows65020 tokens$ 0.0129 2 iterations
5c8cd72a-6cd2-45de-8d11-82c44741cfdc
OpenAIOpenAI/GPT 4.1text → text
Bessie
ox
4 weeks ago
Write a SQL statement that is equivalent to the natural language user query below given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
completed 200 rows31614 tokens$ 0.0906 2 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows66004 tokens$ 0.0126 2 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows69267 tokens$ 0.0135 2 iterations
GoogleGoogle/Gemini 2.0 Flashtext → text
Bessie
ox
4 weeks ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. Do not include any markdown surrounding the xml.

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 200 rows64547 tokens$ 0.0127 3 iterations
9a7c84ae-e782-402c-af17-c66cbab91c5e
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
4 weeks ago
Write a SQL statement that is equivalent to the natural language user query below given the schema in the format of a CREATE TABLE SQL statement. Assume the table is called "df". DO NOT give any preamble or extra characters or markdown just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
completed 200 rows31705 tokens$ 0.1142 3 iterations
GoogleGoogle/Gemini 2.0 Flash Litetext → text
Bessie
ox
1 month ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. 

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 100 rows30099 tokens$ 0.0042 2 iterations
67719af1-b878-4498-8907-68de58673310
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. 

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 5 row sample0 tokens$ 0.0000 1 iteration
00278494-61a3-4237-9548-a0808ac1feaa
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. 

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 100 rows28751 tokens$ 0.0080 2 iterations
427b9bd9-159c-4275-aaee-3010ed11a394
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. 

For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 100 rows29121 tokens$ 0.0087 2 iterations
1c2fc383-99ae-4ee0-8875-7e5c6dcf7242
OpenAIOpenAI/GPT 4.1text → text
Bessie
ox
1 month ago
Write a SQL statement that is equivalent to the natural language user query below given the schema in the format column_name:type. Assume the table is called "df". Do not give any preamble or extra characters just the SQL query in plain text. Make sure the SQL query is on one line.

Schema:
{schema}

User Query:
{query}

SQL Query:
completed 100 rows14905 tokens$ 0.0430 2 iterations
53b1ee1c-1761-4309-8c13-7196f8c05d08
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
Compare the following SQL statements given the database table to see if they are equivalent. If they are not the same, give a reason as to why. Format your response with two xml tags, one for the reasoning, and one a true or false statement indicating whether or not the statements are the same. For example:

<reason>
  The reason the statements differ.
</reason>
<answer>
  true or false
</answer>

Are these two SQL statements equivalent given the schema:

Schema:
{schema}

Statement 1:
{sql}

Statement 2:
{prediction}
completed 100 rows30260 tokens$ 0.0089 2 iterations
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
You will be given a user query, a list of CREATE TABLE sql statements, a single table name that the query is interested in, and a corresponding SQL statement. You will do several transformations on the input.

1) Extract the the single CREATE TABLE statement that references the given table.
2) Replace the table name with "df" in both the sql statement and the CREATE TABLE statement
3) Format the response into three sections like below

<query>
  The user query goes here
</query>
<schema>
  The CREATE TABLE statement goes here on one line
</schema>
<sql>
  The SQL statement goes here
</sql>

Make sure both the schema with the CREATE TABLE statement and the sql statement reference "df" instead of the original table name. Only extract one CREATE TABLE STATEMENT and format it onto a single line without any newlines. Think before responding with the proper xml tags.

Here are the original inputs:

<query>
{instruction}
</query>
<schema>
{input}
</schema>
<sql>
{response}
</sql>

Your output goes here:
error no case clause matching: {:error, "resource_not_found", 0, 0} 7833 / 7834 rows3640759 tokens$ 0.8710 3 iterations
ff76deda-2ba9-4047-ba15-a063e0de496d
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
Determine the number of tables used in the following SQL statement given the CREATE TABLE statements below it. Output the table names in a comma separated list.

The output format should be xml tags containing the number of tables used and the table names.

For example:
<num_tables>2</num_tables>
<tables>table_1,table_2</tables>

{schema}
{output}
completed 5 row sample889 tokens$ 0.0002 1 iteration
6b31ff74-cc0b-414e-b4f4-3d4913257f5d
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
2 months ago
Format the following SQL statement properly onto one line. Respond with the formatted SQL statement and nothing else.

{output}

SQL:
num_tables
num_tables
completed 1034 rows115548 tokens$ 0.1040 3 iterations
c1c2168f-d6cb-48cd-b8be-6333263ee9b8
MetaMeta/Llama 3.3 70B Instructtext → text
Bessie
ox
2 months ago
Determine the number of tables used in the following SQL statement given the CREATE TABLE statements below it. Output the table names in a comma separated list.

The output format should be xml tags containing the number of tables used and the table names.

For example:
<num_tables>2</num_tables>
<tables>table_1,table_2</tables>

{input}
{response}
conflict/main/ox-536
completed 10000 rows4771910 tokens$ 4.29 4 iterations
6bd3b681-4ae3-4181-94f2-190e72cd2529
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
2 months ago
Determine the number of tables used in the following SQL statement given the CREATE TABLE statements below it. Output the table names in a comma separated list.

The output format should be xml tags containing the number of tables used and the table names.

For example:
<num_tables>2</num_tables>
<tables>table_1,table_2</tables>

{schema}
{output}
num_tables
completed 1034 rows40579 tokens$ 0.0079 2 iterations