Repository evaluations - ox/Text2SQL

Evaluations/Normalize the training dataset to use "df" and provide a schema.

main

train.jsonl

Type: text → text

Model:

OpenAI/GPT 4o mini

Provider:

OpenAI

Target field: formatted_data

Prompt

You will be given a user query, a list of CREATE TABLE sql statements, a single table name that the query is interested in, and a corresponding SQL statement. You will do several transformations on the input.

1) Extract the the single CREATE TABLE statement that references the given table.
2) Replace the table name with "df" in both the sql statement and the CREATE TABLE statement
3) Format the response into three sections like below

<query>
  The user query goes here
</query>
<schema>
  The CREATE TABLE statement goes here on one line
</schema>
<sql>
  The SQL statement goes here
</sql>

Make sure both the schema with the CREATE TABLE statement and the sql statement reference "df" instead of the original table name. Only extract one CREATE TABLE STATEMENT and format it onto a single line without any newlines. Think before responding with the proper xml tags.

Here are the original inputs:

<query>
{instruction}
</query>
<schema>
{input}
</schema>
<sql>
{response}
</sql>

Your output goes here:

Queued: May 14, 2025, 4:55 PM UTC

Completed: May 14, 2025, 4:55 PM UTC

5 row sample

1985 tokens$ 0.0005

5 rows processed, 1985 tokens used ($0.0005)

Estimated cost for all 7834 rows: $0.7542

Sample Results completed

8 columns, 1-5 of 7834 rows