Evaluations/Normalize the training dataset to use "df" and provide a schema.
main
train.jsonl
text → text
OpenAIOpenAI/GPT 4o mini
OpenAI OpenAI
formatted_data
You will be given a user query, a list of CREATE TABLE sql statements, a single table name that the query is interested in, and a corresponding SQL statement. You will do several transformations on the input.

1) Extract the the single CREATE TABLE statement that references the given table.
2) Replace the table name with "df" in both the sql statement and the CREATE TABLE statement
3) Format the response into three sections like below

<query>
  The user query goes here
</query>
<schema>
  The CREATE TABLE statement goes here on one line
</schema>
<sql>
  The SQL statement goes here
</sql>

Make sure both the schema with the CREATE TABLE statement and the sql statement reference "df" instead of the original table name. Only extract one CREATE TABLE STATEMENT and format it onto a single line without any newlines. Think before responding with the proper xml tags.

Here are the original inputs:

<query>
{instruction}
</query>
<schema>
{input}
</schema>
<sql>
{response}
</sql>

Your output goes here:
May 14, 2025, 4:55 PM UTC
May 14, 2025, 4:55 PM UTC
5 row sample
1985 tokens$ 0.0005
5 rows processed, 1985 tokens used ($0.0005)
Estimated cost for all 7834 rows: $0.7542
Sample Results completed
8 columns, 1-5 of 7834 rows