Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
848394c8-5df0-412c-8493-fdbe32a5d155

ox
2 months ago# Image Judging Rubric You are an animator looking at an artists work. Judge the following image on a few different criteria. Be very critical. We are aiming for a movie quality character. ## Valid Values Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image ## Criteria Descriptions The criteria in which the image should be graded on are as follows: character: Is the character a 3D Pixar-style white furry ox? task: Is the character performing the task described? objects: Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? expression: Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? texture: The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. coloring: The fur must NOT contain any yellow or sepia tones. It should be a shade of white with a tone as if it lives in the arctic or himalayas. background: The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ones specified in the prompt. ## Return Format Return the judgements in xml format. The xml should contain the criteria name in the tag. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags. ## Inputs Prompt: {prompt} Image: {image}
error no case clause matching: {:error, {%{type: "bad_request", title: "Bad Request", detail: "new_path already exists"}, 400}, 0, 0} 49 rows173266 tokens$ 0.8797 1 iteration
1633a498-2f70-4ec4-8e7e-7779bd843644

ox
2 months ago# Image Judging Rubric You are an animator looking at an artists work. Judge the following image on a few different criteria. Be very critical. We are aiming for a movie quality character. ## Valid Values Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image ## Criteria Descriptions The criteria in which the image should be graded on are as follows: character: Is the character a 3D Pixar-style white furry ox? task: Is the character performing the task described? objects: Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? expression: Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? texture: The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. coloring: The fur must NOT contain any yellow or sepia tones. It should be a shade of white with a tone as if it lives in the arctic or himalayas. background: The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ones specified in the prompt. ## Return Format Return the judgements in xml format. The xml should contain the criteria name in the tag. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags. ## Inputs Prompt: {prompt} Image: {image}
6c67636e-2d2d-449e-b096-02821a7a157a

ox
2 months agoJudge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following criteria: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
3688acba-de8d-4344-85cc-7db6411fd2a4

ox
2 months agoJudge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following criteria: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
a02538e1-fe15-4a10-97d1-f25b44013de6

ox
2 months agoA cute while handsome, 3D Pixar-style white ox character {prompt}. The ox should be with soft, fluffy fur and a slightly rounded, friendly body. The ox has large, curved beige horns, a big pink nose, expressive brown eyes, and a gentle, content smile. The ox should have wide open happy eyes, with a little spark like he is content with the activity he is doing. The character should have an endearing pose with a bit of gravitas. The fur is detailed with soft, realistic texturing and light shading, giving a plush, huggable appearance. The lighting is soft and even, highlighting the texture of the fur. The background is pure white, creating a clean studio look that emphasizes the character. The overall tone is whimsical, heartwarming, yet regal, suitable for an animated feature. The image should NOT contain any yellowish or vintage tone. The ox should be the same texture, lighting and shading as the reference picture.
b03ff09e-491c-48ee-adca-fc26b538a6ca

ox
2 months agoJudge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following criteria: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
model-epoch-50
model-epoch-50