History
Total running cost: $0.5422
PromptRowsTypeModelTargetStatusRuntimeRunByTokensCost
Run
Judge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following criteria: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
50image → textAnthropic AIAnthropic AI/Claude 3.7 Sonnet12a062cde9ab46dfd404c2f4248d4046 completed 00:08:315 months agoox63003 tokens$ 0.4475
Sample
Judge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following criteria: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textAnthropic AIAnthropic AI/Claude 3.7 SonnetSample - N/A completed 00:00:465 months agoox6336 tokens$ 0.0453
Sample
Judge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the criteria * "okay" if the image has elements of the criteria, but is not good yes * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following criteria: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textOpenAIOpenAI/GPT 4oSample - N/A completed 00:00:405 months agoox4401 tokens$ 0.0185
Sample
Judge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the description * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textOpenAIOpenAI/GPT 4oSample - N/A completed 00:00:395 months agoox4340 tokens$ 0.0186
Sample
Judge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the description * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? Is there anything wrong with them? Do they look realistic? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textMetaMeta/Llama 4 MaverickSample - N/A completed 00:00:095 months agoox0 tokens$ 0.0000
Sample
Judge the following image on a few different criteria. Be very critical. We are aiming for perfection. Each of judgements should be one of three values: * "bad" if the image does not match the description * "good" if the image matches the criteria, but could be better * "perfect" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names and should be graded on the following: <character></character> Is the character a 3D Pixar-style white furry ox? <task></task> Is the character performing the task described? <objects></objects> Are all the necessary objects in the scene? <expression></expression> Is the character's expression wide open and happy, with a visible spark of joy or engagement, conveying satisfaction in the activity? <texture></texture> The fur must show clear texture and depth, with soft lighting that avoids harsh shadows or bright highlights. <coloring></coloring> The fur must not contain any yellow or sepia tones (e.g., check that RGB values of white areas are near (255, 255, 255)). <background></background> The entire background must be pure white (#FFFFFF) with no visible gradient, vignette, or objects other than the ox, gears, and workbench. An example response looks like this: <reasoning>Your reasoning</reasoning> <character>good</character> <task>bad</task> <objects>bad</objects> <expression>perfect</expression> <texture>perfect</texture> <coloring>good</coloring> <background>perfect</background> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textMetaMeta/Llama 4 MaverickSample - N/A completed 00:00:175 months agoox6274 tokens$ 0.0026
Sample
Judge the following image on a few different dimensions. Be very critical. Each of judgements should be one of three values: * "poor" if the image does not match the description * "good" if the image matches the description, but could be better * "great" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names: <reasoning> Step by step reasoning of why the image is good or not </reasoning> <style> Is the character in the style specified? Is the fur pure white with a nice texture? Is the character friendly? </style> <task> How well is the task being portrayed? Are all the items and actions present? </task> <quality> What is the overall quality of the character? Are there any defects such as too many legs, eyes not open, etc? </quality> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textMetaMeta/Llama 4 MaverickSample - N/A completed 00:00:335 months agoox8358 tokens$ 0.0042
Sample
Judge the following image on a few different dimensions. Be very critical. Each of judgements should be one of three values: * "poor" if the image does not match the description * "good" if the image matches the description, but could be better * "great" if there is nothing that could be improved about the image Return the judgements in xml format. The xml should contain with the following field names: <reasoning> Step by step reasoning of why the image is good or not </reasoning> <description> Does the character match the description? </description> <task> How well is the task being portrayed? Are all the items and actions present? </task> <quality> What is the overall quality of the image? </quality> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textUnknown or deleted modelSample - N/A completed 00:00:165 months agoox5742 tokens$ 0.0013
Sample
Judge the following image on a few different dimensions. Return the judgements in xml format. The judgement should be either "poor", "good", or "great". The xml should contain with the following field names: <reasoning> Step by step reasoning of why the image is good or not </reasoning> <description> Does the character match the description? </description> <task> Is the character doing the task specified? </task> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textUnknown or deleted modelSample - N/A completed 00:00:175 months agoox5854 tokens$ 0.0016
Sample
Judge the following image on a few different dimensions. Return the judgements in xml format. The judgement should be either "poor", "good", or "great". The xml should contain with the following field names: <reasoning> Step by step reasoning of why the image is good or not </reasoning> <description> Does the character match the description? </description> <task> Is the character doing the task specified? </task> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textQwenQwen/Qwen2 VL 72B InstructSample - N/A completed 00:00:365 months agoox2912 tokens$ 0.0026
Sample
Judge the following image on a few different dimensions. Return the judgements in xml format. The judgement should be either "poor", "good", or "great". The xml should contain with the following field names: <reasoning> Step by step reasoning of why the image is good or not </reasoning> <description> Does the character match the description? </description> <task> Is the character doing the task specified? </task> Prompt: {prompt} Image: {image} Reason through your thoughts step by step before responding. Put your thoughts in the <reasoning></reasoning> tags.
5image → textGoogleGoogle/Gemini 2.0 FlashSample - N/A completed 00:00:005 months agoox0 tokens$ 0.0000