Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
You are writing single-line, comma-separated captions to train a content-only dataset for WAN 2.1 T2V. Output exactly one line per image, 22–40 words, no line breaks. Do not mention style/medium/rendering, camera/lens, brands/IP, or artist names. Use generic nouns and describe only what is visible.
Order (phrases, comma-separated):
Scene summary (what + where)
Subjects with counts (e.g., two people, one dog)
Actions/poses (walking, holding hands, looking at camera)
Setting/background elements (sidewalk, street, trees, buildings)
Composition/shot distance (close portrait, medium shot, full body, wide view, centered/off-center)
Key props/attributes (scarf, umbrella, crown, bottle on table)
Facial expression/eye state if clearly visible (smiling, neutral, left eye closed winking)
Color/lighting only if essential to identify the scene (autumn leaves, night scene, strong shadows)
Text presence if visible: say “a sign with text” (don’t transcribe)
Rules:
No subjective adjectives (e.g., beautiful, cinematic).
Mention gender/age/ethnicity only if obvious and relevant.
If background is plain, say “simple background” or “plain backdrop.”
Omit any field that isn’t visible.
Examples (content-only):
couple walking on a city sidewalk in autumn, two people, holding hands and looking toward the camera, street and buildings with trees and fallen leaves, medium shot full body, light jackets and jeans, neutral expressions, simple daytime scene
product on tabletop, one glass bottle, standing upright with long cast shadow, plain wall and flat surface, centered composition, no label text legible, simple background
ice-cream cone on plain backdrop, single cone with three scoops and sprinkles, static pose, wide view with right-weighted framing, simple background, one small drip visible
Caption this image {file_path}
You are writing single-line captions to train a style LoRA for WAN 2.1 T2V. Produce one comma-separated line (no line breaks) per image, 25–60 words. Do not include character names, brand/IP names, or camera/lens models. Emphasize visual style over identity and keep subjects generic.
Repeat this STYLE ANCHOR STACK verbatim at the start of every caption: digital gouache, soft pastel tones, cool grays + warm reds, clean edges, smooth gradients, soft overcast lighting, simple background, minimal noise
After the anchors, describe what’s visible using these fields (phrases only, separated by commas, any missing fields may be omitted):
Base class & subject (generic): e.g., illustration of a person, cartoon figure, fantasy landscape, product render, animal portrait
Additional style/medium keywords (consistent with anchors): e.g., painterly brush texture, vector-like shading, toon rendering
Palette & materials actually visible: key colors/patterns/materials
Lighting & mood actually visible: e.g., soft studio lighting, high-key, warm rim light, cheerful mood
Composition & shot type: e.g., close portrait, medium shot, full body, three-quarter view, centered composition, white/simple background
Notable visible details (generic): gestures, props, background simplicity, facial expression (e.g., left eye closed winking), motion cues kept mild
Quality/layout helpers (optional): clean edges, smooth gradients, minimal noise
Rules: Be factual; use generic nouns (person/figure/character); if text appears, say “a sign with text” (don’t transcribe); only note gender/age/ethnicity if visually clear and relevant; avoid negatives, camera metadata, and subjective terms. No trigger words.
Output format example:
digital gouache, soft pastel tones, cool grays + warm reds, clean edges, smooth gradients, soft overcast lighting, simple background, minimal noise, illustration of a person, painterly brush texture, muted blue and warm amber palette, soft evening glow, medium shot three-quarter view, holding an umbrella, calm mood, smooth gradients, minimal noise
Caption this image {file_path}
You are writing single-line captions to train a style LoRA for WAN 2.1 T2V. Produce one comma-separated line (no line breaks) per image, 25–60 words. Do not include character names, brand/IP names, or camera/lens models. Emphasize visual style over identity and keep subjects generic.
Repeat this STYLE ANCHOR STACK verbatim at the start of every caption: digital gouache, soft pastel tones, cool grays + warm reds, clean edges, smooth gradients, soft overcast lighting, simple background, minimal noise
After the anchors, describe what’s visible using these fields (phrases only, separated by commas, any missing fields may be omitted):
Base class & subject (generic): e.g., illustration of a person, cartoon figure, fantasy landscape, product render, animal portrait
Additional style/medium keywords (consistent with anchors): e.g., painterly brush texture, vector-like shading, toon rendering
Palette & materials actually visible: key colors/patterns/materials
Lighting & mood actually visible: e.g., soft studio lighting, high-key, warm rim light, cheerful mood
Composition & shot type: e.g., close portrait, medium shot, full body, three-quarter view, centered composition, white/simple background
Notable visible details (generic): gestures, props, background simplicity, facial expression (e.g., left eye closed winking), motion cues kept mild
Quality/layout helpers (optional): clean edges, smooth gradients, minimal noise
Rules: Be factual; use generic nouns (person/figure/character); if text appears, say “a sign with text” (don’t transcribe); only note gender/age/ethnicity if visually clear and relevant; avoid negatives, camera metadata, and subjective terms. No trigger words.
Output format example:
digital gouache, soft pastel tones, cool grays + warm reds, clean edges, smooth gradients, soft overcast lighting, simple background, minimal noise, illustration of a person, painterly brush texture, muted blue and warm amber palette, soft evening glow, medium shot three-quarter view, holding an umbrella, calm mood, smooth gradients, minimal noise
Caption this image {file_path}
You are creating single-line captions to train a style LoRA for WAN 2.1 T2V.
Output one comma-separated line (no line breaks) per image. Do not include character names, brand/IP names, or camera/lens models.
Goal: describe what’s visible while emphasizing visual style over identity. Keep it generic (no personal/unique names), and suitable for both images and as base prompts for T2V.
Length: 25–60 words.
Order & fields (use as phrases, separated by commas):
Base class & subject (generic): e.g., “illustration of a person”, “cartoon mermaid”, “fantasy landscape”, “product render”, “animal portrait”.
Style & medium keywords: “cartoon animation aesthetic”, “digital illustration”, “vector-like shading”, “painterly brush texture”, “3D toon render”, etc.
Palette & materials: key colors or materials/patterns seen (e.g., “bright cyan and gold palette”, “scales texture”, “glossy highlights”).
Lighting & mood: “soft studio lighting”, “bright high-key”, “warm rim light”, “cheerful mood”.
Composition & shot type: “close portrait”, “medium shot”, “full body”, “three-quarter view”, “centered composition”, “white background”.
Notable visible details (generic): gestures, props, background simplicity, facial expression (“left eye closed winking”, “holding golden trident”, “hair flowing”, “clean background”).
Quality/layout helpers (optional): “clean edges, smooth gradients, minimal noise”.
Rules:
Be factual (describe only what’s visible).
Use generic nouns (“person”, “figure”, “character”) instead of names or brands.
If text appears in the image, say “a sign with text” (don’t transcribe).
Only mention gender/age/ethnicity if visually clear and relevant.
For faces, you may note expression/eye state (“smiling”, “eyes closed”, “left eye closed winking”).
Avoid negatives, camera metadata, and subjective adjectives like “beautiful”, “masterpiece”.
No trigger words; this is style training, not character training.
Output format example:
cartoon mermaid, digital illustration, cartoon animation aesthetic, bright blue and gold palette with scale textures, soft high-key lighting, full body three-quarter view, hair flowing, holding a golden trident, cheerful mood, clean white background, smooth gradients, minimal noise
More examples:
person portrait, vector-style digital art, saturated warm palette with teal accents, soft studio lighting, close headshot, left eye closed winking and smiling, clean background, smooth shading, crisp linework
fantasy character, painterly toon rendering, cyan and magenta palette, rim-lit, medium shot seated, gesturing with one hand, simple gradient background, subtle film grain, soft brush texture
product render, flat graphic design aesthetic, pastel palette, even lighting, centered composition on white, icon-like silhouette, clean edges, minimal shadows
caption this image {file_path}