Evaluations/caption images/Iteration history
History
Total running cost: $0.1371
PromptRowsTypeModelTargetStatusRuntimeRunByTokensCost
Run
You are writing single-line, comma-separated captions to train a content-only dataset for WAN 2.1 T2V. Output exactly one line per image, 22–40 words, no line breaks. Do not mention style/medium/rendering, camera/lens, brands/IP, or artist names. Use generic nouns and describe only what is visible. Order (phrases, comma-separated): Scene summary (what + where) Subjects with counts (e.g., two people, one dog) Actions/poses (walking, holding hands, looking at camera) Setting/background elements (sidewalk, street, trees, buildings) Composition/shot distance (close portrait, medium shot, full body, wide view, centered/off-center) Key props/attributes (scarf, umbrella, crown, bottle on table) Facial expression/eye state if clearly visible (smiling, neutral, left eye closed winking) Color/lighting only if essential to identify the scene (autumn leaves, night scene, strong shadows) Text presence if visible: say “a sign with text” (don’t transcribe) Rules: No subjective adjectives (e.g., beautiful, cinematic). Mention gender/age/ethnicity only if obvious and relevant. If background is plain, say “simple background” or “plain backdrop.” Omit any field that isn’t visible. Examples (content-only): couple walking on a city sidewalk in autumn, two people, holding hands and looking toward the camera, street and buildings with trees and fallen leaves, medium shot full body, light jackets and jeans, neutral expressions, simple daytime scene product on tabletop, one glass bottle, standing upright with long cast shadow, plain wall and flat surface, centered composition, no label text legible, simple background ice-cream cone on plain backdrop, single cone with three scoops and sprinkles, static pose, wide view with right-weighted framing, simple background, one small drip visible Caption this image {file_path}
57imagetextOpenAIOpenAI/GPT 4o miniacd4fe8289b60d14eaf871bb48db0ca9 completed 00:03:162 months agoshadowworks832971 tokens$ 0.1260
Sample
You are writing single-line, comma-separated captions to train a content-only dataset for WAN 2.1 T2V. Output exactly one line per image, 22–40 words, no line breaks. Do not mention style/medium/rendering, camera/lens, brands/IP, or artist names. Use generic nouns and describe only what is visible. Order (phrases, comma-separated): Scene summary (what + where) Subjects with counts (e.g., two people, one dog) Actions/poses (walking, holding hands, looking at camera) Setting/background elements (sidewalk, street, trees, buildings) Composition/shot distance (close portrait, medium shot, full body, wide view, centered/off-center) Key props/attributes (scarf, umbrella, crown, bottle on table) Facial expression/eye state if clearly visible (smiling, neutral, left eye closed winking) Color/lighting only if essential to identify the scene (autumn leaves, night scene, strong shadows) Text presence if visible: say “a sign with text” (don’t transcribe) Rules: No subjective adjectives (e.g., beautiful, cinematic). Mention gender/age/ethnicity only if obvious and relevant. If background is plain, say “simple background” or “plain backdrop.” Omit any field that isn’t visible. Examples (content-only): couple walking on a city sidewalk in autumn, two people, holding hands and looking toward the camera, street and buildings with trees and fallen leaves, medium shot full body, light jackets and jeans, neutral expressions, simple daytime scene product on tabletop, one glass bottle, standing upright with long cast shadow, plain wall and flat surface, centered composition, no label text legible, simple background ice-cream cone on plain backdrop, single cone with three scoops and sprinkles, static pose, wide view with right-weighted framing, simple background, one small drip visible Caption this image {file_path}
5imagetextOpenAIOpenAI/GPT 4o miniSample - N/A completed 00:00:192 months agoshadowworks73052 tokens$ 0.0110