How to Write AI Image Prompts That Actually Work: The 2026 Visual Prompting Playbook

A practical 2026 playbook for writing AI image prompts that produce usable results on the first try. Covers the 6-layer prompt structure, negative prompts, references, weights, model-specific tips for Midjourney / Imagen 4 / Firefly / Stable Diffusion, and 12 copy-paste prompt templates.

TL;DR

A good AI image prompt has 6 layers: subject → composition → lighting → mood → style → constraints. Skip a layer and the model fills it in for you — usually badly.
Negative prompts are not optional. "No text, no watermarks, no extra fingers, no logos" alone fixes 30% of common failures.
References + weights beat any amount of clever adjectives. If your model supports image references, use them.
Model-specific syntax matters: Midjourney loves comma-separated phrases and --ar/--style flags; Imagen 4 prefers prose; Firefly wants structure; SDXL wants weight notation.
Treat prompting as iteration not authorship. Plan for 4–8 rerolls per usable result and build a small library of templates you trust.

This article fits inside ImgIvy's broader Complete 2026 Guide to Using AI-Generated Images and Videos Commercially.

Most "AI looks fake" complaints in 2026 are really "this prompt was 4 words long" complaints. The frontier models — Midjourney v7, Imagen 4, Firefly Image 4, SDXL Turbo, FLUX 1.1 Pro — can all produce ad-quality results, but they need direction. This is the playbook our editors actually use when curating the AI image library on ImgIvy.

The 6-layer prompt structure

Every good prompt answers six questions, in roughly this order:

1. Subject       — What / who is the picture of?
2. Composition   — Framing, angle, lens, focal point.
3. Lighting      — Source, direction, quality, color temperature.
4. Mood          — The feeling the viewer should have in one second.
5. Style         — Aesthetic / artistic reference / medium.
6. Constraints   — What the image must NOT contain.

Compare:

Bad prompt (1 layer):

ramen shop at night

Good prompt (6 layers):

A quiet ramen shop at midnight, viewed from a low angle through the steam, 35mm photo, shallow depth of field, single warm tungsten light source above the counter, light rain on the front window catching neon reflections from outside, solitary and cinematic mood, in the style of Wong Kar-wai's In the Mood for Love, photoreal, fine film grain — no people's faces, no English text, no logos.

The second prompt isn't "magic" — it's just specific. Every model in 2026 will produce something usable on the first try with that input. The first prompt will give you a stock-photo of a ramen shop and you'll wonder why "AI is so generic."

Layer 1 — Subject

The subject layer should be one concrete noun phrase, not a topic.

Topic (vague)	Subject (concrete)
"office"	"a hot desk with two monitors and a half-empty coffee cup"
"happy person"	"a woman in her 30s laughing mid-sentence at a kitchen counter"
"futuristic city"	"a small noodle stall at the base of a 200-storey neon tower"

If you find yourself writing more than one sentence here, you have multiple images, not one prompt. Pick the one you actually want.

Layer 2 — Composition

Composition tells the model where things are in the frame. Without it you get the model's default ("medium shot, centered, eye-level") for everything.

Useful vocabulary:

Shot size: extreme close-up, close-up, medium shot, wide shot, establishing shot.
Angle: low angle, high angle, eye-level, bird's eye, worm's eye, dutch tilt.
Lens: 24mm, 35mm, 50mm, 85mm, 200mm, tilt-shift, macro.
Framing: rule of thirds, centered, leading lines, negative space, symmetry.
Depth: shallow depth of field, deep focus, bokeh, rack focus.

For social or ad work, always specify aspect ratio at the prompt level (Midjourney --ar 16:9, Imagen aspect_ratio="16:9", Firefly preset) rather than cropping after.

Layer 3 — Lighting

If you only had budget for one extra word in your prompt, make it a lighting word. Lighting changes the perceived quality of the image more than any other variable.

A 4-question lighting micro-template:

Source: sunlight / window / lamp / screen / fire / neon / overcast / studio softbox / golden hour / blue hour.
Direction: front-lit / side-lit / back-lit / top-down / rim-lit.
Quality: hard / soft / dappled / harsh / diffused.
Color temperature: warm tungsten / cool fluorescent / neutral daylight / golden / amber / teal-orange.

Example: "backlit by golden-hour sun through dust, soft diffused fill, warm tungsten accent from a desk lamp."

Layer 4 — Mood

Mood is the layer that turns a competent image into one that gets shared. Pick 2–3 mood adjectives that contradict each other slightly — "intimate but cinematic," "playful and a little melancholy," "minimalist yet warm." Pure adjectives like "beautiful" and "amazing" do almost nothing because every prompt includes them.

Layer 5 — Style

Style is where you get to be specific. Three options, in increasing order of risk:

Medium: oil painting, watercolor, charcoal sketch, 35mm film, polaroid, claymation, gouache, isometric 3D, low-poly, pixel art.
Movement / genre: art deco, brutalism, bauhaus, ukiyo-e, cyberpunk, solarpunk, dieselpunk.
Named artist or work: "in the style of [X]." This is the most powerful and the most fraught lever. Major models have begun filtering some living-artist names; many will accept the name but produce only a vague pastiche. For commercial work, prefer movement / genre over named living artists — see Are AI-Generated Images Copyright Free? for the legal context.

A safer modern formula: "in the visual language of [movement] crossed with [medium]." Example: "in the visual language of Japanese ukiyo-e crossed with neon noir."

Layer 6 — Constraints (negative prompts)

This is the layer most people forget and the one that has the highest return on effort.

A baseline constraint string that should sit in almost every prompt:

no text, no watermarks, no logos, no signature, no extra fingers, no extra limbs, no recognizable real people, no brand marks, no copyrighted characters

Add scene-specific constraints:

For product shots: "no shadow under product, no reflection of camera"
For UI/interface mockups: "no real brand names, no readable text inside UI"
For people: "no asymmetrical eyes, no melted ears, no extra teeth"

In Midjourney use the --no flag. In SDXL/ComfyUI use the dedicated negative prompt field. In Imagen/Firefly/Sora, append "Avoid: …" prose to the prompt.

References, weights and seeds

If your tool supports image references, use them. A 3-image reference set will out-perform 200 carefully chosen adjectives almost every time.

Midjourney: --cref <url> for character, --sref <url> for style, --iw 2 to bias toward the image reference.
Imagen 4 / Veo 3: reference image upload in Vertex AI; mix subject + style references.
Firefly: "Structure" and "Style" reference panels in Firefly Image 4.
SDXL / FLUX: IP-Adapter, ControlNet, T2I-Adapter for compositional references.

Seeds lock randomness. When you find a near-perfect result, capture the seed (and the model version) so you can iterate on the same starting point. This is the single biggest workflow upgrade for production teams.

Weights let you tell the model what matters most:

Midjourney: (stunning lighting:1.4) or --weird 100.
SDXL/ComfyUI: (word:1.3) to boost, [word] to demote.

Model-specific cheat sheet

Midjourney v7

[subject], [composition], [lighting], [mood], [style], no [constraints] --ar 3:2 --style raw --v 7 --quality 2

Loves comma-separated phrases and short atmospheric adjectives.
--style raw reduces "Midjourney house aesthetic" when you want photorealism.
--sref and --cref for style / character locking are the biggest workflow upgrades since v5.

Google Imagen 4

Prose paragraph describing subject and scene in natural language. Then a second paragraph for composition, lighting, mood, style. Then "Avoid: …" for constraints. Aspect 16:9.

Prefers full prose to comma lists.
Strong at photoreal humans and on-image text rendering (the best in 2026).
Use Vertex AI for the indemnified commercial flow — see our tool comparison.

Adobe Firefly Image 4

Subject. Setting. Lighting. Style. Mood. Composition tags. Camera tags.

Friendly to a structured "tag list" approach; the prompt UI has a structure panel.
Most commercial-safe option — trained on licensed Adobe Stock and public-domain content.
Use Structure/Style reference panels rather than long style adjectives.

Stable Diffusion (SDXL / FLUX 1.1 Pro)

(masterpiece, best quality:1.3), [subject], [composition], [lighting], [mood], [style]
Negative: text, watermark, logo, lowres, blurry, extra fingers, deformed hands

Weight notation (word:1.3) is part of the syntax; use it sparingly.
Negative prompt field is mandatory in any production workflow.
Combine with ControlNet/IP-Adapter for reproducibility across a series.

12 copy-paste templates

Use these as starting points. Each fills the 6 layers and a baseline constraint string.

1. Product hero shot

"A [product] photographed on a [surface] in soft golden-hour window light, 50mm lens, shallow depth of field, centered with breathing room, mood: aspirational and calm, in the style of modern e-commerce editorial photography, photoreal, fine grain — no text, no watermarks, no other products, no shadows on backdrop, --ar 4:5"

2. Lifestyle marketing photo

"A [person] in their 30s [doing activity] in a sunlit Scandinavian apartment, 35mm photo, medium shot, soft window light from the left, mood: candid and warm, in the style of an unposed Apple lifestyle ad, photoreal, slight film grain — no recognizable likeness, no brand logos, no readable text, --ar 3:2"

3. Editorial / magazine illustration

"An editorial illustration about [topic], flat shapes with paper-cut texture, limited 4-color palette (terracotta, cream, deep navy, mustard), confident negative space, mood: thoughtful and a little ironic, in the visual language of The New Yorker meets Saul Bass, no realistic faces, no text — --ar 3:4"

4. SaaS / tech hero image

"An isometric 3D illustration of a [product workflow], pastel palette (mint, lavender, peach), soft ambient occlusion, mood: friendly and competent, in the style of modern fintech onboarding illustration, no text, no readable UI labels, no logos, --ar 16:9"

5. Cinematic establishing shot

"A wide establishing shot of [location] at [time of day], anamorphic lens flares, deep focus, golden-hour back-lit haze, mood: lonely and grand, in the style of Roger Deakins cinematography, photoreal, subtle 35mm grain, no people in frame, no text, --ar 21:9"

6. Character portrait (commercial-safe)

"A medium close-up portrait of an [age]-year-old [non-specific cultural background] [profession], facing 3/4 toward camera, soft window light from the right, mood: confident and approachable, photoreal, 85mm lens, shallow depth of field, no specific celebrity likeness, no logos on clothing, no asymmetric eyes, no extra fingers, --ar 4:5"

7. Pattern / background

"A seamless repeating pattern of [motif], 2-color palette ([color A] and [color B]), flat vector style, balanced negative space, mood: calm and rhythmic, no text, --ar 1:1 --tile"

8. Food photography

"An overhead flat-lay of [dish], on a [surface], natural diffused window light, mood: warm and inviting, in the style of modern cookbook photography, photoreal, slight steam, no text, no utensils with brand marks, --ar 1:1"

9. Real-estate / interior

"A wide-angle interior of a [room type] with [key feature], natural daylight from a large window on the left, mood: airy and lived-in, in the style of high-end Airbnb photography, photoreal, no people, no readable books, no brand-name objects, --ar 16:9"

10. Concept art / world-building

"A wide concept art piece of [place / scene], dramatic atmospheric perspective with three depth layers, golden rim light from camera-right, mood: awe and quiet menace, in the visual language of Studio Ghibli backgrounds crossed with Simon Stålenhag, painterly digital art, no text, no characters in foreground, --ar 21:9"

"A high-contrast [subject] against a [color] background, centered with strong negative space at the top for a future caption overlay, hard side-light, mood: bold and direct, modern minimalist style, photoreal, no text, --ar 1:1"

12. Abstract texture / hero background

"Macro photo of [material / phenomenon] catching light at an oblique angle, extreme depth of field, mood: tactile and contemplative, in the style of high-end pharmaceutical / beauty ad backgrounds, photoreal, no logos, no text, --ar 16:9"

A small process upgrade most teams skip

Keep a prompt journal. Every time a prompt produces a result you'd actually ship, copy the full prompt + model version + seed into a doc. Within a month you'll have a 30-prompt template library that's worth more than any course you can buy.

Tag each entry with use case (hero, lifestyle, social), aesthetic (photoreal, painterly, isometric), and aspect ratio. When a project starts, you fish a template out instead of starting from a blank prompt at 9pm.

Common mistakes

Treating the prompt as the whole job. Generation is a step in a pipeline. The other steps — selection, edit, color, retouching — are what take "AI-looking" to "ad-finishable."
No constraints layer. Adding 8 negative words at the end fixes more issues than any model upgrade.
Naming living artists. Both a quality risk (filtered or vague) and a legal risk for commercial work. Prefer movements, mediums and named historical aesthetics. See Are AI-Generated Images Copyright Free?.
Skipping references when the tool supports them. This is a 10x quality lever; you're leaving it on the table.
Generating one image and giving up. 4–8 rerolls per usable result is normal, even at the frontier. Budget for it.
Not preserving seeds and model versions. When a client asks for "the same but with red," you need the exact seed and model to keep the rest stable.
Generating recognizable real people or branded objects. No prompt phrasing makes that commercially safe. The legal frame is the same as it is for AI video — see our AI Video Generators 2026 piece for the same warning in motion.

Where this is heading

By the end of 2026, two things are going to change how we prompt:

Visual programming will replace text prompts for production work. Tools like ComfyUI, Runway's node editor and Firefly's Structure panel are already there. The "best prompt writers" of 2027 will be the people who build the cleanest reference + node graphs, not the people who write the prettiest adjectives.
Multi-modal direction will become the default. Sketch + reference image + 2-sentence prompt + voice description will all combine into a single instruction the model understands. Pure text prompting is going to feel as old as a command-line interface within 24 months.

Until then: write the six layers, build your template library, and respect the constraints layer like your job depends on it.

This playbook reflects current behavior of the major AI image tools as of May 2026. Models and prompt syntax change frequently; treat the templates as starting points and adapt to the model you're actually using.