Agent Skill
2/7/2026

generate-image-nb

This skill should be used when the user asks to "generate an image", "create a picture", "make a thumbnail", "design a logo", "draw an illustration", "edit a photo", "add to an image", "remove from an image", or create any visual content including blog headers, icons, diagrams, banners, or graphics. Handles all image generation and editing requests via the Gemini Nano Banana API.

P
pouriarouzrokh
0GitHub Stars
1Views
npx skills add PouriaRouzrokh/pr-claude-plugins

SKILL.md

Namegenerate-image-nb
DescriptionThis skill should be used when the user asks to "generate an image", "create a picture", "make a thumbnail", "design a logo", "draw an illustration", "edit a photo", "add to an image", "remove from an image", or create any visual content including blog headers, icons, diagrams, banners, or graphics. Handles all image generation and editing requests via the Gemini Nano Banana API.

name: generate-image-nb description: This skill should be used when the user asks to "generate an image", "create a picture", "make a thumbnail", "design a logo", "draw an illustration", "edit a photo", "add to an image", "remove from an image", or create any visual content including blog headers, icons, diagrams, banners, or graphics. Handles all image generation and editing requests via the Gemini Nano Banana 2 API (gemini-3.1-flash-image-preview). allowed-tools: Bash(python*), Bash(node*), Bash(pip*), Bash(npm*)

Nano Banana 2 Image Generation

For scalable vector graphics (icons, logos, hero illustrations), use generate-svg instead.

Generate and edit images via Nano Banana 2 (gemini-3.1-flash-image-preview) — Google's fastest image generation model. 2x faster than the original, 50% cheaper, supports 512px–4K, image search grounding, and achieves ~95% of Pro quality.

IMPORTANT: Default Prompting Behavior

When invoking this skill, ALWAYS construct a JSON prompt that includes BOTH the creative prompt AND a config key to control all generation parameters. The script extracts config before sending the rest to the model. This gives Claude full control over resolution, aspect ratio, output path, and search — nothing is hardcoded.

Build a single JSON object from the user's request:

import json
prompt = json.dumps({
    "task": "generate_image",
    "subject": {"primary": "...", "details": "..."},
    "style": {"primary": "...", "rendering_quality": "..."},
    "composition": {"framing": "...", "perspective": "..."},
    "technical": {"camera": {"focal_length": "..."}, "lighting": "..."},
    "environment": {"atmosphere": "..."},
    "quality": {"include": ["..."], "avoid": ["..."]},
    "config": {
        "output": "filename.png",
        "size": "4K",
        "aspect_ratio": "16:9",
        "search": False,
        "edit": None
    }
})

Then pass the serialized JSON string as the single prompt argument:

python scripts/generate_image.py '<json_string>'

Config keys (all optional — omit any that aren't needed):

KeyValuesDescription
outputany filenameOutput path (default: generated_image.png)
size512, 1K, 2K, 4KImage resolution
aspect_ratio1:1, 16:9, 9:16, 2:3, 3:2, 21:9, etc.Aspect ratio
searchtrue/falseEnable image search grounding
editfile pathInput image for editing

Creative prompt keys (omit any that aren't relevant):

KeyPurpose
taskAlways "generate_image"
subjectPrimary subject and details
styleArt style and rendering quality
compositionFraming and perspective
technicalCamera settings and lighting
environmentAtmosphere and mood
qualityInclude/avoid lists

For trivially simple requests (e.g., "a cat in a hat"), a plain text prompt is acceptable — the script handles both JSON and plain text.

Setup

Before generating images, verify the API key exists:

[ -n "$GEMINI_API_KEY" ] && echo "API key configured" || echo "Missing GEMINI_API_KEY"

The scripts automatically load GEMINI_API_KEY from: environment variable → .env in the project root → .claude/.env in the project root → ~/.claude/.env. If none found, ask the user for their API key.

Install dependencies based on the user's environment:

# Python
pip install google-genai Pillow

# JavaScript
npm install @google/genai

Script Usage

Python

python scripts/generate_image.py "prompt" [options]

JavaScript

node scripts/generate_image.mjs "prompt" [options]

Options

OptionDescription
-o, --outputOutput filename (default: generated_image.png)
-a, --aspectAspect ratio: 1:1, 16:9, 9:16, 2:3, 3:2, etc.
-s, --sizeResolution: 512 (preview), 1K, 2K, 4K
-e, --editInput image path for editing
--searchEnable image search grounding for real-time visual content

NB2 Capabilities

  • Speed: 2x faster than original Nano Banana, 4-6 seconds per image
  • Resolution: 512px preview mode up to 4K
  • References: Up to 10 object refs, 4 character refs per prompt
  • Search: image_search tool for real-time visual grounding
  • Quality: ~95% of Pro quality for most tasks
  • Cost: ~$0.02/image at 1K, ~$0.07/image at 4K

Generation Examples

Text to Image

# Blog header
python scripts/generate_image.py "modern flat illustration of developer coding, purple gradient background, minimalist style, no text" -a 16:9 -o header.png

# High-res landscape
python scripts/generate_image.py "Mountain landscape at golden hour" -s 2K -a 16:9 -o landscape.png

# Quick 512px preview
python scripts/generate_image.py "kawaii robot app icon" -s 512 -a 1:1 -o icon_preview.png

# Logo
python scripts/generate_image.py "minimalist logo for coffee shop called 'The Daily Grind', clean sans-serif font, black and white, coffee bean motif" -a 1:1 -o logo.png

Image Editing

# Add element
python scripts/generate_image.py "Add a small wizard hat on the cat's head, matching the soft lighting" -e cat.jpg -o cat_wizard.png

# Style transfer
python scripts/generate_image.py "Transform into Van Gogh's Starry Night style with swirling brushstrokes" -e city.png -o city_art.png

Real-Time Content (Image Search)

# Photo of a real product
python scripts/generate_image.py "Photo of the latest iPhone model on a wooden desk" --search -o iphone.png

# Current events visualization
python scripts/generate_image.py "Infographic of today's top news headlines" --search -a 16:9 -o news.png

JSON-Structured Prompting

For complex scenes with multiple elements, JSON-structured prompts improve results by 60-80% compared to plain text. The JSON is serialized as a string via json.dumps() / JSON.stringify() and passed as the text prompt — it is NOT used as responseSchema.

JSON Prompt Template

{
  "task": "generate_image",
  "subject": {
    "primary": "main subject",
    "details": "specific details about the subject"
  },
  "style": {
    "primary": "photorealistic | illustration | watercolor | etc.",
    "rendering_quality": "4K | 2K | 1K"
  },
  "composition": {
    "framing": "rule of thirds | centered | etc.",
    "perspective": "wide-angle | close-up | bird's eye | etc."
  },
  "technical": {
    "camera": { "focal_length": "24mm | 50mm | 85mm" },
    "lighting": "golden hour | studio | dramatic | natural"
  },
  "environment": {
    "atmosphere": "misty | clear | moody | vibrant"
  },
  "quality": {
    "include": ["desired elements"],
    "avoid": ["unwanted elements"]
  }
}

When to Use JSON vs Plain Text

  • JSON: Complex scenes with 3+ elements, specific technical requirements, multiple style constraints
  • Plain text: Simple requests, single-subject images, quick iterations

Python Example

import json

prompt = json.dumps({
    "task": "generate_image",
    "subject": {"primary": "mountain landscape", "details": "snow-capped peaks reflected in alpine lake"},
    "style": {"primary": "photorealistic", "rendering_quality": "4K"},
    "composition": {"framing": "rule of thirds", "perspective": "wide-angle panoramic"},
    "technical": {"camera": {"focal_length": "24mm"}, "lighting": "golden hour"},
    "environment": {"atmosphere": "crisp morning mist"},
    "quality": {"include": ["sharp details", "natural colors"], "avoid": ["text", "people"]}
})

# Pass as: python scripts/generate_image.py '<json_string>' -s 2K -a 16:9

Aspect Ratios

Use CaseRatioExample
YouTube thumbnail16:91344x768
Blog featured image16:91344x768
Instagram square1:11024x1024
Story/Reels9:16768x1344
Twitter header21:91536x672
Portrait2:3832x1248
Ultra-wide1:4 or 4:1NB2 exclusive

Prompt Guidelines

Write descriptive scenes, not keyword lists:

Effective: "A photorealistic close-up portrait of a ceramicist with sun-etched wrinkles, inspecting a glazed tea bowl in a sunlit workshop. Soft golden hour light from a window, 85mm lens with bokeh."

Weak: "old man, pottery, workshop, golden hour"

Tips

  • Describe the scene narratively
  • Specify style: "editorial photography", "flat illustration", "watercolor"
  • Include lighting, camera angle, mood
  • Add "no text" if unwanted text appears
  • Use "white background" or "transparent background" for assets
  • Use JSON-structured prompts for complex multi-element scenes

Limitations

  • Supported languages: EN, ar-EG, de-DE, es-MX, fr-FR, hi-IN, id-ID, it-IT, ja-JP, ko-KR, pt-BR, ru-RU, ua-UA, vi-VN, zh-CN
  • No audio or video inputs
  • Up to 10 object reference images, 4 character references
  • Output count may vary from request
  • All images include SynthID watermark

Troubleshooting

ProblemSolution
Missing API keySet GEMINI_API_KEY in env, .env, .claude/.env, or ~/.claude/.env
Package not foundRun pip/npm install commands
No image generatedSimplify prompt, check for policy violations
Poor text renderingUse detailed prompts with font/style specifications
Wrong aspect ratioExplicitly set -a option
HTTP 429 (rate limit)Wait 30-60 seconds and retry, or reduce request frequency
HTTP 400 / SAFETY blockRephrase prompt to avoid policy-triggering content
HTTP 403 (forbidden)Verify GEMINI_API_KEY is valid and has API access enabled
Request timeoutReduce resolution (-s 1K or -s 512) and retry

Reference

For complete API documentation, prompting strategies, and code examples, see references/nano-banana-docs.md.

Skills Info
Original Name:generate-image-nbAuthor:pouriarouzrokh