Skip to main content

Documentation Index

Fetch the complete documentation index at: https://veogenstudio.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Model Description

Google’s extended Gemini video model (omni-flash-ext) supports three generation modes — text-to-video, single-image animation, and 3-reference-image fusion — with per-second billing so you only pay for the duration you choose. Maximum Resolution: 4K Durations: 4, 6, 8, or 10 seconds

Key Capabilities

  • Per-Second Billing: Pay only for the duration you generate — a 4s clip costs significantly less than a 10s clip.
  • Three Generation Modes: Text-to-video (0 images), image-to-video (1 image), or reference fusion (exactly 3 images).
  • Up to 4K Resolution: Choose 720p (base), 1080p (1.5× multiplier), or 4K (3× multiplier).
  • Extended Durations: Supports 4, 6, 8, and 10 second clips — more flexibility than most Veo tiers.
  • Reference Fusion: Combine scene, character, and object reference images into a single generated video.
  • Aspect Ratio Control: 16:9 landscape or 9:16 portrait.

Image Input Rules

ImagesMode
0Text-to-video
1Image-to-video (single reference)
3Reference fusion (scene + character + object)
2❌ Not supported — will error

Quick Start

1. Text-to-Video (No Images)

curl --request POST \
  --url https://api.veogen.studio/api/v1/generations \
  --header 'Authorization: Bearer <API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "omni-flash-ext",
    "prompt": "A golden retriever running through a field of wildflowers at sunrise, cinematic slow motion, 4K.",
    "duration": 6,
    "resolution": "1080p",
    "aspect_ratio": "16:9"
  }'

2. Image-to-Video (Single Reference Image)

curl --request POST \
  --url https://api.veogen.studio/api/v1/generations \
  --header 'Authorization: Bearer <API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "omni-flash-ext",
    "prompt": "The character walks forward confidently into the sunlight.",
    "duration": 4,
    "resolution": "720p",
    "image_urls": ["https://example.com/character.jpg"]
  }'

3. Reference Fusion (3 Images)

curl --request POST \
  --url https://api.veogen.studio/api/v1/generations \
  --header 'Authorization: Bearer <API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "omni-flash-ext",
    "prompt": "The character explores the scene with the object in hand.",
    "duration": 8,
    "resolution": "720p",
    "image_urls": [
      "https://example.com/scene.jpg",
      "https://example.com/character.jpg",
      "https://example.com/object.jpg"
    ]
  }'
{
  "data": {
    "id": "gen_def456uvw",
    "model": "omni-flash-ext",
    "status": "pending",
    "created_at": "2026-05-21T09:00:00Z"
  }
}

Parameters

ParameterTypeRequiredDescription
modelstringMust be "omni-flash-ext"
promptstringText description of the video to generate
durationinteger4, 6, 8, or 10 (seconds). Only these exact values are valid.
resolutionstring"720p" (default), "1080p", or "4k"
aspect_ratiostring"16:9" (default) or "9:16"
image_urlsarray0, 1, or exactly 3 publicly accessible image URLs. 2 images is not supported.

FAQ

The underlying Omni-Flash-Ext API explicitly does not support 2-image input. Use 0 images (text-to-video), 1 image (animation), or exactly 3 images (reference fusion).
Only fixed values: 4, 6, 8, or 10 seconds. Values like 5 or 7 will return a validation error.
When you provide exactly 3 images, the model treats them as scene, character, and object references respectively and fuses them into a single coherent video. The order matters — first image is scene, second is character, third is object.
No. Omni Flash Ext produces silent video output. For AI-generated audio, use Veo 3.1 Fast.