Omni Flash Ext

Model Description

Google’s extended Gemini video model (omni-flash-ext) supports three generation modes — text-to-video, single-image animation, and 3-reference-image fusion — with per-second billing so you only pay for the duration you choose. Maximum Resolution: 4K Durations: 4, 6, 8, or 10 seconds

Key Capabilities

Per-Second Billing: Pay only for the duration you generate — a 4s clip costs significantly less than a 10s clip.
Three Generation Modes: Text-to-video (0 images), image-to-video (1 image), or reference fusion (exactly 3 images).
Up to 4K Resolution: Choose 720p (base), 1080p (1.5× multiplier), or 4K (3× multiplier).
Extended Durations: Supports 4, 6, 8, and 10 second clips — more flexibility than most Veo tiers.
Reference Fusion: Combine scene, character, and object reference images into a single generated video.
Aspect Ratio Control: 16:9 landscape or 9:16 portrait.

Image Input Rules

Images	Mode
0	Text-to-video
1	Image-to-video (single reference)
3	Reference fusion (scene + character + object)
2	❌ Not supported — will error

Quick Start

1. Text-to-Video (No Images)

curl --request POST \
  --url https://api.veogen.studio/api/v1/generations \
  --header 'Authorization: Bearer <API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "omni-flash-ext",
    "prompt": "A golden retriever running through a field of wildflowers at sunrise, cinematic slow motion, 4K.",
    "duration": 6,
    "resolution": "1080p",
    "aspect_ratio": "16:9"
  }'

2. Image-to-Video (Single Reference Image)

curl --request POST \
  --url https://api.veogen.studio/api/v1/generations \
  --header 'Authorization: Bearer <API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "omni-flash-ext",
    "prompt": "The character walks forward confidently into the sunlight.",
    "duration": 4,
    "resolution": "720p",
    "image_urls": ["https://example.com/character.jpg"]
  }'

3. Reference Fusion (3 Images)

curl --request POST \
  --url https://api.veogen.studio/api/v1/generations \
  --header 'Authorization: Bearer <API_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "omni-flash-ext",
    "prompt": "The character explores the scene with the object in hand.",
    "duration": 8,
    "resolution": "720p",
    "image_urls": [
      "https://example.com/scene.jpg",
      "https://example.com/character.jpg",
      "https://example.com/object.jpg"
    ]
  }'

{
  "data": {
    "id": "gen_def456uvw",
    "model": "omni-flash-ext",
    "status": "pending",
    "created_at": "2026-05-21T09:00:00Z"
  }
}

Parameters

Parameter	Type	Required	Description
`model`	string	✅	Must be `"omni-flash-ext"`
`prompt`	string	✅	Text description of the video to generate
`duration`	integer	—	`4`, `6`, `8`, or `10` (seconds). Only these exact values are valid.
`resolution`	string	—	`"720p"` (default), `"1080p"`, or `"4k"`
`aspect_ratio`	string	—	`"16:9"` (default) or `"9:16"`
`image_urls`	array	—	0, 1, or exactly 3 publicly accessible image URLs. 2 images is not supported.

FAQ

Why can't I use exactly 2 images?

The underlying Omni-Flash-Ext API explicitly does not support 2-image input. Use 0 images (text-to-video), 1 image (animation), or exactly 3 images (reference fusion).

What durations are supported?

Only fixed values: 4, 6, 8, or 10 seconds. Values like 5 or 7 will return a validation error.

How does reference fusion work with 3 images?

When you provide exactly 3 images, the model treats them as scene, character, and object references respectively and fuses them into a single coherent video. The order matters — first image is scene, second is character, third is object.

Does Omni Flash Ext generate audio?

No. Omni Flash Ext produces silent video output. For AI-generated audio, use Veo 3.1 Fast.

Generations

Models

Account

Model Description

Key Capabilities

Image Input Rules

Quick Start

1. Text-to-Video (No Images)

2. Image-to-Video (Single Reference Image)

3. Reference Fusion (3 Images)

Parameters

FAQ

​Model Description

​Key Capabilities

​Image Input Rules

​Quick Start

​1. Text-to-Video (No Images)

​2. Image-to-Video (Single Reference Image)

​3. Reference Fusion (3 Images)

​Parameters

​FAQ

Model Description

Key Capabilities

Image Input Rules

Quick Start

1. Text-to-Video (No Images)

2. Image-to-Video (Single Reference Image)

3. Reference Fusion (3 Images)

Parameters

FAQ