Google’s new video model Veo 3.1 is officially released!
Click to try it now!
Price (credits)
| With Audio | 4s | 6s | 8s |
| 720p | 2400 | 3600 | 4800 |
| 1080p | 2600 | 4200 | 5600 |
| Without Audio | 4s | 6s | 8s |
| 720p | 1200 | 1800 | 2400 |
| 1080p | 1600 | 2400 | 3200 |
Key features✨
Support for T2V / I2V / FLF2V / Reference Image
Reference Image: up to three reference images
🚨Note: image reference currently works only for 8‑second videos

{"image": "use the uploaded reference image of a prehistoric/early human; keep face and attire consistent",
"action": "he steps onto a busy modern city crosswalk, pauses, looks left and right, eyes widen in quiet awe, then exhales softly",
"scene_time": "downtown intersection at blue hour after light rain, neon reflections on wet asphalt",
"camera": "steadycam lateral tracking from his right side; finish with a gentle push-in to close-up",
"style": "cinematic, anamorphic bokeh, subtle film grain, high dynamic range",
"lighting_color": "cool ambient with warm shop neon accents; soft rim light on subject",
"mood": "wonder, mild disorientation, non-threatening",
"audio": "soft rain, city ambience, distant traffic, brief horns, crosswalk beeps",
"dialogue": "no dialogue"}
First‑to‑last‑frame video generation

8‑second video generation

At dawn, a cinematic jungle wrapped in mist with shafts of light is captured handheld. An explorer in muddy boots pushes through dense ferns and stumbles past a broken “Jurassic Park” gate. Distant leaf tremors raise the tension.
A Tyrannosaurus rex bursts through the gate. From medium shots into close‑ups: wet scales, saliva, dust, and leaves fly; the camera pans fast and refocuses. The T‑rex roars; the ground thunders; gravel falls; pterosaurs streak overhead.
The explorer runs along a muddy path; the camera shakes with motion blur; his backlit silhouette is clear; he narrowly avoids a bite. The film hard‑cuts as a fallen trunk blocks him and the T‑rex slams in.
Sound design: jungle ambience, distant thunder; heavy footsteps, low‑frequency rumbles, crackling gravel. Music: ominous drone → building percussion and low brass → climactic cacophony. Voice‑over (short, urgent): “Run! Keep going!”
Native audio generation: dialogue, music, natural ambience

Cinematographic Realism: A nighttime stage in a small tavern. Subject: A blonde woman sings passionately into a vintage microphone, swaying gently and pausing to breathe. Warm spotlights, a red curtain, the mic stand, reflections in glasses, and audience silhouettes are visible. Camera Position: Starting with a medium close-up front shot, the lens slowly zooms in and pans slightly across the front row of spectators. Sound: The singer's vocals, live audience voices and applause, clinking glasses, ambient low-frequency bass, and subtle hall reverberation. Transition: As the chorus explodes and audience cheers rise, the camera pans halfway around to the crowd, seamlessly blending with intensifying applause and subtle motion blur. Aesthetics: Amber warm tones, high contrast, lens flares, fine film grain; single continuous take.
Cinematic visuals & True-to-life textures

A hand just off camera combs a puppy and clips on a hairpin.
How to use
🚨Remerber: always enter a prompt during generation, otherwise an error will occur.🚨
- SeaArt’s default workflow is image‑to‑video.

- Text‑to‑video: simply delete the “Load Image” node.
- FLF2V: connect the first frame to `image`, the last frame to `last_frame`.

- Reference image to video: connect reference images to `reference_image1/2/3` in order.

Node parameters
- `duration_seconds`: video length (4 | 6 | 8 seconds)
- `aspect_ratio`: aspect ratio (9:16 | 16:9)
- `resolution`: resolution (720p | 1080p)
- `generate_audio`: generate audio in sync or not
- `enhance_prompt`: enable prompt polish or not
- `seed`: random seed
- `control after generate`: how to handle the seed after generation
How to improve prompts quality
Framework: subject + action + scene/time + camera movement + visual style + lighting/color + mood + audio/dialogue + aspect ratio + details
🔧Tips:
- Choose one camera move only: pan/push/track/tilt—pick one.
- Keep a single core style.
- Specify sound sources (who/what) and ambient audio.
Shot: slow push from medium to close‑up. Subject: an elderly man with silver hair and round glasses slowly waters a bonsai.
Scene/time: by the living‑room window at golden hour. Style: fine film grain, soft cinematic look.
Lighting/color: warm backlight, gentle rim light, pastel palette. Mood: quiet, tender, calm.
Audio: distant city ambience, dripping water, leaves rustling.
Dialogue: English male voice, low and gentle: “Take it slow, and it’ll grow.”

💡Common issues
- Prompt violation: No ???? content. Veo 3.1 has strict safety checks. If your image passes but the video flags, tweak your prompt or image; consider enabling prompt optimization.
- Lip‑sync off: make dialogue shorter and clearer; reduce noisy background descriptions.
- Face collapse/drift: use a clearer frontal reference for I2V; avoid aggressive camera moves.
- Low accuracy: remove redundancy, split long sentences, and switch to the structured template.
Veo 3.1 makes video creation simple and fast—click to try it now!
If you run into issues, leave a comment or report in the SeaArt Discord channel.














