

Tooba Siddiqui
Fri Dec 19 2025
3 mins Read
Veo 3 and Veo 3.1 were two major AI video generation models released this year by Google DeepMind. Pioneering native audio support, Veo 3 introduced a breakthrough feature, which was soon upgraded with video extension and multi-referencing in Veo 3.1.
Veo 3.1 has been disruptive, but as with any AI tool, ‘what comes next’ becomes the point of focus. Rumors have it that Google DeepMind might release a Veo 3.1 ‘extension’ model or a whole different model ‘Veo 4.’
A Quick Recap of Veo 3.1
Equipped with a slew of advanced features, Veo 3.1 was introduced as an all-in-one AI video generation and editing model.
- AI video editing and multi-input (elements, images, video, and keyframes)
- AI scene extension, with consistent action sequence, camera movement, and visual style.
- Native audio support and audio-visual synchronization.
- Multi-shot generation with camera control
- 1080p resolution and up to 1 minute duration (with scene extension).
Read the complete review of Veo 3.1.
Expected Key Improvements in Veo 4
Despite the innovative features, Veo 3.1 has its own limitations: no 4K support, a maximum duration of 8 seconds, character/object morphing in complex prompts, and inconsistent audio and lip synchronization. These limitations might become the defining factors of the anticipated Veo 4 AI model.
Improved resolution
Most users expected 4K support when Veo 3.1 was announced, but were left disappointed after generating standard 1080p resolution videos. The upcoming AI video model might just cater to this expectation and deliver production-ready videos without any external upscaling tool.
Improved video duration
Another highly anticipated feature of Veo 3.1 was a longer duration. While Veo 3.1 supports multi-shot generation, the short duration makes it challenging for users to create continuous shots and scene transitions. It is expected that Veo 4 will be able to deliver longer videos, with a scene extension feature, making it a one-stop AI video creation tool.
Improved consistency
Unlike Veo 3.1, Veo 4 is expected to demonstrate better object permanence, stable character interactions and motions, and scene and character consistency. It will most likely have improved temporal understanding to ensure there are no continuity drifts, awkward changes, or glitches.
Multilingual support
Like Veo 3.1, Veo 4 will have multi-lingual support, enabling users to generate dialogues in different languages. However, the new AI video model will most likely feature better on-screen text rendering for a better user experience. With multi-lingual support, Veo 4 will focus more on accurate lip-sync for natural dialogue delivery.
Adding Avatar
Like Veo 3.1’s competitor Sora 2, Veo 4 might come with its own ‘cameo’ feature. Users can upload their reference images and create lifelike videos. With Veo 4, users can even add dialogues, and the AI will sync it with lip movements, showing you speak the words naturally.
Conclusion
Veo 4 will surely push the boundaries of AI video generation like its predecessor, Veo 3, did with native audio support. It will introduce a new benchmark, setting it apart from its competitors and benefiting users with more creative control.
Till then, you can explore and refine your AI video creation abilities on ImagineArt AI video generator with multi-modal inputs, built-in workflows, and customization options.

Tooba Siddiqui
Tooba Siddiqui is a content marketer with a strong focus on AI trends and product innovation. She explores generative AI with a keen eye. At ImagineArt, she develops marketing content that translates cutting-edge innovation into engaging, search-driven narratives for the right audience.