ShengShu Technology, a Chinese startup specializing in multimodal generative artificial intelligence (AI), announced Tuesday upgraded versions of its reference-to-video platform, in its escalating competition with market leaders OpenAI’s Sora 2 and Google’s Veo 3.1.

The enhanced Vidu Q2 now lets content creators upload and combine up to seven reference images like faces, scenes, and props into a single cohesive video, according to the company. The platform integrates these visual elements with text prompts through its multiple-entity consistency feature, which the company says maintains the distinctiveness and fidelity of each component.

“We’re moving into a time where AI can mimic human looks and express emotions with cinematic flair,” ShengShu Technology CEO Yihang Luo said in a statement. “This launch goes beyond basic video creation; it’s about teaching AI to act and tell stories alongside creators.”

The system features what ShengShu calls Multiple-Entity Consistency, which preserves the original appearance of different characters, objects, and backgrounds even as scenes change. The platform also incorporates cinematic techniques including camera movements, panning, and depth of field effects.

ShengShu highlighted improvements in rendering subtle facial expressions and natural body movements, moving away from what the company characterizes as the stiff, artificial motion typical of earlier AI video tools.

Industry observers noted that Vidu Q2 generates content faster and at a more affordable price compared with the high costs associated with Sora 2 and Veo 3.1.

Alongside the consumer release, ShengShu made the Vidu Q2 MaaS API available globally, enabling businesses to integrate the technology into their workflows. The company said it has forged partnerships with advertising and e-commerce firms.

ShengShu Technology has been developing generative AI technology since its founding. In 2022, the research team introduced the U-ViT architecture, described as the first Diffusion-Transformer hybrid model. The team also developed UniDiffuser, which generates both text and images within a single system. Previous versions of Vidu introduced multi-character scene consistency (Vidu 1.5), 10-second video generation (Vidu 2.0), and sound integration (Vidu Q1).

Since launching in April 2024, the Vidu platform has reached users in more than 200 countries, accumulated 30 million registered users, and generated more than 400 million videos, according to company figures.