字母榜精选

wrote a column ·

In the field of AI video generation, 'far ahead' has become a reality

Reports suggest that ByteDance’s video generation model, Seedance 2.1, will be released soon, with expected improvements of 20% in output quality compared to version 2.0.ByteDance told Zimu AI that this report is false.

Although Seedance 2.1 may not be launched imminently, it is true that Seedance 2.0 has seen a significant surge in popularity overseas.

This is because over the weekend, an article titled 'Chinese AI Groups Pull Ahead of US Rivals in Video Generation Race' went viral internationally.

Using Seedance 2.0 and Kling 3.0 as core evidence, the article reached a surprising conclusion: 'China not only leads the U.S. in AI-powered video generation, but this advantage will persist indefinitely.'

This assessment sounds somewhat counterintuitive—it reads more like praise for Chinese AI. After all, in recent years, Silicon Valley has consistently introduced new AI products first, followed later by similar offerings from China—a pattern clearly visible to all.

However, after reading the foreign media’s perspective, I realized my view had been too narrow—China genuinely leads the U.S. in AI video generation.

The article specifically interviewed several American AI entrepreneurs and filmmakers who use AI video generation tools, and the consensus was unanimous: Chinese AI video tools have comprehensively surpassed their American counterparts.

More importantly, this lead isn’t just a temporary technological edge—it’s comprehensive dominance, spanning every stage from data to real-world deployment.

Moreover, this lead is of the kind that 'cannot be overtaken'—meaning China’s dominant position will endure indefinitely.

Has 'far ahead' finally become a reality?

Why will Chinese AI always stay ahead of American AI?

One of the article's arguments is that in the field of AI-generated video, the gap at the algorithmic level is rapidly narrowing.

Currently, the technical architectures adopted by various companies are already largely similar. Core technologies such as Transformers, diffusion models, and spatiotemporal attention mechanisms have become relatively transparent.

Therefore, the key issue now boils down to who possesses higher-quality and larger-volume training data.

This happens to align perfectly with ByteDance and Kuaishou’s core strengths. After all, Douyin and Kuaishou are already among the world's largest video-generation engines.

More importantly, this data comes with comprehensive user-behavior annotations.

It’s immediately clear from backend data which videos receive likes, saves, or shares, and which ones have high completion rates.

Moreover, these annotations don’t require manual labeling—they’re naturally generated by real user behavior. Such high-quality, annotated data is something you couldn’t necessarily buy even if you had the money.

In contrast, OpenAI and Anthropic lack accumulated video data.

When launching Sora, OpenAI primarily relied on publicly available video data scraped from the internet, along with some licensed film and television content.

The problem is that publicly available videos on the internet often vary widely in quality, containing a large amount of duplicate content, low-quality material, or even reprocessed content with watermarks and advertisements.

As a result, training efforts frequently yield diminishing returns.

On the global evaluation platform Artificial Analysis, ByteDance's Seedance 2.0, Kuaishou's Kling 3.0, and Alibaba's HappyHorse jointly dominate the top spots on both text-to-video and image-to-video leaderboards.

These rankings are generated from votes by real users, meaning that people generally find the AI-generated video content from these three companies visually appealing.

Although Google has YouTube as a data source and also developed the video generation model Veo 3,

Google’s issue lies in excessive constraints: videos on YouTube are typically longer than five minutes, but current GPUs cannot handle such long, high-resolution videos as training data, which causes failures during model training.

This has led to Veo 3 receiving a lukewarm market response, underperforming compared to Chinese AI video generation models like Seedance 2.0 and Kling 3.0.

Ben Chiang, founder of Director AI, stated, 'Most of the U.S. models we’ve tried haven’t performed well enough in video generation.' He therefore currently relies mainly on Chinese tools such as Kling, Seedance 2.0, and Hailuo for his creative work.

Independent AI filmmaker George Won said, 'Seedance 2.0 is a game-changing tool. It can handle aggressive camera angles and speeds without losing facial details or lighting contrast of characters. Most AI models start to wobble or drift during fast motion.'

Moreover, this data advantage enables the product to achieve 'self-reinforcement.'

ByteDance has already integrated Seedance 2.0 into creative tools like CapCut, allowing it to collect feedback data from over 50 million AI-generated videos daily.

This way, ByteDance can determine which videos users are satisfied with and which ones they aren’t.

Each piece of such feedback makes the development direction of the next-generation Seedance product clearer.

This kind of continuous, large-scale, real-world feedback loop is also unmatched by lab environments like those at OpenAI or Anthropic.

Even with substantial resource investment, it’s extremely difficult to build a similar data flywheel in the short term.

Technology can be caught up with, and algorithms can be imitated, but building an ecosystem and accumulating data require time, a user base, and a complete product loop.

Real-world application scenarios

When a company develops AI-powered video capabilities, it needs a clear 'purpose.'

A data advantage is just the starting point; what truly turns technology into a competitive edge is identifying profitable application scenarios. Only with real-world use cases will companies have the incentive to advance AI video generation.

On this dimension, ByteDance and Kuaishou also outperform U.S. AI.

The first large-scale application scenario is e-commerce videos.

In the past, producing a single professional video for a product cost thousands of yuan, including fees for photographers, lighting technicians, venue rentals, models, post-production editing, and more.

For most small and medium-sized merchants, a typical Taobao store might carry hundreds of products, and filming videos for all of them would cost at least several hundred thousand yuan.

AI-powered video generation technology has changed this situation.

Vincent Yang, CEO of Firework—a video infrastructure company—said, 'A retailer asked us to create 100,000 videos for their product pages. Without AI, that would have been completely unfeasible from a cost perspective. Now, every product can have its own video, and even multiple customized versions tailored to different customers.'

Data shows that product pages with videos achieve conversion rates 30% to 80% higher than those with only text and images. Moreover, Douyin and Kuaishou are among China's largest platforms for live-streaming e-commerce and short-video-driven sales.

Once AI generates the videos, you can simply turn right and launch them directly.

Alibaba’s HappyHorse model explicitly identifies e-commerce video as a core application scenario. It supports batch generation of short product showcase videos and virtual host talking-head videos. Merchants can upload product images and brief text descriptions, and the system automatically generates multiple versions of promotional videos—each tailored to different target audiences with distinct scripts and presentation styles.

The second scenario is advertising.

Traditional TV commercials (TVCs) have an excessively long production cycle.

A 30-second brand advertisement often takes several weeks to complete, from creative planning to filming and production.

With video generation models, dozens of different ad creatives can be produced in just a few minutes.

The third scenario is short-form dramas.

AI-generated short-form dramas experienced explosive growth in 2026. Data shows that the number of AI short dramas airing in March 2026 increased by 138% compared to January, far outpacing the production speed of traditional film and television content.

Thanks to AI video generation, even a small team or individual creator can produce a short drama within just a few days.

And that's not all—Hongguo, ByteDance’s short drama platform, has integrated a 'visual search for identical items' feature.

This feature is straightforward: while watching a short drama, if you’re interested in a character’s outfit, furniture in a scene, or a car parked outside, you can simply click on the image, and the system will recommend identical products for immediate purchase.

This effectively turns short dramas into a commercial setting capable of driving direct conversions.

In contrast, although the U.S. market has content platforms like Netflix and YouTube, none offer such seamless integration and conversion capabilities.

AI-powered video tools in the U.S. remain largely in the realm of creative experimentation, with subscription-based memberships being their only real commercial application.

Moreover, in terms of product functionality, Chinese video generation models are better suited for commercial deployment.

Seedance 2.0 can integrate multiple source images, videos, and audio clips into a single AI-generated video, whereas Sora cannot—it can only generate videos by feeding the model a single image along with text prompts.

This isn't because Sora's technology is inferior, but rather because it lacks a complete commercial ecosystem capable of harnessing its technical capabilities.

The Compute Divide

However, Chinese video AI faces an unavoidable hurdle: computing power.

Leading U.S. AI firms treat computing power as gold, snapping up all available capacity on the market.

Anthropic has recently signed compute agreements totaling over 10 gigawatts.

This figure includes leasing the entire capacity of SpaceX’s Colossus 1 data center—equipped with 220,000 NVIDIA GPUs—as well as a 5-gigawatt agreement with Amazon and a 3.5-gigawatt deal with Google and Broadcom.

OpenAI follows the same approach.

Through its deep collaboration with Microsoft, OpenAI gained access to hundreds of thousands of high-end GPUs, and Microsoft also built multiple hyperscale data centers specifically for OpenAI.

In contrast, although Chinese companies have made significant progress in algorithmic efficiency optimization, they still lag behind in absolute computing power scale.

According to foreign media reports, the gap in AI computing power between China and the U.S. was about 3x in 2023 and had widened to roughly 8x by early 2026.

Beyond computing power, China's AI industry faces other challenges.

The first is copyright.

Take Seedance 2.0 as an example: about a month after its launch, six Hollywood giants—Disney, Warner Bros., Paramount, Skydance, Netflix, and another major studio—jointly sent ByteDance a cease-and-desist letter, alleging that Seedance 2.0 had used large volumes of copyrighted film and television content without authorization during its training phase.

Subsequently, ByteDance urgently suspended its original plan to globally launch Seedance 2.0 in mid-March.

If you’ve been using Seedance 2.0 continuously since February, you’ll notice that IP-based characters that could previously be generated are now unavailable, and users can only generate generic 'non-descript' figures.

The second challenge is rising commercialization barriers.

U.S.-based video-generating AIs like Sora often reject generation requests due to their terms of use, whereas Chinese tools tend to be more permissive and significantly cheaper.

But this has also brought a 'happy headache' to Chinese AI companies.

Demand for Seedance 2.0 has surged since February, and some users have already encountered quota limits and longer wait times.

According to foreign media reports, ByteDance has adopted a more aggressive monetization approach for certain U.S. enterprise clients, requiring them to prepay approximately $2 million in exchange for model access and usage quotas.

The same applies to Kuaishou—they are spinning off the Kling business, which may lead to a separate listing for Kling in the future.

This indicates that Kling operates as an independent business with a stronger growth story than Kuaishou’s core operations.

The bigger the growth story, the clearer the financials must be.

However, AI-generated video is more costly. Generating just a few seconds of video consumes significantly more computing power than generating a piece of text.

The higher the video quality and the longer the duration, the greater the inference costs.

Many video generation models follow this pattern: they start out very cheap or even free, but once users flood in, they quickly impose quotas, introduce waiting queues, and raise prices.

It’s not that companies don’t want to scale up—it’s just that even the landlord’s pantry is running low.

So what China's video AI sector must now confront is not just whether it can build good models, but whether it can turn those good models into a viable business.

If prices are too low, faster user growth leads to greater losses; if prices are too high and no users sign up, the effort becomes counterproductive.

The third issue is the generational gap in model capabilities.

Ultimately, video generation capabilities are built on top of language models.

No matter how advanced a video generation model may be, it still relies on foundational language understanding to interpret user prompts. It then uses reasoning capabilities to grasp the logical relationships between scenes and characters, ensuring coherence in the generated content.

According to assessments by foreign media, OpenAI’s ChatGPT 5.5 and Anthropic’s Mythos already lead domestic Chinese AI companies by nine months to a year.

This capability gap manifests across multiple dimensions, such as reasoning ability, contextual understanding, multi-turn dialogue, and complex task handling.

Although China leads the U.S. in vertical domains like AI-powered video generation, a noticeable gap remains in general-purpose large models.

Overall, China’s lead in AI video generation is real and substantial—but it’s no time to rest easy. The gaps in computing power and foundational models remain a sword hanging overhead. Still, at least for now, we no longer have to gaze longingly at Silicon Valley’s back. $ByteDance (FT0001)$$ByteDance Ecosystem (LIST1266.HK)$$AI (LIST0535.SH)$$Artificial Intelligence (LIST2136.US)$$Artificial Intelligence (LIST23586.HK)$$Disney (DIS.US)$$Streaming Services (LIST2432.US)$$Short Videos (LIST1306.HK)$$Short Video (LIST0319.SH)$

17K Views