English
Back
Open Account
ME News
wrote a column · Apr 14 22:31

AI Video Showdown: Alibaba Makes a Move, ByteDance Opens Up

Author and source: Financial Story Collection
“HappyHorse” overtakes late, as Alibaba intercepts Kuaishou and ByteDance.
On April 14, Volcano Engine, under ByteDance, officially launched the Seedance 2.0 series API service, enabling both enterprise and individual users to utilize its video generation capabilities.
In the early days, the aforementioned services required substantial upfront payments, often reaching tens of millions in 'minimum consumption'.
When computational power is sufficient, the widespread opening of APIs almost serves as the main engine for the commercialization of large models. Interestingly, Seedance 2.0, which has recently faced complaints from many users about long queue times and indirect price hikes, why has it suddenly opened its API entirely?
Behind this, there may be a push from HappyHorse, which went viral last week.
HappyHorse, the video generation model under Alibaba, anonymously topped the Video Arena blind test rankings on April 7, surpassing ByteDance's Seedance 2.0 and Kuaishou's LingAI 3.0. On April 10, Alibaba officially claimed ownership and expedited API access and commercial implementation. The model employs a single-stream Transformer architecture, supporting multilingual lip-syncing, and excels in physical consistency (4.52/5) and visual quality (4.80/5). Its technical approach balances world model evolution with commercial viability, differing from OpenAI’s path of shutting down Sora due to insufficient cost-effectiveness. HappyHorse is deeply integrated into Alibaba’s e-commerce ecosystem, aiming to create a 'content-transaction-fulfillment' loop, driving AI video cost reduction and efficiency enhancement for B2B, while intensifying a competitive landscape among Alibaba, ByteDance, and Kuaishou. Author and source: Financial Story Collection “HappyHorse” overtakes late, as Alibaba intercepts Kuaishou and ByteDance. On April 14, Volcano Engine, under ByteDance, officially launched the Seedance 2.0 series API service, enabling both enterprise and individual users to utilize its video generation capabilities. Initially, the aforementioned services required substantial prepaid amounts, often in the millions for ‘minimum consumption.’ When computational power is sufficient, widespread API access almost serves as the primary engine for large model commercialization. However, it’s interesting that Seedance 2.0, recently criticized by many users for long wait times and covert price hikes, how...
On April 7, HappyHorse anonymously appeared on the authoritative AI platform Video Arena's blind test rankings. In the image-to-video (without audio) category, it scored 1411 points to take first place, surpassing Seedance 2.0 by about 55 points; in the text-to-video (without audio) category, with 1379 points, it ranked ahead of publicly available products like ByteDance's Seedance 2.0, Kuaishou's KeLing AI 3.0, and Kunlun Wanwei's SkyReels V4.
Three days later, Alibaba officially stepped forward to claim this 'thousand-mile horse'.
On March 34 this year, OpenAI announced the shutdown of Sora, and everyone thought that Seedance 2.0 would dominate the AI video generation model field, but Alibaba unexpectedly outmaneuvered ByteDance's Seedance and Kuaishou's KeLing.
In the AI video generation model track, a three-way rivalry pattern is beginning to emerge, with Chinese companies taking the lead globally.
Since 2024, the core driving force behind the qualitative leap of AI video generation models from 'toys' to 'tools' lies in the fundamental shift in underlying technical paradigms—from pattern matching based on massive data statistics to simulating and understanding the laws of the physical world through 'world models.' This transformation addresses two insurmountable gaps of previous technologies: physical plausibility and long-term consistency. Long-term consistency, in particular, is a prerequisite for cinematic-grade applications.
HappyHorse uses a unified Transformer to handle both video and audio simultaneously, outputting finished clips with sound in one inference without requiring post-production stitching. This somewhat unique technical architecture (single-stream Transformer architecture) offers significant advantages in enhancing long-term consistency.
The unified Transformer directly processes long sequences of mixed tokens, and its self-attention mechanism can capture long-range dependencies between video frames and audio frames.
HappyHorse, the video generation model under Alibaba, anonymously topped the Video Arena blind test rankings on April 7, surpassing ByteDance's Seedance 2.0 and Kuaishou's LingAI 3.0. On April 10, Alibaba officially claimed ownership and expedited API access and commercial implementation. The model employs a single-stream Transformer architecture, supporting multilingual lip-syncing, and excels in physical consistency (4.52/5) and visual quality (4.80/5). Its technical approach balances world model evolution with commercial viability, differing from OpenAI’s path of shutting down Sora due to insufficient cost-effectiveness. HappyHorse is deeply integrated into Alibaba’s e-commerce ecosystem, aiming to create a 'content-transaction-fulfillment' loop, driving AI video cost reduction and efficiency enhancement for B2B, while intensifying a competitive landscape among Alibaba, ByteDance, and Kuaishou. Author and source: Financial Story Collection “HappyHorse” overtakes late, as Alibaba intercepts Kuaishou and ByteDance. On April 14, Volcano Engine, under ByteDance, officially launched the Seedance 2.0 series API service, enabling both enterprise and individual users to utilize its video generation capabilities. Initially, the aforementioned services required substantial prepaid amounts, often in the millions for ‘minimum consumption.’ When computational power is sufficient, widespread API access almost serves as the primary engine for large model commercialization. However, it’s interesting that Seedance 2.0, recently criticized by many users for long wait times and covert price hikes, how...
This approach is simpler and more straightforward than using multiple independent models to process and then coordinate, reducing information loss during transmission between modules, and theoretically better for maintaining coherence in long-term narratives.
According to Artificial Analysis's evaluation metrics (out of 5), HappyHorse scored 4.52 in physical consistency, 4.80 in visual quality, and 4.18 in text alignment. This indicates that it performs well in basic consistency but still has room for improvement in long-term consistency in complex scenarios.
However, referring to Sora's experience, good long-term consistency does not necessarily mean good commercial usability.
In practical applications, Sora's long-term consistency heavily relies on the 'memory' capacity of large models, with an extremely low commercial usability rate; only 5%-10% of generated videos can be used for preliminary screening, making it more like an uncontrollable 'gacha game.'
Sora’s shutdown was not due to difficulties in technological upgrades but rather an unbalanced economic equation. According to Appfigures estimates, Sora’s total in-app revenue since its launch has been approximately $2.1 million, with a cost-to-benefit ratio close to 2500:1, making it one of the most expensive 'technological fireworks' in AI history.
Olivia Moore, a partner at the Silicon Valley venture capital firm a16z, once posted a screenshot from SensorTower on social media showing that Sora APP’s 30-day user retention rate was 1%, with a 60-day retention rate of 0%. Such low retention rates clearly do not meet the requirements for commercial applications.
AI development has reached a stage where capital has become sufficiently rational, even ruthless, towards technologies that cannot generate returns despite massive investments. Therefore, OpenAI, which is preparing for an IPO, had no choice but to shut down Sora and return the $1 billion collaboration payment to Disney.
Moreover, OpenAI needs to focus its efforts on advancing the development of the world model. In a sense, there is no standalone large video model—video large models are more like interim milestones in the integration process of world models and multimodal technologies.
Currently, almost all top-tier video models are based on the DiT architecture, whose predecessor is image generation Diffusion. The next step could very likely be the Omni-Model. Video is simply these models inserting frames along the time dimension, while ingesting numerous causal segments from the physical world during data cleansing.
Creating videos is the lowest threshold for verifying spatiotemporal predictive capabilities. Companies capable of producing large-scale video models can theoretically apply this technology to develop large models in other vertical fields — provided there is sufficient high-quality real data available for training.
Alibaba’s goal, obviously, is not just to create a popular video generation tool.
Video represents an excellent vertical application direction for large AI models because, from a traffic perspective, video is currently the only format that AI can seamlessly integrate into the three major cash-generating domains: entertainment, social media, and e-commerce.
ChatGPT (text) has hundreds of millions of monthly active users, while TikTok (video) boasts billions of daily active users. Humans are naturally inclined to avoid reading text and prefer consuming video content. ByteDance's ability to penetrate deep into the core operations of all internet giants stems from its focus on video as a key element.
The video stream data on Douyin encompasses multidimensional dynamic information such as human behavior, object movement, and scene interaction. Every frame serves as a record of real-world patterns. Vertical application AIs trained with such high-quality data gain a significant head start.
According to estimates by GeekPark, the usability rate of Seedance 2.0 for generating 15-second videos may reach 90%, a substantial improvement compared to the industry average of around 20%. The dual enhancement in technical capability and commercial viability makes the viral success of Seedance 2.0 easily understandable.
Moreover, the commercialization of Seedance 2.0 has been executed with strong pacing. It first used viral templates like 'pet cats and dogs beating up Godzilla' to spark a wave of user-generated content across social media, achieving zero-cost traffic generation and user education. Once both reputation and demand peaked, it swiftly initiated commercial monetization.
On March 4, Volcano Engine announced its commercial pricing: scenarios involving video input cost 28 yuan per million tokens, while those without video input cost 46 yuan per million tokens, translating to a pure video generation cost of approximately 0.95 yuan per second.
Following this, ByteDance adopted a differentiated pricing strategy, lowering the barrier for public beta testing, and officially opened API applications to enterprise users on April 2. At this point, Seedance 2.0 completed its transformation from a viral AI toy to an enterprise production tool.
This also confirms the shift in AI industry investment logic. Whether it's internal or external funding, capital will flow to areas that can achieve vertical applications, quick monetization, and contribute ROI. Both B-end and C-end users follow this underlying logic when making payments.
A noteworthy detail is that HappyHorse natively supports lip-sync for English, Mandarin, Cantonese, Japanese, Korean, German, and French. This is likely aimed at enabling HappyHorse-generated videos to enter practical application scenarios like e-commerce (including cross-border e-commerce).
After all, Zhang Di, as the father of Kuaishou's KeLing and Alibaba's HappyHorse, not only understands technology but also excels in business operations (as evidenced by his professional background), naturally excelling at integrating business thinking into HappyHorse’s technical development.
One piece of evidence is that the commercial performance of Kuaishou's KeLing is already supported by financial data. In Q4 2025, KeLing AI revenue reached 340 million yuan; in December 2025, monthly revenue exceeded 20 million US dollars, with an annualized revenue run rate (ARR) reaching 240 million US dollars.
Moreover, Alibaba's ATH Innovation Division, to which HappyHorse belongs, has long advocated the slogan “Create Token, Deliver Token, Apply Token.” This organizational design, centered on 'Token consumption' as a core KPI, ensures that all technical efforts ultimately focus on commercial applications.
Similar to Seedance 2.0, HappyHorse has quickly advanced from anonymous ranking climbs to official announcements, and then plans to open APIs and integrate into Alibaba’s BaiLian MaaS platform, demonstrating rapid progress in commercialization.
Considering that Zheng Bo, head of the HappyHorse team, also serves as CTO of Alibaba Mama, it is highly likely that future applications of HappyHorse will be deeply tied to e-commerce operations.
Integrating AI with core businesses to drive greater ecosystem prosperity is now a key priority for every internet giant. Additionally, Alibaba has long aspired to create traffic entry points that can feed back into core businesses such as e-commerce.
In an ideal scenario, HappyHorse could simultaneously address both critical objectives.
As a video generation tool, HappyHorse can be applied to e-commerce business scenarios such as product advertisements and virtual anchors. If it can also secure a position at the source of AI content generation, it will provide endogenous traffic for e-commerce transactions and other businesses, thereby building a complete closed loop of 'content-transaction-fulfillment'.
This is not baseless imagination, after all, Seedance 2.0 has already entered the e-commerce field.
On April 2, AI Agent company NoDesk AI released a new version of its product DeskClaw, officially integrating with Seedance 2.0. This marks the first time a Claw track product has focused on vertical e-commerce, and it is also one of the first AI products in the e-commerce sector to integrate with Seedance 2.0.
For most investors and practitioners, Alibaba, as the leading e-commerce player, seems to have no reason not to achieve something similar.
The emergence of HappyHorse, an Alibaba-affiliated video generation tool, is in a sense consistent with public expectations. After all, enabling merchants to directly use a stable and reliable video generation tool on e-commerce platforms is the most logical approach.
In the content ecosystem, as long as HappyHorse is stable, reliable, and affordable, it can carve out a significant niche for itself.
For most content creators, cost control is crucial, and it's not just about whether prices are high or low.
Whether individual UPs, small studios, or MCN agencies, their content production operates within a budget and timeline. A tool with frequent price fluctuations and unpredictable queuing times would disrupt the entire production plan, making it impossible to quote projects or deliver them on schedule.
Seedance 2.0 currently faces exactly these kinds of issues and risks. On one hand, during peak hours, regular users may face queues of up to 80,000 people, with wait times exceeding seven hours; even paying premium members are not exempt.
On the other hand, the Jimeng platform has consecutively adjusted its pricing in a short period. It is rumored that the pure material generation cost of producing a two-minute AI manga drama has skyrocketed from about 7 yuan initially to 80 yuan, breaching the business model bottom lines of many small and medium-sized teams.
A user complained to the 'Finance Story Club', 'ByteDance has introduced upgraded services like VVIP on top of its annual fees, which is essentially an indirect price hike due to Seedance's monopolistic position. Now that HappyHorse has launched, ByteDance will probably have to reconsider its strategy.'
As long as HappyHorse maintains stable quality and delivers a good user experience at a lower price, it can attract a group of customers with genuine content generation needs from competitors. Perhaps smart Alibaba Cloud sales teams have already started reaching out to Volcano Engine's clients.
ByteDance’s decision to open up API access for Seedance 2.0 today is a move to proactively lower user barriers. With HappyHorse as a rival, it’s highly likely that both sides will engage in a price war.
The competition among ByteDance, Kuaishou, and Alibaba continues, with each company striving to improve their model capabilities while steadily reducing computing costs. This allows more small and medium-sized entrepreneurs to benefit from technological advancements and could quickly ignite the market — this is the AI era we want to see.
Risk Disclaimer: The above content only represents the author's view. It does not represent any position or investment advice of Futu. Futu makes no representation or warranty.Read more
24K Views
Report
Comments
Write a Comment...