On April 23, the Tencent HunYuan Hy3 preview language model was released and open-sourced. This is a hybrid expert model that integrates fast and slow thinking, with a total of 295 billion parameters, 21 billion activated parameters, and supports a maximum context length of 256K tokens. It is the first model trained after HunYuan’s reconstruction and also the smartest model developed by HunYuan to date. It has achieved significant improvements in complex reasoning, instruction following, contextual learning, coding, agent capabilities, and inference performance.
In February 2026, Tencent HunYuan rebuilt the infrastructure for pre-training and reinforcement learning, along with three principles for model practicality:
1. Systematized capabilities: Not favoring 'one-sided expertise' because even a single application like a code agent involves deep collaboration across reasoning, long-form content, instructions, dialogue, coding, tools, and more.
2. Authentic evaluation: Actively stepping away from public leaderboards that are easily 'gamed,' assessing and improving the model's 'real-world effectiveness' through self-created questions, the latest exams, human evaluations, product beta testing, and other methods.
3. Cost-effectiveness pursuit: Practicality cannot exist without commercial reasonableness; deeply synergizing model architecture and inference framework design to significantly reduce task costs, making intelligence both affordable and effective.
Hy3 preview can be seen as the starting point for HunYuan's rapid exploration of large-scale practical models and solving real-world problems.
Tencent's Chief AI Scientist, Yao Shunyu, stated that Hy3 preview is the first step in rebuilding the HunYuan large model. We hope that through this open-source release, we will receive genuine feedback from the open-source community and users to help us enhance the practicality of the official Hy3 version. At the same time, we are continuing to expand the scale of pre-training and reinforcement learning, raising the model’s intelligence ceiling, and continuously improving its overall performance in real-world scenarios through deep Co-Design with many of Tencent's products, while beginning to explore specialized model capabilities.
Currently, Hy3 preview has launched on Tencent Cloud, Yuanbao, IMA, CodeBuddy, WorkBuddy, QQ, QQ Browser, Tencent Docs, and Tencent Enjoyment, among others. Mainline products such as WeChat Official Accounts, Peacekeeper Elite, Tencent News, Tencent Self-Select Stocks, Tencent Customer Service, and WeChat Reading are also gradually rolling out. Additionally, Hy3 preview supports integration with popular open-source agent products like OpenClaw, OpenCode, and KiloCode, and has been listed on the Tencent Cloud Large Model Service Platform TokenHub.
Hy3 preview focuses on comprehensive practicality, with a significant boost in Agent capabilities.
Multiple evaluations show that Hy3 preview has achieved a comprehensive improvement in model capabilities.
1. Outstanding contextual learning and instruction-following abilities
In various real production and life scenarios, understanding complex and lengthy contexts while adhering to intricate and ever-changing rules is the primary challenge for models. Inspired by Tencent's business scenarios, Tencent HunYuan introduced CL-bench and CL-bench-Life to innovatively assess the model’s contextual learning capabilities, significantly enhancing its contextual learning and instruction-following abilities in Hy3 preview.

2. Outstanding complex reasoning ability, achieving the highest domestic score in Tsinghua University’s mathematics doctoral qualification exam
Complex reasoning ability is fundamental for a model to solve various problems. Hy3 preview excelled in high-difficulty STEM reasoning tasks such as FrontierScience-Olympiad and IMOAnswerBench. It also achieved excellent results in the latest mathematics doctoral qualifying exam (Spring 2026) of Tsinghua University’s Qiuzhen Academy and the National High School Biology Olympiad (CHSBO 2025), showcasing generalizable and robust reasoning capabilities.

3. The most significant improvements were seen in coding and agent performance, demonstrating high cost-effectiveness
Coding and agents are the areas where Hy3 preview has shown the most notable progress. Thanks to the reconstruction of pre-training and reinforcement learning frameworks, along with an increase in the scale of reinforcement learning tasks, Tencent HunYuan rapidly achieved competitive results in mainstream code agent benchmarks like SWE-Bench Verified and Terminal-Bench 2.0, as well as in major search agent benchmarks such as BrowseComp and WideSearch.

In the digital world, coding focuses on the model's execution capability within development environments, while search emphasizes retrieval, filtering, and integration abilities in open information spaces. Together, they determine whether the model truly possesses usability in complex agent scenarios (e.g., OpenClaw). Hy3 preview performed exceptionally in evaluations like ClawEval and WildClawBench, indicating that our agent capabilities are steadily progressing towards being comprehensive and practical.

Beyond public leaderboards, Tencent HunYuan has further developed multiple internal evaluation sets to assess the model’s performance in real-world development scenarios. Results indicate that Hy3 preview demonstrated strong competitiveness across backend engineering task sets like Hy-Backend, user interaction-focused development benchmarks such as Hy-Vibe Bench, and highly challenging software engineering task sets like Hy-SWE Max.

When comparing the size and overall agent performance of various open-source models, Hy3 preview demonstrates high cost-effectiveness.

Tencent's core businesses have been fully integrated, with multi-threaded AI product validation yielding clear benefits.
Prior to official launch, Hy3 preview underwent product testing across Tencent’s key AI operations, delivering clearly positive returns.
On the YBD platform, HunYuan and YBD engaged in deep Co-Design. On one hand, they specifically enhanced the model's performance in hardcore metrics such as intent understanding accuracy, text creation quality, and deep search capabilities. On the other hand, they conducted meticulous optimizations in writing style, penmanship, emotional intelligence, content organization, and content professionalism. The deep collaboration between the model and the product has brought users a smarter and more 'human-like' interactive experience.
In the ima knowledge base Q&A and general Q&A scenarios, test results show that Hy3 preview excels in handling long texts, particularly in retrieval tasks, performing well in terms of answer accuracy, coverage, and comprehensiveness.
In the CodeBuddy and WorkBuddy products, Hy3 preview reduced first token latency by 54%, end-to-end duration by 47%, and increased success rate to over 99.99%. In real user environments, Hy3 preview has stably driven complex Agent workflows of up to 495 steps, covering diverse office scenarios such as document processing, data analysis, knowledge retrieval, and MCP toolchain orchestration.
In the special evaluation of AI avatars for official accounts and AI customer service scenarios, Hy3 preview demonstrated a more comprehensive capability upgrade compared to Hy2. The new model shows greater maturity in understanding user intents, managing complex contexts, and organizing knowledge information. When facing vague questions, short follow-ups, and multi-turn dialogues, it can grasp user needs more accurately and provide clearer and more stable responses. When generating answers based on knowledge bases, user memory, and context, it aligns better with the roles of AI avatars and AI customer service, significantly reducing excessive speculation, subjective assumptions, and emotional expressions, making the overall interaction experience closer to the goal of 'reliable, natural, and efficient' responses.
In the Peacekeeper Elite AI NPC scenario evaluation, the Peacekeeper Elite team quickly integrated Hy3 preview into the AI NPC scenario after its release and conducted evaluations. The overall performance was impressive. In out-of-game character role-playing scenarios, Hy3 Preview not only accurately understands character settings but also provides highly relevant and incrementally valuable content for open-ended questions, offering a more realistic, natural, and immersive conversational experience. In complex battle scenarios within the game, the model's response pace closely mimics real player chat experiences, demonstrating excellent stability and outstanding anthropomorphic role-playing abilities, with remarkable overall effectiveness.
In the Tencent Docs AI PPT scenario, significant progress has been made compared to the previous version (Hy2): generation success rate increased by 20%, evaluation score improved by 10%, and generation time decreased by 20%. Overall, the new model performs excellently in evaluation scenarios, showing outstanding performance in multiple stages such as template selection, color matching, outline generation, and content supplementation, with no hallucinations, alignment with the theme, and good visual effects.
In the QQ AI Assistant XiaoQ product evaluation, compared to the previous version, there have been significant improvements in first-byte latency for long texts, overall response speed, and streaming output efficiency. In core competencies, mathematical reasoning performance has notably improved, and the ability to follow and generalize multi-scenario instructions has further enhanced. It performs more stably and efficiently in tool invocation reasoning and multi-turn reference resolution, achieving outstanding results in the OpenClaw official PinchBench QQ agent scenario test, resulting in a noticeable leap in overall experience.
Reasoning efficiency increased by 40%, achieving optimal intelligent density at equal cost.
Thanks to the deep collaboration between the model and the reasoning framework, as well as comprehensive optimizations in the reasoning framework, operator performance, and quantization algorithms, overall reasoning efficiency increased by 40%, and the cost of Hy3 preview significantly decreased compared to the previous generation model.
On Tencent Cloud's large model service platform TokenHub, Hy3 preview input price is as low as 1.2 yuan per million tokens, cached input price is 0.4 yuan per million tokens, and output price is as low as 4 yuan per million tokens. Meanwhile, Tencent Cloud, in collaboration with HunYuan, launched a customized Hy3 preview Token Plan package, with the personal edition priced as low as 28 yuan per month, providing a more cost-effective choice for Agent development and creating 'Lobster' applications.


Risk Disclaimer: The above content only represents the author's view. It does not represent any position or investment advice of Futu. Futu makes no representation or warranty.Read more
Comments (13)
to post a comment
30
35
