字母榜精选

wrote a column ·

Why must the big tech companies compete to recruit Guo Daya?

A piece of news has quietly spread in the AI community: DeepSeek researcher Guo Daya has resigned.

The immediate reaction from most people was, “Who? Who is Guo Daya?”

This is not difficult to understand, as Guo Daya is far less well-known than the founder Liang Wenfeng and the 'AI prodigy girl' Luo Fuli.

However, in terms of academic research and contributions to DeepSeek's large model, Guo Daya surpasses the latter two by a wide margin.

As of the time of writing, Guo Daya’s published papers have been cited more than 37,000 times, far exceeding those of his peers.

Guo Daya’s h-index is 37, and his i-10 index is 46, indicating that he not only maintains very stable academic output but has also published numerous highly influential papers.

It can even be said that if you are familiar with Guo Daya’s research direction and the studies he led, you will realize that the emergence of the DeepSeek moment years ago would not have been possible without his significant contributions.

So where did he go? There are currently two rumors: one says Baidu, the other says ByteDance.

In fact, these top talents who joined ByteDance in the past two years—Zhou Chang, Yu Bowen, and Jiang Lu—are all focused on the video direction.

Guo Daya is different. He specializes in code intelligence and mathematics, which allows him to strengthen ByteDance’s capabilities in Vibe Coding and AGI, two critical areas.

If he were going to Baidu, that would make sense too. In March, Wenxin Kuaicode completed its version 4.0 iteration and introduced a multi-agent collaborative end-to-end development feature.

But do you know when Wenxin Kuaicode 3.0 was released? It was November 2024. The gap between these two major versions spans over a year, which is uncommon in the AI community where updates typically occur on a weekly basis.

From this perspective, Baidu actually needs Guo Daya more than ByteDance does.

However, for DeepSeek, which has delayed the release of V4, Guo Daya's departure undoubtedly adds insult to injury.

About Guo Daya

Guo Daya was born in Zhuhai, Guangdong in 1995 and entered Sun Yat-sen University's School of Data Science and Computer Science in 2014. In his senior year, he was selected for the joint Ph.D. program between Sun Yat-sen University and Microsoft Research Asia. Under the guidance of Professor Yin Jian and Dr. Zhou Ming, he pursued his doctoral degree with a focus on natural language processing.

In 2020, he received the Microsoft Research Fellowship, an award granted annually to only 12 PhD students in the Asia-Pacific region. After completing his PhD in 2023, he joined DeepSeek as a researcher, focusing on code intelligence and large language model reasoning.

A notable detail from Guo Daya's PhD period is worth mentioning. During his internship at Microsoft Research Asia, he published papers at both EMNLP and NeurIPS, two top-tier conferences.

According to Sun Yat-sen University's graduation requirements, Guo Daya had already fulfilled the most challenging paper publication requirement for his PhD on the third day of his enrollment.

He himself mentioned this during an interview. So let’s take a look at one of his most influential works together.

In 2020, Guo Daya, as a co-first author, published CodeBERT at EMNLP 2020. The other co-first author was Feng Zhangyin from Harbin Institute of Technology.

CodeBERT was the first state-of-the-art (SOTA) model to demonstrate cross-language generalization through bimodal pre-training, capable of handling both natural language and programming languages. Prior to this, while pre-trained models like BERT had achieved success in natural language processing, research on pre-trained models for programming languages remained relatively limited.

The core innovation of CodeBERT lies in the introduction of a replaced token detection task. Traditional masked language modeling can only utilize paired natural language-code data, whereas replaced token detection draws inspiration from ELECTRA by training the model to detect plausible but incorrect replacement tokens generated by a generator.

This allows CodeBERT to leverage a vast amount of unimodal code data, significantly expanding the scale of training data. The model achieved the best performance at that time in code search and code documentation generation tasks.

To put it another way, CodeBERT enables AI to understand human language descriptions as well as the logical structure of code. For instance, if you say in Chinese, 'Find me a sorting algorithm,' it can help locate relevant code; or if you provide a piece of code, it can explain in plain language what the code does.

Although this may not seem impressive now, this paper was published in 2020. At that time, code was just code, and natural language was natural language, with a vast gap between the two.

Therefore, the emergence of CodeBERT can actually be regarded as the beginning of today's Vibe Coding.

After joining DeepSeek, if you ask Daya Guo about the most influential academic achievement he has participated in, it would definitely be DeepSeek-R1.

But if you ask which one ranks second, let me tell you that the answer is not DeepSeek-V3, but DeepSeekMath.

The citation count for DeepSeek-V3's technical report is 3,890, while the citation count for DeepSeekMath is 5,182.

In February 2024, Daya Guo, as a core contributor, participated in the development of DeepSeekMath. This project came after Guo published DeepSeek-Coder as the first author. The goal of this project was to enhance the mathematical reasoning capabilities of large language models.

The key innovation of DeepSeekMath is the introduction of GRPO, or Group Relative Policy Optimization.This is a variant of Proximal Policy Optimization (PPO).

Traditional PPO requires training an independent value function model, which increases memory usage and computational overhead.

Therefore, GRPO completely abandoned the reliance on an independent value function model, instead estimating advantages through relative comparisons within the group, thus reducing the demand for training resources.

The workflow of GRPO is as follows: For the same math problem, the model generates multiple candidate answers, then ranks these answers based on their correctness, increasing the probability of correct answers and decreasing the probability of incorrect ones.

This way, the large model will know how to handle similar problems the next time it encounters them.

This method does not require an additional value network, only a reward function capable of verifying the correctness of answers. In mathematical reasoning tasks, the reward function can directly check whether the final answer is correct.

It’s like enabling the model to make judgments quickly and accurately while saving memory usage.

The DeepSeekMath-RL 7B, optimized with GRPO reinforcement learning, achieved a score of 51.7% on the MATH benchmark without external tools or voting ensembles. This performance is comparable to contemporaries like Gemini-Ultra and GPT-4, which was quite rare in the open-source large model field at the time.

As a result, GRPO was later also applied to the training of DeepSeek-R1.

Where did Guo Daya go?

In fact, if we look at it from the perspective of technical contributions, although Liang Wenfeng and Luo Fuli are more well-known, Guo Daya's contribution to DeepSeek surpasses both of them, which contrasts with public perception.

Liang Wenfeng's role was to provide research direction and resource support. His name often appears toward the end of multiple research papers, indicating he is not the primary technical contributor.

Although Luo Fuli also participated in the development of DeepSeek-V2, her name is not listed as a core contributor in the author list of the relevant paper.

To bring it back to the essence of an agent, it boils down to three things: code understanding, code generation, and program synthesis.

Coincidentally, starting from CodeBERT, Guo Daya's research direction has been exactly this.

Therefore, I believe that Guo Daya's departure will have a significant impact on DeepSeek.

In addition to the previously mentioned CodeBERT, Guo Daya also led the development of GraphCodeBERT and DeepSeek-Coder.

The former enables AI to understand the dependency relationships between variables in code. For example, modifying 'a' affects 'b', and modifying 'b' in turn affects 'c'. This is particularly helpful for code refactoring and bug fixing.

The latter supports multiple programming languages and longer contexts, allowing the model to understand an entire project’s code architecture at once. The performance of DeepSeek-Coder-V2 is comparable to GPT-4 Turbo during the same period.

In 2024, ByteDance poached Zhou Chang from Alibaba. While at Alibaba’s Tongyi Lab, Zhou was responsible for multimodal research, and after joining ByteDance, he became the head of Seed's visual multimodal team, leading the development of Seedream and Seedance.

During the 2026 Spring Festival, Zhou Chang delivered his first major achievement at ByteDance: Seedance 2.0, which gained global attention with its “director-level” video generation capabilities.

Now, there are rumors that ByteDance might be one of Guo Daya's potential employers. Zhou Chang's focus is on strengthening multi-modal visual capabilities, and if the rumors are true, Guo Daya’s goal would be to enhance coding intelligence and reasoning abilities.

ByteDance's Seed team underwent a restructuring in early 2025.

After Wu Yonghui took over, he broke down data barriers between model departments and established a three-tier structure: the Edge team handles long-term AGI research, the Focus team tackles core technology challenges, and the Base team ensures stable delivery of the current generation of models.

From a technical alignment perspective,Guo Daya is most likely to be responsible for the full-chain technology breakthroughs in large code models.

For example, leading the next iteration of ByteCode-LLM.

Because Guo Daya excels in core technologies such as pre-training architecture optimization, ultra-long context adaptation, and multilingual support, he is highly likely to bring ByteDance a 'project-level' code generation agent.

The second key direction is reasoning capability.

One of the core strategies of ByteDance’s Seed team at present is the development of a general reasoning large model similar to o1. If Guo Daya joins, it would directly bring ByteDance the industry's most mature GRPO implementation experience.

He will most likely take charge of reinforcement learning algorithm development for reasoning directions, optimizing DouBao's large model capabilities in mathematical reasoning, multi-step logical reasoning, and complex task decomposition.

The third direction is the research and development of specialized models for mathematical reasoning.

Mathematical reasoning ability forms the core foundation of a large model’s general logic capabilities. The Edge team, specially established by the Seed team, focuses on long-term AGI fundamental research spanning more than three years.

Additionally, short-term quarterly assessments have been eliminated, allowing the team to invest resources into exploratory research.

The DeepSeekMath project that Guo Daya participated in aligns perfectly with this direction.

If Guo Daya chooses to join Baidu, he could also play a significant role in the field of code intelligence.

As mentioned earlier, Wenxin FastCode completed a major upgrade in March 2026, focusing on multi-agent collaboration.

Its specific collaboration method is as follows: The Plan agent handles requirement clarification and task planning, while the Architect agent breaks down complex tasks through the SubAgents mechanism. Each sub-agent has its own independent context to address the 'forgetting' issue in long Context scenarios.

Guo Daya happens to have researched every node in this full-link process.In other words, what Baidu is currently doing completely overlaps with Guo Daya’s research focus.

Therefore, if he joins the Wenxin team, he will likely be responsible for optimizing the collaboration mechanisms between these agents to enhance the accuracy of code generation and improve project-level understanding capabilities.

When it comes to poaching core talent from DeepSeek, Alibaba cannot be overlooked. On March 1, 2026, Lin Junyang, the technical lead of Alibaba's Tongyi Qwen, unexpectedly announced his resignation on a social platform.

Even worse, apart from Lin Junyang, Qwen lost several key technical personnel in 2026, including Yu Bowen, the post-training lead, and Hui Binyuan, the code model lead.

At this critical juncture, Alibaba CEO Wu Yongming made a strong statement during the earnings call on March 19, emphasizing that the ATH division’s top priority is to 'build the most intelligent model.'

At this point, if Alibaba can bring in a top expert like Darya, it would be killing two birds with one stone. It would not only fill the technical gap left by Junyang Lin’s departure but also directly acquire core experience in code intelligence and reasoning capabilities from DeepSeek.

To be honest, compared to DeepSeek's long-delayed V4, Darya has every reason to achieve something at ByteDance, Baidu, or Alibaba while she is still young.

Will DeepSeek-V4 ever arrive?

In early January 2026, foreign media cited two insiders claiming that DeepSeek plans to launch its next-generation flagship model, V4, during the Spring Festival. The report also mentioned that V4's programming capabilities have surpassed Claude 3.5 Sonnet and GPT-4o in internal testing.

This news excited the entire AI community.

Since the release of DeepSeek-R1 on January 20, 2025, DeepSeek has not released any major version updates, with the latest model only reaching DeepSeek-V3.2.

The Spring Festival came, but V4 didn't.

On February 11, users noticed that the version number of DeepSeek's app was updated to 1.7.4, with the context window increasing from 128K to 1M, and the knowledge base cutoff date updated to May 2025.

The community immediately buzzed with speculation that this could be the rumored gray-scale testing of V4.

However, sources close to DeepSeek quickly provided a negative response: 'This is not V4; it's just a minor version update.'

Foreign media later reported that DeepSeek would release V4 on March 2. The report also stated that V4 would be optimized for domestic chips and would be the first version in the series to be fully based on the domestic computing power ecosystem.

This news was quickly picked up by a large number of domestic media outlets.

March 2 arrived, and nothing happened.

The very next day, there were reports suggesting that V4 was 'highly likely' to be released within the week. Leaked benchmark data showed that V4 scored 90% on HumanEval, compared to 82% for DeepSeek V3.

In fact, HumanEval is just an entry-level code generation benchmark test, which has now reached a point of metric saturation, making it unable to fully distinguish the real capabilities of top-tier code models. So at that time, I thought this was fake news.

Sure enough, as we approached April, V4 still hadn't appeared.

There are now reports claiming that DeepSeek-V4 is expected to officially launch in April 2026, with a focus on enhancing long-term memory capabilities and deep integration with domestic chips.

The reasons for the delay vary. Some say it’s due to delays caused by the model's growing size, while others claim the integration of multimodal functions is more complex than expected.

However, one detail worth noting is that one of V4's core selling points is its 'superior programming capability.' According to leaked information, V4 can handle logic chains involving 300,000 lines of code.

And Guo Daya is precisely the key figure behind DeepSeek in this area.

DeepSeek has a small core research team. From the list of paper authors, there are no more than 20 recurring names.

In such an elite small team, every member is indispensable, not to mention someone like Guo Daya, who is an exceptional expert.

The logic is clear: if V4 succeeds, it means DeepSeek has found a replacement or the team has successfully completed the technical transition.

If V4 is delayed again, or its code capability does not meet expectations, the impact of Guo Daya's departure will truly become evident.

From the current situation, DeepSeek is undergoing a severe test. It needs to prove that, even after losing key talent, it can still maintain its pace of technological innovation. $DeepSeek Beneficiaries (LIST23585.US)$$DeepSeek Beneficiaries$$DeepSeek Beneficiaries$$Alibaba (BABA.US)$$BABA-W (09988.HK)$$BABA-WR (89988.HK)$$ByteDance (FT0001)$$Baidu (BIDU.US)$$BIDU-SW (09888.HK)$$BIDU-SWR (89888.HK)$$AI (LIST0535.SH)$$Artificial Intelligence (LIST2136.US)$$Artificial Intelligence (LIST23586.HK)$

30K Views