English
Back
Open Account
PANews
wrote a column · Jun 5 15:46

Worried about AI self-evolving, is Anthropic planning to halt training?

On May 4, 2026, Jack Clark, co-founder of Anthropic, posted a message on social platform X. He wrote: 'I now believe there's a 60% chance that recursive self-improvement will occur by the end of 2028.' Within minutes of the post, Eliezer Yudkowsky, a long-time active researcher in AI safety, replied: 'Then we will all perish together.' He immediately followed up with an analogy pointing to the design flaws of the Chernobyl RBMK nuclear reactor, implying that no one truly knows how to stop the system currently being activated. This exchange, completed in mere seconds, acted like a match striking light on discussions previously hidden within technical papers and internal assessments. Recursive Self-Improvement (RSI)—where an AI system not only optimizes its outputs but also autonomously improves its own improvement process, ultimately creating successor systems more powerful than itself—was a concept long relegated to the fringes of theory. Now, Anthropic’s co-founder has placed it on a countdown clock with a 60% probability of occurring before the end of 2028. A month later, Anthropic officially published a lengthy article titled 'When AI Builds Itself.' The piece was authored by Marina F...
On May 4, 2026, Jack Clark, co-founder of Anthropic, posted a message on social platform X. He wrote: 'I now believe there's a 60% chance that recursive self-improvement will occur by the end of 2028.'
Within minutes of the post, Eliezer Yudkowsky, a long-time active researcher in AI safety, replied: 'Then we will all perish together.' He immediately followed up with an analogy pointing to the design flaws of the Chernobyl RBMK nuclear reactor, implying that no one truly knows how to stop the system currently being activated.
This exchange, completed in mere seconds, acted like a match striking light on discussions previously hidden within technical papers and internal assessments. Recursive Self-Improvement (RSI)—where an AI system not only optimizes its outputs but also autonomously improves its own improvement process, ultimately creating successor systems more powerful than itself—was a concept long relegated to the fringes of theory. Now, Anthropic’s co-founder has placed it on a countdown clock with a 60% probability of occurring before the end of 2028.
A month later, Anthropic officially published a lengthy article titled 'When AI Builds Itself.' Co-authored by Marina Favaro and Jack Clark and released by the Anthropic Institute—established just in March—the article delivered to the public a precisely calibrated acceleration signal card. Using a series of previously undisclosed internal data and a carefully crafted narrative structure, Anthropic conveyed two clear messages: 'We haven't arrived there yet,' and 'But it may arrive faster than most institutions are prepared for.'
In the same month, DeepMind CEO Demis Hassabis used a phrase on the Google I/O stage that had never appeared publicly before: humanity stands at the 'foothills of the singularity.' In a subsequent interview, he adjusted his timeline for artificial general intelligence (AGI) from 'shortly after 2030' to '2029 is a realistic possibility,' openly acknowledging that his dramatic phrasing was 'deliberately provocative,' intended to instill a sense of urgency among governments, economists, and the public.
Two leading institutions that have long positioned themselves as voices of caution and safety in the AI industry nearly simultaneously recalibrated the volume and tone of their public messaging. The timing itself warrants scrutiny as an independent event.
Anthropic’s long-form post published on June 4 opened by clearly stating its narrative objective. It sought not merely to illustrate a technical trend but to demonstrate a directional, accelerating process. To this end, it unveiled a set of previously undisclosed internal data.
On May 4, 2026, Jack Clark, co-founder of Anthropic, posted a message on social platform X. He wrote: 'I now believe there's a 60% chance that recursive self-improvement will occur by the end of 2028.' Within minutes of the post, Eliezer Yudkowsky, a long-time active researcher in AI safety, replied: 'Then we will all perish together.' He immediately followed up with an analogy pointing to the design flaws of the Chernobyl RBMK nuclear reactor, implying that no one truly knows how to stop the system currently being activated. This exchange, completed in mere seconds, acted like a match striking light on discussions previously hidden within technical papers and internal assessments. Recursive Self-Improvement (RSI)—where an AI system not only optimizes its outputs but also autonomously improves its own improvement process, ultimately creating successor systems more powerful than itself—was a concept long relegated to the fringes of theory. Now, Anthropic’s co-founder has placed it on a countdown clock with a 60% probability of occurring before the end of 2028. A month later, Anthropic officially published a lengthy article titled 'When AI Builds Itself.' The piece was authored by Marina F...
The first set of figures points to a structural shift: as of May 2026, over 80% of merged code in Anthropic’s codebase was written by Claude. Two years earlier, that figure was in the low single digits. The same dataset also showed that in Q2 2026, a typical Anthropic engineer merged eight times as much code per day as in 2024.
One can imagine how anyone unfamiliar with deep developments in the AI industry would react upon reading these two numbers for the first time. Yet Anthropic itself acknowledged several important caveats in footnotes: leadership had previously estimated that if scripts and experimental code were included, the share of code written by Claude exceeded 90%; the 80% figure reflects a more conservative metric limited to merged code; lines of code 'are an imperfect measure' and may overstate actual productivity gains; and the code attribution pipeline itself 'has gaps.'
The very way these footnotes are written merits analysis. On the surface, they appear as honest concessions, but in practice, they serve to make the figures in the main text seem carefully self-filtered, thereby enhancing their credibility. This constitutes a dual-layer narrative structure: the main text delivers the signal, while the footnotes provide disclaimers.
The second set of figures concerns speed. On code optimization tasks, Claude Opus 4 achieved approximately a 3x speedup in May 2025—performance that would take skilled human researchers 4 to 8 hours to match. By April 2026, Claude Mythos Preview pushed this multiplier to roughly 52x. The maximum duration AI could work independently on a task also doubled every four months—from 4 minutes in March 2024 to 12 hours by March 2026. The very fact that this doubling occurs every four months creates an easily memorable, geometrically escalating narrative hook.
On May 4, 2026, Jack Clark, co-founder of Anthropic, posted a message on social platform X. He wrote: 'I now believe there's a 60% chance that recursive self-improvement will occur by the end of 2028.' Within minutes of the post, Eliezer Yudkowsky, a long-time active researcher in AI safety, replied: 'Then we will all perish together.' He immediately followed up with an analogy pointing to the design flaws of the Chernobyl RBMK nuclear reactor, implying that no one truly knows how to stop the system currently being activated. This exchange, completed in mere seconds, acted like a match striking light on discussions previously hidden within technical papers and internal assessments. Recursive Self-Improvement (RSI)—where an AI system not only optimizes its outputs but also autonomously improves its own improvement process, ultimately creating successor systems more powerful than itself—was a concept long relegated to the fringes of theory. Now, Anthropic’s co-founder has placed it on a countdown clock with a 60% probability of occurring before the end of 2028. A month later, Anthropic officially published a lengthy article titled 'When AI Builds Itself.' The piece was authored by Marina F...
Another dataset comes from an internal survey of 130 Anthropic research team members conducted in March 2026. The median respondent estimated that output using Mythos Preview was about four times higher than without AI. Once again, a footnote noted that prior independent research by METR suggested developers generally overestimate AI-driven productivity gains. The same dual-layer structure reappears.
The third set of figures indicates AI is approaching the judgment boundary of human researchers. In November 2025, Claude Opus 4.5 outperformed human researchers in selecting research directions 51% of the time. By April 2026, that figure rose to 64%. The sample comprised 129 cases, and Anthropic clarified in a footnote that these cases were deliberately selected moments where human choices had room for improvement.
Any single data point can be interpreted within multiple explanatory frameworks. But taken together, they point consistently in one direction: the pace is accelerating, the gap is narrowing, and all of this is happening within Anthropic’s own codebase and labs—not as theoretical extrapolation on some external benchmark.
After presenting this data, the long-form article outlines three possible future scenarios.
The first scenario is a stagnation of trends, entering a plateau phase of the S-curve. Anthropic states, 'We do not believe this is very likely.'
The second scenario involves compounding efficiency gains, where AI continuously replaces humans across broader R&D functions, though humans still set direction and define success criteria. Anthropic assesses this as 'evidence suggests we are likely heading toward this scenario.'
The third scenario is full recursive self-improvement, where AI autonomously designs, trains, and deploys successor systems more powerful than itself, with humans no longer in the loop. The wording used is 'possible.'
The ordering and tonal weighting of these three scenarios form a complete narrative gradient. The first is downplayed, serving to accommodate skeptics; the second is anchored in 'evidence,' lending the piece a rational veneer; and the third—through phrases like 'possible' and the conditional 'if current technical trends continue'—pushes the boldest hypothesis to the edge of the reader’s imagination without requiring Anthropic to bear the burden of proof.
On May 4, 2026, Jack Clark, co-founder of Anthropic, posted a message on social platform X. He wrote: 'I now believe there's a 60% chance that recursive self-improvement will occur by the end of 2028.' Within minutes of the post, Eliezer Yudkowsky, a long-time active researcher in AI safety, replied: 'Then we will all perish together.' He immediately followed up with an analogy pointing to the design flaws of the Chernobyl RBMK nuclear reactor, implying that no one truly knows how to stop the system currently being activated. This exchange, completed in mere seconds, acted like a match striking light on discussions previously hidden within technical papers and internal assessments. Recursive Self-Improvement (RSI)—where an AI system not only optimizes its outputs but also autonomously improves its own improvement process, ultimately creating successor systems more powerful than itself—was a concept long relegated to the fringes of theory. Now, Anthropic’s co-founder has placed it on a countdown clock with a 60% probability of occurring before the end of 2028. A month later, Anthropic officially published a lengthy article titled 'When AI Builds Itself.' The piece was authored by Marina F...
At the very core of the entire article, Anthropic’s stance is distilled into a single sentence: 'We are not there yet, and recursive self-improvement is not inevitable. But it may arrive faster than most institutions are prepared for.'
If the long-form article published on June 4 is a carefully composed snapshot, placing that snapshot onto a timeline reveals a much longer trajectory.
In 2023, Anthropic released its Responsible Scaling Policy (RSP). The core commitment of this policy document is: if a model’s capabilities exceed the company’s ability to control it safely, the company will pause training of more powerful models. This was not merely a verbal statement but an internal governance document with defined evaluation frameworks and trigger conditions. For a time, this document was regarded by the AI safety community as an actionable example of 'voluntary regulation.'
In 2024, CEO Dario Amodei published a widely circulated article suggesting the possibility that 'powerful AI' would arrive by 2027. At that time, Anthropic still presented itself as an independent safety-focused entity, maintaining a restrained stance toward narratives of scaling up and acceleration.
On January 26, 2026, Amodei posted a 38-page essay titled 'The Adolescence of Technology' on his personal website. In it, he made a judgment that would be repeatedly cited thereafter: 'Because AI is now writing most of the internal code at Anthropic, it is already substantially accelerating our progress in building the next generation of AI systems. This feedback loop is gaining strength month by month, and we may be only one to two years away from the point where current-generation AI autonomously builds the next generation.' In the same essay, he described the forthcoming 'powerful AI' as a 'nation of geniuses inside data centers.'
This marked nearly the starting point for Anthropic’s systematic signaling that a 'self-improving feedback loop' was already underway. The timing of this blog post coincided precisely with the company’s trajectory from a $350 billion valuation toward an even higher valuation tier.
Less than a month later, the turning point arrived.
On February 25, 2026, CNN reported that Anthropic had revised its Responsible Scaling Policy, removing its core commitment to 'pause training more powerful models if capabilities exceed safety control capabilities,' replacing it instead with a non-binding 'Frontier Safety Roadmap.' In the same week, U.S. Secretary of Defense Pete Hegseth issued an ultimatum to Dario Amodei: retract the safety red lines or lose a $200 million Department of Defense contract.
The report quoted Anthropic’s Chief Scientist Jared Kaplan’s response to Time magazine: 'We believe halting model training actually helps no one… if competitors are sprinting full speed ahead.' The phrasing here is particularly noteworthy. 'Helps no one' is not a technical argument but a statement framed in terms of stakeholder博弈 (strategic positioning). Meanwhile, 'if competitors are sprinting full speed ahead' structurally mirrors the narrative that 'unilateral pauses only allow the least cautious players to catch up'—it replaces the original pause logic, which was anchored in one’s own safety capacity, with a speed logic anchored in competitors’ actions.
Anthropic still emphasized in the CNN report that it retained two red lines: not using AI systems to control weapons systems and not deploying them for large-scale domestic surveillance. This is significant because it shows Anthropic did not abandon its safety stance wholesale but made selective concessions and坚守 (holds) across different safety dimensions. Yet this selectivity itself is a key clue in narrative strategy analysis: where it conceded and where it held firm reveals how safety boundaries have been recalibrated.
On March 11, the Anthropic Institute was officially established, led by Jack Clark and positioned as a 'public interest research institute.' Less than two months later, on May 4, Clark posted the now-famous '60%' tweet.
When this timeline is laid out side by side, neither the signal density nor the release cadence appears random. From the personal essay in January foreshadowing developments, to the policy revision in February, the institute’s founding in March, the founder’s probabilistic forecast in May, and the official long-form publication in June—this forms a clear, escalating narrative pipeline with carefully calibrated messaging. One cannot directly conclude from this sequence that 'everything was pre-planned,' but the pattern itself poses a critical question analysts must confront: does this sense of rhythm indicate that Anthropic has incorporated the 'acceleration narrative' into its public communications management?
If only Anthropic were adjusting its narrative口径 in the first half of 2026, analysts would have sufficient reason to focus on the internal decision-making logic of that single company. However, DeepMind CEO Demis Hassabis made a nearly simultaneous and directionally aligned adjustment, undermining the claim that this was merely an isolated corporate case.
On January 20 at the Davos Forum, Hassabis maintained his long-standing view: a 50% probability of achieving AGI by 2030. Just three weeks later, on February 18 at the AI Impact Summit in India, he softened his stance, stating, 'AGI could arrive within five years.'
From May 20 to 22 at Google I/O, Hassabis declared in his keynote address that humanity stands at 'the foothills of the singularity.' Around the same time, OpenAI released GPT-5.3-Codex, describing the model as having 'played a critical role in its own creation,' including assisting with debugging the training process, managing deployment, and analyzing evaluation results. The timing gap among the three leading labs during this window narrowed to just weeks.
Following Google I/O, Hassabis gave an interview to Axios—a segment later widely cited—where his most pivotal remark was an admission that using phrases like 'the foothills of the singularity' was 'deliberately provocative,' intended to spur governments, economists, and the public into recognizing the urgency of AI’s accelerating development. He also revised his AGI timeline from the previously stated 'shortly after 2030' to '2029 is a real possibility,' though the consensus expectation remains centered on 2030, plus or minus one year.
In an even more direct statement to The Seoul Economic Daily, Hassabis said, 'Five to ten years from now, when we look back at 2026 and 2027, we’ll say, “That was the moment we entered the AGI era.”'
The phrase 'deliberately provocative' deserves careful reflection. It is a rare instance where a key figure openly acknowledges their narrative intent. It concedes that at least some of the language used does not passively reflect technical reality but is instead an actively chosen communications tool. This admission itself doesn’t negate the possibility that Hassabis genuinely perceives a technological inflection point—but it clearly extracts 'narrative' from the shadow of 'fact,' making it an object that can be examined independently.
Hassabis’s self-explanation of his own phrasing opens a side door for interpreting this wave of synchronized signals. His 'deliberate provocation' and Anthropic’s use of 'footnoted disclaimers' in its lengthy data-driven argument both exhibit the same amphibious posture: one hand pushes signals strong enough to shock public discourse, while the other retains a safe retreat into 'this is just one possible scenario.'
As Anthropic and DeepMind jointly construct a narrative framework that 'AI is accelerating its own evolution,' external independent researchers have offered alternative interpretations of the same data and phenomena. These interpretations matter not because any one party holds the ultimate truth, but because they reveal just how wide the range of plausible explanations is within the official narrative itself.
The sharpest rebuttal came from Eliezer Yudkowsky. He not only responded directly to Jack Clark but continued voicing his concerns in multiple subsequent forums. MindStudio’s blog captured his full position: he compared current AI safety architectures to the Chernobyl RBMK reactor. The core of this analogy is that if control rods and accelerators are tied into the same system, attempts to slow it down may actually cause it to accelerate out of control.
Nathan Lambert of the Allen Institute for AI introduced the concept of 'Lossy Self-Improvement' (LSI). His argument directly challenges the 'accelerating flywheel' model: as systems grow increasingly complex, each generation’s improvement process incurs friction and loss—much like signal attenuation over long-distance transmission. By this logic, the very improvements that enable AI to write 80% or 90% of code cannot be infinitely replicated in subsequent generations, because those future systems will confront even more complex problem spaces, and the noise and errors inherent in AI outputs will be amplified across generations.
Dean Ball, a senior fellow at the Foundation for American Innovation, offered a more direct linguistic framework that effectively reduced the dimensionality of Anthropic’s data. He told IEEE Spectrum, 'Maybe they’ll eventually automate genius—but not next year. Next year, they’re automating grunt work.' This distinction cuts to the heart of the ambiguity surrounding '80% of code written by AI.' If AI is automating the repetitive patterns in codebases, bulk parameter generation, or end-to-end pipeline configuration, then these tasks indeed correspond to 'grunt work' within the context of software engineering. The remaining 20% likely involves architectural design, strategic judgment, and trade-offs made under incomplete information—these are the elements of genius.
David Scott Krueger of the University of Montreal, who also founded the AI safety nonprofit Evitable, proposed a red line for pausing development at '99% of code written by AI.' He told IEEE Spectrum, 'I think we might be crossing that line right now.' The tension between his framework and Anthropic’s own already-weakened pause commitment represents one of the most significant structural contradictions in this narrative cycle.
Jeff Clune, a computer scientist at the University of British Columbia, took a different stance in his interview with IEEE Spectrum. He said, 'We are at an inflection point for recursively self-improving systems.' If this statement proves true, it would mean Yudkowsky’s warnings were perfectly timed.
These four voices diverge in direction—and even within the same camp, there are internal tensions between more radical factions. Yet their commonality lies in the fact that none rely on official narrative frameworks; instead, each offers an independent assessment of the same phenomenon based on its own methodology. The very diversity and mutual contradiction among these judgments constitute the strongest rebuttal to the notion that any single narrative can fully encompass the truth.
In January 2026, Anthropic closed a funding round at a $350 billion valuation. Investors included Microsoft and NVIDIA. This figure had been previewed by some media outlets as early as late 2025, but its formal announcement coincided precisely with the release of Dario Amodei’s essay 'The Adolescence of Technology.'
In February, another $30 billion funding round was completed, keeping the valuation roughly in the $350 billion range. That same month, Anthropic revised its safety policy and removed its pause commitment. Meanwhile, a $200 million Pentagon contract hung in the balance.
In May, Reuters, The New York Times, and TechCrunch nearly simultaneously reported that Anthropic had closed a $65 billion funding round, reaching a valuation of $965 billion. This figure not only surpassed its own valuation from just two months prior but also exceeded OpenAI’s $852 billion valuation as of March 2026. The New York Times additionally cited Dario Amodei’s remarks at a developer conference, where he stated the company had achieved $30 billion in annualized revenue and even joked, 'I hope our 80x revenue growth this year doesn’t continue—it would be too crazy.'
On June 4, the Anthropic Institute published a long-form article titled 'When AI Builds Itself.'
Lining up these time points sequentially does not imply the existence of a precise arrow pointing on a chart. If someone claims there is a causal relationship among these events, they must provide direct evidence. Without internal decision-making records, no analyst can or should make such an assertion.
On the other hand, it would also be unreasonable to completely ignore and fail to document the temporal alignment of these events. Within just five months, a company saw its valuation surge from $350 billion to $965 billion—nearly tripling—while simultaneously undergoing a major shift in safety policy, constructing a narrative pipeline centered on 'acceleration signals' led by independent research institutions, and having its co-founder issue a prediction with a 60% probability. When all these developments are densely compressed into a six-month window, investors at least have the right to ask: to what extent, if any, did these signal releases serve the function of conveying the message to the market that 'we are at the forefront of acceleration'?
The value of analysis lies precisely in raising this question. There may never be a single definitive answer. But once the question is clearly posed, it cannot easily be withdrawn.
Global AI market financing reached $297 billion in the first quarter of 2026, with the top five deals accounting for a significant share of this total. At this level of funding, all frontier labs face the same pressure: you must convince investors that your technology curve will be steeper than your competitors’. Your risk warnings must also be loud enough so that when regulators eventually step in to set rules, your voice is already embedded in the policy framework. Your narrative must simultaneously be compelling enough to attract top researchers to your lab and alarming enough to preserve whatever residual credibility you still hold within the safety community.
These demands contain inherent contradictions. Anthropic’s narrative adjustments in the first half of 2026 can be seen as an attempt to recalibrate the linguistic balance point among these conflicting imperatives. The weakening of safety commitments, the strengthening of acceleration signals, and the repeated invocation of the argument 'we cannot unilaterally stop' together form a set of vectors pointing in the same direction.
We must return to the core question: do these signals more closely reflect a genuine technological inflection point, or are they primarily rhetorical upgrades aimed at capital and regulators?
Existing public evidence does not allow us to simply check one box over the other. This is because both interpretations rely on the same dataset. An 80% code contribution share, a 52x acceleration effect, and task durations doubling every four months can support either the claim that 'an inflection point is arriving' or the interpretation that 'we are communicating to the market a trend our own engineers have already personally observed.' The boundary between these two readings is blurred.
But some facts are certain and do not require taking sides between the two interpretations.
First, Anthropic’s narrative shift in the first half of 2026 was not an isolated case. DeepMind’s Hassabis made a directionally consistent, though differently calibrated, adjustment in nearly the same quarter. OpenAI’s Sam Altman stated at the India Summit that 'the world isn’t ready yet' and released GPT-5.3-Codex in February 2026, claiming it 'played a critical role in its own creation.' If only Anthropic were sending signals, one might analyze it purely as a corporate strategy. But when the top three labs all significantly amplify their messaging within a few tightly packed months, it constitutes an industry-wide narrative shift.
Second, the timing of these signals exhibits a precisely traceable correspondence with the cadence of financing activities, policy adjustments, and institutional restructurings. This correspondence itself does not need to prove anything; it only needs to be presented honestly. Once presented, each individual’s own methodology will determine how they interpret it next.
Third, Anthropic itself still labels the third scenario—namely, 'fully recursive self-improvement'—as 'possible,' not 'likely.' This indicates that, within the company’s own internal assessment framework, their acceleration narrative has not yet fully closed. The same forces that habitually lead them to include qualifiers in academic papers and blog posts continue to rein in their public statements.
Fourth, Hassabis’s admission of 'deliberate provocation' confirms a mechanism that, while widely suspected before, has rarely been explicitly articulated by a participant: at least some leaders of cutting-edge labs choose their wording with clear communicative intent. This means any interpretation of their statements must simultaneously analyze two layers—the factual claims they make, and the rhetorical strategies they employ in making those claims, treated as behavioral acts in themselves.
Those who have carefully read Anthropic’s full dataset receive a vastly different signal strength compared to those who only remember the two figures: '80% of code written by AI' and '52x acceleration.' Yet in this matter, 'how it is remembered' may deserve more analytical attention than 'what was actually said.'
This long article itself is a precise exemplar of the very phenomenon it describes. It constructs an imminent sense of acceleration through data, while preserving room to retreat via footnotes and qualifiers; it calls for global coordination and verifiable slowdowns, yet had already removed its pause commitment in prior policy revisions. This is neither hypocrisy nor simple inconsistency. It is a narrative balancing act performed by an institution navigating technological uncertainty, commercial pressures, and public responsibility. Hassabis’s confession of 'deliberate provocation' inadvertently confirms from a side door that such balancing tactics have become a consciously deployed method among leading labs.
Risk Disclaimer: The above content only represents the author's view. It does not represent any position or investment advice of Futu. Futu makes no representation or warranty.Read more
14K Views
Report
Comments
Write a Comment...
2