字母榜精选

wrote a column ·

Amazon has secured another ticket, this time in chips

Our chip business is booming.

This statement did not come from NVIDIA, Intel, Google, or Microsoft, but from Amazon CEO Andy Jassy’s latest shareholder letter.

He further added that the demand for Trainium chips is exploding.

This sounds somewhat counterintuitive.

Because over the past period of time, the story you’ve likely heard about Amazon probably wasn’t this one.

What you likely heard was about Amazon’s layoffs, its free cash flow being consumed by AI infrastructure, doubts about AWS’s growth rate, and how it, along with Oracle, was grouped into the category of 'AI subcontractors'.

In many people's minds, Amazon is not a company at the forefront of AI.

When it comes to models, OpenAI has ChatGPT, Anthropic has Claude, and Google has Gemini. If I ask you what Amazon’s strongest model is, many people might actually need to look it up first.

But in areas where no one was really paying attention, Amazon has quietly made its move.

This shareholder letter mentions that AWS's AI business has an annualized revenue exceeding $15 billion.

More crucially, Amazon's in-house chip business, including Graviton, Trainium, and Nitro, now has an annualized revenue surpassing $20 billion, with triple-digit year-over-year growth.

Andy stated that if this segment were spun off and sold directly to third parties like NVIDIA or Intel, its annualized scale could reach about $50 billion.

At this scale, it can no longer be called an 'internal cost-saving tool.' It’s genuinely a new business that has taken shape.

More importantly, Amazon isn’t just making chips anymore. It has underlying chips, data centers, AWS, Bedrock, deeply integrated clients like Anthropic, and external major clients such as OpenAI and Apple.

In other words, although the model itself may not stand out, Amazon is indeed a full-stack AI company.

Microsoft is still figuring out the core axis between OpenAI, Copilot, Azure, and Maia, but Amazon’s direction is becoming increasingly clear.

Let AI run as much as possible on AWS, consuming as many of AWS's own chips as possible, and ultimately regain the profit margins and control of the AI era for themselves.

That’s why I feel that chips are becoming Amazon's 'AWS.'

Take Anthropic, for instance. Their Claude model is now fully running on Amazon's chips.

In order to secure Anthropic as a major client, Amazon even built a dedicated AI computing cluster called Project Rainier.

This cluster is one of the largest known non-NVIDIA computing clusters globally. The New Carlisle campus in Indiana alone had about 500,000 Trainium2 chips deployed at the time.

By the end of 2025, the scale has expanded to 1 million chips. And the sole purpose of these chips is to run Claude.

With Amazon being so accommodating, Anthropic naturally had to reciprocate, directly participating in the design work of Amazon's Trainium3 chip.

Previously, AI companies could only train models based on whatever chips were available from chipmakers. Now, it's reversed—AI companies are teaching cloud vendors how to design chips.

The question arises: Why does Amazon insist on making its own chips? Is it really to challenge NVIDIA, or is it just to raise AWS’s profit margin a little higher?

At its core, Amazon’s chip-making endeavor may appear to be a hardware story, but if you look deeper, it’s still Amazon’s platform logic—turning others' growth into its own infrastructure revenue.

A high-stakes gamble that began in 2015

Annapurna Labs was founded in 2011 and then they remained in stealth mode.

This company was founded by three engineers who adopted a very rebellious design philosophy, working backwards from cloud computing, data traffic, and actual needs to design chips. This reverse thinking approach happens to be completely aligned with Amazon's way of working.

In early 2015, Amazon acquired Annapurna Labs, an Israeli chip venture, for $350 million.

There was no press release, and in Amazon’s financial report, there was only a brief statement. At the time, everyone thought this was just another insignificant investment among Amazon's many ventures.

After the acquisition, Amazon did not rush to launch products. The Annapurna Labs team first participated in AWS-related projects, and later began hardware design for the underlying Nitro System.

Nitro System is a set of cloud infrastructure architecture at the bottom layer of AWS. It takes over many tasks originally handled by the server's main CPU and traditional virtualization software, delegating them to specialized hardware.

Annapurna Labs was responsible for designing this hardware.

To be fair, this was just small-scale experimentation, or you could directly interpret it as a team-building exercise for Annapurna Labs before fully integrating into the Amazon family. The real story begins in 2018.

Three years later, AWS launched Inferentia, a machine learning chip specifically designed for inference tasks.

Amazon believes that compared to training, inference tasks place slightly lower demands on chips and are easier to tackle. This is a common perspective, and many domestic AI chip manufacturers share the same idea.

In 2019, the Inf1 instance equipped with Inferentia chips officially launched.

So how does this thing perform? Amazon previously had a set of cloud servers powered by NVIDIA T4 GPUs, primarily used for graphics rendering, video processing, and machine learning inference.

Inf1 delivers up to three times the throughput of G4, with inference costs reduced by up to 40% per operation.

However, Inf1 did not gain much market attention. The reason lies in its optimization specifically for inference scenarios, making it highly suitable for recommendation systems, image recognition, speech recognition, and NLP inference.

The issue is that Amazon's entire chip ecosystem is still in its early stages. To use Inf1, you must rely on Amazon’s own AWS Neuron SDK.

Although this SDK framework supports TensorFlow, PyTorch, and MXNet, its compatibility and maturity fall far short of NVIDIA’s CUDA. Additionally, early compiler restrictions were significant, imposing constraints on control flow, model size, BERT sequence length, and more.

Thus, Inf1 feels more like a concept product, serving as a proof-of-concept for the market, and Amazon itself is well aware of this.

Nevertheless, Inf1 exceeded Amazon's expectations, prompting the company to move forward aggressively.

In 2021, AWS officially released its first customer-facing chip for training AI models, Trainium. The technical difficulty of training chips is much higher than that of inference chips. The first-generation Trainium, based on a 7-nanometer process, contains approximately 55 billion transistors and began providing computing power for EC2 Trn1 instances in 2022.

Amazon stated that, under specific workloads, the per-token cost of Trainium is 54% lower than A100 clusters. For GPT-like models, Trainium's throughput is comparable to A100, but at about half the cost.

At the end of 2023, Amazon unveiled the second-generation Trainium2 chip at the re:Invent conference. This chip is manufactured using a 5-nanometer process, with four times the number of computing cores compared to the first generation, delivering a fourfold increase in training speed and significantly expanded memory capacity.

Trainium2 has been specifically optimized for generative AI training, supporting structured sparsity, enabling more efficient handling of large language model training tasks. Compared to similar cloud configurations based on H200/H100, its price-performance can improve by an additional 30% to 40%.

In December 2024, Amazon announced the next-generation Trainium3 chip for the first time at the re:Invent conference. This is AWS's first AI chip built using a 3-nanometer process.

By the end of 2025, Trainium3 will be officially integrated into the Trn3 UltraServer, with each server equipped with 144 chips, delivering a total computing power of 362 petaflops. These servers utilize liquid cooling technology, improving energy efficiency by approximately four times compared to the previous generation.

AWS claims that in certain training and inference scenarios, customers can reduce costs to about half of NVIDIA’s GPU solutions.

From 2015 to 2026, Amazon's investment in the chip business has been incremental, meaning the later the stage, the larger the investment.

In 2025, Amazon's capital expenditure reached approximately $125 billion, with the majority allocated to data centers, electricity, and chips required for AI.

In 2026, this figure is expected to reach $200 billion, nearly 40% higher than analyst expectations and surpassing Google’s announced cap of $185 billion.

Why are Amazon's chips selling?

Money shouldn't be spent without earning.

The previous article mentioned that Amazon's chip business has achieved an annualized revenue exceeding 20 billion US dollars, which includes the total income from Graviton processors, Trainium training chips, and Nitro networking chips.

The letter also revealed that if the chip business were to operate as an independent company, similar to NVIDIA or Intel, directly selling chips to third parties, its annualized revenue could reach 50 billion US dollars.

Graviton is essentially an Arm server CPU developed by Annapurna Labs, designed to replace traditional x86 processors like those from Intel and AMD. It handles web services, databases, containers, and various enterprise applications.

You may ask, why do we need this? Can’t I just use Intel’s CPU?

A large portion of workloads on AWS don’t actually require GPUs or AI chips. What they rely on is the most basic, stable, and long-term server CPU computing power.

For these common workloads, Graviton is cheaper, more energy-efficient, and easier to deploy at scale.

Currently, Amazon’s chips are mainly provided to customers via AWS in a rental format rather than through direct hardware sales. Customers purchase EC2 instance computing power, which may be backed by Graviton, Trainium, or Inferentia chips.

This business model differs entirely from traditional chip manufacturers and is somewhat similar to some online individual businesses renting out graphics cards.

From later results, it became evident that Graviton indeed became Amazon’s first self-developed chip product to successfully close the commercial loop. Unlike Trainium, it doesn’t require customers to rewrite significant portions of their training processes, and unlike Inferentia, it isn’t heavily reliant on specific inference scenarios.

Among the top 1,000 largest customers of AWS's Elastic Compute Cloud products, more than 90% are using Graviton chips. AWS also disclosed that over 50,000 customers are using Graviton. Well-known companies such as Apple, SAP, Pinterest, and Datadog are users of Graviton.

As the saying goes, even a hero can be stumped by a penny. Many companies migrate to Graviton because it is affordable, stable, and has low migration costs.

Graviton first helped Amazon prove one thing: as long as the price is low, customers don’t care what kind of chip they are using.

Once this was proven, Trainium and Inferentia had real confidence to continue their stories.

Trainium and Inferentia have relatively fewer customers, with Anthropic being their largest client.

At the end of 2024, Anthropic announced Project Rainier, which will use a computing cluster consisting of nearly 500,000 Trainium2 chips to train the Claude model, as I mentioned at the beginning of the article.

This cluster went into operation in 2025 and was one of the largest machine learning training clusters in the world at the time, with a computing power more than five times greater than Anthropic’s previously used clusters.

In 2025, OpenAI first reached a long-term cloud cooperation commitment with AWS worth $38 billion. By February 2026, Amazon announced a $50 billion investment in OpenAI and confirmed that OpenAI would consume approximately 2 gigawatts of Trainium computing capacity through AWS infrastructure.

Considering that Anthropic and Amazon’s own Bedrock service were already using a large number of Trainium chips, it was impressive that Amazon could still take on OpenAI’s massive order, showing that Amazon had fully committed to chips at that point.

In addition, there is Apple, whose search product uses Graviton 4 and Inferentia 2, improving machine learning inference workloads' efficiency by more than 40%. Apple is also testing Trainium 2 in its early stages, with preliminary results showing that pre-training models on Trainium 2 will increase efficiency by 50%.

However, just as Amazon thought it was about to perfect its chip capabilities, someone threw a bucket of cold water on them.

In July 2025, an internal Amazon document labeled 'Confidential' revealed that AI startup Cohere found the performance of Trainium 1 and Trainium 2 chips to be 'inferior' to NVIDIA's H100 GPU.

Stability AI, the image generation company behind Stable Diffusion, also reached a similar conclusion, stating that Trainium 2 performed poorly in terms of latency, making it 'less competitive' in speed and cost.

Tests conducted by AI Singapore, an AI research institute in Singapore, showed that AWS G6 servers equipped with NVIDIA GPUs outperformed Inferentia 2 in cost efficiency across multiple use cases.

Amazon responded by saying that this feedback does 'not reflect the current situation,' and that Trainium and Inferentia have achieved 'outstanding results' with clients like Ricoh, Datadog, and Metagenomi.

Cloud giant builds chips

Despite this, demand is still growing rapidly.

Amazon's shareholder letter revealed that two major AWS clients had requested to purchase all Graviton instance capacity for 2026, but Amazon declined these large orders due to the need to accommodate other client demands.

AWS added 3.9 gigawatts of power capacity in 2025, with total power capacity expected to double by the end of 2027.

Amazon’s journey in chip development, from a discreet acquisition in 2015 to becoming a business generating $20 billion in annual revenue by 2026, took 11 years.

If you insist on talking about how high or fast this growth is, that may not necessarily be the case, but at least it's acceptable. The question is, how far can this path go, and can it truly replicate the success of AWS?

The core logic behind Amazon's chip development is simple: reduce costs and increase profit margins. However, whether this logic holds depends on three questions: Are the chips really cheaper? Are customers willing to pay the migration costs for this? And how long will it take to recoup the investment?

First, customers need to adapt their code using AWS’s Neuron SDK.

Engineers at Anthropic revealed that migrating the training process to Trainium3 takes about three weeks, which is significantly shorter than the months required for earlier generations of custom chips, but it still represents a substantial engineering investment.

Secondly, not all model architectures can run on Trainium.

Some architectures require CUDA for specific operations, and some companies are well-funded and demand maximum computing power, in which case they can only use NVIDIA’s products.

If we broaden our perspective a bit, you'll find that Amazon’s obsession with chips is actually related to its own awkward position in the AI battlefield.

Let me give you an example. If I ask you what Amazon's model is, could you answer immediately without searching?

Amazon hasn’t neglected model development; on the contrary, it started building its Titan model family early on, later introducing Nova to fill gaps in text, image, and video generation capabilities.

Further, there are various AI products covering consumer to enterprise ends, such as Bedrock, Q, and the upgraded Alexa.

The problem is that Amazon has been involved in every step, but none of these efforts have satisfied customers. This has caused Amazon to fall behind in the AI industry.

In addition to Amazon, other cloud giants are also reshaping the AI chip market.

Google's TPU has already iterated to its fifth generation, and Microsoft is developing its own AI chip, Maia.

Although Google does not directly sell TPUs, it at least built a cloud platform to provide TPU computing power for external users. On the other hand, Microsoft has never released Maia to the public.

However, the current status of Maia is far from optimistic.

In 2023, Microsoft originally planned to use Maia 100 to run large models like OpenAI and Copilot, but when the product was just deployed, it was found that the computing power of Maia 100 was far from enough to support ChatGPT at that time, so OpenAI had no choice but to turn to NVIDIA.

In 2025, Microsoft’s next-generation Maia was reported to face production difficulties, causing its mass production schedule to be delayed from 2025 to 2026. Reasons include mid-process design changes, team turnover, and engineering setbacks. Moreover, according to foreign media reports at the time, the performance of the new Maia could not match NVIDIA’s newly released Blackwell.

By January 2026, Microsoft finally unveiled Maia 200, built on 3nm process technology. However, it is positioned for inference tasks and cannot be used to train large models like NVIDIA’s GPUs.

But by then, the market was already flooded with alternative products. Microsoft not only failed to lead technologically, but also lagged behind Amazon and Google in terms of production capacity and deployment speed. This caused Microsoft’s Maia to fail to make any significant impact.

The future AI chip market may split into two tiers: one dominated by NVIDIA and AMD in the general-purpose market, and another consisting of closed ecosystems controlled by various cloud giants.

For startups and small to medium-sized enterprises, choosing a cloud platform means committing to the underlying chip architecture. If they become heavily reliant on Amazon's Trainium, the cost of migrating to another platform in the future will be extremely high.

Amazon fell behind before, so it wants to lock in the future of small and medium-sized businesses, potentially nurturing a few companies like OpenAI or Anthropic on AWS.

From another perspective, the proprietary chips developed by cloud giants are also driving progress across the entire industry. One reason NVIDIA has been able to maintain high profit margins is the lack of effective competition.

When Amazon, Google, and Microsoft start making their own chips, it will push NVIDIA to lower prices and speed up iterations.

In the end, the entire AI industry benefits.

Whether Amazon’s chip development can become the next AWS depends on how you define 'success.'

If success means creating an entirely new industry and transforming the entire tech ecosystem like AWS did, then clearly, it cannot become the next AWS. The chip industry has existed for decades, and Amazon isn’t creating a new market but rather reallocating shares within the existing market.

However, if success means building a sustainable and competitive business that provides AWS with cost advantages and strategic control, then Amazon has already made significant progress down this path.

An annualized revenue of $20 billion, over 90% adoption by top-tier customers, and landmark cases like Anthropic and OpenAI are all proof enough of Trainium’s success.

More importantly, when you own the full stack from chips to data centers to software platforms, you can optimize end-to-end for specific workloads, something that simply isn’t possible with off-the-shelf chips.

In this sense, the story of Amazon making chips is not about how much money it can make, but about who holds the control.

In the AI era, computing power is the new oil, and whoever controls the production and distribution of computing power will hold the future.

Amazon doesn't want to hand over this control entirely to NVIDIA, just as it didn't want to give Intel control over cloud infrastructure back in the day.

Even if, in the end, Trainium cannot stand shoulder to shoulder with NVIDIA, it has already proven that cloud giants have the ability to challenge the monopoly of chip giants. That in itself is a kind of success. $Amazon (AMZN.US)$$Warren Buffett Portfolio (LIST2999.US)$$Star Tech Companies (LIST2518.US)$$AI Chip (LIST2548.US)$$Semiconductors (LIST22912.HK)$$NVIDIA (NVDA.US)$$Virtual Reality (LIST2139.US)$$Metaverse (LIST2567.US)$$Alphabet-C (GOOG.US)$$Alphabet-A (GOOGL.US)$

59K Views