Focus on GTC 2026! What signals did Jensen Huang's speech send?

On March 16, Jensen Huang gave a set of figures in his keynote speech at GTC 2026: by 2027, NVIDIA will meet at least one trillion dollars of AI computing demand. A year ago, his forecast was still five hundred billion dollars. From ChatGPT initiating generative AI to the implementation of reasoning models and AI agents, the structure of AI computing demand is undergoing a fundamental shift. The full record of this speech might help us understand the logic behind this trillion-dollar figure.
The 'crazy' decision made twenty years ago has today grown into an ecosystem
In 2006, NVIDIA made a decision that was not understood at the time: to integrate CUDA into every GeForce graphics card. This decision almost bet the entire profit of the company, but what it received was skepticism from Wall Street: why would a gaming graphics card company want to build a computing platform?
Twenty years later, hundreds of millions of GPUs are running CUDA. The accumulation process of this ecosystem has been linear: the installed base attracts developers, developers create new algorithms, new algorithms open up new markets, and new markets in turn expand the installed base. As Jensen Huang put it, this is an 'ecosystem flywheel,' and it has been spinning for twenty years to reach its current scale.

"The download volume of the CUDA library is growing rapidly now, larger than ever before," Jensen Huang mentioned in his speech. This means that even the Ampere architecture introduced six years ago is now seeing rising prices in the cloud, not because of hardware scarcity, but because it can still run current applications effectively.
90% of data that could only be stored in the past can finally be utilized now.
Jensen Huang explained the source of this trillion-dollar figure with a slide: structured data + generative AI.
Most of the data we deal with daily is in SQL or Excel, structured tables. But such data accounts for only about 10% of the world's total data. The remaining 90% consists of unstructured PDF files, lecture videos, voice recordings, and images. For the past two decades, these data were merely stored due to the lack of indexing and querying tools.

NVIDIA's approach is to provide two foundational libraries: cuDF for accelerating structured data processing, and cuVS for accelerating vector databases and semantic search. Together, they allow companies to query previously inaccessible unstructured data using generative AI methods.
Nestlé's case was cited as evidence: after GPU-accelerated data processing, speed increased fivefold, and costs dropped by 83%. This is not an isolated case; Jensen Huang believes this model will repeatedly emerge in industries like healthcare, finance, manufacturing, and retail.

From 'being able to understand' to 'knowing how to work,' AI has taken three steps.

Over the past two years, AI has undergone three technical changes:
The first time, generative AI. Starting with ChatGPT, AI has shifted from understanding information to generating content. The interaction mode of software has evolved from retrieval-based to generative, marking a shift in the computing paradigm.
The second time, reasoning models. Models like O1 and O3 have started to possess logical reasoning capabilities, enabling them to plan and break down problems. This makes AI's responses more fact-based reasoning rather than probabilistic combinations.
The third time, AI agents. Represented by Claude Code, AI has begun executing multi-step tasks: reading files, writing code, compiling programs, running tests, evaluating results, and iterating improvements. Jensen Huang revealed that currently, 100% of NVIDIA engineers are using AI-assisted programming.
From understanding to generation, from generation to reasoning, and then to task execution—these three changes combined form the fundamental driver of AI computing demand growth.
The turning point has arrived: AI has started 'working,' and computational demand has increased a millionfold.

Image source: ai.lanrenbao
Based on these changes, Jensen Huang’s assessment is that the tipping point for reasoning has arrived.
In recent years, the main computational load for AI came from the training phase, feeding models with massive amounts of data. But now things have changed: every piece of content generated by AI, every inference executed, and every task completed requires computational resources. Reasoning is replacing training as the primary source of computational demand.
Jensen Huang provided a set of figures: over the past two years, computational demand has increased ten thousand times. His personal feeling is that overall demand has surged a million times. This explains why GPUs have been consistently in short supply. It's not because everyone is training large models; it’s because AI has started actually 'working'—writing code, reading documents, making plans, and executing tasks. The consumption of computing power in these reasoning scenarios is rapidly surpassing that of the training phase.
Seven chips combine into one system, specifically designed for reasoning scenarios.
To meet the computational demands of inference scenarios, NVIDIA has launched a new AI computing platform called Vera Rubin. This is not a single chip but a system composed of seven chips, covering computing, networking, and storage.
The core Rubin GPU is manufactured using TSMC's 3nm process, with a dual-chip package and 336 billion transistors. It is equipped with 288GB of HBM4 memory and has a bandwidth of 22TB/s. The accompanying Vera CPU is an 88-core custom Arm architecture, which for the first time adopts LPDDR5X memory in data centers, optimized primarily for agent-based inference scenarios.
Notably, NVIDIA has collaborated with Groq. Groq's 3 LPU chips utilize a dataflow architecture and are entirely SRAM-based on the chip, making them suitable for low-latency token generation scenarios. The combination logic is as follows: Rubin handles the pre-fill phase requiring massive computational power, while Groq manages the latency-sensitive decoding phase. After integration through Dynamo software, the inference performance of this system increased by 35 times.

Optical modules are directly soldered onto the chips; the era of copper cables is passing.
Jensen Huang confirmed in his speech that 2026 will be the commercial debut year for silicon photonics technology. NVIDIA released the world’s first Spectrum-X Ethernet Switch with CPO (Co-Packaged Optics), integrating optical modules directly into the chip packaging. This technology, co-developed with TSMC, consumes about 5% of the energy compared to traditional copper cables, increases bandwidth density tenfold, and reduces transmission energy consumption by 90%.

In hyperscale AI clusters, interconnect energy consumption and latency have always been bottlenecks. The implementation of CPO means there is now a solution to this problem. Jensen Huang was very direct in his statement: 'We need to do copper, we need to do optics, and we need to do CPO—every one of these requires more production capacity.'
Computational power has been sent into space, eliminating the need for satellites to transmit data back to Earth.

One unexpected announcement in this speech was the “Space-1 Vera Rubin Module,” an orbital data center.
The logic is simple: Satellites capture data in space and send it back to Earth for analysis, which can take several hours round-trip. If computational power is sent into space, satellites can perform analysis and calculations in orbit and only transmit the results back to Earth. The Space-1 Vera Rubin's AI computational power is 25 times that of the H100, allowing large language models and foundational models to run directly in space. This represents a structural change for real-time monitoring and global IoT applications.
Of course, space lacks convection and conduction, relying solely on radiation, so heat dissipation is an engineering challenge. What NVIDIA can provide is the computational module; the cooling issue needs to be addressed by aerospace systems engineers.
What will chips look like in 2028? Jensen Huang has given us a sneak preview

As per tradition, Jensen Huang disclosed the next-generation architecture ahead of time, named Feynman after physicist Richard Feynman
Feynman is manufactured using Taiwan Semiconductor's A16 (1.6nm) process and represents the world’s first mass-produced AI chip architecture to enter the 1-nanometer era. The GPU will feature customized HBM memory, moving away from the standard HBM specifications. Additionally, the accompanying Rosa CPU is designed as the orchestration hub for AI agents. Another notable change is the adoption of 3D stacking: the LPU is directly integrated onto the GPU core, drastically reducing data transmission distances. The inference performance of Feynman is projected to be five times that of Blackwell, with an energy efficiency ratio 3.2 times greater than its predecessor. Jensen Huang confirmed that all seven components of the Feynman platform will be upgraded, with mass production scheduled for 2028
Data centers are no longer warehouses but factories producing tokens
A noteworthy part of the presentation was Jensen Huang's explanation of 'Token Factory Economics.' He defined future data centers as 'factories producing tokens' — no longer storage facilities for files, but assembly lines that process input data into output results

Based on this definition, AI services can be divided into five commercial tiers: Free Tier, Mid-Tier (approximately $3 per million tokens), Advanced Tier (around $6), High-Speed Tier (about $45), and Ultra-High-Speed Tier (roughly $150). As models grow larger and contexts become longer, AI becomes smarter, but the token generation rate decreases
Therefore, token throughput per watt directly determines a company’s production costs. Jensen Huang calculated that building a 1GW data center, even if left empty, would incur an amortized cost of up to $40 billion over 15 years. Under this cost structure, only by operating the most powerful computer systems can enterprises achieve the lowest token production costs
OpenClaw, like Windows, may usher in a new era

Jensen Huang spent considerable time discussing the recent popular OpenClaw project
OpenClaw is a personal AI agent capable of invoking large models, accessing tools and file systems, breaking down tasks, and generating sub-agents. Its Star count on GitHub has grown rapidly, surpassing the accumulation Linux achieved over thirty years. Jensen Huang likened OpenClaw to 'the operating system for intelligent computing,' comparing it to how Windows ushered in the PC era; OpenClaw might initiate the era of personal agents

Based on this assessment, NVIDIA has introduced the NemoClaw reference design, which integrates enterprise-grade security, privacy protection, and a policy engine, with installation requiring only two lines of commands. Jensen Huang’s prediction is: 'All SaaS companies will disappear.' In the future, most SaaS will evolve into AaaS (Agentic as a Service).
90% of data that could only be stored in the past can finally be utilized now.
This conference showcased 110 robots, covering scenarios such as mobility, industry, healthcare, and scientific research. Jensen Huang’s assessment of the robotics market is that it may reach a scale of fifty trillion US dollars in the future.

NVIDIA's layout in this field has been ongoing for a decade, building three foundational computing platforms: training, simulation, and control. Currently, most robotics companies worldwide are using NVIDIA’s technology. Autonomous driving is one of its subsets. NVIDIA announced collaborations with BYD, Geely, Nissan, Hyundai, and other automakers to develop Level 4 autonomous driving based on the DRIVE Hyperion platform. The partnership with Uber is also expanding, with plans to launch an autonomous fleet powered by NVIDIA DRIVE AV software in 28 cities by 2028.

Summary
The focus of AI computing is shifting from training to inference. Over the past two years, three technological shifts—generative AI, inference models, and AI agents—have successively taken root, each reshaping the demand structure for computing power. Jensen Huang’s trillion-dollar prediction is based on this assessment.
Whether this figure is accurate does not depend on how much computing power NVIDIA’s chips can stack, but rather on a more fundamental question: Can AI genuinely enter more practical production scenarios over the next three years—writing code, reading contracts, providing customer service, driving cars, and controlling robots.

Image source: NVIDIA
From a product roadmap perspective, NVIDIA’s strategy is clear: Vera Rubin is set to launch this year, Feynman will enter mass production in 2028, silicon photonics and CPO are starting commercial use, and space computing and robotics are advancing simultaneously. From chips to systems, from ground to space, it covers as many technical nodes as possible. However, technology supply is only half the story. The real explosion in demand requires AI to transition from being a 'toy' in the lab to becoming a 'tool' on the production line. Whether this transformation will occur is not found in Jensen Huang’s speech but lies in the actual implementation across various industries over the next three years.
About Panda
Pando is a licensed company providing virtual asset management services. As a participant in the digital asset management field, Pando has obtained Type 1, Type 4, and Type 9 licenses issued by the Hong Kong Securities and Futures Commission and is authorized to provide virtual asset-related services. Additionally, Pando has acquired public offering fund qualifications and has launched two actively managed ETF products and two passively managed virtual asset ETF products. Through strategic positioning, Pando has accumulated extensive experience in digital asset management and compliance, offering diversified investment solutions and attracting numerous investors.
<声明>
This content is for reference only. It is neither an invitation nor an offer to buy or sell any securities or other financial instruments. Any information, including facts, opinions, or citations, may be condensed or summarized and is accurate as of the date of writing. Information may change without prior notice, and Pando Limited ('Pando') has no obligation to ensure you are notified of such updates. Investing in products mentioned in this content involves significant risk of loss and may not be suitable for all investors. Valuations may fluctuate, potentially resulting in substantial investment losses. Past performance is not indicative of future results. If an investment is denominated in a currency other than your base currency, exchange rate fluctuations may adversely affect value, price, or income. You should not engage in any investment unless you fully understand the nature of the transaction and the extent of potential losses. If you do not fully understand these risks, you must seek independent advice from your financial advisor. Under no circumstances should this content be interpreted as an express or implied commitment, guarantee, or suggestion by Pando or from Pando that you will profit or limit losses in any way. Investors should note that past performance is not indicative of future results.
Virtual assets are highly speculative and risky investments. Investors should exercise extreme caution when participating in these products. The legal status of virtual assets has not been clearly defined, which may affect the nature and enforceability of investors' rights in such virtual assets. Research reports on virtual assets have not been reviewed by regulatory authorities, and investors cannot benefit from the protection of an investor compensation fund. Virtual assets are not legal tender, and related transactions may be irreversible, meaning losses caused by fraudulent or unintended transactions may not be recoverable. The value of virtual assets stems from market participants' ongoing willingness to exchange them for fiat currency, implying that if the market for a particular virtual asset disappears, its value could be entirely and permanently lost. There is currently no guarantee that virtual assets will continue to be accepted as a means of payment in the future. The volatility and unpredictability of virtual asset prices relative to fiat currencies can lead to significant losses within a short period. Changes in legislation and regulation may also adversely affect the use, storage, transfer, trading, and valuation of virtual assets. Certain virtual asset transactions may only be considered complete once recorded and confirmed by a platform licensed by the Securities and Futures Commission, which may differ from the time the client initiated the transaction. The inherent nature of virtual assets makes them more vulnerable to fraud or cyberattacks. Technical malfunctions could also prevent clients of licensed platforms from executing virtual asset trades.
Phone: +852 3891 3288
Address: Room 1408, Two Exchange Square, 8 Connaught Place, Central, Hong Kong
Email: media@pandofinance.com.hk
Risk Disclaimer: The above content only represents the author's view. It does not represent any position or investment advice of Futu. Futu makes no representation or warranty.Read more
Comments
to post a comment
2
