钛媒体APP

wrote a column ·

I, who can't afford Token, have become part of the下沉market population in the AI era

(This article was authored by Mirror Studio and published by Titanium Media with authorization)

Text by Mirror Studio, Author Huang Yiting, Editor Wang Shan

In 2026, what will be the most 'luxurious' expense for humanity at work? The answer is not buying a top-of-the-line computer or purchasing a few decent outfits, but rather being able to use the world's most advanced AI tools without restrictions and regardless of cost.

This means you won't have to strain your brain optimizing prompts just to control costs, worrying about the message 'Today's free quota has been used up'; nor will you need to hesitate repeatedly, unwilling to overwork your beloved Claude (a large language model developed by Anthropic, an American AI company), leaving less important tasks to cheaper, lighter models.

AI is certainly useful, but every time you use it, there is a corresponding cost. The consumption of Tokens (word units) has become so expensive that you can barely afford it. Being meticulous and cautious has become the true state of today’s AI 'workhorses'.

This brings to mind two decades ago, during the era of dial-up internet. At that time, bandwidth was scarce and expensive, so developers compressed images and simplified code as much as possible to save on website bandwidth usage, almost never daring to upload videos. Ventures like Tudou, operating in the video space, were rare, with bandwidth consumption from videos becoming a major operational cost for websites.

Yesterday once more.

In the AI industrial chain, computing power flows downward like water. Starting from upstream GPUs (graphics processing units) and data centers, passing through cloud providers and model vendors, it gets encapsulated into API (application programming interface) interfaces, eventually reaching developers and end-users, turning into specific calls and countable Tokens. Though seemingly intangible, each link corresponds to clear costs, such as GPU depreciation, electricity consumption, and high-bandwidth storage, ultimately aggregating into bills.

Now, this pipeline is becoming congested. On one side, demand is exploding, with complex reasoning scenarios like multimodal and Agent (intelligent agents) causing Token consumption to increase a thousandfold. On the other side, supply remains constrained, as GPUs, HBM (high-bandwidth memory), electricity, and data center construction all face physical limits, with GPU utilization still relatively low. Intelligence comes at a cost; although explosive growth has made the unit price of Tokens cheaper, the amount of money needed to invoke them keeps increasing.

Price hikes are cascading down. Upstream GPUs are in short supply despite high prices and computing shortages; midstream cloud providers are the first to adjust prices. Amazon Web Services, Google Cloud, Baidu Cloud, and Alibaba Cloud have successively raised fees for some AI-related services over the past quarter. Model vendors have also ended subsidy periods, with Tencent and Alibaba halting free public beta tests and raising API call prices. Notably, Tencent’s HunYuan large model saw a maximum price increase of 463%.

The price increases on the model and application sides mean that computing power is no longer an abstract concept exclusive to giant competition—it has become a paid lesson in the form of Tokens for every ordinary person. Just like mobile data back in the day, measured in MBs (mobile data units), users could easily incur unexpected charges leading to service suspension.

Jensen Huang recently proposed the concept of 'Token Economics,' suggesting that inference has become the core workload of AI, and Tokens are the new commodity—standardized, measurable, and tradable. Consequently, Tokens have evolved from a technical byproduct of model training into a core production factor driving the digital economy.

In Jensen Huang's view, 'Token' as a commodity has varying levels of quality. Prices per million tokens range from $0 to $150, depending on the tier, from free to top-tier. Tokens requiring low latency and high interactivity (such as real-time conversations and autonomous driving) demand expensive computing power and are priced higher; whereas tokens for high-throughput, offline processing tasks (like large-scale offline inference and batch data processing) are less sensitive to latency and can be produced using cheaper computing power, hence priced lower.

If tokens have already stratified in value as 'commodities,' what about the people using them? Perhaps in the future, the definition of the 'lower-tier market' population will no longer be limited to whether they can afford physical goods.

"Am I not a valued member?" On the evening of March 11, Su Yu looked at the pop-up window on her computer screen and felt a bit angry. The pop-up reminded her that her Token usage for the week had reached 90% of the limit, and once the limit was exhausted, the use of related models would be suspended until the weekly limit reset.

Su Yu is a doctoral candidate at a certain university and is currently preparing her final dissertation. Over the past three years, Google's Gemini and OpenAI's ChatGPT have been her best companions, and she has been a loyal subscriber to these two 'long-serving AI workers.' In mid-February this year, Anthropic's Claude also joined her toolkit and quickly became one of her most trusted assistants.

‘Claude is incredibly useful, with extremely strong tool-like attributes,’ Su Yu said. She tasked several AI applications with helping her organize and design research frameworks. ChatGPT's responses lacked logical rigor, while Gemini was overly exaggerated and ingratiating. Only Claude, like an objective and professional senior consultant, carefully read through client requirements before delivering a truly usable and inspiring proposal.

After using it for free for over half a month, Su Yu spent approximately 180 RMB to activate a monthly membership for Claude. Compared to Gemini and ChatGPT, Claude's unique feature is that it imposes daily and weekly token consumption limits even for members. This is understandable, as according to LMArena, a globally renowned blind test ranking for large models, as of March 20th, Claude’s main model, Claude-Opus-4-6-thinking, ranked number one worldwide.

But Su Yu had never felt such direct token restrictions before. The first time she triggered Claude’s limit mechanism was on a Wednesday—she could no longer call upon the ‘grounded theory’ halfway through her research, leaving her with a profound sense of academic stagnation. Accustomed to Claude's assistance, she found it difficult to return to her previous state of conducting research. She tried working manually, flipping through original theoretical books, but the efficiency was extremely low, and she didn’t fully trust some translated materials. ‘In the end, I still had to wait for Claude to be available again to double-check everything.’ Those four days felt excruciatingly long.

The limitation on Claude's usage made Su Yu exceptionally anxious. On a Tuesday, Su Yu sent a screenshot of Claude’s backend showing that she had already used up 45% of her weekly quota. ‘It’s only been less than two days this week! I’ve been so careful, discussing just one thesis topic per day, and now it’s reached its limit!’ Su Yu was emotionally overwhelmed. Who says AI can’t replace humans? This AI is almost harder to deal with than her advisor.

Su Yu's Claude backstage. Photo source: Interviewee

She has developed the habit of checking the backend after asking each question, fearing there might not be enough tokens left. Remembering how she once casually chatted with Teacher Claude and asked it to help her make PowerPoint slides, she couldn’t help but scold herself for wasting resources.

This cautious use of 'effective models' is becoming increasingly common. An entrepreneur in the AI film industry told me that his team uses ByteDance’s AI video model 'Dream' alongside APIs from multiple other model providers. 'The better-performing models are indeed more expensive, so we can only switch between different models to balance costs.'

Not long ago, Dream adjusted the membership points quota downward. On one hand, he felt it was normal, "The C-end has always been subsidized, and now it's just reclaiming a portion." But on the other hand, he worried about his own situation, lamenting, "Now it's even less affordable." The rising cost of AI sometimes directly threatens the lifeline of small ventures.

End users are anxious about Token usage, and model vendors are equally anxious about computing power costs.

Discussing the reasons for the surge in Token usage, academician Wang Jian from the Chinese Academy of Engineering previously drew an analogy with the development of electricity. Early artificial intelligence applications were like "lighting a bulb," consuming limited power. However, new-generation applications represented by OpenClaw (intelligent agents) are more like turning on an "air conditioner," requiring increasingly higher amounts of electricity.

However, Wang Jian emphasized that this growth not only signifies widespread adoption but also implies a reduction in the per-unit cost of Tokens. "If electricity prices don't drop, ordinary people won't be able to afford air conditioners."

But compared to the early days of simple question-and-answer calls, now more tasks are being completed through Agents. The model needs to independently break down problems, call tools, write code, debug, and revise again. A seemingly simple request often corresponds to multiple rounds of reasoning and multiple API calls behind the scenes, leading to an exponential increase in Token consumption. Although the unit price has decreased, the overall computational cost required is much higher.

"The models have become larger, and inference costs have correspondingly increased. We also hope to bring them back to their normal commercial value. Long-term reliance on low-price competition isn't beneficial to the industry's development, and that's one of our considerations." Zhipu CEO Zhang Peng said. In the past two months, Zhipu has raised prices three times for its GLM series models (large language models developed by Zhipu), with some models' prices approaching those of leading international models.

Another concern Zhang Peng has is that, "the biggest issue we may face in the next 12 months could be computing power. All these technologies, including intelligent agent frameworks, have boosted many people's creativity and efficiency tenfold. But the precondition is that everyone must be able to use them. It shouldn't happen that due to insufficient computing power, I'm left waiting half a day for an Agent to think through a problem without getting an answer."

According to Claude's calculation method, 100 Tokens is approximately equivalent to 75 English words or 50 Chinese characters, and the price of Token output is five times the input price – this is the simplest conversion method. In other words, every response from the AI must be carefully considered, including background thinking, querying, generating, and even the erroneous consumption caused by model hallucinations, all of which are calculated and eventually turned into real bills.

Lin Zhijia, founder of AGI in the Smart Era, has calculated the numbers. He maintains four 'lobsters,' some deployed locally and others in the cloud. Taking cloud deployment as an example, he purchased a monthly Coding Plan (AI coding subscription service) for about 30 to 40 RMB. With nine days left in March, his Token consumption was still less than 10% of the plan's quota — as a media professional, his demand for Tokens wasn’t particularly high.

But charging based on Token usage doesn’t seem cost-effective. "If I simply have it send me a news update every morning at nine, the Token cost would be around 0.9 RMB per day, totaling over twenty RMB in 30 days, which is almost the same as buying the Coding Plan. There are also losses to consider, and model updates alone might consume Tokens worth three or four RMB."

Balancing between different billing methods has almost become a daily routine for frequent users, and every penny spent on purchasing Tokens ultimately points to the same thing — computing power, and the corresponding depreciation cost of GPUs and data center electricity consumption.

GPUs have become the starting point for everything; the availability of high-end chips determines the system’s upper limit. "Apart from backup machines reserved for some customers, everything else is sold out, not a single card remains," said Liu Hua, Deputy General Manager of UCloud’s Architecture Technology Center.

Below the GPU, data centers, networks, and storage systems need to be built - high-speed interconnection and low-latency transmission are not 'plug-and-play' components. Liu Hua mentioned that the cost of just the network and storage parts could account for about 20% of the total computing power cost.

At the next level are model vendors and API service providers. They deploy large models on these infrastructures, encapsulate them into standardized interfaces for developers to call. In the past two years, these roles have started to overlap; cloud vendors sell computing power as well as provide model APIs, gradually becoming the hub connecting GPUs, models, and developers.

Illustration of how computing power flows. Image source: AI-generated

Computing power permeates layer by layer like this, with the latest changes occurring at the demand end of the industry. "In the past, most AI was paid for by B2B clients, but now B2C payments are becoming more common," said Lin Zhijia. Models are encapsulated into APIs, entry points are simplified, usage thresholds lowered, individual developers or even ordinary users can directly access underlying computing power. "Now, you basically scroll through social platforms and everyone knows how to use it."

There is even a trend towards the retailization of computing power. Around 2024, some cloud vendors began offering GPU 'daily passes,' lightweight cloud hosts, and even 'one-click deployment' trial products. For example, UCloud’s 6.9-yuan trial package aimed at 'shrimp farmers' is essentially like a ticket, bundling complex environment configurations and computing power scheduling to let users try it out at a very low cost. "Many people actually come to 'weed out problems' or try something new," said Liu Hua, "everyone is a bit anxious, fearing being left behind."

However, lowering the threshold does not mean reducing costs. In Liu Hua's view, "If we compare it to the development stages of the Internet, there is no doubt that the current cost of computing power is still in an early, very expensive stage." It is precisely because of this that developers meticulously calculate their expenses, and platforms dare not easily expand the scale of calls.

Even leading companies are making trade-offs. OpenAI's previous shutdown of the video generation project Sora was interpreted by many industry insiders as a balance between computing power and input-output. Under limited resources, they prioritize focusing on core model capabilities and business. Recent adjustments made by major Internet companies such as Alibaba, Tencent, and ByteDance to their AI businesses also revolve around concentrating computing power resources.

Everyone is starting to realize one thing: the future competition will not be about the scale of computing power but its utilization rate. The chain reaction brought by the shortage of computing power is like a prolonged rainy season in the AI era, and every person living in this time cannot avoid getting wet.

Su Yu is currently experimenting with the allocation and scheduling of computing resources.

She categorizes different models by levels: ChatGPT for writing official documents and organizing briefings, Gemini for drawing and handling language details, and Claude specifically used in the most crucial aspects, such as research frameworks, idea design, and long-text analysis. This ensures her efficiency and wallet are maximized.

For instance, she recently processed a batch of interview materials, first letting Claude provide an analytical framework, then handing over this framework to Gemini for initial coding. "I trust Claude more for guiding things, but detailed work can be handed over to cheaper models." If Claude had no quota restrictions, she would even stop using Gemini.

Of course, this is not an advertisement for Claude, Su Yu simply believes that this application suits her needs better. Useful models are becoming scarce, and scarce resources will only be utilized in the most critical areas.

To save further, many users, like Su Yu, have started cutting costs on details.

On social platforms, there was a trend of conversing with AI using classical Chinese, as shorter word counts meant fewer tokens. Some also questioned whether following the trend to say “hello” or “thank you” to AI could be considered an unnecessary waste of resources. After all, AI doesn’t require emotional value.

In fact, much of the waste is beyond the control of users, sometimes it’s due to issues with how models are accessed and operated.

Not long ago, Luo Fuli, head of the MiMo large model team, mentioned, "I can't accurately calculate the losses caused by third-party harness integration, but I've closely observed OpenClaw's context management, which is terrible. In a single user query, it triggers multiple rounds of low-value tool calls, each sent as an independent API request, with each request carrying a context window often exceeding 100K tokens. The actual number of requests is several times that of Claude Code's native framework. Converted into API pricing, the real cost is probably dozens of times the subscription price."

Returning to the usage issue, while users proactively save tokens, platforms also dare not fully expand their user base. This 'self-restraint' based on cost-saving measures, such as what OpenAI is facing, presents a contradiction – in the first half of 2025, it generated $4.3 billion in revenue, while during the same period, the company suffered a net loss of $13.5 billion, meaning that for every dollar OpenAI earns, it loses three dollars. The biggest contributor to the loss is investment in computing power.

At present, computing power is no longer just about availability, but rather whether it can be continuously used and to what extent. When AI becomes sufficiently usable, people will reorganize their work methods around it; when tokens become expensive and limited, this new way of organizing will also be forced to shrink.

If, in the future, computing power cannot truly become as widespread as electricity, then AI will inevitably cause divisions, and cognitive gaps between people will widen further. For instance, Su Yu does not plan to fully share her AI usage methods with those around her. How to interact with Teacher Claude and what kind of data to feed are her little secrets, and in the short term, they are also her competitive edge.

If colleagues ask her to recommend good models, she strongly recommends Gemini and ChatGPT. "Of course, DeepSeek is also a good choice," Su Yu said with a mischievous wink.

In today's world where 'one-person companies (OPC)' and 'super individuals' are gradually becoming popular, such 'little tricks' are not uncommon. When the usability of AI corresponds to countable tokens, what truly sets people apart is how they use it.

(Su Yu mentioned in the article is a pseudonym)

Cover image source:The Editor's Department of Cosmic Exploration

Reference materials

Emergent Intelligence: Yang Zhilin/Zhang Peng/Xia Lixue/Luo Fuli/Huang Chao Discuss Lobsters and 'Token Economics'

Every Day Economic News: AI Drives Massive Token Consumption, Memory Hardware Shortage; Rental Surge for Computing Power Leads Operators to Invest Heavily in Liquid-Cooled Servers. Zhipu Zhang Peng: When Models Are Strong Enough, APIs Themselves Become the Best Business Model

Interface News: Zhipu Stock Price Hits Record High, New Generation Model Raises Prices by Another 10%

DeepTide TechFlow: Token Goes Global, Selling China’s Electricity to the World

Silicon Star Pro: Luo Fuli: Everyone Wake Up, It’s Time to End the Token Frenzy

[Copyright Statement] All content copyright belongs to Mirror Studio. Without written permission, no part may be reproduced, excerpted, or used in any other form, unless otherwise stated.

60K Views