Embodied intelligence, a future scenario that once remained in laboratories, is crashing into reality at a speed beyond everyone's expectations.
Morgan Stanley predicts,By 2050, the global market size for embodied intelligence will reach 5 trillion US dollars. The Chinese market, estimated by the Development Research Center of the State Council, will exceed one trillion yuan by 2035.

However, on this trillion-dollar track, one issue is becoming the biggest hidden danger: the data desert.
Real machine interaction data in the physical AI domain is currently only one hundred thousandth of the training data for large language models. It’s not due to lack of demand but rather extreme scarcity of supply: inconsistent formats, uneven quality, and absurdly high collection costs. A large number of embodied intelligence teams are stuck at the same threshold:Without sufficient high-quality data, model training remains an empty talk.
This is an industry-recognized core bottleneck, yet no one has systematically addressed it from the infrastructure level.
Until Mifeng Technology appeared.
One-stop physical AI data services, filling the industry gap
April 16, 2026, Zhangjiang Science Hall, Shanghai.
With the theme 'Bees Travel the World, Data Drives Intelligence,' Mifeng Technology officially debuted.This is a company positioned as a 'globally leading one-stop physical AI data service platform.' It's not just a data supplier but aims to become the infrastructure for embodied intelligence data.

The difference has already emerged here.
Traditional data companies sell datasets, which are one-time deliverables. What Mifeng wants to do,is to make high-quality physical AI data 'as readily available as water and electricity.'A supply system that covers real machine teleoperation, non-body collection, and full paradigms of simulation data, integrating hardware, software, platforms, and operations across the entire chain to achieve systematic, standardized, and scalable data supply.
This is a precise response to three major pain points in the industry:
Data scarcity: Genuine machine interaction data is extremely scarce, and the current volume of data cannot support large-scale model training;
Inconsistent standards: Non-unified formats and annotation protocols, with a large amount of 'dirty data,' leading to very low reuse rates;
Mismatch between supply and demand: Demand-side players cannot access high-quality data, while supply-side players have resources but struggle to monetize them, resulting in a severely fragmented market.
Yao Maoqing, Chairman and CEO of Mifeng Technology, made a key statement at the press conference:The year 2026 will be the inaugural year for embodied intelligence data. The industry has moved past the technical validation phase, and the core competition going forward will focus on the efficiency of data collection and transformation.

This assessment is backed by solid numbers.
The AgiBot World dataset previously open-sourced by Mifeng Technologyhas been downloaded over 1.1 million times cumulatively on Hugging Face, received 29,000 GitHub Stars, and has been adopted by top global institutions such as MIT, Tsinghua, Berkeley, and Harvard.More notably, Mifeng has become the core data provider for NVIDIA's GR00T series models, supplying 80% of their pre-training real machine data, making it the largest provider of real machine data for embodied systems globally.
This is not a newly established startup but a team that has already secured a central position within the industry and is now working on something even bigger.
MEgo Series: Redefining the Starting Point for Data Collection
The most anticipated part of the launch event was the global debut of MEgo Series hardware for non-body-based data collection.
In the past, physical AI data collection heavily relied on robot bodies, requiring customized robotic arms, specialized sensors, fixed workstations, high equipment costs, lengthy deployment cycles, and was constrained by body shapes, making it fundamentally incapable of covering complex, dynamic, and unstructured full-scene interactions in the real world.
The result is low data collection efficiency, narrow application scenarios, and poor reusability.

The logic behind the MEgo series is quite intriguing:Let data follow humansThe name MEgo itself serves as a declaration: ME represents 'human-centric,' go symbolizes 'portability, lightweight, and boundless,' while the fusion of M (Mifeng) and Ego (first-person perspective) embodies full-chain data governance capability.
Collect data wherever you go.

MEgo Gripper is the core data collection terminal of this system. With an ultra-lightweight design of just 480 grams, it incorporates millimeter-level trajectory reconstruction technology, achieving an operational trajectory restoration accuracy of 1mm. Sub-millisecond global time synchronization ensures precise alignment of multimodal data such as vision, touch, and posture.
A 200° fisheye lens paired with a three-dimensional tactile array outputs comprehensive interaction data including visual, depth, IMU, motion trajectory, multidimensional tactile feedback, and gripper status. Wi-Fi 6 high-speed transmission guarantees both efficiency and precision.

MEgo View is the industry's first all-scenario, all-perspective, multimodal spatial perception data collection terminal. The core design features a dual-perspective collection solution combining 'over 300° panoramic awareness + wrist interaction close-ups': the head-mounted camera covers an ultra-wide 300° environment, while the wrist camera accurately captures hand operation details, supporting full-channel 1080P 60fps HD video streaming.
Sub-millisecond wireless time synchronization and hardware-level precision triggering technology achieve complete alignment of multi-sensor data in both time and space dimensions.This has thoroughly resolved the long-standing industry challenge of spatiotemporal unification of multi-perspective data from the physical world.

More significantly, the MEgo series and Zhiyuan Elf G2 Air featurenative isomorphic design.Models trained on consistent data from MEgo's isomorphic sensors and grippers can be seamlessly deployed to G2 Air, quickly enabling autonomous robotic operations.
Collected data serves as training data, and training data becomes deployment data. There is no information loss between these three steps.
Another product, MEgo Engine, is a data governance engine that bridges the 'last mile.' It covers full-process automation from raw data to training data: multi-source time alignment and intelligent filtering, 6D trajectory reconstruction and spatial awareness rebuilding, quality verification through multi-entity playback and smart scoring, and an automated labeling capability that boosts traditional manual annotation efficiency by over 10 times. One-click upload of collected data outputs standardized datasets ready for model training.
Data collection, governance, training, and deployment—Mifeng has successfully connected this pipeline for the first time.

Hive Co-creation Initiative: The starting point of an ecosystem or a restructuring of an industry?
Technology and products are only half of the story. The other half is the ecosystem.
At the launch event, Mifeng Technology, in collaboration with Shanghai Electric Technology Group, the National Data Standards Committee, and the Ministry of Industry and Information Technology's CCID Research Institute, jointly initiated the global launch of the 'Honeycomb Data Co-Creation Initiative.' Dozens of domestic and overseas institutions, including the Beijing Humanoid Robotics Innovation Center, Shanghai National Land Center, Lingchu Intelligence, Pasini Perception Technology, Daxiao Robotics, Wujie Intelligent Aviation, Qingzhi Jiachuang, Aio Intelligence, and Liulan Digital Intelligence, became the first responding organizations.

The goal of this initiative is to break down data silos, unify data standards, connect global supply and demand, and create an open and efficient physical AI data circulation network.
The participation of the National Data Standards Committee and the Ministry of Industry and Information Technology's CCID Research Institute means that this initiative has a voice at the standard-setting level. The format protocols, quality benchmarks, and delivery standards for physical AI data may be systematically established here in the future.
On the same day, Mifeng completed strategic agreements with JD.com Cloud, Baidu Cloud, Alibaba Cloud, Liepin, Guizhou Big Data Group, and Zhangjiang Group, covering multiple dimensions such as data ecosystems, scenario collaboration, computing power support, and talent development.
The involvement of these partners indicates that Mifeng's data service capabilities are becoming complementary to the primary cloud computing infrastructure in the country.Where there is data, there is computing power; where there is computing power, there is the closed loop for model training.
Zhu Zongyao, Director of the Guizhou Provincial Big Data Development Administration, delivered a speech at the launch event, affirming Mifeng's benchmark significance in data standardization and ecosystem construction. This detail is noteworthy: Guizhou is one of the most important policy highlands for China’s big data industry, and government-level recognition of the Mifeng model suggests that this path for building physical AI data infrastructure may receive more policy resource support.
Mifeng has set a target for 2030:Billion-hour-level data production capacity, co-building the world's largest physical AI data ecosystem.

This figure sounds large today, but when viewed in the context of the industry, embodied intelligence requires this magnitude of data foundation to scale from labs to real-world applications: tens of millions of hours of annual production capacity by 2026, and reaching billions of hours by 2030.
During the roundtable forum that day, Zhu Zheng, co-founder and chief scientist of Jiajia Vision, Xie Chen, CEO of Lightwheel Intelligence, Fan Haoqiang, co-founder of Original Force, Yao Guocai, head of embodied data at Zhiyuan Research Institute, and Zhang Minying, senior algorithm expert at Alibaba Cloud, collectively made a judgment:The core competition for embodied intelligence lies in the efficiency of data collection and transformation. It is expected that by the end of 2026, the entire industry's effective data volume will exceed tens of millions of hours, laying a solid foundation for the scaled implementation of embodied intelligence.
This marks the first time the industry has begun discussing topics at the level of 'data infrastructure.'
In conclusion
In the history of AI development, every leap from laboratory to industry has required the establishment of a critical infrastructure layer.
In the era of large language models, this infrastructure was the massive crawling and cleaning system for internet text data. In the era of embodied intelligence, this infrastructure is the collection, governance, and circulation system for real-world interactive data from physical devices.
What Mifeng Technology is doing is attempting to build an industry-level solution at this level.
The MEgo series has lowered the threshold for data collection to the ground, the Hive Co-creation Initiative has expanded the ecosystem globally, and the one-stop platform has streamlined the entire process from raw data to training data.
This is not just Mifeng Technology’s story; it also marks the beginning of a systematic establishment of industrial infrastructure. Once such infrastructure is built, the competitive moat often becomes incredibly deep.
Risk Disclaimer: The above content only represents the author's view. It does not represent any position or investment advice of Futu. Futu makes no representation or warranty.Read more
Comments
to post a comment
