English
Back
Open Account
半导体行业观察
wrote a column · Apr 17 17:55

Single-channel 400G is ready! Full-stack interconnect company ushers in the era of 800G AI super NIC

Recently, QiYi Moore, an AI full-stack interconnect company, announced that it has successfully built an 800G AI super NIC (SNIC) platform architecture.In addition to the high bandwidth of 800Gb/s and ultra-low latency in the sub-microsecond range, its key technologies also cover enhanced RoCE v2 mechanisms for AI networks, including packet spraying, multi-path transmission, efficient retransmission, and advanced programmable congestion control. The AI SNIC ASIC designed based on this self-developed platform architecture has recently completed chip return and successfully passed silicon validation of the core RDMA architecture, with stable single-channel throughput at 400Gbps and key latency around 1 microsecond. Against the backdrop where domestic high-performance NIC public products and industry narratives still mostly focus on the 100/200G RDMA ASIC engine stage,Singularity Moore is achieving a substantial breakthrough with a single-channel 400G RDMA ASIC engine,unveiling the prelude for domestic AI super NICs to rapidly advance towards 800G ASIC.
Recently, QiYi Moore, an AI full-stack interconnect company, announced that it has successfully built an 800G AI super NIC (SNIC) platform architecture.In addition to the high bandwidth of 800Gb/s and ultra-low latency in sub-microseconds, its key technologies also cover enhanced RoCE v2 mechanisms for AI networks, including packet spraying, multi-path transmission, efficient retransmission, and advanced programmable congestion control. The AI SNIC ASIC designed based on this self-developed platform architecture has recently completed chip return and successfully passed silicon validation of the core RDMA architecture, with single-channel throughput stably at 400Gbps and key latency around 1 microsecond. Against the backdrop where most domestic high-performance NIC public products and industry narratives still focus on the 100/200G RDMA ASIC engine stage,QiYi Moore is achieving a substantial breakthrough with the single-channel 400G RDMA ASIC engine,opening the curtain for domestic AI super NICs to rapidly advance towards 800G ASIC. The ASIC designed based on the 800G AI SNIC platform architecture has successfully passed silicon validation of the RDMA architecture, with single-channel throughput stably at 400Gbps. Why an Ethernet-based RDMA AI native NIC? Before the rise of AI networks, the mainstream classification of smart NICs (SmartNIC) in the industry was not focused on AI training and inference scenarios but rather differentiated based on chip architecture and offload capability depth...
The ASIC designed for the 800G AI SNIC platform architecture has successfully passed silicon validation of the RDMA architecture, with stable single-channel throughput at 400Gbps.
Why Ethernet-based RDMA AI Native NIC?
Before the rise of AI networks, the mainstream classification of SmartNICs in the industry was not centered around AI training and inference scenarios but rather differentiated based on chip architecture and offload capability depth. The core focus was on how much infrastructure work they could offload from the CPU, including inter-VM network switching, storage, encryption/decryption, security and telemetry, compression and decompression, and other hardware offload functions. Typical products like Data Processing Unit (DPU) NICs were not specifically optimized for large model cluster communication.
In the large model era, as AI training scales from thousands to tens of thousands of GPUs and inference moves from single-machine deployment to large-scale distributed services, scale-out networks are becoming critical infrastructure that determines GPU utilization, collective communication efficiency, and per-token cost. Whether it's All-Reduce, Reduce-Scatter, or All-to-All and other typical collective communications, they all require networks to have higher bandwidth density, lower latency, lower tail latency, and faster congestion response capabilities. In this context, the high-performance RDMA route based on Ethernet becomes increasingly clear: it inherits the openness, mature deployment, and cost-controllable advantages of the Ethernet ecosystem while gradually gaining the ability to support large-scale AI clusters through RoCE/RDMA-specific optimizations for AI networks.
This technological trend is particularly evident in the technical demand specifications of leading domestic cloud service providers (CSPs) and the emergence of the Ultra Ethernet Consortium (UEC) abroad—to deliver an open, high-performance Ethernet architecture for AI and HPC scenarios, focusing on solving issues faced by traditional Ethernet in large-scale training environments such as multi-path transmission, rapid congestion response, tail latency control, ease of configuration, and scalability. In other words, Ethernet is no longer just a representative of 'general-purpose networking' but is being redefined in the AI era as one of the main channels for large-scale cluster interconnection.
Recently, QiYi Moore, an AI full-stack interconnect company, announced that it has successfully built an 800G AI super NIC (SNIC) platform architecture.In addition to the high bandwidth of 800Gb/s and ultra-low latency in sub-microseconds, its key technologies also cover enhanced RoCE v2 mechanisms for AI networks, including packet spraying, multi-path transmission, efficient retransmission, and advanced programmable congestion control. The AI SNIC ASIC designed based on this self-developed platform architecture has recently completed chip return and successfully passed silicon validation of the core RDMA architecture, with single-channel throughput stably at 400Gbps and key latency around 1 microsecond. Against the backdrop where most domestic high-performance NIC public products and industry narratives still focus on the 100/200G RDMA ASIC engine stage,QiYi Moore is achieving a substantial breakthrough with the single-channel 400G RDMA ASIC engine,opening the curtain for domestic AI super NICs to rapidly advance towards 800G ASIC. The ASIC designed based on the 800G AI SNIC platform architecture has successfully passed silicon validation of the RDMA architecture, with single-channel throughput stably at 400Gbps. Why an Ethernet-based RDMA AI native NIC? Before the rise of AI networks, the mainstream classification of smart NICs (SmartNIC) in the industry was not focused on AI training and inference scenarios but rather differentiated based on chip architecture and offload capability depth...
800G AI SNIC and Function Description
The Singularity Moore AI Super NIC technology roadmap aligns perfectly with the requirements of leading cloud service providers (CSPs) and is highly convergent with UEC standards.Our established 800G platform architecture capability can effectively support up to 800Gb/s RDMA throughput, with the capacity to handle millions of messages and millions of queues (QP). This greatly enhances the RoCE v2 protocol stack by adding features such as packet spraying, out-of-order reassembly, efficient retransmission, and advanced programmable congestion control—features that are urgently needed for AI networks.The Ethernet technology roadmap ensures product openness, interoperability, and ecosystem compatibility, laying the foundation for entering leading cloud service providers and smoothly integrating into the super Ethernet ecosystem in the future, enabling cross-vendor collaboration.” said Ye Dong, VP of Network Technology at SingularMole.
Ye Dong has over 20 years of experience in network interconnect system architecture design and possesses extensive expertise in AI network protocols, RDMA, virtualization, and software protocol stacks. He worked at Intel (China) for many years, serving as the Technical Director of Intel’s Network Interconnect Products Division, responsible for localizing R&D and deployment of Intel Ethernet, smart NICs, P4 programmable switch chips, and Intel/Google IPU-related system architecture products. He initiated and led the foundational technical solutions for large-scale deployments by multiple major cloud service providers.
In response to the stringent demand for high bandwidth and low latency data transmission in network-intensive large-scale parallel computing, NVIDIA has not only launched a super NIC (SNIC) based on IB networks but also introduced an SNIC for Ethernet, aiming to provide robust network support for AI factories and cloud data centers. NVIDIA's definition of the AI super NIC is clear: it is a new type of network accelerator specifically designed for network-intensive, massively distributed AI computing workloads. Its value goes beyond merely transmitting data packets; it truly transforms communication in multi-GPU, multi-node environments into an acceleration engine for unleashing computing power.
In comparison to NVIDIA ConnectX-8/9's leadership, the 800G AI NIC transitions from being just an 'interface' to becoming a 'central hub.'
NVIDIA publicly stated that ConnectX-8 is the industry's first super NIC (SNIC) to integrate PCIe Gen6-level switching capabilities with ultra-high-speed network processing into a single device. It serves not only AI, HPC, and hyperscale cloud data center scenarios but also consolidates tasks previously requiring separate PCIe switches and independent NICs by incorporating a 48-lane PCIe Gen6 switch. This represents a unique and innovative technological approach.
Traditional 1U servers with eight PCIe GPU cards commonly adopt a tree topology, where the CPU serves as the root node and the PCIe switch connects to GPUs downstream. While this architecture was relatively mature during the general computing era, its structural performance bottlenecks have become increasingly evident in large model training scenarios: GPU-to-GPU communication often needs to go through the PCIe switch, and cross-socket paths may be constrained by host links, leading to increased latency and reduced bandwidth utilization. It struggles to meet the low-latency and high-throughput requirements of collective communications like All-Reduce, Reduce-Scatter, and All-to-All. Therefore, ConnectX-8 integrates PCIe Gen6 switching capabilities with high-speed networking into a single device, replacing traditional discrete PCIe switches, optimizing GPU-to-GPU and GPU-to-NIC data paths while reducing system complexity, power consumption, and total cost of ownership.
Recently, QiYi Moore, an AI full-stack interconnect company, announced that it has successfully built an 800G AI super NIC (SNIC) platform architecture.In addition to the high bandwidth of 800Gb/s and ultra-low latency in sub-microseconds, its key technologies also cover enhanced RoCE v2 mechanisms for AI networks, including packet spraying, multi-path transmission, efficient retransmission, and advanced programmable congestion control. The AI SNIC ASIC designed based on this self-developed platform architecture has recently completed chip return and successfully passed silicon validation of the core RDMA architecture, with single-channel throughput stably at 400Gbps and key latency around 1 microsecond. Against the backdrop where most domestic high-performance NIC public products and industry narratives still focus on the 100/200G RDMA ASIC engine stage,QiYi Moore is achieving a substantial breakthrough with the single-channel 400G RDMA ASIC engine,opening the curtain for domestic AI super NICs to rapidly advance towards 800G ASIC. The ASIC designed based on the 800G AI SNIC platform architecture has successfully passed silicon validation of the RDMA architecture, with single-channel throughput stably at 400Gbps. Why an Ethernet-based RDMA AI native NIC? Before the rise of AI networks, the mainstream classification of smart NICs (SmartNIC) in the industry was not focused on AI training and inference scenarios but rather differentiated based on chip architecture and offload capability depth...
In terms of port configurations, ConnectX-8 has clearly entered the 800G era. According to NVIDIA's official information, the Ethernet version of ConnectX-8 comes in a 2×400GbE form factor, offering not only 800G-level total bandwidth but also more flexible dual-port networking options, making it more suitable for future large-scale AI clusters’ demands for redundancy, traffic diversion, elastic scaling, and complex topology deployments. In terms of shipping configurations, ConnectX-8 integrates high-speed networking capabilities with PCIe Gen6 switch capabilities into a single device, delivered in a unified integrated design form factor.
SingularMole creates a milestone: Domestic AI SNIC moves from 'usable' to 'high-performance.'
It is against this backdrop of global technological evolution thatMorethan Moore's single-channel 400G engine, built on its self-developed 800G AI SNIC ASIC architecture, has completed the core RDMA ASIC tape-out verification, making it particularly significant.Unlike FPGA solutions that are more suitable for early verification and rapid iteration, the ASIC route determines whether a product truly has the performance ceiling, power efficiency, board integration level, and mass production consistency required for large-scale AI training and inference clusters. Morethan Moore is the first to achieve a single-channel 400G RDMA ASIC engine within the domestic industry, with stable throughput at 400Gbps, not only filling the gap in high-bandwidth super NIC chips domestically but also fully demonstrating its mastery of core design capabilities for 2×400G and even higher-rate products, laying a solid technical foundation for the next-generation 800G NIC.
Morethan Moore's latest 800G SNIC aligns comprehensively with NVIDIA's ConnectX-8/9 architectural direction, adopting an integrated design concept consistent with international advanced solutions that combines 'high-speed network processing + PCIe Switch capability.' In terms of product form, it supports 2×400G Ethernet ports total bandwidth, while architecturally optimizing interconnectivity within AI servers and enabling high-speed communication between nodes.This means that the company's 800G product layout is not just about upgrading bandwidth specifications but also about securing a leading position in the future form of high-performance AI cluster interconnection: by integrating PCIe Switch capabilities, reconstructing data paths from GPU to GPU and GPU to NIC, enhancing collective communication efficiency, reducing system complexity, and strengthening the ability to define whole machine platforms and cluster solutions.
Recently, QiYi Moore, an AI full-stack interconnect company, announced that it has successfully built an 800G AI super NIC (SNIC) platform architecture.In addition to the high bandwidth of 800Gb/s and ultra-low latency in sub-microseconds, its key technologies also cover enhanced RoCE v2 mechanisms for AI networks, including packet spraying, multi-path transmission, efficient retransmission, and advanced programmable congestion control. The AI SNIC ASIC designed based on this self-developed platform architecture has recently completed chip return and successfully passed silicon validation of the core RDMA architecture, with single-channel throughput stably at 400Gbps and key latency around 1 microsecond. Against the backdrop where most domestic high-performance NIC public products and industry narratives still focus on the 100/200G RDMA ASIC engine stage,QiYi Moore is achieving a substantial breakthrough with the single-channel 400G RDMA ASIC engine,opening the curtain for domestic AI super NICs to rapidly advance towards 800G ASIC. The ASIC designed based on the 800G AI SNIC platform architecture has successfully passed silicon validation of the RDMA architecture, with single-channel throughput stably at 400Gbps. Why an Ethernet-based RDMA AI native NIC? Before the rise of AI networks, the mainstream classification of smart NICs (SmartNIC) in the industry was not focused on AI training and inference scenarios but rather differentiated based on chip architecture and offload capability depth...
Morethan Moore's VP of Networking Technology, Ye Dong, added that Morethan Moore's latest 800G super NIC is planned for mass production within the year, potentially becoming a major breakthrough in domestication for 800G AI super network chips, system integration solutions, and platform-level architectural capabilities, further enhancing Chinese manufacturers' product definition rights, solution coordination rights, and industrial discourse power in future AI high-performance cluster infrastructures.
The window for localization has opened, and market potential is accelerating release.
From the perspective of industrial space and policy environment, the track Morethan Moore has entered is not a niche one but a rapidly expanding core infrastructure market. Public research shows that the market size for high-performance AI NICs at the ConnectX-7 level alone has exceeded tens of billions of yuan and is still growing continuously. This means that the high-performance AI NIC track represented by ConnectX-7/ConnectX-8 corresponds to a market opportunity worth hundreds of billions of yuan, which is still amplifying.
At the same time, the national strategic demand for autonomous controllable intelligent computing foundations, high-speed interconnects, and localized clusters is continuously strengthening. The National Development and Reform Commission and other departments have explicitly stated the need to accelerate the deployment of intelligent lossless networks, 400G/800G, and other advanced technologies, build a high-speed ubiquitous and secure reliable nationwide integrated computing power network, and strengthen the supply capacity of independent innovation technology.
Morethan Moore is a full-stack solution provider deeply focused on AI interconnects for many years, using networking and chiplet technology as its underlying feature. In addition to AI NIC chips for Scale Out scenarios, the company also provides G2G IO interconnect chiplets for Scale Up supernodes and has already achieved application in the industry. This technological gene differentiates it from other chip manufacturers, building a competitive barrier and granting it greater flexibility, broader development space, and deeper technical depth in next-generation high-performance cluster interconnects.
Risk Disclaimer: The above content only represents the author's view. It does not represent any position or investment advice of Futu. Futu makes no representation or warranty.Read more
Thumbs Up
2
7727 Views
Report
Comments
Write a Comment...
2