HAMi v2.9.0 officially released, paradigm drives heterogeneous computing scheduling to production-level

Recently, the CNCF Sandbox project HAMi officially released version v2.9.0. As the leader and core contributor of HAMi, $PHANCY (06682.HK)$ contributed multiple key capabilities in this version focusing on device ecosystem expansion, CDI standardization, DRA adaptation, resource quota checks, and production stability fixes, accelerating HAMi's implementation as enterprise-grade AI infrastructure.This means enterprises no longer need to repeatedly debug and verify to effectively utilize, manage well, and ensure stable operation of heterogeneous computing power.

HAMi, incubated and led by Paradigm, has been continuously enhancing heterogeneous device virtualization, Kubernetes-native standards, and scheduling ecosystems since v2.8, evolving from a 'GPU sharing tool' into an infrastructure platform for unified management and scheduling of heterogeneous computing power.

01 Overview of Paradigm Contributions

In version v2.9.0, the main contributions from Paradigm include:

- Added support for Hanbo devices: Expanding coverage of domestic GPU devices and enriching the heterogeneous computing power management ecosystem.

- Enhanced Volcano vGPU device plugin: Adapted to nvidia-device-plugin API, supporting CDI mode and promoting standardization of device injection methods.

- HAMi k8s-dra-driver adaptation upgrade: Followed up on new features in v25.12.0, continuously coordinating with the Kubernetes DRA ecosystem evolution.

- Webhook adds resource quota checks: Validates resource quotas at the Pod submission stage, improving scheduling efficiency and production controllability.

- Multiple production stability optimizations and fixes: Including optimized HAMi-Core compilation scripts, fixed scheduler multi-replica upgrade failures, NVIDIA MIG allocation failures under CDI mode, and GPU Device Plugin filter failures.

From device expansion to production stability, filling gaps in enterprise-level critical capabilities

In enterprise AI infrastructure, the coexistence of multi-brand GPUs/NPUs has become commonplace. HAMi v2.9.0 added support for Hanbo devices, further expanding the heterogeneous device ecosystem to more domestic GPU scenarios. Paradigm led related capability building, driving the integration of domestic computing power into a unified scheduling system.

Meanwhile, Paradigm's participation in Volcano vGPU CDI mode support makes device injection methods more declarative and standardized, reducing coupling between plugins and container runtimes, which is of significant engineering value for upgrading, scaling, mixed deployments, and cross-platform deployments in complex AI clusters.

In the Kubernetes-native direction, Paradigm participated in the adaptation upgrade of HAMi k8s-dra-driver, continuously promoting synergy between HAMi and the DRA ecosystem. As a next-generation device resource declaration and allocation mechanism, DRA can finely express complex requirements such as video memory, computing power, and topology, representing an important direction for cloud-native heterogeneous computing power scheduling.

In addition, the Webhook resource quota check and fixes for several key issues further strengthen HAMi's production-grade capabilities. Especially in complex scenarios such as multi-replica schedulers, NVIDIA MIG/CDI, and device filtering, these optimizations directly impact cluster stability, resource allocation accuracy, and platform maintainability.

03 Continuous Collaboration on Building a Heterogeneous Computing Foundation for the AI Era

The competition in AI infrastructure is shifting from 'whether computing power exists' to 'whether it can be managed efficiently, stably, and uniformly.' With large model inference, agent applications, and enterprise-level AI platforms accelerating deployment, unified management, flexible partitioning, and stable scheduling of heterogeneous computing resources have become critical foundations for AI's scaled application.

As the incubator and leader of HAMi, Paradigm will continue to invest in areas such as heterogeneous computing virtualization, domestic chip adaptation, Kubernetes-native scheduling, resource isolation, and production stability, working with the community to promote HAMi as a cloud-native computing infrastructure foundation for the AI era.

Paradigm Intelligence (6682.HK) is a leading full-stack AI cloud service platform, offering efficient AI infrastructure including Prophet AIOS, HAMi vGPU, and Xinchuang Model Box. It possesses full-stack capabilities ranging from managing and optimizing heterogeneous computing power at the foundational layer to invoking agent models, empowering intelligent transformation across various industries. It aims to build an efficient and scalable 'Token Factory' in the AI 2.0 era.

With the mission of 'AI for Everyone,' Paradigm positions itself as the 'pioneer in harnessing AI,' striving to become a world-leading general artificial intelligence technology company.

65K Views