Miner Technology's large model achieves another world-class breakthrough! Mano ranks first among Specialized models on the OSWorld list with 72 billion parameters.

Miner Technology (2718.HK) achieves another world-class breakthrough with its self-developed large model Mano!

According to the latest data from the OS-World E2E official leaderboard (as of October 2025), Miner Technology’s self-developed GUI intelligent agent large model, Mano, achieved a task success rate of 54.0%, setting a new record. It ranks first among Specialized models and second overall, just behind Anthropic's newly released Claude 4.5.

Mano, with 72 billion parameters, ranks second on the OSWorld-Verified Foundation E2E GUI evaluation leaderboard.

Compared to the data submitted in September this year, Mano’s parameter size has expanded from 7 billion to 72 billion (approximately 72 billion), and its task completion rate has increased from 40.1% to 54.0%. This marks a significant performance improvement and demonstrates that dedicated intelligent agents have reached a new level of execution capability in real-world operational tasks.

Mano, with 72 billion parameters, ranks first in the OSWorld-Verified Foundation E2E GUI & Specialized Model evaluations.

From Language to Action: The Next Phase of Intelligent Agents

OSWorld is currently the world's most authoritative evaluation system for 'operational intelligence,' encompassing 10 types of applications and 369 cross-application tasks. It requires models to perform continuous operations in real desktop and browser environments—such as opening spreadsheets, searching for information, organizing data, and completing forms. These tasks are far more complex than question-and-answer generation because each step demands that the model not only understand the content but also comprehend the 'interface structure' while maintaining logical coherence across multiple operations.

In previous tests, even top-tier general-purpose large models often achieved success rates of only 30%–40% on OSWorld. However, Mano 72B’s latest achievement—a 54.0% end-to-end task success rate—not only sets a new record for Chinese models but also places 'specialized intelligent agents' at the forefront of this 'AI operational testing ground' for the first time.

The technical approach behind this differs significantly from traditional language models. In its latest technical report, 'Mano Technical Report' (report link: https://arxiv.org/abs/2509.17336), Minglue Technology systematically outlines its methodology: instead of being trained solely on text-based conversations, the model undergoes repeated trials and learning in a high-fidelity simulated computer environment. This can be understood as placing Mano within an extensive virtual operating system, where it learns to move cursors, click buttons, recognize menus, input data, and master the optimal path to complete tasks through trial and error.

Technical Principle: Enabling the Model to Learn in a 'Real Environment'

Mano’s training framework consists of three stages: Supervised Fine-Tuning (SFT), Offline Reinforcement Learning (Offline RL), and Online Reinforcement Learning (Online RL). Simply put, the SFT stage is akin to 'a teacher demonstrating examples,' where the model learns fundamental operational methods; the offline reinforcement learning stage allows the model to generalize based on past task experiences; and the online reinforcement learning stage involves continuous practice in real environments to discover new strategies.

Minglue Technology has also introduced an execution loop called 'Think–Act–Verify': when performing operations, the model first evaluates the current interface state (Think), then executes specific actions (Act), and finally verifies whether the results are correct (Verify). If errors occur during execution, the model automatically adjusts its steps and retries. This enables Mano to self-correct and handle faults when facing complex and dynamic operational scenarios.

To provide a simple example: when you instruct an agent to 'download a financial report,' a general-purpose large model might only provide a set of operational instructions, whereas Mano will actually open the browser, log into the account, identify the download button, select the correct date range, and retry by re-logging in if an error message appears. This capability is achieved through the synergy of reinforcement learning and a high-fidelity training environment.

According to the paper’s data, after incorporating online reinforcement learning, the model’s average task completion rate improved by approximately 14 percentage points, with particularly stable performance in multi-step tasks (multi-turn tasks). The research team noted that this 'learning in the environment' approach is key to achieving operational intelligence: the model no longer relies on static corpora but acquires feedback through continuous interaction, thereby gaining the ability to 'learn how to act.'

The Competitiveness of Specialized Intelligent Agents

For a long time, the performance evaluation of large models has primarily focused on tasks such as language comprehension, knowledge-based question answering, or content generation. The emergence of GUI agents has extended the boundaries of AI from the "textual world" to real operating systems. Compared to general-purpose large models, the core advantage of specialized agents lies in their focus—not on encompassing all knowledge but on achieving greater execution depth and stability in specific tasks.

Mano’s achievements exemplify this trend. Through structured task data, targeted reinforcement learning, and validation mechanisms, the model has demonstrated continuous improvement in interface recognition, action planning, and process stability. The official OSWorld review noted that this outcome “demonstrates the potential of specialized agents in executing real-world tasks and also marks the engineering progress of multimodal agent research.”

For Minglue Technology, Mano is not only a research achievement but is also gradually becoming the underlying technology for enterprise intelligent systems. The company is exploring ways to embed Mano’s operational intelligence into specific scenarios such as data analysis, marketing automation, and compliance management, enabling the model to take on the role of a "digital assistant" within actual business processes. The research team also mentioned that future directions include improving reasoning efficiency, reducing interaction steps, and promoting lightweight deployment on end devices, ensuring stable operation even in standard hardware environments.

From 7B to 72B parameters, and from 40.1% to 54.0% accuracy, Mano’s evolution represents more than just an increase in parameters—it signifies a capability shift from language understanding to operational intelligence. In their report, the Minglue Technology team stated that Mano will continue to optimize reasoning efficiency and task generalization capabilities while exploring end-side deployment and industry-level implementation paths, integrating agent capabilities into enterprise production workflows. When models no longer merely "output answers" but truly "complete tasks," artificial intelligence begins to demonstrate real-world execution power.

Report link: https://arxiv.org/abs/2509.17336 Leaderboard link: https://os-world.github.io/

37K Views