Ping An's Medical LLM Demonstrates World-Class Clinical Reasoning on HealthBench Hard

Insights, Events and Videos

Ping An Group

10 Jun 2026

Industry-specific AI is the next frontier in healthcare, where domain depth, clinical reasoning, and accuracy matter more than general-purpose capability. Ping An’s latest medical large language model (LLM) has set a new standard. In April 2026, Ping An Medical LLM 3.5 scored 57.27 on HealthBench Hard, OpenAI’s complex clinical reasoning benchmark, ranking first globally, ahead of Baichuan (44.4), Meta (42.8), and OpenAI (42).

From Health Consultation to Clinical Decision Support

Unlike general-purpose models, Ping An Medical LLM 3.5 is trained on real-world healthcare service data from PKU Healthcare and Ping An Health. The data includes screening, management, treatment, and rehabilitation. Two innovations underpin its performance:

1. Dynamic Diagnostic Simulation

Drawing on real-world clinical data, the LLM replicates the clinical reasoning of experienced physicians. To address common clinical challenges – such as ambiguous symptoms, evolving conditions, and incomplete information – Ping An Technology’s R&D team has designed a dynamic diagnostic simulation environment comprising patient digital twins, adaptive assessments agents, and clinical knowledge graphs. By modelling physician’s decision-making under uncertainty as a multi-turn reinforcement learning (RL) reasoning task, the system provides a strong foundation for ongoing algorithm optimisation.

2.Advanced Algorithm Training and Hallucination Control

Given the complexity and high stakes of medical reasoning, the R&D team adopted a three-stage progressive training approach: grounding the model in clinical knowledge and guidelines, strengthening multi-step reasoning through complex cross-disciplinary cases, and improving advanced reasoning performance. In parallel, the team applied on-policy distillation to transfer knowledge more efficiently while preserving the model’s complex reasoning capabilities.

The R&D team also developed an end-to-end hallucination control engine that integrates in-context learning, uncertainty quantification, and other techniques. The engine enables risk control across prompt design, model training, and inference, to continuously reduce hallucination rates.

Scaling AI Across the Healthcare Journey

Ping An is applying AI across key healthcare settings, from online platforms and hospitals to home care and enterprise health services. Across medical screening, management, treatment, and rehabilitation, Ping An is building an integrated AI-enabled healthcare service chain, supported by three core capabilities: large scale deployment, a data flywheel, and alignment with real-world clinical practice.

Ping An has developed more than 90 early disease screening models, deployed across 1,500 primary healthcare institutions in China. To date, these models have supported 1.5 million early screenings. They help 300,000 customers a year identify health risks early.

In chronic disease management, Ping An uses collaboration of multiple AI agents to support community-based, proactive health management. This has helped build one of China’s largest chronic disease management communities, serving 2 million patients. Proactive AI intervention has improved patient adherence to their treatment plans fivefold, for more effective long-term health management.

Making Multidisciplinary Expertise More Accessible with AI

Standardized Multidisciplinary Team (MDT) consultation is an important approach in complex disease management, particularly in oncology, where clinical evidence shows it can improve five-year survival rates by up to 15%. However, access to high-quality MDT services in China has been limited due to the scarcity of top specialists, hospital workflow bottlenecks, and uneven distribution of medical resources.

Ping An is addressing this gap by deploying its medical LLM to support AI-enabled MDT care in real-world healthcare settings. Ping An’s AI-enabled MDT capabilities can analyze comprehensive patient information and provide full-process treatment recommendations covering chemotherapy, targeted therapy, radiotherapy, and endocrine therapy, including dosage, treatment cycles, and sequencing. Internal data shows that 85% of AI-generated content has been adopted by physicians into their final treatment plans. For serious diseases, such as breast cancer, consistency between AI-generated recommendations and senior specialists has reached over 92.5%.

Ping An Chief Technology Officer Ray Wang says, “The goal is to transform top-tier multidisciplinary expertise from a scarce resource concentrated in major urban hospitals into accessible, scalable, and inclusive healthcare capabilities.”

By applying AI across the patient journey, Ping An is transforming medical LLM capabilities into real-world healthcare services – helping more people access care, improving service efficiency, and creating a differentiated, technology-driven health ecosystem for the future.