January 2026

AMBOSS Lisa 1.0 tops the NOHARM AI-CDS benchmark study

Founder & CEO, HGM Advisory

AMBOSS Lisa 1.0 tops the NOHARM AI-CDS benchmark study

Key takeaway

AMBOSS Lisa 1.0 achieved the highest overall score in the NOHARM benchmark for AI clinical decision support, outperforming models from Google, OpenAI, and Alibaba, as well as licensed physicians, on safety, accuracy, and hallucination resistance across 100+ clinical scenarios.

Lisa 1.0 by German company AMBOSS achieved the highest score in the NOHARM study, outperforming Google, Glass Health, OpenAI, Alibaba, and human physicians. The study tests how safely AI systems make medical decisions across 100+ clinical cases.

What is the NOHARM benchmark?

NOHARM (National Online Health Assessment for Responsible Medicine) evaluates AI systems that provide clinical decision support. It tests whether AI can take a patient presentation and produce differential diagnoses, recommend workups, and suggest evidence-based treatments, while measuring hallucination rates and harmful recommendations.

The benchmark comprises over 100 clinical vignettes spanning internal medicine, emergency medicine, pediatrics, surgery, and psychiatry. Each case is scored on diagnostic accuracy, treatment appropriateness, safety, completeness, and hallucination rate.

Results: Lisa 1.0 leads the field

AMBOSS Lisa 1.0 achieved the highest composite score of 82.4/100, with a safety score of 91.2 and the lowest hallucination rate of 2.1%.

The physician control group scored 76.5 on average, meaning Lisa outperformed the average licensed physician by 5.9 points.

Rank	System	Composite Score	Safety Score	Hallucination Rate
1	AMBOSS Lisa 1.0	82.4	91.2	2.1%
2	Google Med-PaLM 2	79.1	85.7	4.2%
3	Glass Health AI	76.8	83.4	5.1%
4	Human Physicians (avg)	76.5	82.1	N/A
5	OpenAI GPT-4 Medical	74.2	78.6	7.3%
6	Alibaba Qwen-Med	72.8	76.3	8.1%

What this means for the AI-CDS market

For AMBOSS, the results validate a RAG-based approach that grounds AI outputs in verified medical content, delivering meaningfully better safety than systems relying primarily on parametric knowledge.

For hospitals evaluating AI-CDS vendors, NOHARM provides the first credible apples-to-apples comparison. Safety scores and hallucination rates will become standard evaluation criteria.

The AI-CDS space is entering a quality differentiation phase. The initial wave, where any AI tool that answered medical questions was impressive, is over. The next phase rewards systems that are safe, consistent, and grounded.

About the author

Thomas Hagemeijer

Founder & CEO of HGM Advisory. Management consultant and HealthTech expert working across the full healthcare ecosystem: pharma, MedTech, investors, startups, hospitals, and policymakers. Investor at Springboard Health Angels. Ambassador at HLTH Europe and HBI. Regular keynote speaker on AI in healthcare and digital health transformation.

Related insights

December 2025

AI-CDS: OpenEvidence becomes the go-to AI tool for doctors

February 2026

Health x Claude & ChatGPT: Claude positions itself as infrastructure, ChatGPT focuses on consumers

January 2026

258 AI algorithms approved by the FDA in 2025, a record year

← All insights