AI Predicts the Onset of Disease up to 20 Years in Advance

A groundbreaking AI model, Delphi-2M, can predict the onset of over 1,000 diseases up to 20 years in advance using medical records. Published in Nature (2025), it marks a major leap in preventive and personalized medicine—learn how it works, its results, and what it means for the future of healthcare.

Oct 20, 2025 0

AI Predicts the Onset of Disease up to 20 Years in Advance

How a new model, Delphi-2M, is redefining the future of preventive medicine

In a landmark study published in Nature (2025), researchers from the European Molecular Biology Laboratory (EMBL), the German Cancer Research Center (DKFZ), and collaborators unveiled an AI model that can predict the onset of more than 1,000 diseases — up to 20 years before they occur.

The model, called Delphi-2M, represents a leap forward in personalized and preventive healthcare by using medical records to forecast an individual’s long-term health trajectory.

What Makes Delphi-2M Different

Traditional risk models focus on one disease at a time — for example, heart disease or diabetes — and usually over short windows of 5 to 10 years. Delphi-2M takes a holistic approach, modeling an individual’s entire medical history as a sequence of health “events,” much like words in a sentence.

Using a transformer architecture (the same core technology that powers large language models), Delphi-2M learns the “grammar” of disease progression — the order and timing in which conditions tend to appear throughout life.

It was trained on data from 400,000 participants in the UK Biobank and validated on 1.9 million individuals from the Danish National Patient Registry. The model proved capable of generalizing across populations, a critical milestone for clinical translation.

Key Findings from the Nature Study

Metric	Result
Diseases modeled	>1,000 ICD-10 conditions
Prediction horizon	Up to 20 years
Median AUROC (accuracy)	~0.80
Top performing categories	Cancers, cardiovascular diseases (AUC > 0.85)
Cross-population performance drop	3–5% lower in Danish dataset
Calibration error	<0.05 for most major disease groups
Explained comorbidity structure	72% variance explained

The model not only predicts which diseases are likely to occur, but also when — offering time-to-event estimates for each condition. This temporal awareness makes it far more actionable than static risk calculators.

How It Works

Sequence learning: Each patient’s medical history is treated as a chronological sequence of “events” (diagnoses, test results, hospital visits).
Generative modeling: Delphi-2M predicts the most likely “next event” and its timing, given the history so far.
Explainability: The model identifies which prior health events contributed most strongly to each prediction, uncovering meaningful comorbidity clusters (e.g., obesity → diabetes → kidney disease).
Cross-validation: Tested on Danish health data, the model maintained high accuracy without retraining, showing real-world robustness.

Important Limitations

Despite its promise, the authors emphasize that Delphi-2M is not yet ready for clinical deployment.

Predictive, not causal: The model finds correlations, not interventions.
Data bias: UK Biobank participants tend to be healthier and more homogeneous than the general population.
Incomplete data: Many risk factors — diet, stress, environment — remain unrecorded in medical records.
Uncertainty over long horizons: Predictions decades ahead will always carry increasing uncertainty.
Ethical concerns: How should we communicate high long-term disease risk without causing unnecessary anxiety or discrimination?

Why It Matters

If validated further, models like Delphi-2M could revolutionize healthcare in several ways:

Proactive prevention: Identify high-risk individuals decades before symptoms appear.
Personalized care: Tailor lifestyle and screening plans to each person’s unique health trajectory.
Healthcare planning: Predict population-level disease trends for better resource allocation.
Discovery engine: Reveal new patterns and pathways in the natural history of disease.

As Nature’s commentary notes, Delphi-2M “learns the natural history of human disease” in a way that could help medicine shift from treating illness to anticipating it.

🧾 References

Shmatko, A., Jung, A. W., Gaurav, K., Brunak, S., Mortensen, L. H., Birney, E., Fitzgerald, T., & Gerstung, M. (2025). Learning the natural history of human disease with generative transformers. Nature, 625, 230–245. https://doi.org/10.1038/s41586-025-09529-3
Scientific American (2025). New AI Tool Predicts Which of 1,000 Diseases Someone May Develop in 20 Years.
Nature News (2025). AI Predicts Disease Decades Ahead by Learning the “Grammar” of Health.
Inside Precision Medicine (2025). AI Model Predicts Risk for 1,000 Diseases Decades in Advance.
News-Medical (2025). AI Model Maps Lifetime Disease Risks to Transform Future Healthcare Planning.
Science Media Centre (2025). Expert Reaction to New AI Model for Predicting Individual Risk of Disease Over Decades.