Dear Friends and Colleagues,
The attached papers, published by the team at the Ratner Early Detection Initiative (REDI) describe a powerful and proven approach to early cancer detection that we believe will radically reduce cancer deaths.
The core idea is that blood tests taken routinely in the course of an annual physical contain dozens of signals hiding in the relationships between values when cancer is developing. When analyzed individually the results may look normal but when machine learning algorithms are applied, cancer can be detected from six to twenty-four months before a doctor is able to diagnose the cancer by currently accepted screening protocols.
The first paper: "What Your Blood Already Knows" describes early detection opportunities for as many as thirteen cancers by deploying specific algorithms to the results of routine blood draws. Two algorithms have been developed and deployed in real clinical settings at Geisinger Health System in Pennsylvania and at Maccabi Healthcare Services in Israel. A colorectal cancer algorithm has been tested by application in more than 600,000 patients. The outcomes are published in the Journal of the American Medical Informatics Association. The lung cancer algorithm was validated at Kaiser Permanente across nearly 200,000 patients and published in the American Journal of Respiratory and Critical Care Medicine. Gastric and liver cancer algorithms are being validated in pilot projects of 193,000 and 75,000 patients respectively. Nine additional cancers, including but not limited to ovarian and pancreatic, have blood signatures documented in the medical literature and are awaiting for algorithm development.
The second paper, "Beyond Cancer," extends the same logic to diseases other than cancer. Cardiovascular disease, type 2 diabetes, chronic kidney disease, fatty liver disease, and a number of rarer conditions that leave fingerprints in routine blood work years before conventional screening protocols are able to detect them. The deployment of these algorithms for cancer and other diseases, has been conservatively estimated to have the potential to save 400,000 to 675,000 lives annually in the United States; numbers representing 13 to 22 percent of all American deaths annually.
The third document, "The Evidence Behind the Numbers," is a question-and-answer companion piece that in addition to anticipating questions we think you may be asking yourselves as you read through the material, shows the sources of every number: the journals where they appear, the studies, which databases, and the calculations. If you are a clinician or a scientist, this is the document that will answer your methodological questions and if you're a policymaker, this document shows the math behind the claims.
REDI Conclusions: The science laid out in our work is not theoretical; it is published, validated, and in some cases already operational. What we lack are the algorithms for the remaining cancers and diseases that can be detected by machine learning. We urgently need to build institutional will and commitment to deploy existing algorithms and to develop additional ones to cover diseases known to be detectable by application of this approach. Building the remaining cancer detection algorithms is estimated to cost less than $75 million; and will require access to electronic medical record data from three or four large health systems. The investment is less than 2% of the cost to develop a single cancer drug. The blood tests are being drawn routinely as part of a patient's annual physical. The data is sitting in accessible medical records. We plan to develop the algorithms needed to analyze the blood. Once the algorithms are developed and applied, only the highest risk patients will undergo additional screening, reducing the overall number of patients screened and significantly increasing the number of cases detected in early stages.
I recognize that you may have questions about specific numbers, about false positive rates, about what happens when an algorithm flags a patient for a cancer that has no established screening test. The evidence paper addresses each of these topics. As you read the papers, please think about what algorithmic blood test analysis could mean for patients. If you are involved in health policy, think about what a $75 million investment could return when the alternative is spending billions treating cancers that could have been caught early when treatment is relatively painless and inexpensive.
On a personal level I implore you to get your blood work done. And if any of this moves you, please share this information with others who might help us in our efforts to have this approach become THE standard in early detection. And of course, share this information with those you care about to ensure their health and wellbeing. We are open to questions, to ideas, to criticism, and most of all, to partners who want to help make this real.
I genuinely believe we are close to using machine learning to analyze blood work to detect cancer and make material advances in the 50 year old "war on cancer;" or as my wife says: in a way that may finally "conquer cancer"
With deep respect and hope,
Bruce Ratner
Ratner Early Detection Initiative (REDI)
What Your Blood Already Knows
The Problem We Can Solve
Every year in the United States, roughly 393,000 people die from 13 cancers. Not because medicine lacks the ability to cure these diseases, but because medicine finds them too late.
Lung cancer kills 127,000 Americans annually. Eighty-five percent of those patients learned they had the disease after it had already spread beyond the lung. Pancreatic cancer kills 51,000 people a year, and the vast majority of them were walking around feeling perfectly fine until the tumor grew large enough to block a bile duct or press against a nerve. By then, the five-year survival rate is 10 percent. Colorectal cancer kills 53,000, even though it is one of the most curable cancers in existence if caught at Stage I. The pattern repeats for ovarian cancer, gastric cancer, esophageal cancer, and liver cancer. The disease grows silently; the patient feels nothing; the diagnosis arrives too late.
The survival numbers are not subtle. Late-stage lung cancer: 8 percent survive five years; early-stage lung cancer: 60 percent; colorectal cancer caught at Stage I: 90 percent survival versus Stage IV: 29 percent; ovarian cancer found early: 93 percent; found at Stage IV (when most women discover it because there is no screening test): 29 percent; pancreatic cancer caught while still confined to the pancreas: 50 percent; caught after it has spread: 10 percent.
The difference between those numbers has nothing to do with better drugs or more aggressive treatment; the difference is when the cancer is found.
Figure 1. Five-year survival rates by stage at diagnosis. Early = Stage I/II. Late = Stage III/IV. Data derived from SEER national cancer surveillance database.
The Signal Hiding in Your Blood
Now here is the part that most people, including most physicians, do not know. Your blood is already telling us about these cancers. Not through some exotic new test, but through the routine blood draw you get at your annual physical.
A standard blood panel measures more than 60 distinct values: red blood cells, white blood cells, platelets, hemoglobin, liver enzymes, kidney function markers, metabolic indicators, cholesterol, thyroid hormone, and blood sugar. Your doctor reviews these numbers against reference ranges. If every value falls within the normal zone, the conclusion is that you are healthy.
However, that conclusion is frequently wrong.
Cancer does not push a single blood value dramatically out of range. It changes the entire landscape in ways too subtle for any human to perceive. White blood cells shift slightly upward. Platelet counts drift a fraction downward. The size distribution of red blood cells, a measurement called mean corpuscular volume, changes by a degree that would never trigger an alert. Liver enzymes creep in one direction while kidney markers shift in another. Individually, every number looks fine. Together, they form a pattern, or a biological fingerprint, that machine learning algorithms can detect 6 to 24 months before the patient notices a single symptom.
The Proof
This is not a theory. In 2016, a team of Israeli scientists analyzed blood tests from more than 600,000 people enrolled in Maccabi Healthcare Services, one of Israel's largest health systems. They trained a machine learning algorithm using a technique called gradient boosting, which combines hundreds of small observations into a single powerful prediction. Think of a detective who notices that three witnesses independently mentioned the same color car: any one observation means nothing, but the pattern changes everything. The algorithm reads your blood work the same way.
Their study, published in the Journal of the American Medical Informatics Association, demonstrated an accuracy metric of 0.82 for colorectal cancer, with 0.5 indicating a coin flip and 1.0 perfect prediction. The algorithm was validated in an independent population of 30,000 people in the United Kingdom. When deployed at Geisinger Health System in Pennsylvania, it flagged 706 patients from a pool of 25,610 who were overdue for screening. Among the 104 who completed colonoscopy, 8 percent had colorectal cancer, compared to the 1 percent detection rate in standard screening. That is an eightfold improvement. No new blood tests and no new equipment. The algorithm simply read the blood work that those patients had already received.
Lung cancer followed. Researchers at Kaiser Permanente analyzed 6,505 lung cancer cases against 189,597 controls and developed an algorithm that achieved an accuracy of 0.86 and identified 40 percent of future lung cancer patients at a false positive rate of just 5 percent, 9 to 12 months before clinical diagnosis. A gastric cancer algorithm was validated across 193,000 patients. A liver cancer algorithm was validated in 75,000 patients. At least nine more cancers, from pancreatic to ovarian to kidney to myeloma, display documented blood chemistry signatures that the same methodology can target.
Figure 2. Current development status of cancer detection algorithms for 13 target cancers.
How Solid Are These Numbers?
The table of 13 cancers and their estimated lives saved rests on three foundations: published survival rates from the national cancer surveillance database maintained by the National Cancer Institute, the documented stage-shift effect of moving diagnosis from late to early stage, and the demonstrated performance of algorithms already deployed in clinical practice. The estimates assume 50 percent population penetration of annual blood testing with algorithm integration; they are deliberately conservative. Full deployment would substantially increase the figures.
Figure 3. Conservative estimates of lives saveable annually per cancer type. Total: ~100,000 to 175,000 lives per year. Assumes 50% population penetration.
What Would It Cost?
This is where the economics become extraordinary. The most expensive element in any machine learning project is acquiring clean, labeled training data. In healthcare, that data already exists. Large hospital systems maintain electronic medical records containing years of blood test results alongside diagnostic records documenting who developed cancer, what type, and when. If three or four major institutions (each with millions of patient records) contributed their existing data, the primary cost would be data science teams, computational resources, clinical validation, and regulatory clearance. Each new algorithm, following the methodology already proven in two cancers, would cost approximately $2 to $5 million to develop and validate; the complete program for all remaining cancers would cost roughly $30 to $50 million.
For perspective, developing a single new cancer drug costs an average of $2.6 billion and takes 12 to 15 years. This entire algorithmic detection program could be completed for less than 2 percent of that cost in a fraction of the time.
Figure 4. Cost comparison: building all 13 cancer algorithms ($40M) vs. single drug development ($2.6B) vs. treating 125 late-stage patients for one year ($75M).
What Happens When the Algorithm Flags You?
For patients flagged by these algorithms, the critical next question is what test confirms the finding? Some cancers already have established follow-up protocols: colonoscopy for colorectal cancer and low-dose CT for lung cancer. However, many of the cancers on this list, including kidney, ovarian, pancreatic, and bladder, have no standard screening test at all. That is precisely why they are caught late and why they kill so efficiently.
Full-body MRI addresses this gap. The Prenuvo Polaris study, presented at the American Association for Cancer Research, scanned 1,011 people with no symptoms. It found biopsy-confirmed cancer in 2.2 percent of participants. Sixty-four percent of the cancers detected were still localized, meaning they were caught while they were curable. Sixty-eight percent were cancer types for which no screening test currently exists, cancers that would have been found only after symptoms appeared, months or years later, at a stage where survival rates collapse.
The false positive profile is reassuring. Of the 41 people referred for biopsy, 51 percent had confirmed cancer or masses requiring treatment. Compare that to mammography, where 39 to 50 percent of biopsied findings turn out to be cancer. The biopsy yield from full-body MRI is right in line with what we already consider worthwhile for screening. The negative predictive value was 99.8 percent: among participants whose scans came back clear, virtually all remained cancer-free for at least one year.
Figure 5. Full-body MRI screening performance compared to mammography. Source: Prenuvo Polaris study (n=1,011).
The Opportunity
The total investment to build every remaining cancer detection algorithm for routine blood work is less than the cost of treating 125 late-stage cancer patients for a single year. The return is an estimated 100,000 to 175,000 lives saved annually, using blood tests already drawn and analyzed, with data already sitting in electronic medical records.
The blood is already telling us. We just need to start listening.
Beyond Cancer
The Bigger Picture
Cancer is not the only killer hiding in your blood work, and it may not even be the biggest one.
Cardiovascular disease kills 700,000 Americans every year. More than half of all heart attack victims had no prior symptoms. Type 2 diabetes kills 89,000 directly and contributes to hundreds of thousands more deaths through complications. Chronic kidney disease kills 57,000 and sends another 130,000 patients onto dialysis annually, at $90,000 per patient per year. Heart failure kills 68,000. Fatty liver disease, now the most common liver disorder in the United States, kills 21,000 and is progressing silently in one out of every four American adults.
Every one of these diseases progresses slowly, over years or decades, while changing blood chemistry at every stage; every one of those blood chemistry changes appears in the same routine blood tests drawn at your annual physical.
The algorithms that detect cancer from routine blood work use a proven methodology: train machine learning on the blood test records of people who later developed disease, and teach the computer to recognize the patterns that preceded diagnosis. That methodology applies to every chronic disease with a blood signature, and the Israeli scientists proved it with cancer. The same approach works for heart disease, diabetes, kidney failure, liver disease, and more than a dozen rarer conditions that devastate patients because they are rarely caught early.
What Your Blood Is Already Saying
Consider chronic kidney disease. Your kidneys can lose half their function before the standard blood test value, creatinine, crosses the threshold that triggers concern. However, the decline is not silent. Creatinine creeps from 0.9 to 1.0 to 1.1 mg/dL over three years while remaining technically normal. Blood urea nitrogen drifts upward. Albumin edges downward. Individually, every number passes inspection. Together, they trace a trajectory toward kidney failure that machine learning detects two to five years before conventional diagnosis, with accuracy scores of 0.83 to 0.88.
Or consider type 2 diabetes: the disease develops over 5 to 10 years. During that time, fasting glucose rises gradually from 88 to 94 to 98 mg/dL, always staying below the 100 mg/dL prediabetes threshold. A1C drifts from 5.2 to 5.5 to 5.8. Triglycerides climb. HDL cholesterol drops. The metabolic panel, the lipid panel, and the A1C are all whispering the same warning, but no single value triggers an alert. Algorithms reading all three panels simultaneously predict diabetes onset three to five years before diagnosis, when lifestyle intervention, 7 percent weight loss plus 150 minutes of weekly exercise, reduces progression by 58 percent. The disease can be prevented entirely, but only if you catch it in time.
Cardiovascular disease follows the same pattern. Machine learning that analyzes lipid trends, inflammatory markers from the complete blood count, and metabolic panel values achieves an accuracy of 0.78 to 0.82 for predicting heart attacks and strokes 5 to 10 years before they occur, substantially outperforming the Framingham Risk Score, which most physicians rely on today.
Figure 6. Demonstrated algorithm accuracy for non-cancer diseases compared to deployed cancer detection algorithms. AUC of 0.50 = coin flip; 1.0 = perfect prediction.
Figure 7. Detection lead time: how far in advance algorithms can identify disease before conventional diagnosis.
The Diseases Doctors Almost Never Catch
Some of the most compelling opportunities involve conditions that physicians rarely consider until catastrophic damage has occurred.
Familial hypercholesterolemia is a genetic disorder affecting 1 in 250 people that causes heart attacks in the 30s and 40s. It is massively underdiagnosed, even though it produces a distinctive lipid signature, LDL cholesterol consistently above 190 mg/dL from young adulthood, specific LDL-to-HDL ratios, that appears on every lipid panel these patients have ever received. Aggressive statin therapy started in the 20s or 30s prevents cardiovascular disease almost entirely. An algorithm flagging this pattern from population-level lipid data could prevent an estimated 10,000 to 15,000 premature cardiac deaths per year.
Hemochromatosis, iron overload, affects 1 in 200 people of Northern European descent, making it one of the most common genetic disorders in the United States. It causes cirrhosis, heart failure, diabetes, and arthritis. It is almost always diagnosed after organ damage has occurred. Yet the metabolic panel tells the story five to ten years earlier: glucose rises slowly as iron damages the pancreas, liver enzymes trend upward, calcium declines. Treatment is simple phlebotomy, essentially blood donation, and if started early, it prevents every complication. An algorithm detecting this pattern would be one of the highest-value, lowest-cost interventions in preventive medicine.
Wilson disease, copper accumulation causing liver failure and neurological devastation, affects 1 in 30,000 people. It is typically diagnosed after irreversible brain damage. Addison's disease, adrenal insufficiency, kills through sudden adrenal crisis: collapse, shock, and organ failure arriving without warning. For both diseases, blood chemistry changes precede catastrophe by years. Algorithms reading those patterns could prompt simple, definitive testing long before irreversible harm.
The Stakes
Figure 9. Estimated deaths preventable annually across 14 non-cancer conditions. Total: ~300,000 to 500,000 per year. Assumes 50% population penetration.
The combined toll of these diseases exceeds 1.2 million American deaths per year. Conservative estimates, assuming only 50 percent of the population receives annual blood testing with algorithm analysis, project that 300,000 to 500,000 of those deaths could be prevented. Cardiovascular disease and sepsis account for the largest share, but diabetes, kidney disease, and heart failure each represent tens of thousands of preventable deaths.
The Economics
The financial case is overwhelming. A heart attack costs $20,000 to $50,000 for initial treatment, plus ongoing costs for heart failure management that can reach $200,000 over a decade. Preventing that heart attack through early detection and intervention costs a few thousand dollars. Dialysis costs $90,000 per patient per year, but detecting kidney disease at Stage 2 instead of Stage 4 delays dialysis by 5 to 10 years, saving hundreds of thousands of dollars per patient. Diabetes complications, blindness, amputations, cardiovascular disease, and kidney failure cost an average of $250,000 per patient over a lifetime. The Diabetes Prevention Program demonstrated that a $3,500 lifestyle intervention prevents the disease entirely in 58 percent of high-risk patients.
Figure 8. Per-patient cost comparison: treating disease after late detection vs. intervening after early algorithmic detection.
The Full Picture
Add these numbers to the cancer detection estimates, and the total opportunity comes into focus. Algorithmic analysis of routine blood tests, the same six panels drawn at every annual physical, could prevent 400,000 to 675,000 deaths per year in the United States. That represents 13 to 22 percent of all annual deaths. No new tests, blood draws, or patient behavior. Just a smarter way to analyze the data we already collect 200 million times a year.
The algorithms for cancer are further along. Two are deployed, two more validated, and nine more buildable. The non-cancer algorithms are at an earlier stage, with research demonstrating feasibility across all major conditions. The methodology is identical. The training data sits in the same electronic medical records. The development cost per algorithm remains $2 to $5 million.
The blood tests are already being drawn. The diseases are already leaving their fingerprints. The only thing missing is the software that reads them.
Figure 10. Combined annual impact of cancer and non-cancer algorithmic early detection. Total: 400,000 to 675,000 deaths preventable per year (13\u201322% of all U.S. annual deaths). Assumes 50% population penetration.
The Evidence Behind the Numbers
This document provides the methodology, citations, and reasoning behind every major claim in the preceding chapters. It is designed to answer the questions that policymakers, physicians, and informed readers will ask.
Part I: The Cancer Selection
Why thirteen cancers and not twenty or fifty?
The 13 cancers on this list were selected because each one meets all four of the following criteria. First, the cancer must be lethal enough to justify the effort: each of these 13 cancers kills at least 2,000 Americans per year. Second, the cancer must show a large survival gap between early and late detection: for every cancer on this list, catching it at Stage I instead of Stage IV at least doubles the five-year survival rate, and for most cancers the improvement is fivefold or greater. Third, the cancer must produce detectable changes in one or more of the six standard blood panels drawn at annual physicals: the complete blood count (CBC), the comprehensive metabolic panel (CMP), liver function tests, renal function tests, thyroid-stimulating hormone, or hemoglobin A1C. Fourth, there must be a plausible biological mechanism explaining why the cancer would alter those blood values.
Cancers that fail any one of these criteria were excluded. Prostate cancer, for example, has a well-established screening test already (the PSA blood test), and its inclusion in a routine panel algorithm would add complexity without addressing an unmet need. Breast cancer is excluded because its primary detection mechanism is imaging rather than blood chemistry. Brain cancers were excluded because most do not produce systemic blood changes detectable in routine panels until very late stages. Melanoma is excluded because it is a surface cancer detected visually rather than through bloodwork.
Thirteen is not a ceiling. As the methodology matures and more institutions contribute data, additional cancers may qualify. However, these 13 represent the clearest, best-documented opportunities where the science, the biology, and the unmet clinical need converge.
What is the biological basis for detecting cancer through routine blood tests?
Cancer is a systemic disease, even when the tumor is localized. Every tumor needs a blood supply, which triggers the formation of new blood vessels and the release of signaling molecules into the bloodstream. Every tumor triggers an inflammatory response, disrupting the balance of white blood cell populations. Every tumor consumes glucose and releases metabolic waste. Additionally, many cancers cause microscopic bleeding that depletes iron stores and alters red blood cell characteristics over weeks and months.
These processes change the composition of blood in ways that are too subtle for a human physician to notice, but machine learning can detect them by analyzing multiple blood values simultaneously and tracking their trajectories over time.
Colorectal and gastric cancers bleed into the gastrointestinal tract, causing a gradual iron deficiency that shows up as declining hemoglobin, shrinking red blood cells (lower mean corpuscular volume), and widening variation in red blood cell size (rising red cell distribution width). Lung cancer triggers systemic inflammation: neutrophil counts rise, lymphocyte counts fall, and platelets increase as part of the inflammatory cascade. Liver cancer disrupts the organ that produces clotting factors and metabolizes toxins, leading to changes in liver enzymes, albumin, and bilirubin that appear on liver function tests. Pancreatic cancer compresses the bile duct as it grows, raising alkaline phosphatase and bilirubin months before the tumor is large enough to detect on imaging, while simultaneously damaging insulin-producing cells and raising blood glucose. Multiple myeloma produces excess immunoglobulins that raise total protein, while bone destruction raises calcium, and kidney damage from abnormal proteins raises creatinine. Leukemia and lymphoma directly alter white blood cell counts, red blood cell production, and platelet levels because they originate in the blood-forming system itself.
The critical insight is that individual values may remain within normal reference ranges, while patterns across multiple values and trends over time reveal elevated risk. This is precisely what machine learning captures and what threshold-based evaluation by human physicians cannot see. 1,2,3,4,5
What peer-reviewed evidence supports the claim that algorithms can detect these cancers?
Validated and Deployed (2 cancers)
Colorectal cancer: The ColonFlag algorithm was developed using data from 606,403 patients from Maccabi Healthcare Services in Israel and externally validated in 30,674 patients in the United Kingdom. It achieved an area under the curve (AUC) of 0.82, with odds ratios of 26 in Israel and 40 in the United Kingdom at a 0.5 percent false positive rate. These findings were published in the Journal of the American Medical Informatics Association in 2016 and separately validated at Kaiser Permanente with an odds ratio of 34.7 at 99 percent specificity. ColonFlag was deployed clinically at Geisinger Health System, where it achieved an eightfold improvement in cancer detection among patients who completed colonoscopy. 1,6,8
Lung cancer: The LungFlag algorithm was validated across 6,505 non-small cell lung cancer cases and 189,597 controls at Kaiser Permanente Southern California. It achieved an AUC of 0.856 in the 9- to 12-month window before clinical diagnosis, with 40.1 percent sensitivity at 95 percent specificity and a diagnostic odds ratio of 12.7. These findings were published in the American Journal of Respiratory and Critical Care Medicine in 2021. The algorithm outperformed both the U.S. Preventive Services Task Force (USPSTF) categorical screening criteria and the PLCOm2012 quantitative risk model. 2
Validated in Large Studies, Awaiting Deployment (2 cancers)
Liver cancer (hepatocellular carcinoma): Validated in a Hong Kong territory-wide study of more than 75,000 patients using a model trained on CBC, liver function, and renal function test values. Achieved 80 percent sensitivity and 81 percent specificity, outperforming AFP (the current tumor marker standard) for early-stage detection. These findings were presented at the ESMO Gastrointestinal Oncology Congress in 2025. 3
Gastric cancer: Validated in a 193,000-patient Hong Kong cohort using machine learning trained on CBC, liver function, and renal function values. Achieved 79 to 96 percent sensitivity. These findings were presented as an abstract at the American Society of Clinical Oncology annual meeting in 2023. 4
Algorithm Needed, Biology Documented (9 cancers)
For the remaining nine cancers, no deployed algorithm yet exists. However, the biological mechanisms by which each cancer alters routine blood chemistry are documented in the peer-reviewed literature, and the methodology for building detection algorithms from this data has been proven through the four cancers above. The Siemens Healthineers "Deep Profiler" model has already demonstrated that a single model analyzing 33 CBC and CMP parameters can simultaneously detect colorectal (AUC 0.76), liver (AUC 0.85), and lung cancer (AUC 0.78), confirming that the multi-cancer, multi-panel approach works. 5
Pancreatic cancer: Liver enzyme (alkaline phosphatase, GGT) and bilirubin elevation from bile duct compression were documented 6 to 18 months before clinical diagnosis. Glucose rises as the tumor damages insulin-producing cells. 10
Multiple myeloma: Cross-panel signature documented two to five years before diagnosis: rising total protein, rising calcium, rising creatinine, declining hemoglobin, and declining albumin. 11
Leukemia: CBC is already the primary diagnostic tool. Abnormal white blood cell counts, anemia, and thrombocytopenia appear months before clinical diagnosis. Currently detected incidentally; no systematic population screening is in place. 12
Lymphoma: CBC changes, including elevated white blood cells, anemia, and inflammatory markers, appear months to a year before diagnosis. Often detected through unexpected CBC abnormalities. 12
Ovarian cancer: Inflammatory signature in CBC and CMP values is detectable before clinical presentation. Multi-cancer early detection research (OncoSeek) confirmed that systemic blood chemistry changes from ovarian tumors are detectable. 13
Esophageal, bladder, kidney, and thyroid cancers: Each can produce documented changes in one or more routine blood panels through mechanisms such as chronic blood loss, inflammation, metabolic disruption, or renal and liver involvement. The Pan-Cancer Tumor Inflammation Signature study confirmed that CBC-related inflammatory markers change 6 to 18 months before clinical cancer diagnosis across multiple tumor types. 12,13
Part II: How We Calculated the Numbers
Where do the "lives saveable" estimates come from?
Each estimate is calculated by multiplying three published data points together.
Step 1: Annual deaths. We use cancer mortality figures from the American Cancer Society's Cancer Statistics 2024, the most widely cited source for United States cancer mortality data. These numbers are derived from the National Cancer Institute's SEER (Surveillance, Epidemiology, and End Results) database and CDC (Centers for Disease Control and Prevention) death certificate data. 6,9
Step 2: Stage-shift survival benefit. We use five-year relative survival rates by stage at diagnosis from the same SEER database. The "stage-shift effect" is the survival improvement gained by detecting cancer at an earlier stage. For example, lung cancer detected at the localized stage has a 60 percent five-year survival rate versus 8 percent for distant-stage disease. The survival benefit from shifting the diagnosis from late to early stage for each cancer is published in Cancer Statistics 2024.
Step 3: Population penetration. We assume 50 percent of the United States adult population receives annual blood testing with algorithm integration; this is conservative, as approximately 200 million routine blood panels are already drawn annually in the United States. We also assume that the algorithm's performance is comparable to the validated ColonFlag and LungFlag platforms. The formula is Annual Deaths × (Early-Stage Survival Rate minus Late-Stage Survival Rate) × 50% Population Penetration = Estimated Lives Saveable.
Example: Lung cancer. 127,000 Annual Deaths × (60% Early-Stage Survival minus 8% Late-Stage Survival) × 50% Penetration = Approximately 33,000 Lives. We report a range (32,000 to 45,000) to account for uncertainty in algorithm sensitivity, the fraction of cancers detectable in the blood, and variability in screening compliance.
The estimates are deliberately conservative, but full population penetration would roughly double them. These are not projections of what will happen; they are calculations of what is biologically and mathematically achievable if the algorithms are built and deployed.
Where do the survival rates come from?
All survival rates are five-year relative survival rates from the SEER database, as published in the American Cancer Society's Cancer Statistics 2024. SEER collects data from cancer registries covering approximately 48 percent of the United States population and is the gold standard for cancer survival statistics. The rates represent survival among patients diagnosed at each stage, compared to expected survival in the general population of the same age.
Why do you report ranges instead of single numbers?
Because the estimates involve three multiplicative assumptions, each of which carries uncertainty. Algorithm sensitivity varies by cancer type and has been measured precisely only for colorectal and lung cancer. Screening compliance varies across populations and delivery mechanisms. Not all cancers within each category may produce detectable blood signatures early enough for intervention. The ranges reflect this uncertainty honestly; the low end of each range assumes lower algorithm sensitivity and lower compliance. The high end assumes performance comparable to that already demonstrated for colorectal and lung cancer.
Part III: What Would It Cost?
Where does the $2–5 million per algorithm estimate come from?
Algorithm development costs are estimated by analogy to the known costs of the two algorithms already built, adjusted for the assumption that electronic medical record data is contributed by partner institutions at no cost. The cost components for each algorithm are as follows.
Data science team: A team of three to five machine learning researchers and clinical informaticists working for 12 to 18 months. At fully loaded academic or industry salaries, this represents $600,000 to $1.5 million.
Computational resources: Cloud computing for training gradient-boosted models on datasets of hundreds of thousands to millions of patient records. Modern cloud infrastructure makes this relatively inexpensive: $50,000 to $200,000 per algorithm.
Clinical validation: Statistical analysis, clinical interpretation, preparation and submission of peer-reviewed publications, and internal/external validation across independent populations: $200,000 to $500,000 per algorithm.
Regulatory clearance: FDA 510(k) pathway for algorithms analyzing existing laboratory tests (the pathway used by ColonFlag and LungFlag). Preparation, submission, and review: $500,000 to $1.5 million per algorithm, depending on the amount of clinical data required.
Project management and overhead: $200,000 to $500,000 per algorithm.
The total per algorithm thus ranges from approximately $1.5 million (for a straightforward extension of existing methodology to a new cancer with abundant training data) to $5 million (for a cancer requiring novel cross-panel approaches or larger validation studies). The industry benchmark supports this range: diagnostic tests in general cost $50 million to $75 million to develop on average, but the vast majority of that cost is consumed by clinical trials for novel biomarkers or devices. Algorithms analyzing existing laboratory tests, using retrospective data from electronic medical records, bypass most of that expense.
Why is the electronic medical record data "free"?
Large health systems already maintain electronic medical records containing years of laboratory results alongside diagnostic records. The data exists and is stored in databases at institutions like Kaiser Permanente (4+ million members), Geisinger (500,000+ patients), and Maccabi Healthcare Services (2.5 million members). When we say "free," we mean that no new data collection is required. The partner institutions would contribute de-identified datasets under data use agreements, as they did for the original ColonFlag and LungFlag studies. The cost of data extraction, cleaning, and de-identification is included in the per-algorithm estimates above. What is not required is the expensive step of collecting new biological samples or running new laboratory tests, which is the dominant cost in most biomarker discovery programs.
Why does a single cancer drug cost $2.6 billion while all thirteen algorithms cost $30–50 million?
Cancer drug development requires synthesizing and testing novel chemical compounds, conducting Phase I, II, and III clinical trials involving thousands of patients over 12 to 15 years, managing manufacturing scale-up, and covering the costs of the many drug candidates that fail. The $2.6 billion figure (from published analyses of pharmaceutical R&D costs) includes both direct costs ($1.4 billion) and the opportunity cost of capital over the development timeline ($1.2 billion). Diagnostic algorithms bypass almost all of this; they analyze existing data, utilize computational methods rather than physical experiments, can be validated retrospectively before prospective testing, and follow a simpler regulatory pathway. The cost difference is fundamental, not simply about cutting corners. 7
Part IV: Non-Cancer Disease Detection
Why do you claim the same blood tests can detect heart disease, diabetes, and kidney failure?
For the same reason they can detect cancer: chronic diseases change blood chemistry gradually over years, and those changes appear in routine blood panels long before conventional diagnosis. The key difference is that most chronic diseases progress even more slowly than cancer, giving algorithms an even longer detection window.
Chronic kidney disease alters creatinine, blood urea nitrogen, and albumin over a two- to five-year window before conventional diagnostic thresholds are crossed. Type 2 diabetes affects glucose, A1C, triglycerides, and HDL over a three- to five-year prediabetic phase. Cardiovascular risk is reflected in lipid trends, inflammatory markers on the CBC, and metabolic panel values 5 to 10 years before a heart attack or stroke. Each of these patterns has been demonstrated in peer-reviewed research with published accuracy metrics. 14,16,17,18
Where do the 300,000–500,000 "deaths preventable" numbers come from for non-cancer diseases?
The calculation follows the same logic as the cancer estimates: annual deaths from each condition, multiplied by the published efficacy of early intervention, multiplied by 50 percent population penetration. The two largest contributors are cardiovascular disease (700,000 annual deaths, with 30 to 40 percent estimated preventable through earlier detection and intervention) and sepsis (270,000 deaths, with mortality reducible from 30 to 10 percent through earlier recognition and treatment). The Diabetes Prevention Program provides the strongest single data point: lifestyle intervention reduces diabetes incidence by 58 percent in patients identified during the prediabetic phase. For chronic kidney disease, early detection delays dialysis by 5 to 10 years, directly preventing deaths that occur during dialysis-related complications. 14,16,17,18,20,21
How reliable are the accuracy numbers for non-cancer algorithms?
The accuracy metrics cited (AUC of 0.78 to 0.88 depending on the disease) come from published research studies, not from deployed clinical systems. This is an important distinction. The cancer algorithms for colorectal and lung cancer have been validated in prospective, real-world clinical settings across multiple institutions. The non-cancer disease algorithms have been demonstrated in research but not yet deployed at scale. The accuracy numbers are therefore best understood as proof of concept: they establish that the patterns exist and are detectable, but real-world performance will need to be confirmed through the same kind of institutional deployment that validated ColonFlag and LungFlag.
What about the rare diseases: hemochromatosis, Wilson disease, Addison's disease?
Yes, and these are among the most compelling cases precisely because they are so rarely caught early. Each of these diseases produces a distinctive blood chemistry signature that is well documented in medical literature. Hemochromatosis shows rising glucose (as iron damages the pancreas), slightly elevated liver enzymes, and declining calcium, 5 to 10 years before cirrhosis develops. Wilson disease shows disproportionate liver enzyme elevation, with declining albumin and rising bilirubin, three to seven years before neurological damage. Addison's disease shows gradually declining sodium, rising potassium, and downward-trending glucose, 6 to 18 months before adrenal crisis. In each case, simple, inexpensive treatment (phlebotomy for hemochromatosis, chelation for Wilson disease, hormone replacement for Addison's) prevents all serious complications when the disease is caught early. An algorithm reading these patterns would be one of the highest-value, lowest-cost interventions in medicine. 19,20,21,22
Part V: Full-Body MRI as Confirmation
Where does the claim come from that full-body MRI has a false positive rate comparable to mammography?
The Prenuvo Polaris study, presented at the American Association for Cancer Research, scanned 1,011 asymptomatic individuals at a single clinic in Canada. Of the 1,011 scanned, 41 were referred for biopsy based on MRI findings. Of those 41 biopsies, 21 (51 percent) confirmed cancer or a mass requiring treatment. Compare that to mammography, where published data show that 39 to 50 percent of biopsied findings turn out to be cancer. The biopsy yield from full-body MRI is therefore equivalent to or better than that from mammography, the screening test most widely accepted in medicine. The negative predictive value of the Prenuvo scan was 99.8 percent: among participants whose scans were clear, almost all remained cancer-free for at least one year of follow-up.
Is a single study of 1,011 people enough evidence?
It is a starting point, not a final answer. The Polaris study is the first prospective dataset from asymptomatic individuals undergoing whole-body MRI screening. Its statistical power is limited by sample size, and its generalizability is limited by its single-center design. However, Prenuvo has launched Project Hercules, a 100,000-person study across multiple locations designed to validate and extend these findings. Single-center studies are how medical progress begins. The first National Lung Screening Trial, which established low-dose CT screening, started with a single randomized trial. The Polaris data is compelling enough to justify further study and carefully designed pilot programs, which we recommend.
What about the cancers with no existing screening test?
This is the critical gap that full-body MRI fills. For the six cancers on our list with established screening follow-ups (colorectal, lung, liver, leukemia, lymphoma, and thyroid), the algorithm flag triggers a specific next step: colonoscopy, low-dose CT, liver ultrasound, or specialist referral. For the remaining cancers (kidney, ovarian, pancreatic, bladder, esophageal, gastric, and multiple myeloma), no targeted screening test currently exists. These are precisely the cancers that kill most efficiently because they are found only after symptoms develop. Full-body MRI can detect solid tumors as small as 1 to 2 centimeters across multiple organs in a single scan. The Polaris study found that 68 percent of cancers detected were types for which no screening test currently exists. This makes full-body MRI the logical next step after a blood algorithm indicates increased risk in a cancer category without an established screening pathway.
Part VI: Anticipated Objections
These estimates assume algorithms that don't exist yet. Isn't this speculative?
Two of the 13 cancer algorithms are built, validated, and deployed in clinical practice. Two more have been validated in large studies and are ready for deployment. The methodology for building the remaining nine is identical to that used to produce the first four. The training data exist in the same electronic medical records. The biological mechanisms are documented in peer-reviewed literature. Calling the remaining algorithms "speculative" would be like saying in 1970 that a vaccine for measles, mumps, and rubella was speculative because only the measles vaccine had been developed. The methodology was established; the extension was a matter of engineering, not invention.
What if the algorithms don't perform as well for new cancers?
This is why we report ranges rather than point estimates and describe our projections as conservative. If a new algorithm achieves an AUC of 0.75 instead of 0.82, it still identifies patients at substantially elevated risk who would benefit from targeted screening. The lives-saveable estimates would decrease, but the fundamental value proposition remains: any algorithm that shifts diagnosis from late stage to early stage saves lives. The question is not whether the algorithms will be perfect. The question is whether they will be better than the current approach, which, for most of these cancers, is no screening at all.
Who would pay for this?
The development program ($30 to $50 million total) could be funded through multiple mechanisms: federal research grants through the National Cancer Institute, philanthropic investment, health system partnerships, or industry partnerships with diagnostic companies (as Roche and Siemens have already demonstrated with the existing cancer algorithms). The per-patient cost of running the algorithms once deployed is near zero: the blood is already being drawn, the data already exist in electronic medical records, and the computational cost of running a trained algorithm on existing data is measured in fractions of a cent per patient. The economic return is overwhelming: preventing even 10,000 late-stage cancer diagnoses per year saves $2 to $5 billion in treatment costs.
Why haven't hospitals done this already?
Three reasons. First, the proof of concept is recent: the first cancer detection algorithm was published in 2016, and the first real-world deployment results appeared from 2017 to 2021. Second, healthcare moves slowly by design, and the regulatory, reimbursement, and workflow integration challenges are real, though solvable. Third, the incentive structure in United States healthcare rewards treatment over prevention. A hospital that prevents cancer makes less money than a hospital that treats it. This is a structural problem that policy intervention can address. The technology is not the bottleneck. The will to deploy it is.
References
1. Kinar Y, Kalkstein N, Akiva P, et al. "Development and Validation of a Predictive Model for Detection of Colorectal Cancer in Primary Care by Analysis of Complete Blood Counts: A Binational Retrospective Study." Journal of the American Medical Informatics Association 23:879-890 (2016). https://doi.org/10.1093/jamia/ocv195
ColonFlag development and validation study across 606,403 Israeli and 30,674 UK patients achieving AUC 0.82 for colorectal cancer prediction from CBC parameters alone, with odds ratios of 26 (Israel) and 40 (UK) at 0.5% false positive rate and 6- to 24-month detection lead time.
2. Gould MK, Huang BZ, Tammemagi MC, et al. "Machine Learning for Early Lung Cancer Identification Using Routine Clinical and Laboratory Data." American Journal of Respiratory and Critical Care Medicine 204:445-453 (2021). https://doi.org/10.1164/rccm.202007-2791oc
LungFlag validation across 6,505 non-small cell lung cancer cases at Kaiser Permanente achieving AUC 0.856 with 9- to 12-month detection lead time, demonstrating that CBC patterns identify lung cancer before imaging would detect a mass.
3. Kwok KN, Chueng KM, Lam SJL, et al. "Development of a Novel Routine Blood-Based AI Model for Hepatocellular Carcinoma Screening: A Territory-Wide Study." ESMO Gastrointestinal Oncology 10: 100241 (2025). https://www.esmogastro.org/article/S2949-8198(25)00110-4/fulltext
Hong Kong territory-wide validation of CBC/LFT/RFT-based AI model for liver cancer detection across 75,000+ patients; achieved 80% sensitivity and 81% specificity, outperforming AFP in early-stage detection.
4. Wong TCB, Lam SJL, Cheung KM, et al. "AI Blood Signature in Common Blood Tests for Detection of Gastric Cancer in a Cohort of 190,000 Individuals." Journal of Clinical Oncology 41(16_suppl), Abstract 1500 (2023).
193,000-patient Hong Kong validation of machine learning model for gastric cancer detection using CBC, LFT, and RFT values, achieving 79 to 96% sensitivity.
5. Singh V, Chaganti S, Siebert M, et al. "Deep Learning-Based Identification of Patients at Increased Risk of Cancer Using Routine Laboratory Markers." Scientific Reports 15:12661 (2025). https://doi.org/10.1038/s41598-025-97331-6
Siemens Healthineers "Deep Profiler" model validated for simultaneous colorectal (AUC 0.76), liver (AUC 0.85), and lung cancer (AUC 0.78) detection from 33 standard CBC and CMP parameters.
6. Siegel RL, Giaquinto AN, Jemal A. "Cancer Statistics, 2024." CA: A Cancer Journal for Clinicians 74:12-49 (2024). https://doi.org/10.3322/caac.21820
Comprehensive annual cancer statistics documenting U.S. cancer mortality by type, stage distribution, and 5-year survival rates. Foundational source for all mortality and stage-shift survival calculations in this document.
7. DiMasi JA, Grabowski HG, Hansen RW. "Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs." Journal of Health Economics 47:20-33 (2016). https://doi.org/10.1016/j.jhealeco.2016.01.012
Cancer drugs average $2.6 billion total development cost.
8. Hornbrook MC, Goshen R, Choman E, et al. "Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data." Digestive Diseases and Sciences 62:2719-2727 (2017). https://pubmed.ncbi.nlm.nih.gov/28836087
Kaiser Permanente validation of ColonFlag demonstrates an odds ratio of 34.7 at 99% specificity. Among flagged patients completing colonoscopy, 8% had cancer, 22% had advanced adenomas.
9. American Cancer Society. "Cancer Facts & Figures 2024." American Cancer Society (2024). https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/2024-cancer-facts-figures.html
Comprehensive epidemiological data on cancer incidence, mortality, and survival in the United States. Source for individual cancer death counts used throughout this document.
10. Sharma C, Eltawil KM, Renfrew PD, et al. "Advances in Diagnosis, Treatment and Palliation of Pancreatic Carcinoma: 1990-2010." World Journal of Gastroenterology 17:867-897 (2011). https://doi.org/10.3748/wjg.v17.i7.867
Documents the pre-diagnostic blood signature of pancreatic cancer: alkaline phosphatase and GGT rising from bile duct compression, bilirubin increasing, albumin declining, and glucose rising from pancreatic cell damage, 6 to 18 months before clinical diagnosis.
11. Kyle RA, Gertz MA, Witzig TE, et al. "Review of 1027 Patients with Newly Diagnosed Multiple Myeloma." Mayo Clinic Proceedings 78:21-33 (2003). https://doi.org/10.4065/78.1.21
Documents the distinctive cross-panel blood signature preceding myeloma: rising total protein, rising calcium, rising creatinine, declining hemoglobin, and declining albumin, appearing 2 to 5 years before clinical diagnosis.
12. American Cancer Society. "Key Statistics for Leukemia" and "Key Statistics for Non-Hodgkin Lymphoma." American Cancer Society (2024). https://www.cancer.org/cancer/types/non-hodgkin-lymphoma.html
Projects 62,770 new leukemia cases (23,670 deaths) and 89,190 new lymphoma cases (21,050 deaths) in 2024. Notes CBC as primary diagnostic tool for hematologic malignancies.
13. Hinestrosa JP, Kurzrock R, Lewis JM, et al. "Early-Stage Multi-Cancer Detection Using an Extracellular Vesicle Protein-Based Blood Test." Communications Medicine 2:29 (2022). https://doi.org/10.1038/s43856-022-00088-6
Pilot study demonstrating that EV protein biomarkers isolated from plasma can detect Stage I and II pancreatic, ovarian, and bladder cancers simultaneously, achieving AUC 0.95 with 71.2% sensitivity at 99.5% specificity.
14. Ambale-Venkatesh B, Yang X, Wu CO, et al. "Cardiovascular Event Prediction by Machine Learning: The Multi-Ethnic Study of Atherosclerosis." Circulation Research 121:1092-1101 (2017). https://doi.org/10.1161/CIRCRESAHA.117.311312
Machine learning analyzing lipid panels, CMP values, and inflammatory markers predicted cardiovascular events 5 to 10 years before occurrence with AUC 0.78 to 0.82, outperforming the Framingham Risk Score.
15. Khera AV, Won HH, Peloso GM, et al. "Diagnostic Yield and Clinical Utility of Sequencing Familial Hypercholesterolemia Genes in Patients With Severe Hypercholesterolemia." Journal of the American College of Cardiology 67:2578-2589 (2016). https://doi.org/10.1016/j.jacc.2016.03.520
Documents the distinctive lipid signature of familial hypercholesterolemia and demonstrates that early statin therapy prevents premature heart attacks in the 30s and 40s.
16. Knowler WC, Barrett-Connor E, Fowler SE, et al. "Reduction in the Incidence of Type 2 Diabetes with Lifestyle Intervention or Metformin." New England Journal of Medicine 346:393-403 (2002). https://doi.org/10.1056/NEJMoa012512
Landmark Diabetes Prevention Program: 7% weight loss plus 150 minutes weekly exercise reduced diabetes incidence by 58% in prediabetic patients, establishing that early identification enables prevention.
17. Razavian N, Blecker S, Schmidt AM, et al. "Population-Level Prediction of Type 2 Diabetes from Claims Data and Analysis of Risk Factors." Big Data 3:277-287 (2015). https://doi.org/10.1089/big.2015.0020
Deep learning using routine lab values predicted diabetes onset 3 to 5 years before diagnosis with AUC 0.80, identifying high-risk patients when glucose remained below diagnostic thresholds.
18. Tangri N, Grams ME, Levey AS, et al. "Multinational Assessment of Accuracy of Equations for Predicting Risk of Kidney Failure: Meta-Analysis." Journal of the American Medical Association 315:164 (2016). https://doi.org/10.1001/jama.2015.18202
Machine learning predicting CKD progression from routine CMP values achieved C-statistics of 0.83 to 0.88, enabling Stage 1 to 2 detection years before conventional diagnosis.
19. Roberts EA, Schilsky ML. "Diagnosis and Treatment of Wilson Disease: An Update." Hepatology 47:2089-2111 (2008). https://doi.org/10.1002/hep.22261
Documents Wilson disease liver function pattern appearing 3 to 7 years before neurological damage. Early chelation therapy is fully preventive of neurological complications.
20. Erichsen MM, Lovas K, Skinningsrud B, et al. "Clinical, Immunological, and Genetic Features of Autoimmune Primary Adrenal Insufficiency." Journal of Clinical Endocrinology & Metabolism 94:4882-4890 (2009). https://doi.org/10.1210/jc.2009-1368
Documents Addison's disease CMP pattern: declining sodium, rising potassium, and hypoglycemia appearing 6 to 18 months before adrenal crisis.
21. Powell LW, Seckington RC, Deugnier Y. "Haemochromatosis." Lancet 388:706-716 (2016). https://doi.org/10.1016/S0140-6736(15)01315-X
Hemochromatosis creates subtle metabolic CMP patterns 5 to 10 years before cirrhosis. Simple phlebotomy initiated early prevents all organ complications.
22. Nieman LK, Biller BMK, Findling JW, et al. "The Diagnosis of Cushing's Syndrome: An Endocrine Society Clinical Practice Guideline." Journal of Clinical Endocrinology & Metabolism 93:1526-1540 (2008). https://doi.org/10.1210/jc.2008-0125
Documents Cushing's syndrome CMP and lipid signature appearing years before diagnosis: rising glucose, declining potassium, rising triglycerides, and falling HDL.
23. Siddiqui MS, Yamada G, Vuppalanchi R, et al. "Diagnostic Accuracy of Noninvasive Fibrosis Models to Detect Change in Fibrosis Stage." Clinical Gastroenterology and Hepatology 17:1877-1885 (2019). https://doi.org/10.1016/j.cgh.2018.12.031
Machine learning analyzing AST/ALT ratios, albumin, bilirubin, and platelet counts detected NAFLD progression to fibrosis 2 to 4 years earlier than conventional criteria, with AUC 0.82.
24. Lindor KD, Bowlus CL, Boyer J, et al. "Primary Biliary Cholangitis: 2018 Practice Guidance from the American Association for the Study of Liver Diseases." Hepatology 69:394-419 (2019). https://doi.org/10.1002/hep.30145
Documents PBC's distinctive liver function signature years before cirrhosis: gradually rising alkaline phosphatase, increasing GGT, elevated cholesterol, and slowly rising bilirubin.