Life expectancy is an underused1,2 but critically important piece of information for medical decision-making.3,4 Numerous clinical guidelines recommend incorporating life expectancy into cancer screening decisions5,6 since screening exposes patients to burdens and risk immediately while the benefits of decreased cancer mortality occur 10 years later.7,8,9,10 Screening patients who have <10 years of life expectancy exposes them to the potential harms of screening with little chance of benefit.6,10 To increase the rates of guideline-concordant cancer screening, there is a clear need for accurate, easy-to-use life expectancy calculators to appropriately target cancer screening.

Existing methods for life expectancy estimation have clear limitations that impair their utility to guide screening decisions. Clinicians often rely on their training and intuition to estimate life expectancy, but studies have shown that clinicians frequently overestimate life expectancy.11 Objective, data-driven mortality prediction indexes have been developed and are available at websites such as ePrognosis.com.12,13,14,15,16,17,18 However, these mortality indexes are underused,1,2,3 likely due in part to the substantial time burden needed for collecting and inputting patient data into an online calculator.19

A life expectancy calculator embedded within the EHR would obviate the need for clinicians’ data entry to determine prognosis. While current clinical alerts are triggered by age, this EHR-embedded life expectancy calculator could suppress cancer screening reminders for patients with a limited life expectancy regardless of age and trigger cancer screening reminders for older patients with life expectancy ≥10 years.5 Existing mortality prediction indexes include self-reported physical functioning, which is not routinely captured in the EHR system. Thus, developing new EHR-based life expectancy calculators is a critical first step to make it easier for clinicians to provide guideline-concordant care.

As the largest vertically integrated healthcare system in the US, the Veterans Affairs (VA) is an ideal setting to develop EHR-based life expectancy calculators to better target CRC screening. First, the VA has had a longstanding focus on preventive care.20 Second, studies have found high rates of colorectal cancer screening among Veterans with high comorbidity burden despite the potential for harm in this population,21 suggesting that targeting colorectal cancer screening through the use of life expectancy may be especially important within VA. Finally, VA’s EHR provides an ideal setting for life expectancy calculator development, with a large population with >10 years of follow-up and a wide variety of clinical data elements, including pharmacy, laboratory, and vital sign data.

Our objective in this study was to use the Veterans Affairs (VA) EHR database to build a life expectancy calculator for adults aged ≥50 years. We sought to utilize EHR structured data elements to develop a model with approximately 100 predictors to accurately estimate life expectancy.

METHODS

Study Population

We used the VA EHR databases to identify all Veterans aged ≥50 using VA primary care in 2005, with mortality follow-up through 2017. The first primary care visit during 2005 was designated as the index visit for each patient; we then used a 1-year “look-back” period prior to the index visit to capture six domains of predictor variables: demographics, disease diagnoses, medication use, laboratory results, vital signs, and healthcare utilization. Of the 3,971,825 VA patients aged ≥50 with outpatient visits during 2005, 3,708,465 had at least one primary care visit, 3,343 (0.09%) had an invalid date of death prior to the index primary care visit, resulting in a final sample size of 3,705,122. We separated our data into an 80% random sample for model development (n=2,964,098) and the remaining 20% for model validation (n=741,024).

Predictor Assessment

For disease diagnoses, we collected all ICD-9 diagnosis codes from the inpatient and outpatient medical SAS files and applied the Clinical Classifications Software (CCS)22 to create 281 diagnosis groups. The inpatient and outpatient medical SAS files are constructed by compiling patient-level EHR data from 170 local VA Medical Center EHR systems to a centralized data warehouse.23,24 For medication use, we used the Veteran Health Administration’s National Drug File (NDF) to identify 401 unique medication classes from the Pharmacy Benefits Management file and classified use as any or none in the past year. For each of the 71 laboratory tests, we created a five-level variable: normal, abnormal low, abnormal high, nonsensical values, or not measured. Data on seven vital signs were extracted from the inpatient and outpatient medical SAS files: pulse, temperature, systolic blood pressure, respiration, pain level, BMI (weight and height), and pulse oximetry. In addition, because weight changes can be strongly indicative of poor health,25,26,27,28 we a priori calculated two variables: weight range as the largest difference between weights recorded in the year prior to the index date, and recent weight change as the difference between the most recent weight and the weight closest to two weeks prior to the index date. For healthcare utilization, we created 149 types of healthcare visits according to visit stop code and categorized visit frequency as zero, one, or at least two visits. Demographic variables (age, gender, and race/ethnicity) were extracted from the Clinical Data Warehouse. Race/ethnicity was included in descriptive characteristics but was not incorporated into model building. We did not want the observed shorter life expectancies of minority groups29,30 to influence treatment decisions that could potentially exacerbate existing racial/ethnic inequalities.31 In total, we obtained 913 predictors from the EHR to use for model-building.

For missing demographics (other than age) and vital signs (other than pulse oximetry and pain), we conducted a single stochastic conditional mean imputation, using a regression equation with all variables with any missingness included. Variables with missingness were gender (n=40,615, 1.1%), race/ethnicity (n=233,221, 6.3%), pulse (n=221,574; 6.0%), temperature (n=518,032; 14.0%), respiration (n=449,069; 12.1%), weight range (n=967,463; 26.1%), recent weight change (978,289; 26.4%), systolic blood pressure (n=215,821; 5.8%), and BMI (626,569; 16.9%).

Prior to modeling, we considered the functional form of predictor variables. We used a restricted cubic spline with 4 knots for age. We categorized BMI as <18.5, 18.5–25, 25–30, 30–35, >35, systolic blood pressure as <90, 90–104, 105–119, 120–139, 140–159, 160–179, ≥180, and pulse oximetry as <93, 93–95, ≥96, or no measurement, following commonly used clinical thresholds.

Outcome Assessment

We used the Veterans Health Administration’s Vital Status File32 to identify all deaths through December 31, 2017, an approach with high accuracy compared to the National Death Index (98% sensitivity and 98% specificity).33

Statistical Analysis

We first examined baseline characteristics of the development and validation cohorts. Using the development cohort (n=2,964,098), we built our life expectancy prediction model in two phases: variable selection and model coefficient estimation. Two phases were needed because current statistical software does not allow least absolute shrinkage and selection operator (LASSO) regression in parametric survival models.

In the first variable selection phase of model development, we applied a LASSO Cox proportional hazards regression with a BIC-optimized lambda. To determine this lambda, we evaluated a sequence of 100 lambda values and chose the lambda that gave us the lowest BIC with <100 predictors. We sought to build a model with approximately 100 predictors after considering several factors. First, while there is little penalty for including additional predictors in EHR models (since additional data collection or data entry is not required), models with greater number of predictors are more likely to be overfit,34 leading to reduced generalizability to different time periods or slightly different patient populations. In addition, we found little difference in discrimination between models with all 913 predictors and 80 predictors. Thus, a ~100 predictor model appears to best balance optimizing model discrimination with maximizing model generalizability.

In the second coefficient estimation phase of model development, we used the variables selected from LASSO Cox regression and fit a Gompertz survival model in the development cohort (n=2,964,098). We chose a parametric survival model to allow for life expectancy predictions beyond the observed follow-up time. Of the numerous parametric survival distributions, Gompertz is most appropriate for human mortality curves because it fits the observed doubling in mortality rate with each decade of life.35

To internally validate our model in the validation cohort (n=741,024), we calculated the integrated area under the curve (iAUC),36,37 which is a weighted average of the areas under the time-varying receiver operator characteristic curves at all possible cutpoints. We graphed the iAUC to display the AUC over time and at three time points of interest: 2 years, 5 years, and 10 years. We evaluated model calibration by creating ventiles (20 bins) of predicted life expectancy and comparing the mean of the predicted 10th percentile of life expectancy within each group to the observed 10th percentile of life expectancy within each group (time to 10% mortality).

To identify which domains of predictors (demographics, diseases, medications, labs, vital signs, and healthcare utilization) most strongly contributed to discrimination in the final model, we evaluated a series of models. First, we fit a Gompertz model with each domain of predictors alone, without any other predictors. Since demographics are known to have strong predictive power, we next fit models with demographics plus each domain of predictors separately. Finally, we evaluated the effect of removing one domain of predictors from the full model, fitting models with 5 of 6 domains. We fit all models in the development cohort and calculated the iAUC in the validation cohort.

To provide an example of the output a clinician would see from this life expectancy calculator, we generated two test cases, “Mr. Unhealthy” and “Mr. Healthy,” taken from individuals at the 10th and 80th percentiles of predicted median life expectancy, respectively. For each test case, we presented the 25th, 50th, and 75th percentile of predicted life expectancy, as well as the predicted probability of mortality within 2 years, 5 years, and 10 years. To provide an interpretation, the 25th percentile of life expectancy means that the individual has a 25% chance of dying before that time, while the median life expectancy is the time at which an individual has a 50/50 chance of being alive. All 95% confidence intervals for life expectancy estimates were ≤0.1 year and thus were not presented.

We conducted sensitivity analyses to ensure results were similar across clinically important subgroups. We examined iAUC and calibration plots across subgroups defined by age, gender, race/ethnicity, and presence of cancer.

RESULTS

There were 2,964,098 patients in the development cohort and 741,024 patients in the validation cohort. Baseline characteristics were essentially the same in the two cohorts (Table 1). The mean age was 68.0 years, 97% were male, 85% were white, and 11% were black. The mean BMI was 28.9. Over half of patients had ≥8 diseases and 42% had ≥6 medications. During a median of 12.1 years of follow-up, 49% of patients died (1,453,767 in the development cohort and 362,934 in the validation cohort).

Table 1 Baseline Characteristics of Veteran’s Affairs Primary Care Patients Aged 50+ in 2005, n=3,705,122

The final life expectancy calculator had 93 predictors from six domains: 2 demographic predictors (age and sex), 39 disease predictors, 28 medication predictors, 14 laboratory predictors, 4 vital sign predictors, and 6 healthcare utilization predictors (Supplemental Table S1). The iAUC was 0.816 in the development cohort and 0.816 in the validation cohort (95% confidence interval in the validation cohort: 0.815, 0.817). The AUC was consistently around 0.8 throughout follow-up time in the validation cohort (Figure 1), with the lowest value of 0.803 at 2 years and increasing to 0.809 at 5 years and 0.830 at 10 years.

Figure 1.
figure 1

AUC over time from life expectancy calculator. overall, the iAUC was 0.8162 (0.8153, 0.8171)

Several predictors had a counterintuitive association with mortality. For example, osteoarthritis, benign neoplasms, and upper respiratory infections were all associated with decreased mortality risk and longer life expectancy. To determine the relative importance of these counterintuitive factors, we conducted sensitivity analyses examining the discrimination of models with these 3 factors removed and found that removing these factors led to trivial declines in iAUC (0.8162 full model, 0.8156 model without 3 counterintuitive factors) (Supplemental Table S2). Since our primary goal was to maximize discrimination and previous prediction studies have also shown counterintuitive factors,38 we elected to retain our counterintuitive factors in our prediction model.

The model was well calibrated in the validation cohort. Across 20 bins of predicted life expectancy, the observed time to 10% mortality aligned with the predicted 10th percentile of life expectancy (Figure 2). For example, in the 15th ventile, the predicted 10th percentile of life expectancy was 5.2 years and the observed time to 10% mortality was 5.5 years. For individuals with predicted median life expectancy <12 years, predicted and observed time to 50% mortality was well aligned. In the 6th ventile, the predicted median life expectancy was 9.2 years and the observed time to 50% mortality was 8.6 years.

Figure 2.
figure 2

Calibration plot of predicted vs. observed remaining life expectancy. Each point represents a ventile (1/20) of the validation cohort, representing 37,051 patients. In panel A, 3 lowest-risk ventiles could not be shown since <10% of patients in those ventiles had died by the end of follow-up. In panel B, 9 lowest-risk ventiles could not be shown since <50% of patients in those ventiles had died by the end of follow-up

In subgroups defined by age, gender, race/ethnicity, and presence of cancer, the life expectancy calculator showed good discrimination and calibration (Supplemental Table S3). Across age groups, discrimination was good but notably lower than the overall sample, likely due to strong predictive power of age (range of iAUC stratified on age: 0.743–0.776). Discrimination was very similar by race and presence of cancer, while discrimination was better in women (iAUC: 0.860) than men (iAUC: 0.814). Calibration was excellent by gender, race/ethnicity, and presence of cancer (Supplemental Figure S1); within age groups, calibration was excellent for ages ≥65 and good for ages 50–64.

As expected, diseases and demographics contributed highly to discrimination (Figure 3). Including only age and gender produced a model with an iAUC of 0.710, including only diseases produced an iAUC of 0.725, while including only healthcare utilization produced an iAUC of 0.516. When adding to a demographics-only model, the iAUC was highest when including diseases (0.790). Removing demographics from the full model had the largest impact (iAUC drop of 0.0403 to 0.7760), while removing healthcare utilization had a negligible impact (iAUC drop of 0.0007 to 0.8156).

Figure 3
figure 3

The contributions of different domains of variables to iAUC in the validation cohort

In the example output for “Mr. Healthy” (Table 2), predicted 25th percentile life expectancy was 13.7 years and predicted median life expectancy was 20.7 years, indicating that cancer screening would be recommended. In contrast, Mr. Unhealthy’s predicted median life expectancy is only 5.4 years, and his 75th percentile life expectancy is 8.8 years, indicating that cancer screening would be more likely to harm than help.

Table 2 Sample Output from Life Expectancy Prediction Calculator: Life Expectancy Prediction and Predicted Probability of Mortality

DISCUSSION

We developed and internally validated a life expectancy prediction calculator using structured data from the VA EHR; our final model included 93 predictors and had an iAUC of 0.816. As expected, diseases and demographics were the strongest contributors to high discrimination. Our model demonstrated good calibration overall and across demographic subgroups.

Our life expectancy calculator performs comparably to other long-term mortality risk tools that had C-statistics around 0.8,12,13,14,16,39 but has two notable differences. First, our life expectancy calculator used EHR data while other mortality risk tools relied on self-reported data from questionnaires.12,13,14,15,16,40 Using structured data elements from the EHR, we were able to consider nearly one thousand predictors rather than several dozen predictors considered in other models. Our life expectancy calculator included many more predictors than other indexes because it was designed to be embedded within the EHR, unlike previous mortality risk indexes that rely on self-reported data and clinician data entry. To our knowledge, only the Mathias mortality risk prediction equation used EHR data from a single multi-specialty group practice; they applied predictive mining to select 24 predictors from 980 possible predictors in approximately 8,000 patients.39

The second major difference is that our calculator predicts remaining life expectancy, while other indexes generally only predict mortality risk at one specific time point. Most mortality risk prediction tools are for short-term mortality (≤5 years);39,40 only a few predict ≥10-year mortality risk, which is needed for cancer screening.13,16,18 Of other indexes, only the Lee index predicts life expectancy.15 Remaining life expectancy is difficult to predict because it requires extended follow-up time and relatively high cumulative mortality to generate accurate predictions. The benefit is that life expectancy is more intuitive for patients and provides a more complete picture than mortality risk at a single time.

Our model had better discrimination among females, who were approximately 3% of the study sample, compared to males. We believe this may be due to the fact that women in our VA population were substantially younger than men (mean age 63 vs. 68 for men). Previous studies have shown that measures of discrimination are higher for mortality indexes in younger populations compared to older populations.41 Ultimately, it is reassuring that the model performs well in women, even though they were only 3% of the data, suggesting that our life expectancy predictions can be used to target breast cancer screening.

Consistent with other mortality prediction tools, we did not incorporate race/ethnicity into our life expectancy calculator.12,14,16 While non-white individuals tend to have shorter life expectancies,29,30 prediction algorithms incorporating race/ethnicity may perpetuate and exacerbate existing disparities and racism in the medical system.31 However, excluding race/ethnicity from algorithms can still lead to racially biased models,42 and machine learning approaches may be particularly prone to encode racial/ethnic bias in the healthcare system.43,44,45 We took numerous steps to minimize potential racial bias, including manual variable cleaning and specification (e.g., assigning all laboratory values to one of five categories). We welcome the development of “algorithmic stewardship” that will advance the equitable development and use of prediction tools in clinical medicine.46,47

Uncertainty is inherent in the prediction of all future events, and we have quantified this uncertainty in the interquartile range for an individual’s predicted life expectancy. While predictions of “very high” risk or “almost no” risk are easier to understand, communicate, and act upon, many modern clinical decisions rely on modest risk of future events, including predicted cardiovascular risk for statin medications48 and predicted fracture risk for osteoporosis medications.49 We believe it is critical to acknowledge the uncertainty and to have this uncertainty inform the acceptable level of risk and possibility of benefit.50 While some patients may want to make decisions based on the most likely scenario, others may want to continue with treatments even if they have a small chance of benefitting. Eliciting and clarifying these preferences is a core component of shared decision-making.51 Explicitly noting the prediction uncertainty will enable nuanced discussions between clinicians and patients.

Our work has clear implications. Programming our life expectancy calculator into the VA EHR can improve guideline-concordant care for numerous preventive therapies with different times to benefit, acting at both the patient level and at the system level. For an individual patient, the median life expectancy provides the best estimate of the remaining lifetime. From a systems perspective, integration into the EHR will enable widespread uptake as well as tailoring of clinical decision support systems. For example, clinical reminders for colorectal cancer screening can be suppressed for patients with a life expectancy <10 years instead of using age-based cutoffs, thereby reducing inappropriate screening among younger sicker patients. Similarly, clinical reminders could be triggered for adults who are outside of the usual age range for colorectal cancer screening but have >10 year life expectancy and thus may benefit from screening.52,53 Using life expectancy rather than age will also help minimize unintended harms of quality measures.21 This life expectancy calculator can be replicated in other EHR settings to improve guideline-concordant care beyond the VA.

Our study had noteworthy strengths. First, we were able to evaluate nearly 1000 predictor variables in 3.7 million Veterans. Second, we used state-of-the-art methods (LASSO Cox regression and Gompertz regression) to build our final model, after determining that machine learning methods would provide not clinically significant improvements in model performance.54,55 Finally, we demonstrated that our model was well-calibrated and had high discrimination in numerous subgroups.

Our study had limitations. First, the performance of our life expectancy calculator outside of the VA population is unclear; future studies should validate and/or develop new EHR-based prediction indexes in non-VA populations, particularly given that women made up only 3% of our sample. Our calculator can be translated to other EHR systems using the OMOP Common Data Model.56 Second, we were not able to incorporate other risk factors that influence mortality, including social risk factors and physical function. Third, we were only able to incorporate healthcare utilization within the VA system, while some Veterans receive care at outside medical institutions. Fourth, life expectancies >12 years are model extrapolations to be interpreted with caution. However, these extrapolations will have little impact on clinical utility, since any life expectancy >10 years indicates likely to benefit from cancer screenings. Fifth, the presence of counterintuitive factors may concern some users and slow widespread adoption. However, as “black box” prediction algorithms (i.e., neural networks for breast cancer histology)57 become more common in clinical medicine, we believe that users will increasingly focus more on predictive accuracy and less on model transparency.58 Finally, predictions may not be accurate for patients with rare diseases.

In conclusion, we developed and internally validated a life expectancy calculator with excellent discrimination and calibration using EHR data for Veterans aged ≥50 years. With excellent calibration across 1 to 10 years of mortality prediction, our life expectancy calculator could be useful for a wide variety of clinical decisions, such as statin treatment and glycemic control. Accurate estimates of life expectancy are critical for breast and colorectal cancer screening and building in such estimates into the EHR system will be a critical foundational step in improving the rates of individualized, guideline-concordant care.