Original Contribution

Machine Learning on High-Dimensional Data to Predict Bleeding Post Percutaneous Coronary Intervention

Corbin Rayfield, MD1*; Pradyumna Agasthi, MD1*; Farouk Mookadam, MBBCh1; Eric H. Yang, MD1; Nithin R. Venepally, MBBS1; Harish Ramakrishna, MD2; Piotr Slomka, PhD3; David R. Holmes Jr, MD4; Reza Arsanjani, MD1

Corbin Rayfield, MD1*; Pradyumna Agasthi, MD1*; Farouk Mookadam, MBBCh1; Eric H. Yang, MD1; Nithin R. Venepally, MBBS1; Harish Ramakrishna, MD2; Piotr Slomka, PhD3; David R. Holmes Jr, MD4; Reza Arsanjani, MD1

Abstract: Introduction. The purpose of the current study is to determine the accuracy of machine learning in predicting bleeding outcomes post percutaneous coronary intervention (PCI) in comparison with the American College of Cardiology CathPCI bleeding risk (ACC-BR) model. Methods. Mayo Clinic CathPCI registry data were retrospectively analyzed from January, 2003 to June, 2018, including 15,603 patients who underwent PCI. The cohort was randomly divided into a training sample of 11,703 patients (75%) and a unique test sample of 3900 patients (25%) prior to model generation. The risk-prediction model was generated utilizing a boosted classification tree algorithm of 105 unique variables to predict the risk of major and minor bleeding complications within 72 hours after PCI or before hospital discharge. The receiver operating characteristic (ROC) curves and areas under the curve (AUC) for the boosted classification tree algorithm (AI-BR) model and ACC-BR model were compared for the test cohort. Results. The mean age of the patient cohort was 67 ± 12.7 years, and women constituted 30% of the cohort. The rate of major bleeding complications in the entire cohort was 1.8%. The sensitivity and specificity of the AI-BR model were 77.3% and 80.9%, respectively. The ROC-AUC for the AI-BR model (0.873) was superior vs the ACC-BR model (0.764; P=.02) in predicting major bleeding for the test cohort. Conclusion. The AI-BR model accurately predicts bleeding post PCI and outperforms the ACC-BR model in predicting the risk of bleeding in patients undergoing PCI. 

J INVASIVE CARDIOL 2020;32(5):E122-E129.

Key words: bleeding, percutaneous coronary Intervention, postprocedural complications, prediction model

Percutaneous coronary intervention (PCI) is the most common invasive cardiac procedure for treatment of patients with coronary artery disease (CAD). In the current era of antiplatelet and antithrombotic therapy, bleeding has become the most commonly encountered early complication following PCI.1 Postprocedural bleeding is associated with short- and long-term death, non-fatal myocardial infarction, stroke, blood transfusion, prolonged hospital stay, rehospitalization, and increased hospital costs.2-4 Several bleeding-avoidance strategies, such as bivalirudin or radial approach, have been proposed to reduce periprocedural bleeding among higher-risk patient groups;5-8 however, these strategies are frequently not utilized among patients with the highest bleeding risk.9 Since bleeding can result in adverse postprocedure outcomes, several bleeding risk models have been developed. Currently, the American College of Cardiology CathPCI Bleeding Risk (ACC-BR) calculator is the most commonly used model for predicting postprocedural bleeding. 

Machine learning is a subfield of computer science that focuses on pattern recognition and computational artificial intelligence; it provides the ability to automatically learn and improve the experience without being explicitly programmed, and has been previously utilized to predict the presence of obstructive CAD and revascularization in patients undergoing single photon emission computed tomography (SPECT).10,11 These algorithms have been shown to outperform traditional statistical methods. 

The aim of this study is to investigate whether bleeding risk can be effectively predicted using a machine learning algorithm and compare the diagnostic accuracy of the artificial intelligence bleeding risk (AI-BR) model with the currently used ACC-BR.


Study population. Retrospective data from the Mayo Health Clinic Systems across four sites (La Crosse, Wisconsin; Mankato, Minnesota; Rochester, Minnesota; and Phoenix, Arizona) for 21,872 patients who underwent PCI between January 2006 and December 2017 were obtained. Eighty-six variables (Table 1) encompassing clinical and patient demographics were recorded for each patient and evaluated by the predictive model. After the entire cohort was obtained, patients were removed from the patient cohort if any of the 86 variable data points, including bleeding data, were missing. The final cohort included 15,604 patients. 

Definitions and outcomes. The primary outcome that the model sought to predict was bleeding within 72 hours of PCI and prior to hospital discharge. Bleeding was defined according to the National Cardiovascular Data Registry (NCDR), which considers retroperitoneal, gastrointestinal, genitourinary, and intracranial bleeding, as well as access-site hematoma, as bleeding events. Postcatheterization transfusions were also recorded. 

Model variables. The model was generated on 86 predictor variables (Table 1). Each of the 86 predictor variables was fed into the model for consideration. The final model included data from all predictor variables, but final discrimination did not depend on all 86 variables. 

Model generation. The 15,604 patients were divided into a training set and test set by a random-number generator. Seventy-five percent of the entire cohort (n = 11,703) was utilized as the training cohort and 25% (n = 3900) was used to test the accuracy, sensitivity, and specificity of the model. Data from patients in both cohorts were scaled according to their median data so that the range of any one variable did not dominate measures of statistical distance. Finally, the training data set was down-sampled so that the relative infrequency of bleeding events did not impact the frequency of bleeding prediction. This had the effect of randomly sampling within the training set so that the frequency of bleeding and non-bleeding patients was balanced. 

Model variable weighting. In order to investigate the relative contribution of individual variables to the overall bleeding prediction, the variable importance of each term was investigated. This was performed after the model had been finalized and was not used to generate the model. For each iteration of the model, the prediction accuracy on the training cohort used during that model generation was recorded. This accuracy was recorded as each predictor variable was added to the model. The difference between the two accuracies were then averaged over all trees, and normalized by the standard error. The difference between the accuracies of the two models was then averaged over all of the trees and over each boosted iteration. 

Model testing. All recorded variables were considered candidate variables. The variables, once scaled, were fed into a boosted classification tree algorithm. This model trained the base estimator on the training set and observed the training data samples that the base estimator misclassified and created a weighted coefficient for these samples. A second base estimator was then trained, applying the above weight coefficient, to samples when calculating the entropy measure of homogeneity. Boosting was performed to create successive base classifiers that were programmed to place greater emphasis on the misclassified samples from the training data. For further validation, the model was generated through 10-fold cross validation such that 10 separate 10-fold cross-validation was performed. Finally, a probability of class membership was calculated based on the sum of the individual tree results for each patient. If the sum was >50% probability of bleeding, the patient was predicted to have bled. At the end of this process, the model of bleeding prediction (AI-BR model) post PCI was considered finalized. 

Following model generation on the training cohort, the AI-BR model was applied to the 3900 patients in the testing data set, who were not used in any of the model-generation steps. A receiver operating characteristic (ROC) curve was generated based on the prediction model for the test cohort. 

Model comparison. In order to ensure improved accuracy versus historical models, the ACC-BR bleeding model score developed by Rao et al10 was calculated for each patient in the test cohort. Patients received a bleeding score composed of the risk variables ST-segment elevation myocardial infarction (STEMI), age, body mass index (BMI), previous PCI, presence of chronic kidney disease, shock, cardiac arrest within 24 hours, gender, hemoglobin, and PCI status and the composite risk score was compared with the AI-BR model. A ROC curve for model performance was compared. The differences between the ROC areas under the curve (AUC) were compared using the Delong method.25

Statistical analysis. All statistical analyses were performed in R, version 3.5.1 (July 2, 2018). Augmented intelligence algorithms and data processing were performed, in part, through the “caret” package, version 6.0-80.13 A P-value <.05 was considered statistically significant. Categorical variables are summarized as frequencies and compared using Pearson’s Chi-square test. Continuous variables are summarized as medians and compared using Wilcoxon rank-sum test. 


Study sample. Retrospective data were collected on 21,782 patients from the Mayo Clinic PCI database. After application of the exclusion criteria, a total of 15,603 patients were included in the analysis (Figure 1). Baseline characteristics of the cohort are listed in Table 2. Bleeding occurred in 1.8% of patients who had PCI performed during the time interval considered. The location of bleeding occurred most often at the access site (0.5%), followed by gastrointestinal (0.26%), retroperitoneal (0.167%), and genitourinary bleeding (0.06%). 

Risk factors for in-hospital bleeding. The top 20 variables associated with post-PCI bleeding risk among the training cohort are displayed in Figure 2. The relative importance of all 86 variables can be found in Supplemental Figure S1. The variables that best predicted bleeding risk were acute coronary syndrome, indication and timing of PCI, presence of ischemic changes on presentation electrocardiogram, ability to perform PCI on the culprit lesion, and number of native vessels treated. The variables least predictive of bleeding risk included use of fondaparinux, the presence of peptic ulcer disease, and presence of mid left anterior descending coronary artery disease. 

Model performance. The out-of-bag error rate for the AI-BR prediction model on the training cohort was 0.102. This measure was calculated as the error rate in predicting training samples that were not included in a given iteration of model development. The predictive model was generated by taking the most accurate predictions as judged by the out-of-bag error rate on the cross-validated training samples. 

To judge the accuracy of the model on data that had not been included in the production of the model, a total of 3900 patients were included in the test set. The accuracy on the test cohort was found to be 77.4% (sensitivity, 77.3%; specificity, 80.1%). The concordance statistic (c-statistic) for the model to predict bleeding post PCI was 87.0%. 

The ACC-BR score was calculated for each patient in the test cohort. The patients were stratified into low risk, medium risk, and high risk according to the prespecified levels previously published.10 Of the 3900 patients in the testing cohort, a total of 1471 (37.7%) were found to be low risk, 2071 (53.1%) were medium risk, and 353 (9.1%) were high risk. The ACC-BR score c-statistic was 76.4% (Figure 3).

Bleeding risk among subgroups for the entire population. Postprocedural bleeding was experienced by 49 of the 3264 patients (1.5%) who underwent PCI via radial access vs 172 of the 9429 patients (1.8%) who underwent PCI via femoral access in the entire study population. Postprocedural bleeding was also experienced by 18 the 688 patients (2.6%) who underwent PCI with a glomerular filtration rate (GFR) <60 mL/min/1.73m2 and in 74 of the 4047 diabetic patients (1.8%). Of the 3813 women in the entire population, a total of 101 (2.6%) had postprocedural bleeding, and 123 of the total 9057 patients (1.4%) who were >75 years old also experienced bleeding.

Comparison among subgroups. The cohorts were divided into subgroups, and the performances of the two models were compared (Table 3). The AI-BR model had a statistically significant increase in prediction accuracy vs the ACC-BR score for patients who underwent PCI via radial access (c-score, 0.866 vs 0.774; P=.03) and femoral access (c-score, 0.876 vs 0.762; P=.03), as well as patients with GFR <60 mL/min/1.73 m2 (c-score, 0.880 vs 0.766; P=.04), diabetes mellitus (c-score, 0.843 vs 0.757; P=.04), female gender (c-score, 0.835 vs 0.654; P=.04), and age >75 years (c-score, 0.887 vs 0.718; P=.046). The AI-BR model performed better among patients who had active STEMI or BMI >35 kg/m2 vs the ACC-BR score, but these differences were not statistically significant (STEMI c-score, 0.748 vs 0.693 [P=.07]; BMI >35 kg/m2 c-score, 0.86 vs 0.649 [P=.08]). 


In the current study, machine learning algorithms were superior to conventional bleeding risk predictors in patients undergoing PCI in the Mayo Clinic Health System. In addition, subgroup analysis showed superior performance of the AI-BR model in the following subsets: patients >75 years, radial and femoral access intervention, diabetes, and GFR <60 mL/min/1.73 m2. Both the AI-BR and ACC-BR models had similar ability to predict bleeding in STEMI patients undergoing PCI. The overall rate of bleeding events in the cohort was 1.8%, which is significantly lower than the national average of 5.8%. 

We demonstrated the superior predictive capacity of the AI-BR model in comparison with the ACC-BR model in predicting bleeding risk post PCI. To our knowledge, this is the first study demonstrating the application of machine learning algorithms to identify patients at risk of developing bleeding complications post PCI using a large database of patients undergoing PCI. Unlike traditional approaches to risk prediction, machine learning algorithms have significant advantages, both theoretical and practical. Discretization of continuous variables allows non-linear responses to variables without obligating researchers to perform the tedious task of determining the choice and tuning of these relationships.15 Bleeding is an independent predictor of morbidity and mortality post PCI. Better ability to identify individuals at increased risk of bleeding might allow the provider to tailor the length of dual-antiplatelet therapy (DAPT), especially in the era of newer-generation drug-eluting stents. A recent meta-analysis of 9 randomized control studies comparing short (up to 6 months) vs long-term course of DAPT following drug-eluting stent implantation demonstrated a significant reduction of major bleeding events with no apparent increase in all-cause death, myocardial infarction, stent thrombosis, or stroke.16 

The prevalence of major bleeding based on the NCDR definition was 1.8% in our study population, which was significantly lower than the national average of 5.8% based on the CathPCI registry.14 In our study, we identified the factors highly associated with major bleeding events post PCI, which include: (1) presentation with CAD, specifically STEMI; (2) the indication and timing of PCI; (3) ischemic changes on electrocardiogram (non-ST elevation); (4) PCI performed on non-culprit lesions; (5) high complexity of lesion intervened; (6) low preintervention TIMI flow; and (7) higher number of native vessels treated. Other risk factors identified in the ACC-BR model,14 including age, BMI, chronic kidney disease, cardiogenic shock, cardiac arrest within 24 hours, female sex, preprocedural hemoglobin level, and urgency of PCI were also noted to have moderate association with periprocedural bleeding risk. Unlike previous risk calculators, incorporation of intraprocedural data significantly enhanced the accuracy of our model. The list of top 20 variables associated with risk of periprocedural bleeding is presented in Figure 1. Furthermore, among all subgroups, AI-BR showed the highest improvement in accuracy prediction in patients with BMI >35 kg/m2. This is likely due to the low rate of periprocedural bleeds noted in obese patients, as the performance of traditional risk prediction models developed using linear/logistic regression analysis is limited in cohorts with low event rates. Similarly, no significant difference between AI-BR and ACC-BR was noted in the subgroup of STEMI patients, although there was a trend, potentially due to presence of higher rates of periprocedural bleeding events noted in this subgroup.

Antiplatelet and antithrombotic therapies have significantly improved patient outcomes post PCI; however, procedure-related bleeding events have negatively affected the outcomes, including mortality and major adverse cardiovascular events. Mortality outcomes increase between 2- to 6-fold depending upon the different contemporary definitions used to classify major bleeds.17 The potential mechanisms that explain the increase in mortality after a major bleed include: (1) interruption of DAPT, portending an increased risk of in-stent thrombosis;18 (2) increased erythropoietin production leading to a sustained prothrombotic state beyond the acute phase via induction of plasminogen activator inhibitor-1 and platelet activation;19,20 and (3) transfusion of both red blood and non-red blood cell transfusions such as plasma/platelets causing increased prothrombotic state acutely via release of CD40 ligand and sustained platelet activation.19,21 Based on prior studies, transfusion of blood products independently increased the risk of long-term mortality following a procedural bleeding.22,23 Therefore, the risk stratification of patients at high risk for periprocedural bleeding will help guide the choice and duration of antiplatelet therapy. 

Bleeding avoidance strategies, including use of bivalirudin and radial approach, were previously proposed to reduce frequency of periprocedural bleeding, especially among high-risk patients.5-8 However, contemporary studies have shown less use of bleeding avoidance strategies among high-risk patients.9 Given the current era of public reporting of PCI-related performance measures at both individual and institutional levels, the risk stratification of patients undergoing PCI can lead to appropriate use of bleeding avoidance strategies to prevent periprocedural bleeding, thereby also decreasing PCI-related mortality and major adverse cardiovascular events.

Prior models14,24 that were developed to predict post-PCI bleeding risk were simplified in order to allow clinicians to perform the assessment bedside. Our AI-BR model uses 86 different variables to accurately predict a bleeding event post PCI. However, with the advancement of electronic medical records, it is possible to embed complex risk prediction models into patient records, making them easily accessible and readily available to healthcare providers. Smartphone-based/web-based applications can be developed to run the AI-BR model to increase the ease of use and improve penetration among healthcare providers. This may encourage greater adoption of bleeding avoidance strategies, and potentially adjust the length of DAPT. 

Study limitations. The current machine learning algorithm tested in our analysis was performed on a single data set. Further studies are needed to validate the findings in an external independent data set. Other models of machine learning, such as neural networks or support-vector machines, were not tested. A higher number of potential risk factors could lead to over-fitting, and lower external validity. The patients in the data set might have skewed demographics, as they were predominantly white, which has implications with external validity. The analysis was retrospective, with a low number of major bleeding events in the entire cohort. Other unmeasured confounders, such as operator experience and skill, may be of paramount importance in assessing periprocedural bleeding risk. In order to run this model, eighty-six variables must be obtained; however, this is likely easier than expected given the wide availability of these variables and the possibility of automatic calculation through the electronic medical record. The use of blood transfusion during the admission might not reflect bleeding due to procedural complication and its use remains controversial. 


Compared with contemporary bleeding risk prediction models, the AI-BR model based on machine learning algorithms significantly improves the accuracy of post-PCI bleeding predictions. Further studies are needed to validate the accuracy of the AI-BR model on large-population based data sets. Incorporation of complex machine-learning based risk prediction models into existing electronic medical records or the development of a smartphone-based application would improve penetration and utilization of these models, thereby leading to better PCI-related quality of care.

*Joint first authors.

From the 1Department of Cardiovascular Diseases, Mayo Clinic Arizona, Phoenix, Arizona; 2Department of Anesthesiology and Perioperative Medicine, Mayo Clinic Arizona, Phoenix, Arizona; 3Department of Cardiovascular Diseases, Cedar-Sinai Medical Center, Los Angeles, California; and 4Department of Cardiovascular Diseases, Mayo Clinic Rochester, Rochester, Minnesota.

Disclosure: The authors have completed and returned the ICMJE Form for Disclosure of Potential Conflicts of Interest. The authors report no conflicts of interest regarding the content herein.

Manuscript submitted November 10, 2019, accepted November 14, 2019.

Address for correspondence: Reza Arsanjani, MD, Senior Consultant Cardiovascular Diseases, Mayo Clinic, 13400 East Shea Boulevard, Scottsdale, AZ 85259. Email: Arsanjani.Reza@mayo.edu

  1. Chhatriwalla AK, Amin AP, Kennedy KF, et al. Association between bleeding events and in-hospital mortality after percutaneous coronary intervention. JAMA. 2013;309:1022-1029.
  2. Lopes RD, Alexander KP, Manoukian SV, et al. Advanced age, antithrombotic strategy, and bleeding in non–ST-segment elevation acute coronary syndromes: results from the ACUITY (Acute Catheterization and Urgent Intervention Triage Strategy) trial. J Am Coll Cardiol. 2009;53:1021-1030.
  3. Cohen DJ, Lincoff AM, Lavelle TA, et al. Economic evaluation of bivalirudin with provisional glycoprotein IIB/IIIA inhibition versus heparin with routine glycoprotein IIB/IIIA inhibition for percutaneous coronary intervention: results from the REPLACE-2 trial. J Am Coll Cardiol. 2004;44:1792-1800.
  4. Manoukian SV, Feit F, Mehran R, et al. Impact of major bleeding on 30-day mortality and clinical outcomes in patients with acute coronary syndromes: an analysis from the ACUITY trial. J Am Coll Cardiol. 2007;49:1362-1368.
  5. Sherev DA, Shaw RE, Brent BN. Angiographic predictors of femoral access site complications: implication for planned percutaneous coronary intervention. Catheter Cardiovasc interv. 2005;65:196-202.
  6. Kirtane AJ, Piazza G, Murphy SA, et al. Correlates of bleeding events among moderate-to high-risk patients undergoing percutaneous coronary intervention and treated with eptifibatide: observations from the PROTECT–TIMI-30 trial. J Am Coll Cardiol. 2006;47:2374-2379.
  7. Verheugt FW, Steinhubl SR, Hamon M, et al. Incidence, prognostic impact, and influence of antithrombotic therapy on access and nonaccess site bleeding in percutaneous coronary intervention. JACC Cardiovasc Interv. 2011;4:191-197.
  8. Rao SV, Ou F-S, Wang TY, et al. Trends in the prevalence and outcomes of radial and femoral approaches to percutaneous coronary intervention: a report from the National Cardiovascular Data Registry. JACC Cardiovasc Interv. 2008;1:379-386.
  9. Marso SP, Amin AP, House JA, et al. Association between use of bleeding avoidance strategies and risk of periprocedural bleeding among patients undergoing percutaneous coronary intervention. JAMA. 2010;303:2156-2164.
  10. Arsanjani R, Xu Y, Dey D, et al. Improved accuracy of myocardial perfusion SPECT for detection of coronary artery disease by machine learning in a large population. J Nuclear Cardiol. 2013;20:553-562.
  11. Arsanjani R, Dey D, Khachatryan T, et al. Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population. J Nuclear Cardiol. 2015;22:877-884.
  12. Brindis RG, Fitzgerald S, Anderson HV, Shaw RE, Weintraub WS, Williams JF. The American College of Cardiology-National Cardiovascular Data Registry™(ACC-NCDR™): building a national clinical data repository. J Am Coll Cardiol. 2001;37:2240-2245.
  13. Kuhn M. CARET: Classification and Regression Training. R package version 6.0-80. 2018.
  14. Rao SV, McCoy LA, Spertus JA, et al. An updated bleeding model to predict the risk of post-procedure bleeding among patients undergoing percutaneous coronary intervention: a report using an expanded bleeding definition from the National Cardiovascular Data Registry CathPCI Registry. JACC Cardiovasc Interv. 2013;6:897-904.
  15. Poppe KK, Doughty RN, Wells S, et al. Developing and validating a cardiovascular risk score for patients in the community with prior cardiovascular disease. Heart. 2017;103:891-892.
  16. Rozemeijer R, Voskuil M, Greving JP, et al. Short versus long duration of dual antiplatelet therapy following drug-eluting stents: a meta-analysis of randomised trials. Neth Heart J. 2018;26:242-251.
  17. Kwok CS, Rao SV, Myint PK, et al. Major bleeding after percutaneous coronary intervention and risk of subsequent mortality: a systematic review and meta-analysis. Open Heart. 2014;1:e000021.
  18. Dangas GD, Claessen BE, Mehran R, et al. Clinical outcomes following stent thrombosis occurring in-hospital versus out-of-hospital: results from the HORIZONS-AMI (Harmonizing Outcomes with Revascularization and Stents in Acute Myocardial Infarction) trial. J Am Coll Cardiol. 2012;59:1752-1759.
  19. Doyle BJ, Rihal CS, Gastineau DA, Holmes DR. Bleeding, blood transfusion, and increased mortality after percutaneous coronary intervention: implications for contemporary practice. J Am Coll Cardiol. 2009;53:2019-2027.
  20. Smith KJ, Bleyer AJ, Little WC, Sane DC. The cardiovascular effects of erythropoietin. Cardiovasc Res. 2003;59:538-548.
  21. Hachem A, Yacoub D, Théorêt J-Fc, Gillis M-A, Mourad W, Merhi Y. Enhanced levels of soluble CD40 ligand exacerbate platelet aggregation and thrombus formation via CD40-dependent TRAF-2/Rac1/p38 MAPK signaling pathway. Arterioscler Thromb Vasc Biol. 2010;30:2424-2433.
  22. Chase AJ, Fretz EB, Warburton WP, et al. The association of arterial access site at angioplasty with transfusion and mortality: the MORTAL study (Mortality benefit of Reduced Transfusion after PCI via the Arm or Leg). Heart 2008;94:1019-1025.
  23. Rao SV, Jollis JG, Harrington RA, et al. Relationship of blood transfusion and clinical outcomes in patients with acute coronary syndromes. JAMA. 2004;292:1555-1562.
  24. Mehran R, Pocock SJ, Nikolsky E, et al. A risk score to predict bleeding in patients with acute coronary syndromes. J Am Coll Cardiol. 2010;55:2556-2566.
  25. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837-845.