Question
Cardiovascular (heart) disease is a major health problem worldwide. It is estimated that by 2030, over 23 million people will die from cardiovascular diseases each
Cardiovascular (heart) disease is a major health problem worldwide. It is estimated that by 2030, over 23 million people will die from cardiovascular diseases each year globally, with the impact disproportionally affecting low and middle-income countries. In the late 1940s, the U.S. government took steps to study heart disease. In order to develop excellent data for their study, they decided to track a large cohort of initially-healthy people over time. The town of Framingham, Massachusetts (a suburb of Boston) was selected as the site for the study, which commenced in 1948. The study enrolled 5,209 participants aged 30-62. Participants were given a questionnaire and a medical examination every two years. They also collected data on the participants' physical characteristics and behavioral characteristics in addition to the medical test data. Over the years, the study has expanded to include multiple generations and has collected many more factors including genetic information. This data is now famously known and is simply called the Framingham Heart Study. In this problem (adapted from a question developed at MIT), I ask you to construct a predictive model using the Framingham Heart Study data in order to predict coronary heart disease (CHD) and to make recommendations to better prevent it. There are 3,658 observations, with each observation representing the data from a particular study participant. There are 16 columns in the dataset, which are described in the table below. Column name Description male Gender of patient age Age (in years) at first examination education Some high school, high school/GED, some college/vocational school, college currentSmoker Is a current smoker cigsPerDay Number of cigarettes per day BPMeds Is on blood pressure medication at time of first examination prevalentStroke Previously had a stroke prevalentHyp Currently hypertensive diabetes Currently has diabetes totChol Total cholesterol (mg/dL) sysBP Systolic blood pressure diaBP Diastolic blood pressure BMI Body Mass Index heartRate Heart rate (beats/minute) glucose Blood glucose level (mg/dL) TenYearCHD Experienced coronary heart disease within 10 years of first examination You will be asked to predict TenYearCHD (whether a patient experiences coronary heart disease within 10 years of his or her first examination). As a consequence of your modeling efforts, you should be able to identify risk factors, which are the variables that increase the risk of CHD. To lower the risk of CHD, physicians can prescribe preventive medication such as blood-pressure lowering or cholesterol-lowering medications. Many policy makers, when recommending certain preventive medications to patients at risk of developing CHD, rely on evidence-based analysis that weighs the pros and cons of such interventions. Health economic evaluation is a commonly applied methodology for decision-making that takes both medical costs and health benefits (a monetized version of improved life longevity) into consideration. In fact, many countries establish clinical practice guidelines using such formalized health economic evaluation methodologies (the National Institute for Health and Clinical Excellence in England, for example). As prior work, let us suppose that a colleague of yours has completed a health economics study analyzing the costs and benefits of a recently approved medication aimed at preventing CHD. The study has estimated that patients who experience CHD within the next 10 years are expected to incur a lifetime cost of USD165,000 associated with the disease; this cost includes both the costs of treatment for CHD (USD80,000) as well as a cost intended to capture the decreased quality and length of life experienced by patients with CHD (USD85,000). The study has determined that patients who take the preventative medicine will have their probability of developing CHD within the next 10 years reduced by a factor of 2.3; that is, if their current 10-year risk of developing CHD is p without taking the medication, then their 10-year risk with the medicine would instead be p/2.3. Regardless of whether a patient eventually develops CHD, there is a USD7,500 cost associated with taking this recently approved medication. Part (a) (10 points) Construct a random forest model for this problem with 1000 trees. Leave other hyperparameters at the default values. Part (b) (10 points) Suppose a new patient arrives, and the physician accesses the patient's electronic medical records and retrieves the following information about the patient: Variable Value male 1 age 55 education College currentSmoker 1 cigsPerDay 10 BPMeds 0 prevalentStroke 0 prevalentHyp 1 diabetes 0 totChol 220 sysBP 140 diaBP 100 BMI 30 heartRate 60 glucose 80 Based on your model, what is the predicted probability that this patient will experience CHD in the next ten years? Part (c) (5 points) to help interpret the model, provide a horizontal bar chart of the feature (i.e. variable) importance measure of the random forest model developed in Part (b). Use scikit-Learn's feature_importances_ attribute. Part (d) (10 points) Suppose you wish to determine the optimal strategy for assigning which patients receive the medication. Given your colleague's analysis of the total expected economic costs associated with CHD and the medication, identify an appropriate threshold value of p. Provide the value of that threshold. Should the physician prescribe the preventive medication for the patient in Part (b)? Part (e) (10 points) Describe the test-set performance of the random forest model, using the threshold identified in Part (d) to separate patients into those who are at high risk for CHD (risk exceeding the threshold p) and those who are at low risk for CHD (risk below the threshold p). Consider high risk as the positive outcome and low risk as the negative outcome. Calculate the model's accuracy, sensitivity, and specificity. Part (f) (5 points) Are there any aspects of the analysis performed thus far that raise ethical concerns? If so, what are some ways that this analysis could be changed to address these concerns?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started