Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The data set we'll be using in the midterm assignment, ClaimsData.csv , is structured to represent a sample of patients in the Medicare program, which
The data set we'll be using in the midterm assignment, ClaimsData.csv is structured to represent a sample of patients in the Medicare program, which provides health insurance to Americans aged and older, as well as some younger people with certain medical conditions. The observations represent a random sample of Medicare beneficiaries, limited to those still alive at the end of Our independent variables are from and we will be predicting cost in Our independent variables are the patient's age in years at the end of and then several binary variables indicating whether or not the patient had diagnosis codes for a particular disease or related disorder in : alzheimers, arthritis, cancer, chronic obstructive pulmonary disease, or copd, depression, diabetes, heart.failure, ischemic heart disease, or ihd, kidney disease, osteoporosis, and stroke. Each of these variables will take value if the patient had a diagnosis code for the particular disease and value otherwise. Reimbursement is the total amount of Medicare reimbursements for this patient in And reimbursement is the total value of all Medicare reimbursements for the patient in Bucket is the cost bucket the patient fell into in and bucket is the cost bucket the patient fell into in These cost buckets are defined using the thresholds determined by data supplier. So the first cost bucket contains patients with costs less than $ the second cost bucket contains patients with costs between $ and $ the third cost bucket contains patients with costs between $ and $ and the fourth cost bucket contains patients with costs between $ and $ and fifth cost bucket contains patients greater than $ Q: Calculate the patient number percentages of each bucket by creating a table of the variable bucket and divide by the number of rows in Claims. Our goal will be to predict the cost bucket the patient fell into in using a CART model. But before we build our model, we need to split our data into a training set ClaimsTrain and a testing set ClaimsTest Therefore, load the package caTools, and then set our random seed to so that we all get the same split. And set SplitRatio to be Q: What is the average age of patients in the training set, ClaimsTrain? Q: What proportion of people in the training set ClaimsTrain had at least one diagnosis code for diabetes? The baseline method would predict that the cost bucket for a patient in will be the same as it was in It can be calculated by creating a classification matrix to compute the accuracy for the baseline method on the test set. And it is Our goal will be to create a CART model that has an accuracy higher than Build your CART model and name it ClaimsTree. Independent variables you should use: age, arthritis, alzheimers, cancer, copd, depression, diabetes, heart.failure, ihd, kidney, osteoporosis, stroke, bucket and reimbursement Q Construct the model in ClaimsTrain data set cpobtained by cross validation and predict the bucket in ClaimTest and compute the accuracy. can you please solve these questions using R programming language Rstudio
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Certainly I can guide you through solving these questions using R programming language in R...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started