Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The data set we'll be using in the midterm assignment, ClaimsData.csv , is structured to represent a sample of patients in the Medicare program, which

The data set we'll be using in the midterm assignment, ClaimsData.csv, is structured to represent a sample of patients in the Medicare program, which provides health insurance to Americans aged 65 and older, as well as some younger people with certain medical conditions. The observations represent a 1% random sample of Medicare beneficiaries, limited to those still alive at the end of 2008. Our independent variables are from 2008, and we will be predicting cost in 2009. Our independent variables are the patient's age in years at the end of 2008, and then several binary variables indicating whether or not the patient had diagnosis codes for a particular disease or related disorder in 2008: alzheimers, arthritis, cancer, chronic obstructive pulmonary disease, or copd, depression, diabetes, heart.failure, ischemic heart disease, or ihd, kidney disease, osteoporosis, and stroke. Each of these variables will take value 1 if the patient had a diagnosis code for the particular disease and value 0 otherwise. Reimbursement2008 is the total amount of Medicare reimbursements for this patient in 2008. And reimbursement2009 is the total value of all Medicare reimbursements for the patient in 2009. Bucket2008 is the cost bucket the patient fell into in 2008, and bucket2009 is the cost bucket the patient fell into in 2009. These cost buckets are defined using the thresholds determined by data supplier. So the first cost bucket contains patients with costs less than $3,000, the second cost bucket contains patients with costs between $3,000 and $8,000, the third cost bucket contains patients with costs between $8,000 and $19,000, and the fourth cost bucket contains patients with costs between $19,000 and $55,000, and fifth cost bucket contains patients greater than $55,000. Q1: Calculate the patient number percentages of each bucket by creating a table of the variable bucket2009 and divide by the number of rows in Claims. Our goal will be to predict the cost bucket the patient fell into in 2009 using a CART model. But before we build our model, we need to split our data into a training set (ClaimsTrain) and a testing set (ClaimsTest). Therefore, load the package caTools, and then set our random seed to 88 so that we all get the same split. And set SplitRatio to be 0.6. Q2: What is the average age of patients in the training set, ClaimsTrain? Q3: What proportion of people in the training set (ClaimsTrain) had at least one diagnosis code for diabetes? The baseline method would predict that the cost bucket for a patient in 2009 will be the same as it was in 2008. It can be calculated by creating a classification matrix to compute the accuracy for the baseline method on the test set. And it is 0.68. Our goal will be to create a CART model that has an accuracy higher than 68% Build your CART model and name it ClaimsTree. Independent variables you should use: age, arthritis, alzheimers, cancer, copd, depression, diabetes, heart.failure, ihd, kidney, osteoporosis, stroke, bucket2008 and reimbursement2008. Q4. Construct the model in ClaimsTrain data , set cp=0.00005(obtained by cross validation) and predict the bucket2009 in ClaimTest and compute the accuracy. can you please solve these questions using R programming language (Rstudio)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Certainly I can guide you through solving these questions using R programming language in R... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Analysis And Decision Making

Authors: Christian Albright, Wayne Winston, Christopher Zappe

4th Edition

538476125, 978-0538476126

More Books

Students also viewed these Mathematics questions