Answered step by step
Verified Expert Solution
Question
1 Approved Answer
BMED 4 7 0 2 Medical Information systems Midterm Assignment The data set we'll be using in the midterm assignment, ClaimsData.csv , is structured to
BMED
Medical Information systems
Midterm Assignment
The data set we'll be using in the midterm assignment, ClaimsData.csv is structured to
represent a sample of patients in the Medicare program, which provides health insurance to
Americans aged and older, as well as some younger people with certain medical conditions.
The observations represent a random sample of Medicare beneficiaries, limited to those
still alive at the end of
Our independent variables are from and we will be predicting cost in
Our independent variables are the patient's age in years at the end of and then several
binary variables indicating whether or not the patient had diagnosis codes for a particular
disease or related disorder in : alzheimers, arthritis, cancer, chronic obstructive pulmonary
disease, or copd, depression, diabetes, heart.failure, ischemic heart disease, or ihd, kidney
disease, osteoporosis, and stroke.
Each of these variables will take value if the patient had a diagnosis code for the particular
disease and value otherwise.
Reimbursement is the total amount of Medicare reimbursements for this patient in
And reimbursement is the total value of all Medicare reimbursements for the patient in
Bucket is the cost bucket the patient fell into in and bucket is the cost bucket
the patient fell into in
These cost buckets are defined using the thresholds determined by data supplier.
So the first cost bucket contains patients with costs less than $ the second cost bucket
contains patients with costs between $ and $ the third cost bucket contains
patients with costs between $ and $ and the fourth cost bucket contains patients
with costs between $ and $ and fifth cost bucket contains patients greater than
$
Q: Calculate the patient number percentages of each bucket by creating a table of the
variable bucket and divide by the number of rows in Claims.
Our goal will be to predict the cost bucket the patient fell into in using a CART model.
But before we build our model, we need to split our data into a training set ClaimsTrain and a
testing set ClaimsTest Therefore, load the package caTools, and then set our random seed to
so that we all get the same split. And set SplitRatio to be
Q: What is the average age of patients in the training set, ClaimsTrain?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started