Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

BMED 4 7 0 2 Medical Information systems Midterm Assignment The data set we'll be using in the midterm assignment, ClaimsData.csv , is structured to

BMED 4702
Medical Information systems
Midterm Assignment
The data set we'll be using in the midterm assignment, ClaimsData.csv, is structured to
represent a sample of patients in the Medicare program, which provides health insurance to
Americans aged 65 and older, as well as some younger people with certain medical conditions.
The observations represent a 1% random sample of Medicare beneficiaries, limited to those
still alive at the end of 2008.
Our independent variables are from 2008, and we will be predicting cost in 2009.
Our independent variables are the patient's age in years at the end of 2008, and then several
binary variables indicating whether or not the patient had diagnosis codes for a particular
disease or related disorder in 2008: alzheimers, arthritis, cancer, chronic obstructive pulmonary
disease, or copd, depression, diabetes, heart.failure, ischemic heart disease, or ihd, kidney
disease, osteoporosis, and stroke.
Each of these variables will take value 1 if the patient had a diagnosis code for the particular
disease and value 0 otherwise.
Reimbursement2008 is the total amount of Medicare reimbursements for this patient in 2008.
And reimbursement2009 is the total value of all Medicare reimbursements for the patient in
2009.
Bucket2008 is the cost bucket the patient fell into in 2008, and bucket2009 is the cost bucket
the patient fell into in 2009.
These cost buckets are defined using the thresholds determined by data supplier.
So the first cost bucket contains patients with costs less than $3,000, the second cost bucket
contains patients with costs between $3,000 and $8,000, the third cost bucket contains
patients with costs between $8,000 and $19,000, and the fourth cost bucket contains patients
with costs between $19,000 and $55,000, and fifth cost bucket contains patients greater than
$55,000.
Q1: Calculate the patient number percentages of each bucket by creating a table of the
variable bucket2009 and divide by the number of rows in Claims.
Our goal will be to predict the cost bucket the patient fell into in 2009 using a CART model.
But before we build our model, we need to split our data into a training set (ClaimsTrain) and a
testing set (ClaimsTest). Therefore, load the package caTools, and then set our random seed to
88 so that we all get the same split. And set SplitRatio to be 0.6.
Q2: What is the average age of patients in the training set, ClaimsTrain?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions