Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this question, we work with a dataset from the great textbook of An Introduction to Statistical Learning. (A) Read the dataset file Hearts_s.csv and

In this question, we work with a dataset from the great textbook of "An Introduction to Statistical Learning." (A) Read the dataset file Hearts_s.csv and assign it to a Pandas DataFrame. (B) Check out the dataset. As you see, the dataset contains a number of features including both contextual and biological factors (e.g. age, gender, vital signs, ). The last column AHD is the label with Yes meaning that a human subject has Heart Disease, and No meaning that the subject does not have Heart Disease. (C) As you see, there are at least 3 categorical features in the dataset (Gender, ChestPain, Thal). Lets ignore these categorical features for now, only keep the numerical features and build your feature matrix and label vector. (D) Split the dataset into testing and training sets with the following parameters: test_size=0.25, random_state=4. (E) Use KNN (with k=3), Decision Tree (with random_state=5), and Logistic Regression Classifiers to predict Heart Disease based on the training/testing datasets that you built in part (d). Then check, compare, and report the accuracy of these 3 classifiers. Which one is the best? Which one is the worst? (F) Now, we want to use the categorical features as well! To this end, we have to perform a feature engineering process called OneHotEncoding for the categorical features. To do this, each categorical feature should be replaced with dummy columns in the feature table (one column for each possible value of a categorical feature), and then encode it in a binary manner such that only one of the dummy columns can take 1 at a time (and zero for the rest). For example, Gender can take two values m and f. Thus, we need to replace this feature (in the feature table) by 2 columns titled m and f. Wherever we have a male subject, we can put 1 and 0 in the columns m and f. Wherever we have a female subject, we can put 0 and 1 in the columns m and f. (Hint: you will need 4 columns to encode ChestPain and 3 columns to encode Thal).

(G) Repeat parts (d) and (e) with the new dataset that you built in part (f). How does the prediction accuracy change for each method?

(H) Now, repeat part (e) with the new dataset that you built in part (f), but this time using Cross-Validation. Thus, rather than splitting the dataset into testing and training, use 10-fold Cross-Validation (as we learned in Lab4) to evaluate the classification methods and report the final prediction accuracy.

The Hearts_s.csv you can download from this link => https://drive.google.com/open?id=1OWOR-qbyBhHc-Mr6tq5rbxGhINLYNhDm

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database And Transaction Processing

Authors: Philip M. Lewis, Arthur Bernstein, Michael Kifer

1st Edition

0201708728, 978-0201708721

More Books

Students also viewed these Databases questions

Question

Why do living creatures die? Can it be proved that they are reborn?

Answered: 1 week ago

Question

8. Do the organizations fringe benefits reflect diversity?

Answered: 1 week ago