Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1.In this part you will use Python to analyze the heart disease data set (the link and explanation is included here) by training and building

image text in transcribedimage text in transcribed

1.In this part you will use Python to analyze the heart disease data set (the link and explanation is included here) by training and building a model with regression analysis. Test your model and discuss the result of your test with performance metrics. Make sure you separate training set and testing data properly. Then analyse the input data and explain which of them have more effects on output and modify your models by eliminating non significant variables. (5 marks) Heart Disease Dataset: Here, is the link for heart disease dataset of patients. http://archive.ics.uci.edu/ml/datasets/Heart+Disease After going to this link you will find two folders: One: Data Folder and two: Dataset description. Data folder that has the dataset. It is better to use processed cleveland data. In the dataset description folder, you will find the description about the columns' names referring to the14 column of the dataset as the following: The last one attribute (number 14) is the result. Include your R source code of regression analysis, training and generating results. Here are the example of attributes and their Information (please see data set documents for more details) 1. \#3 (age) 2. \#4 (sex) 3. \#9 (cp) 4. \#10 (trestbps) 5. \#12 (chol) 6. \#16 (fbs) 7. \#19 (restecg) 8. \#32 (thalach) 9.\#38 (exang) 10. \#40 (oldpeak) 13. \#51 (thal) 14. \#58 (num) -------------->result For more information related to this assignment you can read Chapter 2 and Linear Regression section of Chapter 3 of "Doing Data Science" book. 2.Nonlinear Models In this part you will use the heart disease data set to analyse the data with logistic regression analysis ( or any other nonlinear classifier on your choice) and compare with linear regression analysis then answering which method is better. First use two models as the estimator (with CPS521 Lab4 numerical result). Here you need to compare both methods by calculating Errors such as Mean Square Error (MSE) and other performance metrics (R-squared) to find which method can do prediction more accurately. Make sure you separate training set and testing data and there is no overfitting. Exercise-6 (1 Mark) Download the Excel file "Sample-probability-distributions-graph.xlsx" of the sample distributions from the lecture notes. By visual observation and running the regression analysis (for example by Excel regression analysis) find out which probability distribution is linear. You can examine fitting the distribution data by using linear regression model or by explaining the equation of each distribution

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Reliability Engineering Designing And Operating Resilient Database Systems

Authors: Laine Campbell, Charity Majors

1st Edition

978-1491925942

More Books

Students also viewed these Databases questions