Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Project 1 (DO IN PYTHON) This dataset: https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/Proj1.csv is adapted from the World Health Organization on Strokes (it's based on real data but is NOT

Project 1 (DO IN PYTHON)

This dataset: https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/Proj1.csv

is adapted from the World Health Organization on Strokes (it's based on real data but is NOT REAL). Use this dataset to answer the following questions and perform the following tasks. Feel free to add extra cells as needed, but follow the structure listed here and clearly identify where each question is answered. Please remove any superflous code.

Data Information

  • reg_to_vote: 0 if no, 1 if yes.
  • age: age of the patient in years.
  • hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension.
  • heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease.
  • ever_married: 0 if no, 1 if yes.
  • Residence_type: 0 for Rural, 1 for Urban.
  • avg_glucose_level: average glucose level in blood.
  • bmi: body mass index.
  • smoking_status_smokes, smoking_status_formerly: Whether or not the person smokes, or formerly smoked. If a person has 0's for both these columns, they never smoked.
  • stroke: 1 if the patient had a stroke or 0 if not.
  • dog_owner: 0 if no, 1 if yes.
  • er_visits: number of recorded Emergency Room visits in lifetime.
  • racoons_to_fight: number of racoons the patient belives they could fight off at once.
  • fast_food_budget_month: amount (in US dollars) spent on fast food per month.

Logistic Regression

Build a logistic regression model to predict whether or not someone had a stroke based on all the other variables in the dataset.

  1. Count the missing data per column, and remove rows with missing data (if any).
  2. Use 10 fold cross validation for your model validation. Z-score your continuous variables only. Store both the train and test accuracies to check for overfitting. Is the model overfit? How can you tell?
  3. After completing steps 1-2, fit another logistic regression model on ALL of the data (no model validation; but do z score) using the same predictors as before, and put the coefficients into a dataframe called coef.
  4. print out a confusion matrix for the model you made in part 3. What does this confusion matrix tell you about your model? How can you tell?

Please help! I cannot figure it out I keep getting errors.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Financial management theory and practice

Authors: Eugene F. Brigham and Michael C. Ehrhardt

12th Edition

978-0030243998, 30243998, 324422695, 978-0324422696

Students also viewed these Programming questions