Question
Project 1 (DO IN PYTHON) This dataset: https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/Proj1.csv is adapted from the World Health Organization on Strokes (it's based on real data but is NOT
Project 1 (DO IN PYTHON)
This dataset: https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/Proj1.csv
is adapted from the World Health Organization on Strokes (it's based on real data but is NOT REAL). Use this dataset to answer the following questions and perform the following tasks. Feel free to add extra cells as needed, but follow the structure listed here and clearly identify where each question is answered. Please remove any superflous code.
Data Information
- reg_to_vote: 0 if no, 1 if yes.
- age: age of the patient in years.
- hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension.
- heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease.
- ever_married: 0 if no, 1 if yes.
- Residence_type: 0 for Rural, 1 for Urban.
- avg_glucose_level: average glucose level in blood.
- bmi: body mass index.
- smoking_status_smokes, smoking_status_formerly: Whether or not the person smokes, or formerly smoked. If a person has 0's for both these columns, they never smoked.
- stroke: 1 if the patient had a stroke or 0 if not.
- dog_owner: 0 if no, 1 if yes.
- er_visits: number of recorded Emergency Room visits in lifetime.
- racoons_to_fight: number of racoons the patient belives they could fight off at once.
- fast_food_budget_month: amount (in US dollars) spent on fast food per month.
Logistic Regression
Build a logistic regression model to predict whether or not someone had a stroke based on all the other variables in the dataset.
- Count the missing data per column, and remove rows with missing data (if any).
- Use 10 fold cross validation for your model validation. Z-score your continuous variables only. Store both the train and test accuracies to check for overfitting. Is the model overfit? How can you tell?
- After completing steps 1-2, fit another logistic regression model on ALL of the data (no model validation; but do z score) using the same predictors as before, and put the coefficients into a dataframe called coef.
- print out a confusion matrix for the model you made in part 3. What does this confusion matrix tell you about your model? How can you tell?
Please help! I cannot figure it out I keep getting errors.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started