Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

DTSC 560 Data Science for Business Module 5 Assignment: Logistic Regression Analysis This material is for enrolled students' academic use only and protected under U.S.

image text in transcribedimage text in transcribed
DTSC 560 Data Science for Business Module 5 Assignment: Logistic Regression Analysis This material is for enrolled students' academic use only and protected under U.S. Copyright Laws. This content must not be shared outside the confines of this course, in line with Eastern University's academic integrity policies. Unauthorized reproduction, distribution, or transmission of this material, including but not limited to posting on third-party platforms like GitHub, is strictly prohibited and may lead to disciplinary action. You may not alter or remove any copyright or other notice from copies of any content taken from BrightSpace or Eastern University's website. Copyright Notice 2024, Eastern University - All Rights Reserved For this assignment, you will conduct a logistic regression analysis in R. You will not be turning in any code or output; rather you'll do the analysis and use the output to answer questions for the associated assignment quiz on Brightspace. Please read these instructions carefully to be able to align your answers with the associated Module 5 assignment quiz in Brightspace. Data: insurance.csv (download from Module 5 on Brightspace) We are using a dataset of information from 7,232 car insurance customers, some of whom have made insurance claims and some who haven't. You will also use a second dataset of new customers to predict the probability of new customers making insurance claims. That dataset is called insurance_predictions.csv (download from Module 5 on Brightspace). Background: For this assignment, you work at an auto insurance company and you would like to predict the probability of insurance claims based on different customer characteristics. Your business question is: \"What is the probability that a customer will make an auto insurance claim based on certain characteristics?\" Variables: The variables in this dataset include: e CLAIM: Whether a customer has made a recent auto insurance claim (No =0, Yes = 1) KIDSDRIV: Whether a customer has children that are driving (No =0, Yes = 1) AGE: Age of driver in years HOMEKIDS: Whether a customer has children at home (No =0, Yes = 1) e INCOME: Income in dollars HOMEOWN: Whether a customer owns a home (No =0, Yes = 1) o MSTATUS: Whether married (No =0, Yes = 1) e GENDER: Gender (Male = 0, Female = 1) EDUCATION: Level of education (High School only = 0, College or beyond = 1) e TRAVTIME: Commute time to work in minutes e CAR_USE: Type of car use, private or commercial (Private = 0, Commercial = 1) e BLUEBOOK: Value of vehicle in dollars e TWC: Customer time with insurance company in years e RED CAR: Whether a customer's carisred (No =0, Yes =1) e CLM_BEF: Whether a customer has made a previous claim in the last five years (No = 0,Yes=1) o REVOKED: Whether a customer has had their license revoked (No =0, Yes=1) e MVR_PTS: Whether a customer has motor vehicle record points (traffic tickets) (No = 0, Yes =1) CAR_AGE: Vehicle age in years URBANICITY: Whether a customer lives in an urban or rural area (Rural = 0, Urban = 1) Assignment Steps: Carry out the steps below to complete the assignment, then answer the questions in the Module 5 Assignment Quiz on Brightspace. The guiz questions are included here. with their numbers. if you prefer to answer them as you are doing the assignment and enter them in the Brightspace quiz all at once (multiple choice questions are labeled \"MC\"). 1) Generate summary statistics for the variables in the insurance.csv dataset. Quiz question #1: What percentage of customers have submitted a recent claim? 2) Partition the dataset into a training, validation, and test set, using a 60%-20%-20% split. *IMPORTANT: In order to get results that align with the correct answers in the assignment quiz, when you are partitioning your dataset you MUST set the seed value to 42 using the set.seed () function. If you do not do this, you will not be able to reproduce the answers that correspond with the assignment quiz. Quiz question #2: How many observations are in the test set? 3) We don't have a severe class imbalance in the insurance dataset, so we're going to start with fitting a model to the training set. Conduct a logistic regression analysis using the training data frame with CLAIM as the outcome variable and_all the other variables in the dataset as predictor variables. Quiz question #3: What is the coefficient for the KIDSDRIV variable? Quiz question #4: What is the odds ratio for the URBANICITY variable

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Discrete and Combinatorial Mathematics An Applied Introduction

Authors: Ralph P. Grimaldi

5th edition

201726343, 978-0201726343

More Books

Students also viewed these Mathematics questions