Question
### Course Project (Phase 1) For the course project you will analyze customer churn data from a bank. Customers that churn are those that leave
### Course Project (Phase 1)
For the course project you will analyze customer churn data from a bank. Customers that "churn" are those that leave the bank. In Phase 1 of the project you will perform a descriptive analysis of the data and identify variables that may be strong predictors of churn. In Phase 2 of the project you will build predictive models and assess their performance.
**DO NOT build any models in Phase 1**. Note: A possible exception could be building a Random Forest model to help with variable selection.
Begin by loading packages. You may not need the functionality provided by all of these packages.
```{r} #| include: false library(tidyverse) library(caret) library(randomForest) library(ranger) library(rpart) library(rattle) library(RColorBrewer) library(e1071) library(MASS) library(GGally) library(mice) library(VIM) library(ROCR) library(esquisse) library(tidymodels) ```
Read in the data with the following code:
```{r} churn = read_csv("Churn_Modelling.csv") ```
The variables in the dataset are:\ \* Row Number - Can be deleted\ \* CustomerId - Unique identifier for each customer\ \* Surname - Customer's last name\ \* CreditScore - Customer's credit score (higher is better)\ \* Geography - Customer's country\ \* Gender - Customer's gender\ \* Age - Customer's age\ \* Tenure - Number of years that the customer has been with the bank\ \* Balance - Customer's bank balance\ \* NumOfProducts - The number of bank products that the customer is using\ \* HasCrCard - Indicate whether the customer has (1) a credit card with the bank or (0) not\ \* IsActiveMember - Indicates the customer is (1) an active member of the bank or (0) not\ \* EstimatedSalary - The estimated salary of the customer\ \* Exited - An indicator that shows if the customer closed their account with the bank (1) or (0) not (This is the response variable)
To clean and prepare the data, please complete the following tasks:
- Delete the RowNumber, CustomerId, and Surname columns (NOTE: The MASS package has been loaded so you will need to be careful if you choose to use the select function - Convert the Geography, Gender, HasCrCard, IsActiveMember, and Exited columns to factors - Rename the levels of the HasCrCard, IsActiveMember, and Exited columns from 0 and 1 to No and Yes (You may use more descriptive namings if you prefer) - If there is any missing data, deal with this missingness is an appropriate manner - You may also remove any data that you consider to be an outlier - After cleaning and preparing the data, split the data into training and testing sets (Use a 70/30 training/testing split with a random number seed of 1234)
Then perform appropriate analysis on the training set to examine the relationships between each variable and the Exited response variable. Which variables appear likely to be strong predictors of customer churn? What are the possible implications of your findings?
**DELIVERABLE** Your deliverable for Phase 1 is a six slide PowerPoint presentation summarizing your work on this phase. You do not need to deliver (present) the presentation, I only want to see your slides. Your audience for this presentation is non-technical. Do NOT include any R code in your slides. If you are working with a partner, please include their name on the presentation. Only submit one presentation via Canvas per group.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started