Question
I have this actuarial assignment where I have to predict the customers who 'Churn'(leave) a telecommunications company. We are given a training dataset which has
I have this actuarial assignment where I have to predict the customers who 'Churn'(leave) a telecommunications company. We are given a training dataset which has 50 columns including Churn (yes or no response) and an evaluation dataset which has everything except the Churn variable. We have to build several models in R software using the training dataset, and apply these to the evaluation dataset to find 3000 out of 10000 customers in that dataset that are most likely to churn. We also have to select the best model and justify why we chose it. What I don't understand is how to manage the issues of imbalanced data in the training dataset, as well as how to split the training dataset into train and test samples, which I will use to apply the models such as logistic regression, k-nearest neighbours, random forests etc.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started