Answered step by step
Verified Expert Solution
Question
1 Approved Answer
21.3 TAYKO SOFTWARE CATALOGER Data Link - https://1drv.ms/x/s!Ao_duWhjG7s9hC2CouQBXkre5Gk2?e=fi1FOh Develop a logistic regression model for classifying a customer as a purchaser or non-purchaser. Partition the data
21.3 TAYKO SOFTWARE CATALOGER
Data Link - https://1drv.ms/x/s!Ao_duWhjG7s9hC2CouQBXkre5Gk2?e=fi1FOh
- Develop a logistic regression model for classifying a customer as a purchaser or non-purchaser. Partition the data randomly into training set 60% validation set 40%. Run logistic regression with L2 penalty, using method LogisticRegressionCV. Please submit Python code.
- Tell a high-level story of steps taken to get to the end result. Start with the framework i.e., objective, exploration, variable selection (PCA, Correlation etc.). Then provide the final results and comparison analysis of the training vs. validation data vs. test.
- Present your findings in power point format (no more than 5 slides) in terms of steps taken and results.
Things you can add:
- Show the shape of the df
- Show some records of the df
- List data types of the variables in the df
- Preliminary Exploration - view the data: rename all column names - replace space with underscore
- Look at descriptive statistics
- Count of Missing values
- Remove certain variables from the onset (i.e., spending and sequence number)
- Count number of unique values in each variable
- Dummy variables if need
- Some visualizations to explore the data
- Histograms, Frequency Distribution, side by side plots with the outcome, scatterplot, pairplot
- Other plots according to your discretion
- Correlation table & Heatmap: Comment on high correlations
- Conduct a PCA: Discuss how many PCs to use
The Logistic Regression:
- (don't incorporate PCA or variable reduction through correlations here, run the logistic regression on all variables apart from spending and sequence_number)
- Partition the data on the whole data set randomly into a training set 60% validation set 40%
- Run quick descriptive stats for validation and training dataset
- Fit a logistic regression (set penalty=l2 and C=1e42 to avoid regularization): Predict the model on validation dataset
- Develop gains and lift chart for test and validation results
- Confusion matrix for all sets
- Show some use of stats model if possible
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started