Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

21.3 TAYKO SOFTWARE CATALOGER Data Link - https://1drv.ms/x/s!Ao_duWhjG7s9hC2CouQBXkre5Gk2?e=fi1FOh Develop a logistic regression model for classifying a customer as a purchaser or non-purchaser. Partition the data

21.3 TAYKO SOFTWARE CATALOGER

Data Link - https://1drv.ms/x/s!Ao_duWhjG7s9hC2CouQBXkre5Gk2?e=fi1FOh

  • Develop a logistic regression model for classifying a customer as a purchaser or non-purchaser. Partition the data randomly into training set 60% validation set 40%. Run logistic regression with L2 penalty, using method LogisticRegressionCV. Please submit Python code.
  • Tell a high-level story of steps taken to get to the end result. Start with the framework i.e., objective, exploration, variable selection (PCA, Correlation etc.). Then provide the final results and comparison analysis of the training vs. validation data vs. test.
  • Present your findings in power point format (no more than 5 slides) in terms of steps taken and results.

Things you can add:

  • Show the shape of the df
  • Show some records of the df
  • List data types of the variables in the df
  • Preliminary Exploration - view the data: rename all column names - replace space with underscore
  • Look at descriptive statistics
  • Count of Missing values
  • Remove certain variables from the onset (i.e., spending and sequence number)
  • Count number of unique values in each variable
  • Dummy variables if need
  • Some visualizations to explore the data
    • Histograms, Frequency Distribution, side by side plots with the outcome, scatterplot, pairplot
    • Other plots according to your discretion
  • Correlation table & Heatmap: Comment on high correlations
  • Conduct a PCA: Discuss how many PCs to use

The Logistic Regression:

  • (don't incorporate PCA or variable reduction through correlations here, run the logistic regression on all variables apart from spending and sequence_number)
  • Partition the data on the whole data set randomly into a training set 60% validation set 40%
  • Run quick descriptive stats for validation and training dataset
  • Fit a logistic regression (set penalty=l2 and C=1e42 to avoid regularization): Predict the model on validation dataset
  • Develop gains and lift chart for test and validation results
  • Confusion matrix for all sets
  • Show some use of stats model if possible

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mathematical Applications For The Management, Life And Social Sciences

Authors: Ronald J. Harshbarger, James J. Reynolds

12th Edition

978-1337625340

More Books

Students also viewed these Mathematics questions