Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

use the Housing CA.csv. file path below and Orange open-source data visualization software to complete questions a, b, c, and d https://uiowa.instructure.com/courses/216942/files/23138924/download?download_frd=1 Housing information on

use the Housing CA.csv. file path below and Orange open-source data visualization software to complete questions a, b, c, and d


https://uiowa.instructure.com/courses/216942/files/23138924/download?download_frd=1


Housing information on 20,640 census tracts (specified by a longitude/latitude) is provided in the file HousingCA.csv. The variables in this data set are:


Longitude Longitude of region Latitude Latitude of region HousingMedianAge Median age of housing in region TotalRooms Total number of rooms in region's housing TotalBedrooms Total number of bedrooms in region's housing Population Region's population Households Number of households in region MedianIncome Median income of region's residents MedianHouseValue Median house value of region (in $1000s)


Using the data in ChurnImbalanced.csv, construct the following classification models to classify a customer observation as "leave" or "stay." Note that the primary target class of interest is the "leave" category as the phone company would like to intervene and retain these customers. Split the data so that 80% is used for training/validation in a 10-fold cross-validation experiment and 20% is used for a test set.


a) Use lasso regularization in conjunction with 10-folds cross-validation to evaluate and select a logistic regression model. Report the value of the lasso penalization determined in the cross-validation experiment and the corresponding value of the AUC. Then, construct your final model on all

of the training/validation data and report its performance measures (confusion matrix metrics, AUC, lift) on the test set.

b) Use ridge regularization in conjunction with 10-folds cross-validation to evaluate and select a logistic regression model. Report the value of the ridge penalization determined in the cross-validation experiment and the corresponding value of the AUC. Then, construct your final model on all of the training/validation data and report its performance measures (confusion matrix metrics, AUC, lift) on the test set.

c) Employ k-nearest neighbors in conjunction with 10-folds cross-validation to evaluate and select a classification model. Identify the value of k that results in the largest value of AUC. Then, construct your final model on all of the training/validation data and report its performance measures (confusion matrix metrics, AUC, lift) on the test set.

d) Compare the classification models in parts (a), (b), and (c).




Step by Step Solution

3.44 Rating (170 Votes )

There are 3 Steps involved in it

Step: 1

To complete questions a b c and d youll need to follow these general steps Data Preprocessing Load the dataset from the provided CSV file using Orange ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Economics

Authors: R. Glenn Hubbard

6th edition

978-0134797731, 134797736, 978-0134106243

More Books

Students also viewed these Programming questions