Answered step by step
Verified Expert Solution
Link Copied!

Question

...
1 Approved Answer

(c) A rst idea is to use tree _score to classify potential customers (the hypothesis being he that dog owners might seek residences located close

image text in transcribed
image text in transcribed
(c) A rst idea is to use tree _score to classify potential customers (the hypothesis being he that dog owners might seek residences located close to a locale with many trees). To use this idea, we would set a threshold for this variable such that we send pamphlets only to customers whose tree score is above (or below ) that threshold Find the optimal threshold for the training data , and construct the confusion matrix for the training data. What would the resulting prot have been if DogBark inc. had used this method to target potential customers in the training data with this method (d) Construct a confusion matrix and calculate expected prot for the classier from the previous question on the validation data. Is the performance better or worse? Explain why. (e) if DogBark inc. had a perfect classier , what would its prot be on the validation data '2 Is this higher or lower than the performance on the previous part? Explain why. (f) Fit a logistic regression to the training data using all the covariates to predict dog ownership .Print the estimated coefcients and interpret them. (g) Use the output of the logistic regression to create a classier .What is the threshold that maximizes DogBark inc.'s prots , based on the training set? What is the confusion matrix on the validation set? What is the profit on the validation set? (h) Fit a decision tree to the training data to predict who is a dog owner . Use cross -validation to nd the best tree and plot it. How would you use your tree to decide whom to send pamphlets to (based on the training data). What is the confusion matrix on the validation data 1? What is the prot on the validation set? (i) Which method would you recommend using? (j) Suppose that DogBark inc. got stuck with a large inventory of dog toys, and wishes to change the goal of the market campaign . Instead of maximizing profits , DogBark inc. wishes to use the marketing campaign to get 1,000 purchases by sending the minimal number of pamphlets. DogBark inc. has mailing information and the indexes included in the dataset about 1,000,000 potential customers .The data in dog .csv is a representative sample of the larger dataset . Using the classier you selected in the previous question, how many pamphlets (on average) would DogBark inc. have to send in order to get 1,000 purchases 'E' Question 2 (Quality of Classication ) Tahoe Healthcare has been approached by a healthcare analytics company, Xaltra, about a new system for managing readmissions. The CEO of Tahoe believes the initial success with the CareTracker system could be enhanced through the use of better predictive analytics and is intrigued by Xaltra's approach The Xaltra system merges data from a variety of hospital systems to provide state -of-the -art predictions of readmissions risk. It uses up to 50,000 variables and generates its predictions using advanced machine learning algorithms. Xaltra has been adopted by a number of hospitals throughout the country. The price for the Xaltra system is $250,000 in up-front licence and integration fees plus an ongoing $45,000 per year fee for maintenance and support . The expected lifetime of the system is 10 years . Xaltra requested a sample of data to test their systems's performance against Tahoe's current logistic regression method (the one we developed together in class }. Tahoe provided Xaltra with the same dataset used in our class session ,which includes admissions and outcome data for all AMI patients treated by Tahoe in the last three years. The ROC curve results for the Xaltra test are provided in the file xaltra .csv . For convenience, these are shown alongside the ROC curve for Tahoe's current logistic prediction system. (Note: for this problem, all the data required for the analysis is provided in the problem . There is no need to use data from the corresponding class session.) You will remember that the dataset used to generate these ROC curves had 998 positive cases (patients readmitted) and 3,384 negative cases (patients not readmitted). The cost of CareTracker is $1,200 per patient , and it reduces the chances of readmission by 40%. The nancial penalty for a readmission is $8,000

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Managerial Accounting Tools for Business Decision Making

Authors: Jerry J. Weygandt, Paul D. Kimmel, Donald E. Kieso, Ibrahim M. Aly

4th Canadian edition

978-1118856994

Students also viewed these Mathematics questions

Question

rank order clustering

Answered: 1 week ago