Question

1 Approved Answer

Posted on Oct 13, 2024

Case 2 - LAB - Smart Party Ware BUAD425 -2 units DATA ANALYSIS FOR DECISION MAKING Information & Operations Management Marshall School of Business University

Case 2 - LAB - Smart Party Ware BUAD425 -2 units DATA ANALYSIS FOR DECISION MAKING Information & Operations Management Marshall School of Business University of Southern California Scenario Applichem is interested in diversifying its portfolio, one of the company it is interested in is Smart Partyware Company. Smart Partyware Company (SPW) is in the niche party ware business, currently with a fixed customer base, and that they sell innovative plastic party ware to their members. Since they sell a plastic product, they may be a good vertical acquisition target. The Smart Partyware Company's business model is direct-to-consumer marketing. Over the years they have gained dedicated upscale customers and currently have 500,000 members in their database. In the direct-marketing industry, the response rate is measured as a percentage of customers who buy the directly mailed product. Smart Partyware's historical response rate for direct mail to selected members is approximately 10%far above the industry average. SPW was using RFM (Recency-Frequency-Monetary) analysis to target customers. Smart Partyware wants to increase the response rate well beyond the 10% rate. SPW designs new party ware for every campaign, gives a new name to its party ware, and broadly classifies the party ware under one of its many party themes. Most of the designs cut across many themes but are classified into a particular category based on the main design theme in the party ware. The recent product to be marketed is Celebrating American Arts. It has famous American art works printed in the party ware, and even though it falls under the Art Party theme the party ware can be used as well for pool or barbeque or one of the other parties. For analysis purposes, if the member bought the American Arts package, the value of the Art Party variable increases by one. Exhibit 1: Partial List of Variables in SPW Database Variable Name Seq# ID# Gender M R F FirstPurch Sports Party Pool Party Barbeque Party Birthday Party End-of-School-Term Party Art Party Block Party Cooking Party Get Together Movie Night Success Description Sequence number in the partition Identification number in the full (partitioned) market test data set O = Male 1 = Female Monetarytotal money spent on Partyware Recencymonths since last purchase Frequencytotal number of purchases Months since first purchase Number of purchases from the category: Sports Party Number of purchases from the category: Pool Party Number of purchases from the category: Pool Party Number of purchases from the category: Birthday Party Number of purchases from the category: End-of-School-Year Party Number of purchases from the category: Art Party Number of purchases from the category: Block Party Number of purchases of the category: Cooking Party Number of purchases of the category: Get Together Number of purchases of the category: Movie night =1 Celebrating American Arts was bought, = 0 if not Each marketing campaign starts with a trial marketing of 2,000 members: the newly designed party ware is sent to 2,000 randomly selected members from the database, and they have one week to respond. The Information & Operations Management Marshall School of Business University of Southern California packages come with paid return postage; if the member likes it, he or she can keep it, otherwise they have to return it within one week. After two weeks, SPW has all the data it needs to go for mass marketing. The current company policy is not to send packages to more than 100,000 members so that the members do not become tired of repeated marketing campaigns. The members always have the opportunity to visit the SPW Website and buy current and old packages. Most of the old packages are returned packages from marketing campaigns and are sold at discounted values. After analyzing the recent Celebrating American Arts trial marketing data, it is found that 11.2% of the members have bought the new package. The selling price for the package is $60, the mailing cost is $4.50, and the return mail cost is the same. The total cost of producing the package is $10. If the package is returned, it can be sold at discounted rate or destroyedhistorically the expected salvage value has been $15. Based on these assumptions, it is calculated that if the package is mailed to 100,000 randomly selected members then the profit from the marketing campaign will be $154,000, and if they can mine the data perfectly and send only to the members interested in the package they will make $2,548,000. The range is extremely widecurrently, SPW is making an average profit of $700,000 per marketing campaign and a yearly profit of $8.4 million. Selling price per Product Cost per Product Salvage Value per Product Cost of Mailing the Product Cost of Returning the Product Based on the Training Data Set, Level Count Non Buyer 888 Buyer 112 Total 1000 60 10 15 4.5 4.5 Prob 0.88800 0.11200 1.00000 Buyers = 11.2% buyers for this Product and if we assume there are 500,000 potential members, then the total number of Buyers in the 500,000 members is 500,000 * 0.112 = 56,000 Buyers Profit per Product after mailing cost Cost of Mailing the Product to not a buyer 60-10-4.5= 45.5 -10-4.5-4.5+15 = -4 The Maximum profit that can be made is 56,000 * 45.5 = $2,548,000, if we mail Product only to the buyers. If we mail more Products than some of the Products will be returned and it will cost us money. This cost is $4 = (Product cost - Salvage +postage both ways) = (10-15 +9) Marketing department has suggested it is prudent to mail the Products only to maximum of 100,000 members so the Product club members do not become tired of repeated marketing campaign. Let us calculate the Baseline profit if we mail Product randomly. = 100,000 * 0.112 * 45.5 + 100,000 *0.888* (-4) = 154,000 Information & Operations Management Marshall School of Business University of Southern California The Low case Scenario is 154,000 and the best case Scenario is $2,548,000 There are two ways to increase our baseline profit, increase the percentage of identification of buyers and reduce the number of Products shipped (the range will be between 56,000 to 100,000). Our objective is to beat the average profit of $700,000 by using decision tree method or by using the logistic regression method. Plan of action Use the recent Celebrating American Arts trial marketing data to prove we can do a better job than RFM analysis. 1. Provide calculations to show that the Maximum profit based on the training data is $2.548 Million 2. Provide calculations to show that the profit based on the training data is $0.154 Million, if 100,000 packages are mailed randomly to members. 3. Build the Best Decision tree Model using JMP (Go option) on the following conditions, Y = Success X = All predictors Cutoff Probability for mailing = 0.15 a. Interpret the decision tree? b. Interpret R2 and how many splits did you have in the model? c. Examine each of the split variables to explain whether they make business sense? d. Create the confusion matrix for the testing data set. (cutoff Prob. = 0.15) e. What is the expected profit based on the confusion matrix. 4. Build the Best Logistic Regression Model (stepwise) using JMP on the following conditions, Y = Success X = All predictors Cutoff Probability for mailing = 0.15 a. What is the estimated logistic regression equation? = 1.005 - 0.403 GENDER + 0.117 R + 0.49 BDAY - 0.841 ART - 0.511 BLOCK b. Is this Logistic regression model useful? Provide statistical evidence to support your answer and where appropriate use a significance level of 5%. Nominal Logistic Fit for Success Converged in Gradient, 6 iterations Whole Model Test Model Difference Full Reduced RSquare (U) AICc -LogLikelihood 54.75045 295.92605 350.67650 DF 5 ChiSquare 109.5009 0.1561 603.937 Information & Operations Management Marshall School of Business University of Southern California Prob>ChiSq <.0001* BIC Observations (or Sum Wgts) 633.299 1000 Measure Entropy RSquare Generalized RSquare Mean -Log p RMSE Mean Abs Dev Misclassification Rate N Training 0.1561 0.2058 0.2959 0.2931 0.1720 0.1060 1000 Definition 1-Loglike(model)/Loglike(0) (1-(L(0)/L(model))^(2/n))/(1-L(0)^(2/n)) -Log([j])/n (y[j]-[j])/n |y[j]-[j]|/n ([j]Max)/n n Lack Of Fit Source Lack Of Fit Saturated Fitted DF 325 330 5 -LogLikelihood 118.61465 177.31140 295.92605 Parameter Estimates Term Intercept Gender[0] R Birthday Party Art Party Block Party ChiSquare 237.2293 Prob>ChiSq 0.9999 Estimate 1.055833 -0.4038561 0.11760304 0.49312962 -0.8416034 -0.5114807 Std Error 0.2094527 0.1123229 0.0186788 0.1860686 0.1561663 0.1317824 ChiSquare 25.41 12.93 39.64 7.02 29.04 15.06 Prob>ChiSq <.0001* 0.0003* <.0001* 0.0080* <.0001* 0.0001* For log odds of 0/1 Effect Likelihood Ratio Tests Source Nparm DF 1 1 1 1 1 1 1 1 1 1 Gender R Birthday Party Art Party Block Party L-R ChiSquare 12.5279272 54.0152515 8.26973223 28.6412009 14.4414813 Prob>ChiSq 0.0004* <.0001* 0.0040* <.0001* 0.0001* Prediction Profiler c. Interpret the summary values of R2 and how many variables are there in the model? How much of the variability in the model is explained, .1561 Information & Operations Management Marshall School of Business University of Southern California d. Explain the coefficients of variables and state whether they make business sense? (hint: Use profiler) R - Recency, if the number goes up you are further away from buying (more recent buyers buy more) Birthday Party - unlikely to throw other parties Art Parties - if you've had art parties in the past, the odds of buying goes up Block party - if goes up, the odds of buying goes up e. Has the fit (R2) improved compared to Decision Tree? Why? Yes, r2 recorded 10% in the decision tree compared to 15% f. Create the confusion matrix for the testing data set. (cutoff Prob. = 0.15) 5. Contingency Analysis of LogisticPrediction By Success 6. Mosaic Plot 7. 8. 9. Contingency Table 10. Success By LogisticPrediction Count 0 1 0 711 39 750 11. 12. 1 184 66 250 895 105 1000 Tests N 1000 DF 1 -LogLike 38.360474 RSquare (U) 0.0682 13. Test Likelihood Ratio Pearson ChiSquare 76.721 89.673 Prob>ChiSq <.0001* <.0001* 14. Fisher's Exact Test Left Right 2-Tail Prob Alternative Hypothesis 1.0000 Prob(LogisticPrediction=1) is greater for Success=0 than 1 <.0001* Prob(LogisticPrediction=1) is greater for Success=1 than 0 <.0001* Prob(LogisticPrediction=1) is different across Success a. What is the expected profit based on the confusion matrix. $906,800 testing Information & Operations Management Marshall School of Business University of Southern California $920,000 training 5. Build your own best models to predict who will buy \"Celebrating American Arts\" party ware using Logistic regression and Decision Tree. 6. Find the estimated profit based on best models. Use the Profit Calculator - Excel Sheet to calculate profit. PART 1 - Decision Tree Model(s) Note 1: The data has been colored based on buyer and non-buyer and divided into training and testing datasets. To create testing and training data set from raw data set, refer to Appendix 1. Note 2: The process of building a good model is long; it involves the following steps, a. Build a decision tree model on the training data set b. Use the decision tree to predict the propensity(probability) of a member buying the product and store it in JMP as columns (for both training & testing data set) c. Use the propensity to decide who will be mailed the product. d. Switch to testing dataset to get confusion matrix. e. Get Confusion matrix. f. Use the confusion matrix to find out how many members were sent the product and how many bought the product g. Use the confusion matrix to get the profit estimate. Step A: Build a decision tree model on the training data set (for the first 1000 rows of data) 1. 2. Open the SmartPartyWare_Case2.jmp file in JMP, you should see the following file in JMP You should get a Screen like this. 3. Click, Analyze menu Modeling Partition Information & Operations Management Marshall School of Business University of Southern California 4. For Y, Response, select Success; for X columns, select from Gender, M, R, F, Movie Night (select all the predictors) OK. 5. The following screen will show up Information & Operations Management Marshall School of Business University of Southern California 6. Click on the red triangle and at the upper left corner Display options show Split Prob 7. The following screen will show up, note the split probabilities are show in the decision tree. Information & Operations Management Marshall School of Business University of Southern California Based on the above printout, the percentage of buyers in the 1000 training dataset is 0.112 or 11.2% Now we can build the decision tree using \"In-built JMP algorithm or manually\" , if you click the \"Split\" repeatedly you will be building the algorithm manually and if you click on Go then JMP will build the decision tree for you. 8. Click on Go and you will get the following decision tree. The JMP algorithm finds the best decision tree that will do a good job on the training data set and testing data set based on the \"R-square\" KPI, the decision tree algorithm may not be the best choice for our objective of maximizing the profit. To maximize the profit you have to send as many products as possible ( at most 100,000) at the same time select members with high propensity. Information & Operations Management Marshall School of Business University of Southern California i) Let us understand the first split of the decision tree, The first split states that if you mail products to members with a. Recency of less than 16 then the propensity to buy the product is 0.1469 (14.69%) b. Recency of more than or equal to16 then the propensity to buy the product is 0.0443 (4.43%) ii) a. b. Now if we split the R<16 group further, then we get the following groups, If you mail the products to members with R < 16 and Art Party >=1 then the propensity to buy the product is 0.2867 (28.67%) If you mail the products to members with R < 16 and Art Party <1 then the propensity to buy the product is 0.1083 (10.83%) Note: R<16 is a profitable group, among the profitable group we were able to find an unprofitable segment (ArtParty <1). iii) Now if we split the R<16 & Art Party <1 group further, then we get the following groups, Information & Operations Management Marshall School of Business University of Southern California a. b. If you mail the products to members with R < 16 and Art Party < 1 & Recency <8 then the propensity to buy the product is 0.1693 (16.93%) If you mail the products to members with R < 16 and Art Party < 1 & Recency >= 8 then the propensity to buy the product is 0.0733 (7.33%) Note: R<16 & art party <1 is an unprofitable subgroup, among the subgroup we were able to find profitable segment (recency <8). iv) let us understand bottom blocks of decision tree, there are 5 groups and can get additional information about from leaf diagram, operations management marshall school business university southern california v) click on red triangle at upper left corner report you will following screen, above gives propensity buy for various (5 this tree). now have decide which group select mail product. vi) know basic response rate 11.2%, if 15% as cut off then, these be selected mailing, note: number mailing (36 + 107 189) =232 per thousand members, so approximately 23.2% 0.232*500,000 =116,000 more than 100,000, but 100,000 out 116,000 mail. vii) 19 % 107) =143 14.3% 0.143*500,00 0 =71,500 less 100,000. see higher cutoff, lower members sent products. our objective maximum (close 100,000) with high buy. so, split further or play cutoff probabilities groups. step b: use tree predict (probability) a member buying product store it in jmp columns (for both training testing data set) 1. save prediction formula. do predicteds, only saves values first 1000 rows. formula, all 2. created file **note: not show up window *** main column prob(success == 1) column, estimates c: who mailed create new named decisionbuy (any name ok - i decisionbuy1 inform algorithm had built buyer using tree) right empty space (give column) modeling type change nominal ok. \"decisionbuy1\" created. 2) formula 3) opens, set functions conditional 4) window, restricts formulas, given desired formulas.> 0.15, then DecisionBuy1 =1 Else, DecisionBuy1 =0 This step involves steep learning curve, so practice it. 5) Now Select Prob(Success==1) from the table column and click it. The following window will show up. Information & Operations Management Marshall School of Business University of Southern California 6) The next step is to compare the Prob(Success=1), Select the comparison on the function group shown above and select a >= b option, The following will show in the formula window, The following will show in the formula window, Information & Operations Management Marshall School of Business University of Southern California The Red rectangle is the active window in the formula window, whatever you type will be entered here, 7) Type in 0.15 (the propensity you have selected), you will see the following on the formula screen, 8) Click on the \"then clause\" window and type in 1and Click on the \"else clause\" window and type in 0 and You will see the following on the formula screen, 9) Now click ok and the \"DecisionBuy1\" column will be updated. Step D: Switch to testing dataset 1) Currently the first 1000 rows form the \"Training data\" for analysis we have to study the effectiveness of the Decision tree algorithm on the \"Testing data\" which is the bottom 1000 rows. We will switch the dataset as follows, 2) Highlight the rows from 1001 to 2000, by left click on row 1001 and scrolling to 2000 row, then right click to get the menu given below, and then select Exclude/Unexclude option as show below, 3) The Highlighted rows from 1001 to 2000 will now be Unexcluded as shown below, Information & Operations Management Marshall School of Business University of Southern California 4) Now, highlight the rows from 1 to 1000, by left click on row 1 and scrolling to 1000 row, then right click to get the menu given below, and then select Exclude/Unexclude option as show below, 5) The Highlighted rows from 1 to 1000 will now be Excluded as shown below, Now we have switched from \"Training Data set\" to \"Testing Dataset\" Step E: Get Confusion matrix 1) Go to the Analyze Menu and select Fit Y by X as shown below, Information & Operations Management Marshall School of Business University of Southern California 2) A new screen will open up as follows, Now select DecisionBuy1 column for Y, Response and click Now select Success Column for X and click And Click OK 3) You should have the following screen, Information & Operations Management Marshall School of Business University of Southern California Y, response X, Factor 4) You can now make the confusion matrix simpler by clicking on the red triangle and unselect the following. Unselect Total %, Col % and Row %. 5) Now the result will look like Information & Operations Management Marshall School of Business University of Southern California Note: now you have got the confusion matrix for the testing dataset. Step F: Use the confusion matrix to find out how many members were sent the product and how many bought the product 1) From the above Confusion Matrix, we get the following information, Based on the Testing Data Set, Level Members Selected for Mailing Total Members Mailed based on Algorithm Actual number of Members Mailed based on Restriction Probability of Buying for the mailed members Probability of Non-Buying for the mailed members Count 324 (324/1000) 500,000 = 162,000 100,000 * Percentage 324/1000 = 32.4% 32.4% 1.00000 = (66/324) = 20.3704 % == (258/324) = 79.6296 % Marketing department has suggested it is prudent to mail the Products only to maximum of 100,000 members so the Product club members do not become tired of repeated marketing campaign. Let us calculate the Profit for the selected Decision Tree, = 100,000 * 0.203704 * 45.5 + 100,000 *0.796296* (-4) = 608,333 We did not beat the bench mark of $700,000, maybe we need to split further to reduce the mailing percentage and increase the propensity of buy. Information & Operations Management Marshall School of Business University of Southern California PART 2 - Logistic Regression Model(s) a. b. c. d. e. f. g. Build a Logistic Regression model on the training data set Use the Logistic Regression to predict the propensity(probability) of a member buying the product and store it in JMP as columns (for both training & testing data set) - These two steps are similar to regression The following steps are similar to Part1 Use the propensity to decide who will be mailed the product. Switch to testing dataset to get confusion matrix. Get Confusion matrix. Use the confusion matrix to find out how many members were sent the product and how many bought the product Use the confusion matrix to get the profit estimate. 1) Open the SmartPartyWare_Case2.jmp file in JMP, you should see the following file in JMP (I am starting from the beginning from the original data set). 2) Let us do the logistic Regression Analysis, go to Analyze and click on Fit Model 3) The following screen will show up; Select \"Success\" on the Select Columns and then click on \"Y\" under Pick Role Variables. Select the X variables \"Gender - MovieNight\" and click on Add under Construct Model Effects, then under Personality select the \"Stepwise\" option. You will see the following screen with information filled as shown below, then click Run Information & Operations Management Marshall School of Business University of Southern California 4) The following screen will show up; the stepwise gives you three options, forward, backward or mixed, we will use Forward option. If you are using the forward option then none of the parameters should be entered, if you are using the backward option then all of the parameters should be entered. Information & Operations Management Marshall School of Business University of Southern California 5) Click on Go button and you will get the following window, According to \"Stepwise Regression\" the most important variable is Art Party\" followed by \"R (Recency)\" etc. The best Model consists of \"6\" parameters (5 variables plus intercept\". Now click on Make Model to select the best model. Information & Operations Management Marshall School of Business University of Southern California 6) The following screen will show up; According to \"Stepwise Regression\" the most important variable is Art Party\" followed by \"R (Recency)\" etc. Now click on run. 7) The following screen will show up; this is the Multiple Regression model, I have clicked on Prediction Profiler to show the propensity relationship. Information & Operations Management Marshall School of Business University of Southern California Information & Operations Management Marshall School of Business University of Southern California Step B: Use the Logistic Regression to predict the propensity (probability) of a member buying the product and store it in JMP as columns (for both training & testing data set) 1. Click the red triangle at the upper left corner Save Probability Formula 2. The following columns will be created in the JMP file **Note: it will not show up in the Logistic Tree window *** The main column is the Prob[ 1] column, it estimates the member's propensity to buy the product. Step C: Use the propensity to decide who will be mailed the product. (Similar to Part1) 1) Create a new column named LogisticBuy1 (any name is ok - I selected LogisticBuy1 to inform it is the first algorithm I had built to predict the buyer using Logistic Regression) 2) Right click the empty column space New Column LogisticBuy1 (Give a name for the new column) click on Modeling type and change it to Nominal Click OK. Information & Operations Management Marshall School of Business University of Southern California A new Column \"LogisticBuy1\" is created. 3) Right click LogisticBuy1column Formula 4) New window opens, Set the Functions as Conditional If 5) The following will show in the formula window, Note: JMP restricts the formulas, you have to select the functions given in the formula window to create desired formulas. Information & Operations Management Marshall School of Business University of Southern California Our Objective is to create the following formula, If Prob(Success = =1) > 0.15, then LogisticBuy1 =1 Else, LogisticBuy1 =0 This step involves steep learning curve, so practice it. 6) Now Select Prob[1] from the table column and click it.. 7) The next step is to compare the Prob[1], Select the comparison on the function group shown above and select a >= b option, The following will show in the formula window, The Red rectangle is the active window in the formula window, whatever you type will be entered here, 8) Type in 0.15 (the propensity you have selected), you will see the following on the formula screen, 9) Click on the \"then clause\" window and type in 1and Click on the \"else clause\" window and type in 0 and you will see the following on the formula screen, 10) Now click ok and the \"Logistic Buy1\" column will be updated. Step D: Switch to testing dataset - Same as Part1 Information & Operations Management Marshall School of Business University of Southern California Step E: Get Confusion matrix - Same as Part1 Step F: Use the confusion matrix to find out how many members were sent the product and how many bought the product Let us calculate the Profit for the selected Logistic Regression, = 100,000 * (66/250) * 45.5 + 100,000 *(184/250) * (-4) = $ 906,800 We did beat the bench mark of $700,000, Can we do better? Try other variables in the logistic regression or have higher cutoff value for Sending products , may be use 0.2 instead of 0.15 Appendix 1 - 1. To open a file in JMP \"JMP \" \"File\" menu \"Open\" Locate \".JMP\" and click \"Open\" SmartPartyWare_Case2.jmp 2. To color the rows Click the triangular part below the diagonal of the corner cell of the data form \"Color or Mark by Column\" Select \"Success\" OK. Information & Operations Management Marshall School of Business University of Southern California 3. To create the training data set and test data set 1) Create a new column named \"random\". Here is how to create a new column: a) Right click the empty column space b) \"New Column\"; c) Give a name to the column, \"Random\