Question

1 Approved Answer

Posted on Oct 13, 2024

Be as clear as possible. Vague answers -- even if they are long -- will not receive full credit. Hands-on questions are often open in

Be as clear as possible. Vague answers -- even if they are long -- will not receive full credit. Hands-on questions are often open in its nature. You are encouraged to write down your opinions/comments that are in excess of what the questions ask. Nevertheless, apparently incorrect information (or data included in your submission that does not serve any purpose, or formatted in a way that makes reading and assessment challenging), even if unwarranted, can lead to penalty. Therefore, proofread your report to tidy it up before submission. Questions 1. Conducting Classification Using Decision Trees (4 Points) A supermarket is offering a new line of organic products. The supermarket's management wants to determine which customers are likely to purchase these products. The supermarket has a customer loyalty program. As an initial buyer incentive plan, the supermarket provided coupons for the organic products to all of the loyalty program participants and collected data that includes whether these customers purchased any of the organic products. The ORGANICS data set (available in SAS Metadata Repository in library AAEM -- the one you downloaded before for in-class use) contains 13 variables and over 22,000 observations. The variables in the data set are shown below with the appropriate roles and levels: Name Model Role ID DemAffl DemAge DemCluster DemClusterGroup DemGender DemRegion DemTVReg PromClass ID Input Input Rejected Input Input Input Input Input Measureme nt Level Nominal Interval Interval Nominal Nominal Nominal Nominal Nominal Nominal PromSpend PromTime TargetBuy TargetAmt Input Input Target Rejected Interval Interval Binary Interval Description Customer loyalty identification number Affluence grade on a scale from 1 to 30 Age, in years Type of residential neighborhood Neighborhood group M = male, F = female, U = unknown Geographic region Television region Loyalty status: tin, silver, gold, or platinum Total amount spent Time as loyalty card member Organics purchased? 1 = Yes, 0 = No Number of organic products purchased Although two target variables are listed, these exercises concentrate on the binary variable TargetBuy. https://www.coursehero.com/file/12108577/mis6324-HW3/ a. Create a new diagram named Organics. b. Define the data set AAEM.ORGANICS as a data source for the project. 1) Set the model roles for the analysis variables as shown above. 2) Examine the distribution of the target variable. What is the proportion of individuals who purchased organic products? Put your answer in your report. Similarly, answering all following questions in your report. 3) The variable DemClusterGroup contains collapsed levels of the variable DemCluster. Presume that, based on previous experience, you believe that DemClusterGroup is sufficient for this type of modeling effort. Set the model role for DemCluster to Rejected. 4) As noted above, only TargetBuy will be used for this analysis and should have a role of Target. Can TargetAmt be used as an input for a model used to predict TargetBuy? Why or why not? 5) Finish the Organics data source definition. c. Add the AAEM.ORGANICS data source to the Organics diagram workspace. d. Add a Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation. e. Add a Decision Tree node to the workspace and connect it to the Data Partition node. f. Create a decision tree model automatically. Use average square error as the model assessment statistic. Include the resulting tree in your report. 1) How many leaves are in the optimal tree? 2) Which variable was used for the first split? g. Add a second Decision Tree node to the diagram and connect it to the Data Partition node. 1) In the Properties panel of the new Decision Tree node, change the maximum number of branches to 3 to enable three-way splits. 2) Create a decision tree model. Use average square error as the model assessment statistic. 3) How many leaves are in the optimal tree? h. Based on average square error, which of the decision tree models appears to be better? 2. Conducting Classification Using Regression (3 Points) Continue with the ORGANICS data set and the diagram you have just created. a. Is any missing values imputation needed? (Hint: you can use StatExplore node to check if there is any missing value) If yes, should you do imputation before generating the decision tree models? Why or why not? b. Add an Impute node to the diagram and connect it to the Data Partition node. Set the node to impute U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs. c. Add a Regression node to the diagram and connect it to the Impute node. d. Choose stepwise as the selection model and the validation error as the selection criterion. e. Run the Regression node and view the results. Which variables are included in the final model? 3. Conducting Classification Using Neural Network (3 Points) Continue with the ORGANICS data set and the diagram you have just created. a. Add a Neural Network tool to the Organics diagram. Connect the Impute node to the Neural Network node. b. Set the model selection criterion to average error. c. Run the Neural Network node and examine the validation average squared error. How does it compare to other models