1.
2.
3.
4.
Natershed is a media services company that provides online streaming movie and television content. As a result of the competitive market of streaming service providers, Watershed is interested in proactively identifying will unsubscribe in the next three months based on the customer's characteristics. For 3 test set of customers, the file Watershed contains an indication of whether a customer unsubscribed in the past three months and the dassification model's estimated unsubseribe probability for the customer. In an effort to prevent customer churn, Watershed wishes to offer promotions to customers who may unsubscribe. It costs Watershed $15 to offer a promotion to a customer. If offered a promotion, it successfully persuades a customer to remain a Watershed customer with probability 0.8, and the retaining the customer is worth $70 to Watershed. Click on the datafile logo to reference the data. DATA file Assuming customers will be offered the promotion in order of decreasing estimated unsubscribe probability; determine how many customers Watershed should offer the promotion to maximize the profit of the intervention campaign. Compute the average profit from offering the top n customers a promotion as: Proht = Number of unsubscribing customers in top n (P(unsubscribing customer persuaded to remain )(7015) +P( unsubscribing customer is not persuaded )(015)) + Number of customers who don't intend to unsubscribe (015) The maximum profit of 5 occurs when customers are offered the prometion. Honey is a technology company that provides online coupons to its subscribers. Honey's analytics staff has developed a classificatian method to predict whether a customer who has been sent a coupon will apply the coupon toward a purchase. For a sample of customers, the following table lists the classification model's estimated coupon usage probability for a customer. For this particular campaign, suppose that when a customer uses a coupon, Honey recelves $1 in revenue from the product sponsor. To target the customer with the coupon offer, Honey incurs a cost of $0.05. Honey will offer a customer a coupon as long as the expected profit of doing so is positive. Using the equation Expected Profit of Coupon offer =P( coupon used ) Profit if coupon used +(1P( coupon used )) Profit if coupon not used determine which customers should be sent the coupon. Determine the expected proft for each customer, Round your answers to the nearest cent, Enter negative value as negative number, if any. The expected profit is positive for customers so these customers A university is applying classification methods in order to identify alumni who may be interested in donating money. The university has a database of 58,205 alumni proflies containing numerous variables, Of these 58,205 alumni, only 576 have donated in the past. The university has oversampled the data and trained a random forest of 100 classification trees. For a cutoff value of 0.5, the following confusion matrix summarizes the performance of the random forest on a validation set: The following table lists some information on individual observations from the validation set: (a) Choose the correct explanation for how the probability of Donation was computed for the three observations. (i) The probability of Donation for each observation is the proportion of the 100 individual classification trees that dassified the observation as "Donation." (ii) The probabilaty of Donation for each observation is the proportion of the 100 individual classification trees that classified the observation as "No Donation." (iii) The probability of Donation for each observation is the ratio of the individual classification trees that classifled the observation as "Donation" and those that classifled it as "No Donation." (iv) The probability of Donation for each observation is the ratio of the individual classification trees that classifled the observation as "No Donation" and those that classified it as "Donation." Why were Dbservations A and C classified as Donation and Observation B was classified as No Donation? If required, round your answers to one decimal place. The probability of Donation for Observation A is classified as Donation by the random forest. than 0.5,50 Observation A is The probability of Donation for Observation B is cassified as No Donation by the random forest. than 0.5,50 Observation B is The probability of Donation for Observation C is (3. it is than 0.5, so Observation C is classified as Donation by the random forest. (b) Compute the values of accuracy, sensitivity, specificity, and precision. Explain why accuracy is a misleading measure to consider in this case. Evaluate the performance of the random forest, particularly commenting on the precision measure. If required, round your answer to three decimal places, Accuracy = b) Compute the values of accuracy, sensitivity, specificity, and precision. Explain why accuracy is a misleading measure to consider in this case. Evaluate the performance of the random forest, particularfy commenting on the precislon measure. If required, round your answer to three decimal places. Accuracy = If required, round your answers to the nearest whole percentage. Accuracy is not the best measure to use for unbalanced data sets because less than donated. If required, round your answers for Sensitivity and Specificity to three decimal places and round your answer for Precision to four decimal places. Sensitivity = Specificity = Precision = The value of precision seems disturbingly 1. The precision measure represents the percentage of alumni classified by the random forest as that are donors. Comparing the value of precision with the proportion of observations corresponding to donations, there a tremendous improvement in the ability to target alumni who may be more likely to donate. Casey Deesel is a sports agent negotiating a contract for Titus Johnston, an athlete in the National Football League (NFL). An important aspect of any NFL contract is the amount of guaranteed money over the life of the contract. Casey has gathered data on 506NFL athletes who have recently signed new contracts. Each observation (NFL athlete) indudes values for percentage of his team's plays that the athlete is on the field (Snappercent), the number of awards an athlete has recelved recognizing on-field performance (Awards), the number of games the athlete has missed due to injury (GamesMissed), and millions of dollars of guaranteed money in the athlete's most recent contract (Money, dependent variable). Casey has trained a full regression tree on 304 observations and then used the validation set to prune the tree to obtain a best-pruned tree. The bestpruned tree (as applied to the 202 observations in the validation set) is: a) Titus Johnston's variable values are: SnapPercent =96, Awards =7, and GamesMissed =3. How much guaranteed money does the regression tree predict that a player with Titus Johnson's profle should eam in his contract? If required, round your answers to two decimal places. The predicted result is 1 (3) million of guaranteed money. Natershed is a media services company that provides online streaming movie and television content. As a result of the competitive market of streaming service providers, Watershed is interested in proactively identifying will unsubscribe in the next three months based on the customer's characteristics. For 3 test set of customers, the file Watershed contains an indication of whether a customer unsubscribed in the past three months and the dassification model's estimated unsubseribe probability for the customer. In an effort to prevent customer churn, Watershed wishes to offer promotions to customers who may unsubscribe. It costs Watershed $15 to offer a promotion to a customer. If offered a promotion, it successfully persuades a customer to remain a Watershed customer with probability 0.8, and the retaining the customer is worth $70 to Watershed. Click on the datafile logo to reference the data. DATA file Assuming customers will be offered the promotion in order of decreasing estimated unsubscribe probability; determine how many customers Watershed should offer the promotion to maximize the profit of the intervention campaign. Compute the average profit from offering the top n customers a promotion as: Proht = Number of unsubscribing customers in top n (P(unsubscribing customer persuaded to remain )(7015) +P( unsubscribing customer is not persuaded )(015)) + Number of customers who don't intend to unsubscribe (015) The maximum profit of 5 occurs when customers are offered the prometion. Honey is a technology company that provides online coupons to its subscribers. Honey's analytics staff has developed a classificatian method to predict whether a customer who has been sent a coupon will apply the coupon toward a purchase. For a sample of customers, the following table lists the classification model's estimated coupon usage probability for a customer. For this particular campaign, suppose that when a customer uses a coupon, Honey recelves $1 in revenue from the product sponsor. To target the customer with the coupon offer, Honey incurs a cost of $0.05. Honey will offer a customer a coupon as long as the expected profit of doing so is positive. Using the equation Expected Profit of Coupon offer =P( coupon used ) Profit if coupon used +(1P( coupon used )) Profit if coupon not used determine which customers should be sent the coupon. Determine the expected proft for each customer, Round your answers to the nearest cent, Enter negative value as negative number, if any. The expected profit is positive for customers so these customers A university is applying classification methods in order to identify alumni who may be interested in donating money. The university has a database of 58,205 alumni proflies containing numerous variables, Of these 58,205 alumni, only 576 have donated in the past. The university has oversampled the data and trained a random forest of 100 classification trees. For a cutoff value of 0.5, the following confusion matrix summarizes the performance of the random forest on a validation set: The following table lists some information on individual observations from the validation set: (a) Choose the correct explanation for how the probability of Donation was computed for the three observations. (i) The probability of Donation for each observation is the proportion of the 100 individual classification trees that dassified the observation as "Donation." (ii) The probabilaty of Donation for each observation is the proportion of the 100 individual classification trees that classified the observation as "No Donation." (iii) The probability of Donation for each observation is the ratio of the individual classification trees that classifled the observation as "Donation" and those that classifled it as "No Donation." (iv) The probability of Donation for each observation is the ratio of the individual classification trees that classifled the observation as "No Donation" and those that classified it as "Donation." Why were Dbservations A and C classified as Donation and Observation B was classified as No Donation? If required, round your answers to one decimal place. The probability of Donation for Observation A is classified as Donation by the random forest. than 0.5,50 Observation A is The probability of Donation for Observation B is cassified as No Donation by the random forest. than 0.5,50 Observation B is The probability of Donation for Observation C is (3. it is than 0.5, so Observation C is classified as Donation by the random forest. (b) Compute the values of accuracy, sensitivity, specificity, and precision. Explain why accuracy is a misleading measure to consider in this case. Evaluate the performance of the random forest, particularly commenting on the precision measure. If required, round your answer to three decimal places, Accuracy = b) Compute the values of accuracy, sensitivity, specificity, and precision. Explain why accuracy is a misleading measure to consider in this case. Evaluate the performance of the random forest, particularfy commenting on the precislon measure. If required, round your answer to three decimal places. Accuracy = If required, round your answers to the nearest whole percentage. Accuracy is not the best measure to use for unbalanced data sets because less than donated. If required, round your answers for Sensitivity and Specificity to three decimal places and round your answer for Precision to four decimal places. Sensitivity = Specificity = Precision = The value of precision seems disturbingly 1. The precision measure represents the percentage of alumni classified by the random forest as that are donors. Comparing the value of precision with the proportion of observations corresponding to donations, there a tremendous improvement in the ability to target alumni who may be more likely to donate. Casey Deesel is a sports agent negotiating a contract for Titus Johnston, an athlete in the National Football League (NFL). An important aspect of any NFL contract is the amount of guaranteed money over the life of the contract. Casey has gathered data on 506NFL athletes who have recently signed new contracts. Each observation (NFL athlete) indudes values for percentage of his team's plays that the athlete is on the field (Snappercent), the number of awards an athlete has recelved recognizing on-field performance (Awards), the number of games the athlete has missed due to injury (GamesMissed), and millions of dollars of guaranteed money in the athlete's most recent contract (Money, dependent variable). Casey has trained a full regression tree on 304 observations and then used the validation set to prune the tree to obtain a best-pruned tree. The bestpruned tree (as applied to the 202 observations in the validation set) is: a) Titus Johnston's variable values are: SnapPercent =96, Awards =7, and GamesMissed =3. How much guaranteed money does the regression tree predict that a player with Titus Johnson's profle should eam in his contract? If required, round your answers to two decimal places. The predicted result is 1 (3) million of guaranteed money