For the following problems, follow all directions and answer all questions. Be brief but precise in your answers. 1) You have available two files: probs.csv and truths.csv. Each file contains \"test\" data on 47944 consumers. The first file contains the estimated probability that each consumer will purchase after having been given an offer, as estimated by a predictive model. The second file contains the actual \"truth\" of whether that customer purchased or not after having been given an offer. The instance numbers are aligned across the two les. Question: Do the probabilities seem to predict purchase well? How would you know? 2a) Let's say that you will target an offer if the estimated probability is >0.5 (this is the threshold that most machine learning methods institute by default). Show the resulting confusion matrix. Calculate the precision, recall, lift, "ue Positive Rate, and False Positive rate. 2b) Let's say that you will target an offer if the estimated probability is >0.05 (that's the approximate base rate in the population). Show the resulting confusion matrix. Calculate the precision, recall, lift, True Positive Rate, and False Positive rate. 2c) Which do you think is a better threshold, and why? 3) Plot the ROC curve for these data. Plot the CRC curve. Explain the difference between the two. Can you show the points on each curve to which the two thresholds above correspond? 4) Assess whether the probabilities are well calibrated. Recall that \"well calibrated\" means that when one estimates a probability of p%, approximately p% of the people with that probability actually purchase. (Hint: bucket the estimated probabilities based on some reasonable intervals.) 5a) If people respond to the offer, we earn $18 net of product costs. To send the offer itself costs $1 (separate from the product costs). Show the cost/benefit matrix for this problem. 5b) Evaluate the overall benefit that would he (would have been) achieved with each threshold for this dataset. 6) Recommend the best targeting threshold to use for these data. Explain how you determined it