A national veterans organization wishes to develop a predictive model to improve the cost-effectiveness of their direct

Question:

A national veterans’ organization wishes to develop a predictive model to improve the cost-effectiveness of their direct marketing campaign. The organization, with its in-house database of over 13 million donors, is one of the largest direct-mail fundraisers in the United States. According to their recent mailing records, the overallresponse rate is 5.1%. Out of those who responded (donated), the average donation is $$13.00$. Each mailing, which includes a gift of personalized address labels and assortments of cards and envelopes, costs $$0.68$ to produce and send. Using these facts, we take a sample of this dataset to develop a classification model that can effectively capture donors so that the expected net profit is maximized. Weighted sampling is used, underrepresenting the non-responders so that the sample has equal numbers of donors and non-donors.
Data The file Fundraising.csv contains 3120 records with 50% donors (TARGET_B=1)and 50% non-donors (TARGET_B =0). The amount of donation (TARGET_D) is also included but is not used in this case. The descriptions for the 22 attributes (including two target attributes) are listed in Table 23.9.
Assignment Step 1—Partitioning: Partition the dataset into 60% training and 40% holdout (set the seed to 12345).
Step 2—Model Building: Follow the following steps to build, evaluate, and choose a model.
1. Select classification tool and parameters: Run at least two classification models of your choosing. Be sure NOT to use TARGET_D in your analysis.
Describe the two models that you chose, with sufficient detail (method, parameters, attributes, etc.) so that it can be replicated.
2. Classification under asymmetric response and cost: What is the reasoning behind using weighted sampling to produce a training set with equal numbers of donors and non-donors? Why not use a simple random sample from the original dataset?
3. Calculate net profit: For each method, calculate the cumulative gains of net profit for both the training and holdout sets based on the actual response rate (5.1%.) Again, the expected donation, given that they are donors, is $$13.00$, and the total cost of each mailing is $$0.68$. (Hint: To calculate estimated net profit, we will need to undo the effects of the weighted sampling and calculate the net profit that would reflect the actual response distribution of 5.1% donors and 94.9% non-donors. To do this, divide each row’s net profit by the oversampling weights applicable to the actual status of that row. The oversampling weight for actual donors is 50%/5.1% = 9.8. The oversampling weight for actual non-donors is 50%/94.9% = 0.53.)

4. Draw lift charts (cumulative gains curves): Draw the different models’ net profit cumulative gains curves for the holdout set in a single plot (net profit on the y-axis, proportion of list or number mailed on the x-axis).
Is there a model that dominates?
5. Select best model: From your answer in (4), what do you think is the “best”
model?
Step 3—Testing: The file FutureFundraising.csv contains the attributes for future mailing candidates.
6. Using your “best” model from Step 2 (number 5), which of these candidates do you predict as donors and non-donors? List them in descending order of the probability of being a donor. Starting at the top of this sorted list, roughly how far down would you go in a mailing campaign?

Fantastic news! We've Found the answer you've been seeking!