Question

1 Approved Answer

Posted on Oct 07, 2024

1.Load the LoanData.csv data set into R.It lists the outcome of 5611 loans.The data variables include loan status (current, late or in default), credit grade

1.Load the LoanData.csv data set into R.It lists the outcome of 5611 loans.The data variables include loan status (current, late or in default), credit grade (from best rating AA to the worst one, HC for heavy risk), loan amount, loan age (in months), borrower's interest rate and the debt to income ratio. Code loan status as a binary outcome (0 for current loans, 1 for late or default loans). Code debt-to-income ratio into three levels ('low' for ratio<10%, 'medium' for ratio between 10% and 30%, 'high' for ratio above 30%).[10 points]

2.Fit the recoded data set using logistic regression.Use Credit.Grade, Amount, Age, Borrower.Rate and Debt to Income Ratio (recoded) as the explanatory variables.Copy the glm summary output from R and paste it below. [10 points]

3.Evaluate in-sample fitting of your logistic regression model using .5 as the cutoff probability.Display the confusion matrix below. [10 points]

4.The cutoff probability should be around 92.43% with symmetric costs of misclassification.Why?Display the confusion matrix using the updated cutoff probability below.What's the overall in-sample misclassification rate in this case?[10 points]

5.Randomly select 4611 out of 5611 loans as your training set.Apply the fitted logistic model to the 1000 loans from your test set.Choose the appropriate cutoff probability assuming symmetric costs of misclassification [see step 4].What's your out-of-sample prediction accuracy rate based on the test set's confusion matrix?[10 points]

6.Sort the 1000 loans in your test set according to the predicted default probabilities in decreasing order.Use a FOR loop to calculate the lift. Then plot the lift chart for your test set.[10 points]

7.Calculate the out-of-sample prediction accuracy rate for 20 random test samples (sample size=1000).Display the 20 accuracy rates and their mean below. [10 points]

8.Please briefly explain why Nave Bayes classifier is considered as a nave implementation of the Bayes' Theorem?[10 points]

9.Load packages textir and e1071 into R.Perform Nave Bayes Analysis using the political sentiment data set (as we did in lecture 10).Use 300 randomly selected observations as training set and the remaining 100 as your test set.Display the test set's confusion matrix below. [10 points]

10.Calculate the out-of-sample prediction accuracy rates for ten random test samples (sample size =100).Display the 10 accuracy rates and their mean below. [10 points]