Answered step by step
Verified Expert Solution
Question
1 Approved Answer
For this assignment you will write an R program to complete the tasks given below. You will hand in two files for this assignment.A File
For this assignment you will write an R program to complete the tasks given below. You will hand in two files for this assignment.A File with your R programA PDFDOC file with your output code.Use the following fileR Data Set: HMEQScrubbed.csv in the zip file attachedThe Data Dictionary in the zip file.Note: The HMEQScrubbed.csv file is a simple scrubbed file from the previous week homework. If you did more advanced scrubbing of data for last week, you may use your own data file instead. You might get better accuracy! If you decide to use your own version of HMEQScrubbed.csv please hand it in along with the other deliverables.This assignment is an extension of the Week assignment. We will now incorporate Regression Analysis to the problem.Step : Use the Decision Tree Random Forest Decision Tree code from Week as a Starting PointIn this assignment, we will build off the models developed in Week Now we will add Regression to the models.Step : Classification ModelsUsing the code discussed in the lecture, split the data into training and testing data sets.Do not use TARGETLOSSAMT to predict TARGETBADFLAG.Create a LOGISTIC REGRESSION model using ALL the variables to predict the variable TARGETBADFLAGCreate a LOGISTIC REGRESSION model and using BACKWARD VARIABLE SELECTION.Create a LOGISTIC REGRESSION model and using a DECISION TREE and FORWARD STEPWISE SELECTION.List the important variables from the Logistic Regression Variable Selections.Compare the variables from the logistic Regression with those of the Random Forest and the Gradient Boosting.Using the testing data set, create a ROC curves for all models. They must all be on the same plot.Display the Area Under the ROC curve AUC for all models.Determine which model performed best and why you believe this.Write a brief summary of which model you would recommend using. Note that this is your opinion. There is no right answer. You might, for example, select a less accurate model because it is faster or easier to interpret.Step : Linear RegressionUsing the code discussed in the lecture, split the data into training and testing data sets.Do not use TARGETBADFLAG to predict TARGETLOSSAMT.Create a LINEAR REGRESSION model using ALL the variables to predict the variable TARGETBADAMTCreate a LINEAR REGRESSION model and using BACKWARD VARIABLE SELECTION.Create a LINEAR REGRESSION model and using a DECISION TREE and FORWARD STEPWISE SELECTION.List the important variables from the Linear Regression Variable Selections.Compare the variables from the Linear Regression with those of the Random Forest and the Gradient Boosting.Using the testing data set, calculate the Root Mean Square Error RMSE for all models.Determine which model performed best and why you believe this.Write a brief summary of which model you would recommend using. Note that this is your opinion. There is no right answer. You might, for example, select a less accurate model because it is faster or easier to interpret.Step : Probability Severity Model Push Yourself!Using the code discussed in the lecture, split the data into training and testing data sets.Use any LOGISTIC model from Step in order to predict the variable TARGETBADFLAGUse a LINEAR REGRESSION model to predict the variable TARGETLOSSAMT using only records where TARGETBADFLAG is List the important variables for both models.Using your models, predict the probability of default and the loss given default.Multiply the two values together for each record.Calculate the RMSE value for the Probability Severity model.Comment on how this model compares to using the model from Step Which one would your recommend using?
For this assignment you will write an R program to complete the tasks given below. You will hand in two files for this assignment.A File with your R programA PDFDOC file with your output code.Use the following fileR Data Set: HMEQScrubbed.csv in the zip file attachedThe Data Dictionary in the zip file.Note: The HMEQScrubbed.csv file is a simple scrubbed file from the previous week homework. If you did more advanced scrubbing of data for last week, you may use your own data file instead. You might get better accuracy! If you decide to use your own version of HMEQScrubbed.csv please hand it in along with the other deliverables.This assignment is an extension of the Week assignment. We will now incorporate Regression Analysis to the problem.Step : Use the Decision Tree Random Forest Decision Tree code from Week as a Starting PointIn this assignment, we will build off the models developed in Week Now we will add Regression to the models.Step : Classification ModelsUsing the code discussed in the lecture, split the data into training and testing data sets.Do not use TARGETLOSSAMT to predict TARGETBADFLAG.Create a LOGISTIC REGRESSION model using ALL the variables to predict the variable TARGETBADFLAGCreate a LOGISTIC REGRESSION model and using BACKWARD VARIABLE SELECTION.Create a LOGISTIC REGRESSION model and using a DECISION TREE and FORWARD STEPWISE SELECTION.List the important variables from the Logistic Regression Variable Selections.Compare the variables from the logistic Regression with those of the Random Forest and the Gradient Boosting.Using the testing data set, create a ROC curves for all models. They must all be on the same plot.Display the Area Under the ROC curve AUC for all models.Determine which model performed best and why you believe this.Write a brief summary of which model you would recommend using. Note that this is your opinion. There is no right answer. You might, for example, select a less accurate model because it is faster or easier to interpret.Step : Linear RegressionUsing the code discussed in the lecture, split the data into training and testing data sets.Do not use TARGETBADFLAG to predict TARGETLOSSAMT.Create a LINEAR REGRESSION model using ALL the variables to predict the variable TARGETBADAMTCreate a LINEAR REGRESSION model and using BACKWARD VARIABLE SELECTION.Create a LINEAR REGRESSION model and using a DECISION TREE and FORWARD STEPWISE SELECTION.List the important variables from the Linear Regression Variable Selections.Compare the variables from the Linear Regression with those of the Random Forest and the Gradient Boosting.Using the testing data set, calculate the Root Mean Square Error RMSE for all models.Determine which model performed best and why you believe this.Write a brief summary of which model you would recommend using. Note that this is your opinion. There is no right answer. You might, for example, select a less accurate model because it is faster or easier to interpret.Step : Probability Severity Model Push Yourself!Using the code discussed in the lecture, split the data into training and testing data sets.Use any LOGISTIC model from Step in order to predict the variable TARGETBADFLAGUse a LINEAR REGRESSION model to predict the variable TARGETLOSSAMT using only records where TARGETBADFLAG is List the important variables for both models.Using your models, predict the probability of default and the loss given default.Multiply the two values together for each record.Calculate the RMSE value for the Probability Severity model.Comment on how this model compares to using the model from Step Which one would your recommend using?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started