Question

1 Approved Answer

Posted on Oct 10, 2024

#QUESTION 1: #1a) ames_daya

#QUESTION 1: #1a) ames_daya <- read.csv('ames-1.csv') library(ISLR) attach(ames) ames<-na.omit(ames) set.seed(1) idx <- sample(1:nrow(ames), nrow(ames) / 2) ames_train <- ames[idx, ] ames_train ames_test <- ames[-idx, ] ames_test #1b) library(rpart) tree_unpruned <- rpart(Sale_Price ~ ., data = ames_train) plot(tree_unpruned) text(tree_unpruned) #1c) library(rpart) set.seed(2) cv_tree <- prune(tree_unpruned, cp = 0.05) plot(cv_tree$cptable[, "nsplit"], cv_tree$cptable[, "xerror"], type = 'b', xlab = "Number of Splits", ylab = "Cross-validated Error") best_cp <- cv_tree$cptable[which.min(cv_tree$cptable[, "xerror"]), "CP"] prune_cv_mytree <- prune(tree_unpruned, cp = best_cp) plot(prune_cv_mytree) text(prune_cv_mytree) title("Pruned tree with tree size tuned by CV!") #1d) library(randomForest) set.seed(1) bagged_tree <- randomForest(Sale_Price ~ ., data = ames_train, mtry = 10,importance=TRUE) bagged_tree varImpPlot(bagged_tree) yhat.bag=predict(bagged_tree,newdata=ames_test) mse.bagged=mean((yhat.bag-ames_test$Sale_Price)^2) #1e) set.seed (1) rf.ames =randomForest(Sale_Price~.,data=ames_train,mtry=4, importance =TRUE) yhat.rf = predict(rf.ames,newdata =ames_test) mse.r=mean((yhat.rf-ames_test$Sale_Price)^2) mse.r #1f) library(gbm) set.seed(1) set.seed(1) boost_ames <- gbm(Sale_Price ~ ., data = ames_train, distribution = 'gaussian', n.trees = 500, interaction.depth = 4) yhat_boosted <- predict(boost_ames, newdata = ames_test) mse_boosted <- mean((yhat_boosted - ames_test$Sale_Price)^2) print(mse_boosted) Above is my R code for the following questions that I will list below. For some reason when I try to calculate the MSE I keep getting errors. I have consulted ChatGBT for help and nothing. I am stuck and don't know what to do. I will list the questions and file below. Please help!!!! The data for this example is the Ames housing data available in ames.csv. The goal is to predict the Sale_Price using all independent variables. The data description is available athttps://cran.r-project.org/web/packages/AmesHousing/AmesHousing.pdf. It is a good practice to make sure that your data has no missing values.

1a. Split this data into two equal parts: one for training and another for testing. 1b. Fit a regression tree to the training data for predicting Sale_Price using all available independent variables. Plot the tree, discuss the size and interpret the tree that you constructed. 1c.Prune this tree using cross validation as discussed in class. Plot the pruned tree, discuss the size and interpret the pruned tree that you constructed. Now compare the MSE of the pruned tree and the unpruned tree on your test data. Which one has a lower MSE? Which tree would you recommend? 1d. In this sub-part, you will construct a Bagged Regression tree on the training data and compute the MSE of your Bagged regression tree on the test data. Rememeber to carefully specify the mtry parameter. 1e. In this sub-part, you will construct a Random Forest Regression tree on #1 the training data and compute the MSE on the test data. Rememeber to carefully specify the mtry parameter. 1f. Finally, here you will develop a Boosted Regression tree on the training data and compute the MSE on the test data. You may try tweaking the n.tree and shrinkage parameters to see how it affects your test set MSE