Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

#QUESTION 1: #1a) ames_daya

#QUESTION 1: #1a) ames_daya <- read.csv('ames-1.csv') library(ISLR) attach(ames) ames<-na.omit(ames) set.seed(1) idx <- sample(1:nrow(ames), nrow(ames) / 2) ames_train <- ames[idx, ] ames_train ames_test <- ames[-idx, ] ames_test #1b) library(rpart) tree_unpruned <- rpart(Sale_Price ~ ., data = ames_train) plot(tree_unpruned) text(tree_unpruned) #1c) library(rpart) set.seed(2) cv_tree <- prune(tree_unpruned, cp = 0.05) plot(cv_tree$cptable[, "nsplit"], cv_tree$cptable[, "xerror"], type = 'b', xlab = "Number of Splits", ylab = "Cross-validated Error") best_cp <- cv_tree$cptable[which.min(cv_tree$cptable[, "xerror"]), "CP"] prune_cv_mytree <- prune(tree_unpruned, cp = best_cp) plot(prune_cv_mytree) text(prune_cv_mytree) title("Pruned tree with tree size tuned by CV!") #1d) library(randomForest) set.seed(1) bagged_tree <- randomForest(Sale_Price ~ ., data = ames_train, mtry = 10,importance=TRUE) bagged_tree varImpPlot(bagged_tree) yhat.bag=predict(bagged_tree,newdata=ames_test) mse.bagged=mean((yhat.bag-ames_test$Sale_Price)^2) #1e) set.seed (1) rf.ames =randomForest(Sale_Price~.,data=ames_train,mtry=4, importance =TRUE) yhat.rf = predict(rf.ames,newdata =ames_test) mse.r=mean((yhat.rf-ames_test$Sale_Price)^2) mse.r #1f) library(gbm) set.seed(1) set.seed(1) boost_ames <- gbm(Sale_Price ~ ., data = ames_train, distribution = 'gaussian', n.trees = 500, interaction.depth = 4) yhat_boosted <- predict(boost_ames, newdata = ames_test) mse_boosted <- mean((yhat_boosted - ames_test$Sale_Price)^2) print(mse_boosted) Above is my R code for the following questions that I will list below. For some reason when I try to calculate the MSE I keep getting errors. I have consulted ChatGBT for help and nothing. I am stuck and don't know what to do. I will list the questions and file below. Please help!!!! The data for this example is the Ames housing data available in ames.csv. The goal is to predict the Sale_Price using all independent variables. The data description is available athttps://cran.r-project.org/web/packages/AmesHousing/AmesHousing.pdf. It is a good practice to make sure that your data has no missing values.

1a. Split this data into two equal parts: one for training and another for testing. 1b. Fit a regression tree to the training data for predicting Sale_Price using all available independent variables. Plot the tree, discuss the size and interpret the tree that you constructed. 1c.Prune this tree using cross validation as discussed in class. Plot the pruned tree, discuss the size and interpret the pruned tree that you constructed. Now compare the MSE of the pruned tree and the unpruned tree on your test data. Which one has a lower MSE? Which tree would you recommend? 1d. In this sub-part, you will construct a Bagged Regression tree on the training data and compute the MSE of your Bagged regression tree on the test data. Rememeber to carefully specify the mtry parameter. 1e. In this sub-part, you will construct a Random Forest Regression tree on #1 the training data and compute the MSE on the test data. Rememeber to carefully specify the mtry parameter. 1f. Finally, here you will develop a Boosted Regression tree on the training data and compute the MSE on the test data. You may try tweaking the n.tree and shrinkage parameters to see how it affects your test set MSE

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Precalculus

Authors: Michael Sullivan

10th Global Edition

1292121772, 1292121777, 978-1292121772

More Books

Students also viewed these Mathematics questions