The code for each Part of this should be self contained, that is, each of Part 1, 2, and 3 should contain all the necessary code and not rely on code from another Part of the lab in order to run all parts of the lab should be done using python, sklearn, pandas, numpy, and matplotlib Part 1 Creating and evaluating a random forest model In this part of the lab, you should read in the data verify that all the data is numeric and that there are no missing values split the data into training and validation sets (don't worry about creating a final test set) create a random forest model using the data evaluate the model on both the training and validation sets using MAE and error Part 2 Exploring the n estimators hyper parameter In this part of the lab you should use a for loop to create a random forest model for each value of n estimators from 1 to 30 evaluate each model on both the training and validation sets using MAE visualize the results by creating a plot of n estimators vs MAE for both the training and validation sets After that you should answer the following questions Which value of n estimators gives the best results Explain how you decided that this value for n estimators gave the best results Why is the plot you created above not smooth Was the result here better than the result of Part 1 What better or worse was it Part 3 Exploring the max features hyper parameter In this part of the lab you should use a for loop to create a random forest model for each value of max features from 1 to the total number of features in the data for each model, use the value for n estimators as determined in Part 2 evaluate each model on both the training and validation sets using MAE visualize the results by creating a plot of max features vs MAE for both the training and validation sets After that you should answer the following questions Which value of max features gives the best results Explain how you decided that this value for max features gave the best results Was the result here better than the result of Part 2 What better or worse was it

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code and not rely on code from another Part of the lab in order to run.
all parts of the lab should be done using python, sklearn, pandas, numpy, and matplotlib.

Part 1 - Creating and evaluating a random forest model

In this part of the lab, you should:

read in the data;
verify that all the data is numeric and that there are no missing values;
split the data into training and validation sets (don't worry about creating a final test set);
create a random forest model using the data;
evaluate the model on both the training and validation sets using MAE and % error.

Part 2 - Exploring the n_estimators hyper-parameter

In this part of the lab you should:

use a for loop to create a random forest model for each value of n_estimators from 1 to 30;
evaluate each model on both the training and validation sets using MAE;
visualize the results by creating a plot of n_estimators vs MAE for both the training and validation sets.

After that you should answer the following questions:

Which value of n_estimators gives the best results?
Explain how you decided that this value for n_estimators gave the best results;
Why is the plot you created above not smooth?
Was the result here better than the result of Part 1? What % better or worse was it?

Part 3 - Exploring the max_features hyper-parameter

In this part of the lab you should:

use a for loop to create a random forest model for each value of max_features from 1 to the total number of features in the data;
for each model, use the value for n_estimators as determined in Part 2;
evaluate each model on both the training and validation sets using MAE;
visualize the results by creating a plot of max_features vs MAE for both the training and validation sets.

After that you should answer the following questions:

Which value of max_features gives the best results?
Explain how you decided that this value for max_features gave the best results;
Was the result here better than the result of Part 2? What % better or worse was it?