Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code
- The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code and not rely on code from another Part of the lab in order to run.
- all parts of the lab should be done using python, sklearn, pandas, numpy, and matplotlib.
Part 1 - Creating and evaluating a random forest model
In this part of the lab, you should:
- read in the data;
- verify that all the data is numeric and that there are no missing values;
- split the data into training and validation sets (don't worry about creating a final test set);
- create a random forest model using the data;
- evaluate the model on both the training and validation sets using MAE and % error.
Part 2 - Exploring the n_estimators hyper-parameter
In this part of the lab you should:
- use a for loop to create a random forest model for each value of n_estimators from 1 to 30;
- evaluate each model on both the training and validation sets using MAE;
- visualize the results by creating a plot of n_estimators vs MAE for both the training and validation sets.
After that you should answer the following questions:
- Which value of n_estimators gives the best results?
- Explain how you decided that this value for n_estimators gave the best results;
- Why is the plot you created above not smooth?
- Was the result here better than the result of Part 1? What % better or worse was it?
Part 3 - Exploring the max_features hyper-parameter
In this part of the lab you should:
- use a for loop to create a random forest model for each value of max_features from 1 to the total number of features in the data;
- for each model, use the value for n_estimators as determined in Part 2;
- evaluate each model on both the training and validation sets using MAE;
- visualize the results by creating a plot of max_features vs MAE for both the training and validation sets.
After that you should answer the following questions:
- Which value of max_features gives the best results?
- Explain how you decided that this value for max_features gave the best results;
- Was the result here better than the result of Part 2? What % better or worse was it?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started