Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code

  • The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code and not rely on code from another Part of the lab in order to run.
  • all parts of the lab should be done using python, sklearn, pandas, numpy, and matplotlib.

Part 1 - Creating and evaluating a random forest model

In this part of the lab, you should:

  • read in the data;
  • verify that all the data is numeric and that there are no missing values;
  • split the data into training and validation sets (don't worry about creating a final test set);
  • create a random forest model using the data;
  • evaluate the model on both the training and validation sets using MAE and % error.

Part 2 - Exploring the n_estimators hyper-parameter

In this part of the lab you should:

  • use a for loop to create a random forest model for each value of n_estimators from 1 to 30;
  • evaluate each model on both the training and validation sets using MAE;
  • visualize the results by creating a plot of n_estimators vs MAE for both the training and validation sets.

After that you should answer the following questions:

  • Which value of n_estimators gives the best results?
  • Explain how you decided that this value for n_estimators gave the best results;
  • Why is the plot you created above not smooth?
  • Was the result here better than the result of Part 1? What % better or worse was it?

Part 3 - Exploring the max_features hyper-parameter

In this part of the lab you should:

  • use a for loop to create a random forest model for each value of max_features from 1 to the total number of features in the data;
  • for each model, use the value for n_estimators as determined in Part 2;
  • evaluate each model on both the training and validation sets using MAE;
  • visualize the results by creating a plot of max_features vs MAE for both the training and validation sets.

After that you should answer the following questions:

  • Which value of max_features gives the best results?
  • Explain how you decided that this value for max_features gave the best results;
  • Was the result here better than the result of Part 2? What % better or worse was it?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Visual Basic 2017 For Windows Web And Database Applications

Authors: Corinne Hoisington

1st Edition

1337102113, 978-1337102117

Students also viewed these Databases questions

Question

In Exercises 106108, perform the indicated operations. (--)(--)

Answered: 1 week ago