Question

1 Approved Answer

Posted on Jul 06, 2024

Congratulations on finishing your prediction model for home sale prices in Cook County! In the following section, we'll delve deeper into the implications of predictive

Congratulations on finishing your prediction model for home sale prices in Cook County! In the following section, we'll delve deeper into the implications of predictive modeling within the CCAO case study - especially because statistical modeling is how the CCAO valuates properties. Refer to Lecture 15 if you're having trouble getting started! Question 9 When evaluating your model, we used root mean squared error. In the context of estimating the value of houses, what does error mean for an individual homeowner? How does it affect them in terms of property taxes? Type your answer here, replacing this text. It is time to build your own model! Just as in the guided model from the previous question, you should encapsulate as much of your workflow into functions as possible. Your job is to select better features and define your own feature engineering pipeline inside the function process_data_fm in the following cell. You must not change the parameters inside process_data_m . To evaluate your model, we will start by defining a linear regression model called final_model . Then, we will process training data using your process_data_fm , fit final_mode1 with this training data, and compute the training RMSE. Then, we will process the test data with your process_data_fm , use final_model to predict Log Sale Price for the test data, transform the predicted and original log values back into their original forms, and compute the test RMSE. See below for an example of the code we will run to grade your model: Note: delog is a function we will run to undo the log transformation on your predictions/original sale prices. final model = 1m. LinearRegression (fit_intercept=True) training_data = pd. read_csv( 'cook_county_train. csv' ) test data = pd. read_csv ( 'cook_county_test. csv' ) X_train, y_train = process_data_fm(training_data) X_test, y_test = process_data_fm(test_data) final model . fit (X_train, y_train) y_predicted_train = final_model. predict(X_train) y_predicted_test = final_model.predict (X_test) training_rmse = rmse (delog(y_predicted_train) , delog(y_train) ) test rmse = rmse (delog (y_predicted_test), delog(y_test) )training_rmse = rmse(delog(y_predicted_train), delog(y_train)) test_rmse = rmse(delog(y_predicted_test), delog(y_test)) Note: It is your duty to make sure that all of your feature engineering and selection happens in process_data_fm , and that the function performs as expected without errors. We will NOT accept regrade requests that require us to go back and run code that require typo/bug fixes. Hint: Some features may have missing values in the test set but not in the training set. Make sure process_data_fm handles missing values appropriately for each feature! Note: You MUST remove any additional new cells you add below the current one before submitting to Gradescope to avoid any autograder errors. Grading Scheme Your grade for Question 8 will be based on your training RMSE and test RMSE. The thresholds are as follows: Points 3 2 0 Training RMSE Less than 80k [80k, 160k) [160k, 260k) More than 260k Points 3 2 0 Test RMSE Less than 85k [85k, 165k) [165k, 265k) More than 265k