Question

1 Approved Answer

Posted on Sep 10, 2024

I just need help on the second question Exercise 1: Generate fake data using linear regression model with known parameters and some noise as shown

image text in transcribed I just need help on the second question

Exercise 1: Generate fake data using linear regression model with known parameters and some noise as shown below In [3]: import numpy as np # simulate a fake normal distribution vaLues with given mean and standard deviation mean, stdDev - 5, 7 N 1000 # number of samples to generate x-| np . random . normal(mean, stdDev, N) # generate normal noise with mean as e and standard deviation of $25 trueError np.random.normal(e, 2, N) # beta parameters used for generating the data trueBetao1.1 trueBeta1-8.2 # generate data Plot the histogram of the fake data. Use mlab.normpdf to add best-fit pdf. Also, make a scatter plot beween x_1 and y Build a regression model from scratch and demonstrate that it recovers the true values of fis. Repeat the exercise with Scikit Package. Create a new variable, Z, that is equal tox_1*2. Include this as one of the predictors in your model. See what happens when you fit a model that depends on x_1 only and then also on Z. For this exercise you will evaluate the model for different sample sizes starting from 100 to 5000 with an increment of 100 samples. You will split your samples into training and test set 80%/20%) using train test spit function available in the Scikit package. Plot the mean square error of the training set and of the test set versus sample sizes for both models (one with x 1 only and the other that includes Z). Exercise 2: For this exercise you will use real estate sale data for Brooklyn available in the resource folder on Blackboard (rollingsales_brooklyn) Analyze sales using regression with any predictors you feel are relevant. Justify why regression was appropriate to use Visualize the coefficients and fitted model. Predict the neighborhood using a k-NN classifier. Be sure to withhold a subset of the data for testing. Find the variables and the k that give you the lowest prediction error Report and visualize your findings Describe any decisions that could be made or actions that could be taken from this analysis Act Go t Exercise 1: Generate fake data using linear regression model with known parameters and some noise as shown below In [3]: import numpy as np # simulate a fake normal distribution vaLues with given mean and standard deviation mean, stdDev - 5, 7 N 1000 # number of samples to generate x-| np . random . normal(mean, stdDev, N) # generate normal noise with mean as e and standard deviation of $25 trueError np.random.normal(e, 2, N) # beta parameters used for generating the data trueBetao1.1 trueBeta1-8.2 # generate data Plot the histogram of the fake data. Use mlab.normpdf to add best-fit pdf. Also, make a scatter plot beween x_1 and y Build a regression model from scratch and demonstrate that it recovers the true values of fis. Repeat the exercise with Scikit Package. Create a new variable, Z, that is equal tox_1*2. Include this as one of the predictors in your model. See what happens when you fit a model that depends on x_1 only and then also on Z. For this exercise you will evaluate the model for different sample sizes starting from 100 to 5000 with an increment of 100 samples. You will split your samples into training and test set 80%/20%) using train test spit function available in the Scikit package. Plot the mean square error of the training set and of the test set versus sample sizes for both models (one with x 1 only and the other that includes Z). Exercise 2: For this exercise you will use real estate sale data for Brooklyn available in the resource folder on Blackboard (rollingsales_brooklyn) Analyze sales using regression with any predictors you feel are relevant. Justify why regression was appropriate to use Visualize the coefficients and fitted model. Predict the neighborhood using a k-NN classifier. Be sure to withhold a subset of the data for testing. Find the variables and the k that give you the lowest prediction error Report and visualize your findings Describe any decisions that could be made or actions that could be taken from this analysis Act Go t