Question: Load the data stored in the tab - delimited file nyc . txt into a DataFrame named nyc . Use head ( ) to display
Load the data stored in the tabdelimited file nyctxt into a DataFrame named nyc Use head to display the first
rows of this DataFrame.
The columns contained in this DataFrame are described below.
Price The average price in US dollars for a meal for two.
Food The Zagat customer rating of the quality of the food. On a scale from
Decor The Zagat customer rating of the quality of the decor. On a scale from
Service The Zagat customer rating of the quality of the service. On a scale from
Wait The average wait time, in minutes, to be seated during dinner rush on a Friday evening.
East A binary variable indicating if the restaurant is East or West of th Avenue.
Our goal in this problem will be to create a linear regression model to predict the value of Price using the other five
columns as features.
Perform the following steps in a single code cell:
Create a D feature array named X containing the relevant features, as well as a D label array named y
containing the labels. Note: These should be arrays, and not DataFrames or Series. See note below.
Use traintestsplit to split the data into training and testing sets using an split. Name the
resulting arrays Xtrain Xtest ytrain and ytest Set random state
Print the shapes of Xtrain and Xtest Include text labeling the two results as shown below. Add
spacing to ensure that the shape tuples are leftaligned.
Training Features Shape: xxxx
Test Features Shape: xxxx
Note: You can extract a NumPy array from a pandas DataFrame or series object by adding values to the end of it For
example, if df is a DataFrame object, if you run the statement X dfiloc: somecolumnsvalues, then X will
be a D array containing the information for the selected columns.
We will now create a linear regression model that can be used to estimate the price at a similar restaurant.
Create a linear regression model named nycmod and then fit it to the training data. Display the intercepts and
coefficients for the final model with text labels as shown below. Add spacing to ensure that the values replacing the
xxxx characters are leftaligned. The intercept should appear as a single number, The coefficients should be in the form
of an array and should be displayed on a single line.
Intercept: xxxx
Coefficients: xxxx
We will now calculate the rsquared score for the model on both the training set and the test set.
Calculate and print the training and testing rsquared values for your model, rounded to four decimal places. Include the
text labels explaining which value is which, as shown below. Add spacing to ensure that the scores are leftaligned.
Training rSquared: xxxx
Testing rSquared: xxxx
We will now use the model to generate predictions for the restaurants in the test set.
Use your model to generate price estimates based on the feature values in the test set. Store the results in a variable
named testpred Print the first observed yvalues for the test set, and then the first predictions, rounded to
decimal places. Include text labels with your output as shown below. Each price array should be displayed on a single
line, and the two arrays should be leftaligned.
Observed Prices: xxxx
Estimated Prices: xxxx
Suppose that you wish to use the model to estimate the price for three new restaurants that were not included in the
original dataset. Assume that the feature values for these restaurants are as follows:
Food Decor Service Wait East
Create a DataFrame named nycnew that contains the feature values for these restaurants. Pass this DataFrame to
the predict method of your model, storing the results in a variable named newpred Print the price predictions
stored in this variable, rounded to decimal places, with a message as shown below.
Estimated Prices: xxxx
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
