Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Using python Plot predictions Dataset: Data location: https://www.kaggle.com/camnugent/california-housing-prices Keep running into errors, code so far: import math import numpy as np import pandas as pd

Using python

Plot predictions

Dataset: Data location: https://www.kaggle.com/camnugent/california-housing-prices

Keep running into errors, code so far:

import math import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.model_selection import train_test_split from sklearn import linear_model from sklearn.metrics import mean_absolute_error as mae from scipy import stats from statsmodels.stats.outliers_influence import variance_inflation_factor real_estate_data = pd.read_csv("C:/housing.csv") ## Replacing missing total bedrooms with an average total_bedrooms_mean = real_estate_data['total_bedrooms'].mean() total_bedrooms_mean = math.ceil(total_bedrooms_mean) real_estate_data['total_bedrooms'] = real_estate_data['total_bedrooms'].fillna(total_bedrooms_mean) ## Getting average bedrooms and rooms from household real_estate_data['avg_bedrooms'] = real_estate_data['total_bedrooms']/real_estate_data['households'] real_estate_data['avg_rooms'] = real_estate_data['total_rooms']/real_estate_data['households'] ## Create numerical categories for ocean proxmitiy values ocean_dict = {"ocean_proximity": {"NEAR BAY": 1, "<1H OCEAN": 2, "INLAND": 3, "NEAR OCEAN": 4, "ISLAND": 5}} real_estate_data.replace(ocean_dict, inplace=True) ## remove outliers real_estate_data[(np.abs(stats.zscore(real_estate_data)) < 10).all(axis=1)] ## Getting X and y X = pd.DataFrame(real_estate_data[['housing_median_age', 'population', 'median_income', 'ocean_proximity', 'avg_rooms']]) y = pd.DataFrame(real_estate_data[['median_house_value']]) ## Adding constant to X X = sm.add_constant(X) ## Partitioning the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) ## Creating model using scikits LinearRegression() lm = linear_model.LinearRegression() lm.fit(X_train, y_train) predictions = lm.predict(X_test) print("Predictions: ", predictions[0:5]) # The coefficients print("Coefficients: ", lm.coef_) # The mean squared error print("Mean squared error: %.2f" % mean_squared_error(y_test, predictions)) # The coefficient of determination: 1 is perfect prediction print("Coefficient of determination: %.2f" % r2_score(y_test, predictions)) # Plot outputs plt.scatter(X_train.iloc[:,0], y_train, color="black") plt.plot(X_train.iloc[:,0], predictions, color="blue", linewidth=3) plt.xticks(()) plt.yticks(()) plt.show() 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

New Trends In Shape Optimization

Authors: Aldo Pratelli, Günter Leugering

1st Edition

3319175637, 9783319175638

More Books

Students also viewed these Mathematics questions

Question

what is a distributed hash table ( DHT ) ?

Answered: 1 week ago

Question

13.1 Explain the strategic role of employee benefits.

Answered: 1 week ago