Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Using python Plot predictions Dataset: Data location: https://www.kaggle.com/camnugent/california-housing-prices Keep running into errors, code so far: import math import numpy as np import pandas as pd
Using python
Plot predictions
Dataset: Data location: https://www.kaggle.com/camnugent/california-housing-prices
Keep running into errors, code so far:
import math import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.model_selection import train_test_split from sklearn import linear_model from sklearn.metrics import mean_absolute_error as mae from scipy import stats from statsmodels.stats.outliers_influence import variance_inflation_factor real_estate_data = pd.read_csv("C:/housing.csv") ## Replacing missing total bedrooms with an average total_bedrooms_mean = real_estate_data['total_bedrooms'].mean() total_bedrooms_mean = math.ceil(total_bedrooms_mean) real_estate_data['total_bedrooms'] = real_estate_data['total_bedrooms'].fillna(total_bedrooms_mean) ## Getting average bedrooms and rooms from household real_estate_data['avg_bedrooms'] = real_estate_data['total_bedrooms']/real_estate_data['households'] real_estate_data['avg_rooms'] = real_estate_data['total_rooms']/real_estate_data['households'] ## Create numerical categories for ocean proxmitiy values ocean_dict = {"ocean_proximity": {"NEAR BAY": 1, "<1H OCEAN": 2, "INLAND": 3, "NEAR OCEAN": 4, "ISLAND": 5}} real_estate_data.replace(ocean_dict, inplace=True) ## remove outliers real_estate_data[(np.abs(stats.zscore(real_estate_data)) < 10).all(axis=1)] ## Getting X and y X = pd.DataFrame(real_estate_data[['housing_median_age', 'population', 'median_income', 'ocean_proximity', 'avg_rooms']]) y = pd.DataFrame(real_estate_data[['median_house_value']]) ## Adding constant to X X = sm.add_constant(X) ## Partitioning the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) ## Creating model using scikits LinearRegression() lm = linear_model.LinearRegression() lm.fit(X_train, y_train) predictions = lm.predict(X_test) print("Predictions: ", predictions[0:5]) # The coefficients print("Coefficients: ", lm.coef_) # The mean squared error print("Mean squared error: %.2f" % mean_squared_error(y_test, predictions)) # The coefficient of determination: 1 is perfect prediction print("Coefficient of determination: %.2f" % r2_score(y_test, predictions)) # Plot outputs plt.scatter(X_train.iloc[:,0], y_train, color="black") plt.plot(X_train.iloc[:,0], predictions, color="blue", linewidth=3) plt.xticks(()) plt.yticks(()) plt.show()
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started