Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Using Python Create a model to predict future home values Data location: https://www.kaggle.com/camnugent/california-housing-prices I have a model but not sure how to predict future values

Using Python

Create a model to predict future home values

Data location: https://www.kaggle.com/camnugent/california-housing-prices

I have a model but not sure how to predict future values

Code so far:

import math import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm from sklearn.model_selection import train_test_split from sklearn import linear_model from sklearn.metrics import mean_absolute_error as mae from scipy import stats real_estate_data = pd.read_csv("$housing.csv") ## Replacing missing total bedrooms with an average total_bedrooms_mean = real_estate_data['total_bedrooms'].mean() total_bedrooms_mean = math.ceil(total_bedrooms_mean) real_estate_data['total_bedrooms'] = real_estate_data['total_bedrooms'].fillna(total_bedrooms_mean) ## Getting average bedrooms and rooms from household real_estate_data['avg_bedrooms'] = real_estate_data['total_bedrooms']/real_estate_data['households'] real_estate_data['avg_rooms'] = real_estate_data['total_rooms']/real_estate_data['households'] ## Removing values with prices 500001 as it is an arbitrary value when looking at dataset real_estate_data = real_estate_data[real_estate_data.median_house_value < 500001] ## Get unique values for ocean proximity ocean_proximity = real_estate_data["ocean_proximity"].unique() ## Create numerical categories for ocean proxmitiy values ocean_dict = {"ocean_proximity": {"NEAR BAY": 1, "<1H OCEAN": 2, "INLAND": 3, "NEAR OCEAN": 4, "ISLAND": 5}} real_estate_data.replace(ocean_dict, inplace=True) ## remove outliers real_estate_data[(np.abs(stats.zscore(real_estate_data)) < 10).all(axis=1)] ## Understanding the data ## Scatter plot real_estate_data.plot.scatter(x = 'avg_bedrooms', y = 'median_house_value') plt.show() ## Ocean proximity to value real_estate_data_prox_1 = real_estate_data[real_estate_data.ocean_proximity == 1]['median_house_value'] real_estate_data_prox_2 = real_estate_data[real_estate_data.ocean_proximity == 2]['median_house_value'] real_estate_data_prox_3 = real_estate_data[real_estate_data.ocean_proximity == 3]['median_house_value'] real_estate_data_prox_4 = real_estate_data[real_estate_data.ocean_proximity == 4]['median_house_value'] real_estate_data_prox_5 = real_estate_data[real_estate_data.ocean_proximity == 5]['median_house_value'] # plt.hist([real_estate_data_prox_1, # real_estate_data_prox_2, # real_estate_data_prox_3, # real_estate_data_prox_4, # real_estate_data_prox_5], # bins = 10, # stacked = True) # plt.legend(['Near Bay = 1', # '<1H Ocean = 2', # 'Inland = 3', # 'Near Ocean = 4', # 'Island = 5']) # plt.title('Histogram of Value with Ocean Proximity Overlay') # plt.xlabel('Value'); # plt.ylabel('Frequency'); # plt.show(); ## Normalized ocean proximity to value (n, bins, patches) = plt.hist([real_estate_data_prox_1, real_estate_data_prox_2, real_estate_data_prox_3, real_estate_data_prox_4, real_estate_data_prox_5], bins=10, stacked=True, density=True) ## Creating table from variable n created in prior command n_table = np.column_stack((n[0], n[1], n[2], n[3], n[4])) ## Normalizing the previously created table n_norm = n_table / n_table.sum(axis=1)[:, None] ## Creating custom bins from bins variable created in preivous command ourbins = np.column_stack((bins[0:10], bins[1:11])) p1 = plt.bar(x = ourbins[:,0], height = n_norm[:,0], width = ourbins[:, 1] - ourbins[:, 0]) p2 = plt.bar(x = ourbins[:,0], height = n_norm[:,1], width = ourbins[:, 1] - ourbins[:, 0],bottom = n_norm[:,0]) p3 = plt.bar(x = ourbins[:,0], height = n_norm[:,2], width = ourbins[:, 1] - ourbins[:, 0],bottom = n_norm[:,1]) p4 = plt.bar(x = ourbins[:,0], height = n_norm[:,3], width = ourbins[:, 1] - ourbins[:, 0],bottom = n_norm[:,2]) p5 = plt.bar(x = ourbins[:,0], height = n_norm[:,4], width = ourbins[:, 1] - ourbins[:, 0],bottom = n_norm[:,3]) # plt.legend(['Near Bay = 1', # '<1H Ocean = 2', # 'Inland = 3', # 'Near Ocean = 4', # 'Island = 5']) # plt.title('Normalized Histogram of Value with Ocean Proximity Overlay') # plt.xlabel('Value'); # plt.ylabel('Frequency'); # plt.show(); print(real_estate_data['median_house_value'].describe().apply(lambda x: format(x, 'f'))) ## Getting X and y X = pd.DataFrame(real_estate_data[['longitude', 'latitude', 'housing_median_age', 'population', 'households', 'median_income', 'ocean_proximity', 'avg_bedrooms', 'avg_rooms']]) y = pd.DataFrame(real_estate_data[['median_house_value']]) X = sm.add_constant(X) ## Partitioning the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) lm = linear_model.LinearRegression() model = lm.fit(X_train, y_train) predictions = lm.predict(X_train) predictions[0:5] model_train = sm.OLS(y_train, X_train).fit() model_train.summary() model_test = sm.OLS(y_test, X_test).fit() model_test.summary() ## Coeffecients and intercept print(model.coef_) print(model.intercept_) print(model_train.summary()) print(model_test.summary()) print(model_train.bse) ## MAEBaseline and MAEReression ypred = model_train.predict(X_test) print('==============Mean==============') print(y_train.mean()) print('==============MAE==============') print(mae(y_train.values, predictions))

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Interconnection Networks

Authors: J C Bermond

1st Edition

1483295273, 9781483295275

More Books

Students also viewed these Mathematics questions

Question

2. Value-oriented information and

Answered: 1 week ago

Question

1. Empirical or factual information,

Answered: 1 week ago

Question

1. To take in the necessary information,

Answered: 1 week ago