Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1.1 Preprocess the raw data When given a new dataset, we need to deal with the missing values and categorical features. In [10]; import pandas

image text in transcribed

image text in transcribed

1.1 Preprocess the raw data When given a new dataset, we need to deal with the missing values and categorical features. In [10]; import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression, Ridge, Lasso from sklearn.metrics import mean_absolute_error, mean_squared_error import matplotlib.pyplot as plt df = pd.read_csv('housing.csv') # 0. fill in missing values mean_val = df['total_bedrooms ' ) .mean() df ['total_bedrooms'] = df['total_bedrooms'].fillna(mean_val) print (df.isnull().sum()) # 1. convert categorical features to numerical values labelencoder = LabelEncoder() df['ocean_proximity'] = labelencoder.fit_transform(df['ocean_proximity']) print (df.info()) 0 longitude 0 latitude 0 housing_median_age 0 total rooms 0 total bedrooms 0 population 0 households 0 median_income 0 median_house_value ocean_proximity 0 dtype: int64 Range Index: 20640 entries, 0 to 20639 Data columns (total 10 columns): longitude 20640 non-null float64 latitude 20640 non-null float 64 housing_median_age 20640 non-null int64 total_rooms 20640 non-null int64 total_bedrooms 20640 non-null float64 population 20640 non-null int64 households 20640 non-null int64 median_income 20640 non-null float64 median_house_value 20640 non-null int64 ocean_proximity 20640 non-null int64 dtypes: float64(4), int64(6) memory usage: 1.6 MB None 1.5 Use the ridge regression model to do prediction minw || y Xw Il + || w |13 1.5.1 Compare its performance on the testing set with that of the standard linear regression model minw ll y Xw I12 1.5.2 Use different 2 to see how it affects the performance of the ridge regression model on the testing set In [18]: # your code

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Temporal Databases Research And Practice Lncs 1399

Authors: Opher Etzion ,Sushil Jajodia ,Suryanarayana Sripada

1st Edition

3540645195, 978-3540645191

More Books

Students also viewed these Databases questions