Answered step by step
Verified Expert Solution
Question
1 Approved Answer
# run this code to load and process the data import pickle, sklearn import matplotlib.pyplot as plt import pandas as pd import numpy as np
# run this code to load and process the data
import pickle, sklearn
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier,DecisionTreeRegressor, plottree
from sklearn.modelselection import traintestsplit,KFold,GridSearchCV,LeaveOneOut
from sklearn.linearmodel import LinearRegression, LogisticRegression
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import rscore,accuracyscore,classificationreport
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
SalespdreadcsvNYCSales.csv
SalesSales.replacenpnandropnaastypeLANDSQUAREFEET":int,
"GROSSSQUAREFEET":int,
"YEARBUILT":int
SalesSALEPRICELOG"nplogSalesSALEPRICE
SalesGROSSSQUAREFEETLOG"nplogSalesGROSSSQUAREFEET
SalesLANDSQUAREFEETLOG"nplogSalesLANDSQUAREFEET
SalesSalesBOROUGH 'RESIDENTIALUNITS', 'COMMERCIALUNITS',
'YEARBUILT','SALEPRICELOG', 'GROSSSQUAREFEETLOG','LANDSQUAREFEETLOG'
ID:
celladcfce
Readonly
Setting
This dataset contains a sample of the building or building unit apartment etc. sold in the New York City property market over month with a sales price higher than USD.
This dataset contains the following important properties of the building units sold.
BOROUGH: A digit code for the borough the property is located in; in order, these are
Manhattan
Bronx
Brooklyn
Queens
Staten Island
RESIDENTIALUNITS: Number of residential units in the property
COMMERCIALUNITS: Number of commercial units in the property
YEARBUILT: Year when the property was built
SALEPRICELOG: log of the sales price
GROSSSQUAREFEETLOG: log of the gross square feet
LANDSQUAREFEETLOG: log of the land square feet
Question
Assume we want to use a decision tree model to predict SALEPRICELOG. For the predictors, we want to use the following
'RESIDENTIALUNITS',
'COMMERCIALUNITS',
'YEARBUILT',
'GROSSSQUAREFEETLOG',
'LANDSQUAREFEETLOG'
For this question:
Split the dataset into trainingtest set with randomstate
Use the training set to perform fold crossvalidation to tune one parameter for this decision model. Feel free to which parameter to tune and the range of values for the gridsearch.
For the model you found in the previous step, compute rscore based on the function in sklearn documentation.
Question
Assume we want to use a linear regression model to predict SALEPRICELOG. For the predictors, we want to use the following
'RESIDENTIALUNITS'
'COMMERCIALUNITS',
'YEARBUILT'
'GROSSSQUAREFEETLOG'
'LANDSQUAREFEETLOG'
For the question:
Split the dataset into trainingtest set with randomstate
Train this model on the training set. On the testing set, compute rscore based on the function here.
Pay attention to the interpretation of rscore given here. Discuss whether your model in Q or Q is better.
Question
Assume we want to use a random forest model to predict BOROUGH. For the predictors, we want to consider the following
'RESIDENTIALUNITS'
'COMMERCIALUNITS',
'YEARBUILT'
'GROSSSQUAREFEETLOG'
'LANDSQUAREFEETLOG'
'SALEPRICELOG'
For this question:
Split the dataset into trainingtest set with randomstate
Use the training set to perform fold crossvalidation to tune a parameter for the random forest. Feel free to which parameter to tune and the range of values for the gridsearch.
For the model you found in the previous step, compute the following based on the testing set
The average accuracy
The name and the precision for the borough with the highest precision
The name and the recall for the borough with the highest recall
Question
Assume we want to use a multinomial logit model with L penalty to predict BOROUGH. For the predictors, we want to use the following
'RESIDENTIALUNITS'
'COMMERCIALUNITS',
'YEARBUILT'
'GROSSSQUAREFEETLOG'
'LANDSQUAREFEETLOG'
'SALEPRICELOG'
For this question:
Split the dataset into trainingtest set with randomstate
Use the training set to perform fold crossvalidation to tune
inside range Set maxiter for the regression.
For the model you found in the previous step, compute the following based on the testing set
The average accuracy
The name and the precision for the borough with the highest precision
The name and the recall for the borough with the highest recall
Question
Perform Q again. This time, use StandardScaler and Pipeline to incorporate features standardization.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started