# Set-up Before doing anything, run this chunk: ```{ r, message=FALSE} library(tidyverse) ``` # Estimating Boston housing prices ## Data CODE *Start by loading the

# Set-up Before doing anything, run this chunk: ```{ image text in transcribed r, message=FALSE} library(tidyverse) ``` # Estimating Boston housing prices ## Data **CODE** *Start by loading the data with "ast2018sample.csv" with `read.csv()`. Assign the output to an object called "df".* ```{r} # your code here ``` **CODE** *Use `dim()` to check how many rows and columns are in the data:* ```{r} # your code here ``` You will build a model of housing prices using linear regression, so first you need to check if the variable on housing prices (`AV_TOTAL`) is normally distributed. **CODE** *Use `ggplot` with `geom_density()` to plot the distribution of house prices:* ```{r} # your code here ``` **WRITING** *What do you notice about your plot?* **CODE** *Next, create a variable called "log_price" that calculates the log of house prices. Make sure you add this variable to `df`.* ```{r} # your code here ``` **CODE** *OK, now use `ggplot` with `geom_density()` to plot the distribution of *log* house prices: ```{r} # your code here ``` **WRITING** *What do you notice about your plot of the distribution of **log** prices?* ## Summarization **CODE** *Use `summarize()` to calculate average log prices:* ```{r} # your code here ``` **CODE** *Use `group_by()` and `summarize()` and `n()` to calculate the number of observations per ZIPCODE (`ZIPCODE`):* ```{r} # your code here ``` **CODE** *Use `group_by()` and `summarize()` to calculate the average and standard deviation of log prices by ZIPCODE (`ZIPCODE`):* ```{r} # your code here ``` Let's focus the rest of our analysis on the zipcode [2132](https://www.google.com/maps/place/Boston,+MA+02132/@42.2922626,-71.2432236,11.76z/data=!4m5!3m4!1s0x89e37f3c80e4b4ad:0xeef4364fd79e1466!8m2!3d42.2797554!4d-71.1626756). **CODE** *Filter out all observations that are not from the zipcode 2132 and assign the output to a new object called "df_2132":* ```{r} # your code here ``` **CODE** *Calculate the average, median and standard deviation of log price for `df_2132`:* ```{r} # your code here ``` ## Visualization (Make sure you use the data `df_2132`) **CODE** *Make of a scatterplot of log prices on the y-axis and the number of bedrooms (`R_BDRMS`) on the x-axis:* ```{r} # your code here ``` **WRITING** *What do you notice?* **CODE** *Make of a scatterplot of log prices on y-axis and the total square feet of living area (`LIVING_AREA`) on the x-axis:* ```{r} # your code here ``` **WRITING** *What do you notice?* **CODE** *Use a scatterplot to check if there is a relationship between the year a house was built (`YR_BUILT`) and its log price:* ```{r} # your code here ``` **WRITING** *What do you notice?* **CODE** *Use `stat_summary` to plot the average log price against `OWN_OCC` (a dummy variable indicating whether a property is owner-occupied):* ```{r} # your code here ``` **WRITING** *Does it look like log prices vary depending on whether a property is owner occupied?* ## Inference **CODE** *Estimate a linear model of log house prices as a function of:* * the total living area (`LIVING_AREA`) * the number of bedrooms (`R_BDRMS`) * the number of full bathrooms (`R_FULL_BTH`) * the number of half bathrooms (`R_HALF_BTH`) * whether the property is owner-occupied (`OWN_OCC`) * the number of fireplaces (`R_FPLACE`) *and assign your output to the object `m1`:* ```{r} # your code here ``` **WRITING** *Interpret the coefficient on LIVING_AREA (recall it is measured in square feet). Be sure to note whether the coefficient is significant (and if so, why):* **WRITING** *Interpret the coefficient on OWN_OCCY. Be sure to note whether the coefficient is significant (and if so, why):* # Estimating the value of air quality Load this [data on housing prices](https://deepblue.lib.umich.edu/handle/2027.42/22636) by running the chunk: ```{r} hprice2 = wooldridge::hprice2 head(hprice2) ``` The data include the following variables * `price`: median housing price. * `nox`: Nitrous Oxide (NOX) concentration; parts per million. * `crime`: number of reported crimes per capita. * `rooms`: average number of rooms in houses in the community. * `dist`: weighted distance of the community to 5 employment centers. * `stratio`: average student-teacher ratio of schools in the community. **CODE** *Make a scatterplot of prices as a function of NOX.* ```{r} # your code here ``` **WRITING** *What do you notice?* **CODE** *Make a scatterplot of prices as a function of crime.* ```{r} # your code here ``` **WRITING** *What do you notice?* **CODE** *Run a regression that estimates price as linear function of* * `nox`: [Nitrous Oxide (NOX)](https://en.wikipedia.org/wiki/Nitrous_oxide) concentration; parts per million. * `crime`: number of reported crimes per capita. * `rooms`: average number of rooms in houses in the community. * `dist`: weighted distance of the community to 5 employment centers. * `stratio`: average student-teacher ratio of schools in the community. *and save the output to the object `m2`:* ```{r} # your code here ``` **CODE** *Verify the t-value for `proptax` is equal to the estimated coefficient divided by the standard error:* ```{r} # your code here ``` **CODE** *Verify the p-value for `proptax` using `pt()` (hint: there are 499 degrees of freedom, because we have $n=506$ observations and $6+1=7$ covariates including the intercept):* ```{r} # your code here ``` **WRITING** *What does it mean for a regression coefficient to be statistically significant?* **WRITING** *What is the hypothesis test on `proptax`? (Be sure to state both the null and alternative hypotheses)* **WRITING** *What is the hypothesis test on `nox`? (Be sure to state both the null and alternative hypotheses)* **WRITING** *Interpret the coefficient on `nox` (recall it is measured in parts per million). Be sure to note whether the coefficient is significant (and if so, why):* **WRITING** *Which has a larger average effect on prices in this sample: pollution or crime?*

X1 UNIT_NUM 1 1 ST NA NA N GIS_ID -ST_NUM - ST_NAME -ST_NAME_SUF 100001000 104 A 104 PUTNAM 100002000 197 LEXINGTON ST 100003000 199 LEXINGTON 100004000 201 LEXINGTON 100005000 203 LEXINGTON NA PID CM_ID 100001000 NA 100002000 NA 100003000 NA 100004000 NA 100005000 NA 100006000 NA 100007000 NA w NA NA 100006000 205 207 LEXINGTON NA 100007000 209 211 LEXINGTON ST NA 100008000 NA 100008000 213 LEXINGTON NA 100009000 NA 100009000 215 LEXINGTON NA 100010000 NA 100010000 217 LEXINGTON NA 10 10 11 11 12 12 100011000 NA NA 100011000 219 100012000 221 LEXINGTON LEXINGTON 100012000 NA ST NA PID CM_ID = GIS_ID ST_NUM - ST_NAME ST_NAME_SUF - UNIT_NUM 13 13 100013000 223 LEXINGTON ST NA 14 14 100014000 225 LEXINGTON ST NA 100013000 NA 100014000 NA 100015000 NA 100016000 NA 100015000 227 LEXINGTON ST NA 16 16 100016000 235 LEXINGTON NA 17 17 100017000 NA 100017000 237 LEXINGTON NA 18 18 100018000 239 LEXINGTON NA 19 19 100019000 241 LEXINGTON NA 20 20 LEXINGTON NA 100018000 NA 100019000 NA 100020000 NA 100021000 NA 100022000 NA 100023000 NA 100020000 243 100021000 243 HF 21 21 LEXINGTON ST NA 22 22 100022000 245 LEXINGTON ST NA 23 23 100023000 247 LEXINGTON ST NA X1 UNIT_NUM 1 1 ST NA NA N GIS_ID -ST_NUM - ST_NAME -ST_NAME_SUF 100001000 104 A 104 PUTNAM 100002000 197 LEXINGTON ST 100003000 199 LEXINGTON 100004000 201 LEXINGTON 100005000 203 LEXINGTON NA PID CM_ID 100001000 NA 100002000 NA 100003000 NA 100004000 NA 100005000 NA 100006000 NA 100007000 NA w NA NA 100006000 205 207 LEXINGTON NA 100007000 209 211 LEXINGTON ST NA 100008000 NA 100008000 213 LEXINGTON NA 100009000 NA 100009000 215 LEXINGTON NA 100010000 NA 100010000 217 LEXINGTON NA 10 10 11 11 12 12 100011000 NA NA 100011000 219 100012000 221 LEXINGTON LEXINGTON 100012000 NA ST NA PID CM_ID = GIS_ID ST_NUM - ST_NAME ST_NAME_SUF - UNIT_NUM 13 13 100013000 223 LEXINGTON ST NA 14 14 100014000 225 LEXINGTON ST NA 100013000 NA 100014000 NA 100015000 NA 100016000 NA 100015000 227 LEXINGTON ST NA 16 16 100016000 235 LEXINGTON NA 17 17 100017000 NA 100017000 237 LEXINGTON NA 18 18 100018000 239 LEXINGTON NA 19 19 100019000 241 LEXINGTON NA 20 20 LEXINGTON NA 100018000 NA 100019000 NA 100020000 NA 100021000 NA 100022000 NA 100023000 NA 100020000 243 100021000 243 HF 21 21 LEXINGTON ST NA 22 22 100022000 245 LEXINGTON ST NA 23 23 100023000 247 LEXINGTON ST NA