Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

History of Data: The dataset was compiled by David Harrison of Harvard and Daniel Rubenfeld of University of Michigan who in the late 1970s investigated

History of Data: The dataset was compiled by David Harrison of Harvard and Daniel Rubenfeld of University of Michigan who in the late 1970’s investigated the relationship between housing values and the willingness to pay for clean air. The hypothesis in this study proposes that environmental pollution should have a negative impact on house prices. The Boston Housing Dataset contains 506 observations and includes 14 non-constant independent variables, which are listed below.

1. CRIM per capita crime rate by town

2. ZN proportion of residential land zoned for lots over 25,000 sq. ft.

3. INDUS proportion of non-retail business acres per town

4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

5. NOX nitric oxides concentration (parts per 10 million)

6. RM average number of rooms per dwelling

7. AGE proportion of owner-occupied units built prior to 1940

8. DIS weighted distances to five Boston employment centers

9. RAD index of accessibility to radial highways

10. TAX full-value property-tax rate per $10,000

11. PTRATIO pupil-teacher ratio by town

12. B 1000(Bk - 0.63) ^2 where Bk is the proportion of blacks by town

13. LSTAT % lower status of the population

14. MEDV (Y) Median value of owner-occupied homes in $1000\'s (response variable)

A. From the variables given: Build a model to test, i.e., choose the dependent variable and the independent variable or variables. You must justify your choice of dependent variables and independent variables (this means justifying excluding variables as well). Justify whether you would expect the independent variable(s) to have a positive or negative effect on the dependent variable. You must also justify your choice of a linear or non-linear model.

Theory Formation:

Multiple regression is used when we want to predict the value of a variable based on the value of two or more other variables. The variables we are using to predict the value of the dependent variable are called the independent variables. A dependent or target variable, in this case the monetary value of houses (MEDV) , is dependent on (in this case three independent variables) and is the variable we seek to predict.

To predict the value of houses, a multiple regression model will be constructed with the following features (independent variables): NOX (nitric oxides concentration (parts per 10 million), RM (average number of rooms per dwelling), LSTAT (% lower status of the population), and PTRATIO (pupil-teacher ratio by town).

The justification for selecting the four features (independent variables) above are as follows: The hypothesis is based on the theory that higher Nitric Oxide concentrations have a direct impact on housing prices. This would related to a negative impact on housing values. One would expect that a higher RM, would be related to a higher MEDV. This is due to the fact larger houses (more rooms) typically cost more and, therefore, have a positive impact on the MEDV. Regarding LSTAT, one would expect to observe a lower MEDV with a higher LSTAT, therefore, this feature has a negative impact on MEDV. Generally speaking, a neighborhood with “lower class” citizens will have housing with lower prices, thus a negative impact on MEDV. For a higher LSTAT, one would expect to observe a lower MEDV. Typically, a lower teacher-to-student ratio is related to lower performance in students, which is more typical for areas with lower housing costs. Consequently, this feature has a negative impact on MEDV.

Conversely, the nine non-constant independent variables not selected for this study include: CRIM, ZN, INDUS, CHAS, AGE, DIS, RAD, TAX and B . While crime often appears in areas of lower housing values, it seems to be a result of lower economic status, not a predictor of lower housing values. Regarding ZN and INDUS, proportion of residential land zoned for lots over 25,000 sq. ft. and proportion of non-retail business acres per town do not seem to have a direct relationship to MEDV…………

Calculate Statistics:

B. Carefully and completely explain your results.

C. Test- conduct all hypothesis tests discussed in class on your model.

BOSTON DATASET ONE- YEAR PTRATI CRIM ZN INDUS CHAS AGE DIS RAD TAX B MEDV LSTAT RM NOX 1 4 8 10 12 14 13 61 11 5 0.6567 20 3.97 100 2.0107 264 391.93 30.1 6.9 6.842 13 0.647 81.8 2.1121 89.4 2.1398 0.5401 20 3.97 264 392.8 33.8 9.59 7.203 13 0.647 0.5341 20 3.97 5 264 388.37 43.1 7.26 7.52 13 0.647 0.5201 20 3.97 91.5 2.2885 264 386.86 48.8 5.91 8.398 13 0.647 0.8253 20 3.97 94.5 2.0788 264 393.42 31 11.25 7.327 13 0.647 0.5501 20 3.97 91.6 1.9301 264 387.89 36.5 8.1 7.206 13 0.647 62.8 1.9865 392.4 264 384.07 0.7616 20 3.9 5 264 22.8 10.45 5.56 13 0.647 0.7857 20 3.97 84.6 2.1329 30.7 14.79 7.014 13 0.647 0.5783 20 3.97 67.0.0 2.4216 5 264 384.54 50 7.44 8.297 13 0.575 2.872 61.5 3.9175 0.5405 20 3.97 52.6 5 264 390.3 43.5 3.16 7.47 13 0.575 0.0907 20 6.96 1 3 223 391.34 20.7 13.65 5.92 18.6 0.464 0.2992 20 6.96 42.1 4.429 3 223 388.65 21.1 13 5.856 18.6 0.464 396.9 223 394.96 0.1621 20 6.96 16.3 4.429 3 223 25.2 6.59 6.24 18.6 0.464 0.1146 0.2219 20 6.96 58.7 3.9175 3 24.4 7.73 6.538 18.6 0.464 20 6.96 51.8 4.3665 223 390.77 35.2 6.58 7.691 18.6 0.464 0.0564 40 6.41 1 32.9 4.0776 4 254 396.9 32.4 3.53 6.758 17.6 0.447 0.096 40 6.41 42.8 4.2673 4 254 396.9 32 2.98 6.854 17.6 0.447 0.1047 40 6.41 1 49 4.7872 4 254 389.25 33.2 6.05 7.267 17.6 0.447 27.6 4.8628 6.826 6.482 0.0613 40 6.41 1 4 254 393.45 33.1 4.16 17.6 0.447 0.0798 40 6.41 32.1 4.1403 4 254 396.9 29.1 7.19 17.6 0.447 0.2104 20 3.33 32.2 4.1007 216 396.9 35.1 4.85 6.812 14.9 0.4429 14.9 0.4429 14.9 0.4429 0.0358 20 3.33 64.5 4.6947 5 216 387.31 45.4 3.76 7.82 0.0371 20 3.33 37.2 5.2447 216 392.23 35.4 4.59 6.968 0.0613 20 3.33 1 49.7 5.2119 5 216 377.07 46 3.01 7.645 14.9 0.4429 0.015 90 1.21 1 24.8 5.885 1 198 395.52 50 3.16 7.923 13.6 0.401 0.0091 90 2.97 20.8 73,073 1 285 394.72 32.2 7.85 7.088 15.3 0.4 0.011 55 2.25 31.9 7.3073 1 300 394.72 22 8.23 6.453 15.3 0.389 0.0197 0.0387 0.0459 80 1.76 31.5 9.0892 241 341.6 20.1 12.93 6.23 18.2 0.385 293 6.209 6.315 52.5 5.32 31.3 7.3172 396.9 23.2 7.14 16.6 0.405 52.5 5.32 45.6 7.3172 293 396.9 22.3 7.6 16.6 0.405 293 371.72 245 396.9 0.043 52.5 5.32 22.9 7.3172 6 24.8 9.51 6.565 16.6 0.405 0.035 80 4.95 27.9 5.1167 4 28.5 3.33 6.861 19.2 0.411 0.0789 80 4.95 27.7 5.1167 4 245 396.9 37.3 3.56 7.148 19.2 0.411 0.0362 80 4.95 23.4 5.1167 4 245 396.9 27.9 4.7 6.63 19.2 0.411 0.0827 13.92 18.4 5.5027 4 289 396.9 23.9 8.58 6.127 16 0.437 0.082 13.92 42.3 5.5027 4 289 396.9 21.7 10.4 6.009 16 0.437 0.1293 13.92 31.1 5.9604 4 289 396.9 28.6 6.27 6.678 16 0.437 0.0537 13.92 51,00 5.9604 4 289 392.85 27.1 7.39 6.549 16 0.437 0.141 13.92 58 6.32 4 289 396.9 20.3 15.84 5.79 16 0.437 0.0647 70 2.24 20.1 7.8278 5 358 368.24 22.5 4.97 6.345 14.8 0.4 0.0556 70 2.24 10 7.8278 358 371.58 29 4.74 7.041 14.8 0.4 0.0442 70 2.24 47.4 7.8278 358 390.86 24.8 6.07 6.871 14.8 0.4 0.0354 34 6.09 40.4 5.4917 7 329 395.75 22 9.5 6.59 16.1 0.433 0.0927 34 6.09 18.4 5.4917 7 329 383.61 26.4 8.67 6.495 16.1 0.433 0.1 0.0552 34 6.09 17.7 5.4917 329 390.43 33.1 4.86 6.982 16.1 0.433 33 2.18 41.1 4.022 7 222 393.68 36.1 6.93 7.236 18.4 0.472 0.0548 33 2.18 58.1 3.37 7 222 393.36 28.4 8.93 6.616 18.4 0.472 0.075 33 2.18 71.9 3.0992 7 222 396.9 33.4 6.47 7.42 18.4 0.472 0.0493 33 2.18 70.3 3.1827 7 222 396.9 28.2 7.53 6.849 18.4 0.472 0.493 99 82.5 3.3175 4 304 396.9 22.8 4.54 6.635 18.4 0.544 0.3494 9.9 76.7 3.1025 4 304 396.24 20.3 9.97 5.972 18.4 0.544 2.6355 9.9 37.8 2.5194 4 304 3.50.45 16.1 12.64 4.973 18.4 0.544 0.7904 9.9 52.8 2.6403 4 304 396.9 22.1 5.98 6.122 18.4 0.544 0.2617 9.9 90.4 2.834 4 304 396.3 19.4 11.72 6.023 18.4 0.544 0.2694 9.9 82.8 3.2628 4 304 393.39 21.6 7.9 6.266 18.4 0.544

Step by Step Solution

3.54 Rating (161 Votes )

There are 3 Steps involved in it

Step: 1

Solution To make this question a little simpler we will solve this in excel sheet of which some pict... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Operations Management in the Supply Chain Decisions and Cases

Authors: Roger Schroeder, M. Johnny Rungtusanatham, Susan Goldstein

6th edition

73525243, 978-0073525242

More Books

Students also viewed these Mathematics questions