Question

1 Approved Answer

Posted on Jun 29, 2024

Multiple linear regression predicts the value of one dependent variable based on the values of more than one independent variable. In this instance, consistent notation

Multiple linear regression predicts the value of one dependent variable based on the values of more than one independent variable. In this instance, consistent notation becomes very important. In STATDISK, the multiple linear regression model (under the "analysis" and "multiple regression" drop down menus) is capable of employing the assistance of up to sixteen independent variables to predict the value of one dependent variable. This multiple linear regression model has the form:

y = b0+ b2x2+ b3x3+ b4x4+ b5x5+ b6x6+ b7x7+ b8x8+ b9x9+ b10x10+ b11x11+ b12x12+ b13x13+ b14x14+ b15x15+ b16x16+ b17x17

where

y is the dependent variable selected by choosing the appropriate column value for "dependent variable column." For consistency purposes, I usually list the dependent variable in column 1.
xi, for all i ? {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 , 14, 15, 16, 17}, are the independent variables selected by "select[ing] the columns to include in the regression analysis." Notice that the first column is also selected because the first column is also included in the regression analysis as the dependent variable.
b0is the constant coefficient.
biis the coefficient describing the strength of the independent variable value, xi.

It is not necessary to select all of the available independent variables! The model will have fewer xicoefficients when fewer columns are included in the regression analysis.

Using this model, several independent variable values may predict the value of one dependent variable. For instance, if b0= 7, b2= -2, b3= 3, b4= -1, b5= 8, b6= 123, b7= 2, b8= 1, b9= -5, b10= 1, b11= 4, b12= 6, b13= 9, b14= -13, b15= -4, b16= 11, and b17= 12, then the regression model is expressed as:

y = 7 - 2x2+ 3x3- x4+ 8x5+ 123x6+ 2x7+ x8- 5x9+ x10+ 4x11+ 6x12+ 9x13- 13x14- 4x15+ 11x16+ 12x17

If values of the independent variables are x2= 3, x3= -1, x4= 5, x5= 2, x6= 0, x7= -9, x8= -19, x9= -2, x10= 66, x11= 2, x12= 0, x13= 0, x14= 0, x15= 1, x16= 0, and x17= 0, then the linear regression equation predicts that the corresponding dependent variable value is:

y = 7 - 2(3) + 3(-1) - (5) + 8(2) + 123(0) + 2(-9) + (-19) - 5(-2) + (66) + 4(2) + 6(0) + 9(0) - 13(0) - 4(1) + 11(0) + 12(0) = 52

This means that if x2= 3, x3= -1, x4= 5, x5= 2, x6= 0, x7= -9, x8= -19, x9= -2, x10= 66, x11= 2, x12= 0, x13= 0, x14= 0, x15= 1, x16= 0, and x17= 0, then y = 52.

The multiple regression function in STATDISK is under the analysis menu. It behaves similarly to the correlation and regression function. The multiple regression function must be used when there is more than one independent variable predicting the dependent variable.

Background: Dummy Variables

A dummy variable (also known as an categorical variable) is one that takes the value 0 or 1 to indicate the absence or presence of some categorical effect. Dummy variables are used as devices to sort data into mutually exclusive categories (such as the home has central air conditioning or does not, etc.). The numerical value of dummy variables are either 0 (usually for "no") or 1 (usually for "yes").

In some cases, there is a need for more than one dummy variable to sort data into mutually exclusive categories. For example, Hollister homes were sold during one of six time frames:

1998JAN-JUN,
1998JUL-DEC,
1999JAN-JUN,
1999JUL-DEC,
2000JAN-JUN, or
2000JUL-DEC.

Six mutually exclusive variables yield only only five degrees of freedom; therefore, only five dummy variables are present in the data. Specifically, 1998JAN-JUN is not present.

For example, if a home was sold in March, 2000, then

1998JUL-DEC = 0,
1999JAN-JUN = 0,
1999JUL-DEC = 0,
2000JAN-JUN = 1, and
2000JUL-DEC = 0.

Notice that if a home was sold in January, 1998, then

1998JUL-DEC = 0,
1999JAN-JUN = 0,
1999JUL-DEC = 0,
2000JAN-JUN = 0, and
2000JUL-DEC = 0

meaning that the home was not sold in any of the dummy variable categories present forcing the sale into the only remaining (unlisted) category of 1998JAN-JUN.

Question:

Using thedata_1998-2000_MREG.csv

download

dataset, find the multiple linear regression model predicting the sale price of Hollister homes and predict the sale price of a 1600 square foot, 5 year-old Hollister home on a 10000 square foot lot with central air conditioning, a pool, and a two-car garage sold in February, 2000 located in the Cerra Vista Elementary School District (CV).

Hint: After finding the multiple linear regression model including all sixteen independent variables, use the following values variables to predict the sale price of the home:

1998JUL-DEC = 0
1999JAN-JUN = 0
1999JUL-DEC = 0
2000JAN-JUN = 1
2000JUL-DEC = 0
CAL = 0
CV = 1
GAB = 0
ROH = 0
SS = 0
CENTRAL = 1
POOL = 1
GARAGE = 2
SQUARE FOOTAGE = 1600
AGE = 5
LOT SIZE = 10000

Using the dataset, find the multiple linear regression model predicting the sale price of Hollister homes and predict the sale price of a 1600 square foot, 5 year-old Hollister home on a 10000 square foot lot with central air conditioning, a pool, and a two-car garage sold in February, 2000 located in the Cerra Vista Elementary School District (CV). Hint: After finding the multiple linear regression model including all sixteen independent variables, use the following values variables to predict the sale price of the home: 1998JUL-DEC = 0 Number of columns used: 9 1999JAN-JUN = 0 Dependent column: Possible Answer. 1999JUL-DEC = 0 Coeff, bo: Sale Price 18. 35 324 O $316,577 2000JAN-JUN = 1 Coeff, b2: Square Foot 28. 62503 2000JUL-DEC = 0 Coeff, b3: Age 7. 379748 O $343,151 Coeff, b4: Loot Size 20. 85235 . CAL = 0 Coeff, b5 : Air conditioning 39. 62026 . CV = 1 Coeff, b6: Pool 27. 84638 O $387,704 Coeff, b7 : Garage 0. 1105289 . GAB = 0 Coeff, b8: Date -0. 0928854 O $156,993 . ROH = 0 Coeff, b9: Location 0. 0004159 O $322,054 SS = 0 Total Variation: 2. 112268e+7 CENTRAL = 1 Explained Variation: 9. 478577e+6 O $344,834 Unexplained Variation: 1. 164410e+7 POOL = 1 Standard Error : 98. 26059 Coeff of Det, RA2: 0. 4487393 O $436,096 GARAGE = 2 Adjusted RA2: 0. 4450825 SQUARE FOOTAGE = 1600 P Value: 0 O $310,682 AGE = 5 . LOT SIZE = 10000 O $302,432