Question
Scenario In an effort to estimate the plant biomass in a desert environment, field measurements on the diameter and height and laboratory determination of oven
Scenario
In an effort to estimate the plant biomass in a desert environment, field measurements on the diameter and height and laboratory determination of oven dry weight were obtained for a sample of plants in a sample of transects (area). The data are to be used to see how well the weight can be estimated by the more easily determined field observations.
Part 1 Section A
Multiple regression will be used with weight as the dependent variable and diameter and height as independent variables. For consistency, give the model in terms of the variable names. That is, use the variable names weight, diameter, and height in the model equation.
(i)Describe relationships between the dependent variable weight and independent variables diameter and height.
(ii)Describe the relationship between the independent variables diameter and height.
(b) Paste the coefficients table and give the estimated model in terms of the variable names:
(c) Interpret the estimated slope parameters for diameter and height.
(d) Have the assumptions for regression analysis been met? Justify your answer.
(e)Should we be concerned about multicollinearity? Justify your answer.
(f)Should we be concerned about potential outliers and influential observations? Should any observations be excluded? Justify your answer.
(g) List the estimate of the model standard deviation and the Adjusted R-squared?
-----------------------------------
#####################################################################
#
# Part 1 Section A Main Effects Model
#
The following objects are masked from 'package:base':
format.pval, units
> mat <- as.matrix(dataobj[,1:3])
> rcorr(mat, type="pearson")
diameter height weight
diameter1.000.420.75
height0.421.000.61
weight0.750.611.00
n= 34
P
diameter height weight
diameter0.0131 0.0000
height0.01310.0001
weight0.00000.0001
> # multiple regression model with original data
> model <- lm(weight ~ diameter + height, data=dataobj)
> summary(model)
Call:
lm(formula = weight ~ diameter + height, data = dataobj)
Residuals:
Min1QMedian3QMax
-4.5648 -1.0265 -0.04900.79754.9367
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.116780.71430-2.9630.00580 **
diameter0.206910.038935.314 8.72e-06 ***
height0.118460.038243.0980.00412 **
---
Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.042 on 31 degrees of freedom
Multiple R-squared:0.6698, Adjusted R-squared:0.6484
F-statistic: 31.43 on 2 and 31 DF,p-value: 3.484e-08
> # confidence intervals on model parameters
> confint(model,level=0.95)
2.5 %97.5 %
(Intercept) -3.57360473 -0.6599513
diameter0.127507420.2863204
height0.040474030.1964519
> # confidence interval on E(y) for specified values of all
> #independent variables
> forecastdata = data.frame(diameter=33,
+height=35)
> predict(model,newdata=forecastdata,interval="confidence")
fitlwrupr
1 8.857584 7.092585 10.62258
> # prediction interval on y for specified values of all
> #independent variables
> predict(model,newdata=forecastdata,interval="prediction")
fitlwrupr
1 8.857584 4.333817 13.38135
> par(mfrow = c(2, 2))
> plot(model, main="Plant Biomass")
> # Shapiro-Wilk test for normality of errors and use alpha=0.01
> shapiro.test(resid(model))
Shapiro-Wilk normality test
data:resid(model)
W = 0.97422, p-value = 0.5868
> # load library lmtest for Breusch-Pagan test
> library(lmtest)
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Warning messages:
1: package 'lmtest' was built under R version 3.6.3
2: package 'zoo' was built under R version 3.6.2
> # Breusch-Pagan Test for a common error variance and use alpha=0.01
> bptest(model)
studentized Breusch-Pagan test
data:model
BP = 15.591, df = 2, p-value = 0.0004115
> par(mfrow = c(1, 1))
> # histogram of residuals
> hist(resid(model), main="Plant Biomass Histogram of Residuals",xlab="Residuals")
> # boxplot of residuals
> boxplot(resid(model), main="Plant Biomass Boxplot of Residuals")
> # Influential observations
> influence.measures(model)
Influence measures of
lm(formula = weight ~ diameter + height, data = dataobj) :
dfb.1_dfb.dmtrdfb.hghtdffit cov.rcook.dhat inf
10.0824440.195108 -1.30e-010.350378 0.894 3.88e-02 0.0441
2-0.0297970.0043241.76e-02 -0.033188 1.161 3.79e-04 0.0518
3-0.0327070.0040312.02e-02 -0.036595 1.162 4.61e-04 0.0531
4-0.0261660.188374 -2.71e-01 -0.332889 1.126 3.70e-02 0.1059
5-0.0445940.042521 -2.46e-02 -0.073330 1.142 1.85e-03 0.0450
6-0.0012640.0003572.20e-04 -0.001709 1.141 1.01e-06 0.0327
70.2726810.297897 -4.03e-010.587959 0.728 1.01e-01 0.0622
8-0.2343820.2254652.08e-010.468202 1.054 7.14e-02 0.1135
90.789282 -2.1611408.42e-01 -2.261384 0.722 1.33e+00 0.3437*
100.119935 -0.043514 -6.00e-020.121882 1.178 5.09e-03 0.0793
110.048982 -0.020534 -1.92e-020.050375 1.174 8.73e-04 0.0636
12 -0.0493340.068868 -8.80e-02 -0.158686 1.097 8.52e-03 0.0457
130.0184750.012872 -2.57e-020.031278 1.214 3.37e-04 0.0918
140.252433 -0.025285 -1.90e-010.276750 1.109 2.57e-02 0.0838
150.004223 -0.0099509.09e-030.013108 1.248 5.92e-05 0.1158
16 -0.0024740.071370 -1.89e-01 -0.268702 1.045 2.39e-02 0.0582
170.0261670.134479 -1.32e-010.171191 1.358 1.01e-02 0.1972*
18 -1.4145550.5382121.65e+002.246276 0.614 1.26e+00 0.3094*
190.096394 -0.051810 -2.82e-020.098934 1.172 3.36e-03 0.0709
200.035383 -0.031914 -3.20e-02 -0.068156 1.253 1.60e-03 0.1225
21 -0.0585310.083289 -7.14e-02 -0.132468 1.137 5.99e-03 0.0573
22 -0.1000170.0615131.42e-010.233646 1.161 1.85e-02 0.0964
230.056733 -0.013302 -3.19e-020.060158 1.169 1.24e-03 0.0615
24 -0.0006740.0004874.18e-06 -0.000769 1.171 2.04e-07 0.0577
250.154717 -0.099495 -3.31e-020.160225 1.172 8.76e-03 0.0846
260.0429560.046456 -1.07e-030.171680 1.045 9.86e-03 0.0322
27 -0.0805090.061843 -7.97e-02 -0.207800 1.019 1.43e-02 0.0354
28 -0.033923 -0.3468532.64e-01 -0.433256 1.067 6.15e-02 0.1093
29 -0.0189520.0131303.95e-04 -0.021649 1.166 1.61e-04 0.0544
30 -0.0355320.030123 -9.83e-03 -0.048379 1.154 8.05e-04 0.0484
31 -0.0049240.0007031.30e-03 -0.007007 1.139 1.69e-05 0.0317
32 -0.049112 -0.0282194.46e-02 -0.088137 1.129 2.66e-03 0.0403
330.043199 -0.169632 -3.42e-02 -0.325788 0.950 3.42e-02 0.0489
34 -0.0219670.358729 -5.05e-01 -0.590631 1.076 1.13e-01 0.1516
> # Check for multicollinearity
> # The VIF function is in R package regclass
> # load library regclass for VIF function
> library(regclass)
Loading required package: bestglm
Loading required package: leaps
Loading required package: VGAM
Loading required package: stats4
Loading required package: splines
Attaching package: 'VGAM'
The following object is masked from 'package:lmtest':
lrtest
Loading required package: rpart
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 'randomForest'
The following object is masked from 'package:ggplot2':
margin
Important regclass change from 1.3:
All functions that had a . in the name now have an _
all.correlations -> all_correlations, cor.demo -> cor_demo, etc.
Attaching package: 'regclass'
The following object is masked from 'package:lattice':
> VIF(model)
diameterheight
1.215745 1.215745
#####################################################################
#
# Part 1 Section B Log Transformed Model
#
> # Based on advice from a statistical consultant, a natural log
> #transformation for all the variables will be investigated
> # Create new variables using natural logarithm
> dataobj$lndiameter <- log(dataobj$diameter)
> dataobj$lnheight <- log(dataobj$height)
> dataobj$lnweight <- log(dataobj$weight)
> str(dataobj)
'data.frame': 34 obs. of6 variables:
$ diameter: num20.5 10 10.1 10.5 9.2 12.1 18.6 29.5 45 5 ...
$ height: num13 6.2 5.9 27 16.1 12.3 7.2 29 16 3.1 ...
$ weight: num6.84 0.4 0.36 1.38 1.01 ...
$ lndiameter: num3.02 2.3 2.31 2.35 2.22 ...
$ lnheight: num2.56 1.82 1.77 3.3 2.78 ...
$ lnweight: num1.92279 -0.91629 -1.02165 0.3257 0.00995 ...
> # examine scatterplot matrix of transformed variables
> plot(dataobj[,4:6],main="Plant Biomass Natural Log Transformed Variables")
> # correlation matrix with p-values
> #correlation matrix is first, then n, then p-values for each
> #pairwise correlation
> mat <- as.matrix(dataobj[,4:6])
> rcorr(mat, type="pearson")
lndiameter lnheight lnweight
lndiameter1.000.330.88
lnheight0.331.000.40
lnweight0.880.401.00
n= 34
P
lndiameter lnheight lnweight
lndiameter0.05970.0000
lnheight0.05970.0179
lnweight0.00000.0179
> # multiple regression model with log transformed data
> lnmodel <- lm(lnweight ~ lndiameter + lnheight, data=dataobj)
> summary(lnmodel)
Call:
lm(formula = lnweight ~ lndiameter + lnheight, data = dataobj)
Residuals:
Min1QMedian3QMax
-1.00305 -0.484790.041980.321641.62388
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)-4.42120.4617-9.576 8.93e-11 ***
lndiameter1.66940.17079.778 5.47e-11 ***
lnheight0.20890.13911.5020.143
---
Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6473 on 31 degrees of freedom
Multiple R-squared:0.795, Adjusted R-squared:0.7818
F-statistic: 60.12 on 2 and 31 DF,p-value: 2.144e-11
> # confidence intervals on model parameters
> confint(lnmodel,level=0.95)
2.5 %97.5 %
(Intercept) -5.36292722 -3.4795669
lndiameter1.321176222.0176019
lnheight-0.074790440.4925318
> # confidence interval on E(y) for specified values of all
> #independent variables
> forecastdata = data.frame(lndiameter=3.50,
+lnheight=3.56)
> predict(lnmodel,newdata=forecastdata,interval="confidence")
fitlwrupr
1 2.165194 1.712027 2.618361
> # prediction interval on y for specified values of all
> #independent variables
> predict(lnmodel,newdata=forecastdata,interval="prediction")
fitlwrupr
1 2.165194 0.7693451 3.561043
> par(mfrow = c(2, 2))
> plot(lnmodel, main="Plant Biomass Natural Log")
> # Shapiro-Wilk test for normality of errors and use alpha=0.01
> shapiro.test(resid(lnmodel))
Shapiro-Wilk normality test
data:resid(lnmodel)
W = 0.9615, p-value = 0.2687
> # Breusch-Pagan Test for a common error variance and use alpha=0.01
> bptest(lnmodel)
studentized Breusch-Pagan test
data:lnmodel
BP = 0.56949, df = 2, p-value = 0.7522
> par(mfrow = c(1, 1))
> # histogram of residuals
> hist(resid(lnmodel), main="Plant Biomass Natural Log Histogram of Residuals",xlab="Residuals")
> # boxplot of residuals
> boxplot(resid(lnmodel), main="Plant Biomass Natural Log Boxplot of Residuals")
> # Influential observations
> influence.measures(lnmodel)
Influence measures of
lm(formula = lnweight ~ lndiameter + lnheight, data = dataobj) :
dfb.1_ dfb.lndmdfb.lnhgdffit cov.rcook.dhat inf
1-0.084700.15880 -0.0143020.26984 1.000 2.39e-02 0.0465
2-0.151730.012260.127095 -0.24405 1.015 1.97e-02 0.0436
3-0.180710.005980.163955 -0.29242 0.974 2.78e-02 0.0460
4-0.00287 -0.021990.0443460.05858 1.184 1.18e-03 0.0720
50.01348 -0.023120.0240680.04931 1.149 8.36e-04 0.0449
60.01739 -0.004360.0116720.09156 1.107 2.86e-03 0.0299
70.006690.25300 -0.2305010.42539 0.883 5.67e-02 0.0576
8-0.126280.101800.0791650.18832 1.179 1.21e-02 0.0951
90.42413 -0.592060.063836 -0.68563 0.964 1.47e-01 0.1371
10 -0.397920.180760.247341 -0.42845 1.093 6.05e-02 0.1177
11 -0.205820.126620.071280 -0.23258 1.097 1.82e-02 0.0666
120.002530.00571 -0.019800 -0.03373 1.152 3.92e-04 0.0449
130.035880.05060 -0.0991710.11032 1.342 4.18e-03 0.1824*
140.896560.13843 -1.2514401.37180 0.830 5.39e-01 0.2363*
150.44066 -0.928550.7594181.18596 0.572 3.71e-01 0.1330*
160.02703 -0.00422 -0.052507 -0.08450 1.152 2.45e-03 0.0542
170.015730.14477 -0.1814370.21585 1.451 1.60e-02 0.2503*
18 -0.215980.139390.1647430.29485 1.220 2.94e-02 0.1417
19 -0.006290.005190.000819 -0.00693 1.227 1.65e-05 0.1010
20 -0.006970.005520.0044350.01027 1.224 3.64e-05 0.0985
210.00481 -0.012170.0147530.02440 1.167 2.05e-04 0.0552
22 -0.119590.083760.0958270.19205 1.158 1.25e-02 0.0841
230.02807 -0.00706 -0.0213640.03619 1.172 4.51e-04 0.0608
24 -0.086660.09478 -0.026782 -0.12231 1.169 5.12e-03 0.0739
25 -0.088980.09073 -0.004761 -0.10107 1.432 3.51e-03 0.2323*
26 -0.073550.090260.0501460.22837 1.023 1.73e-02 0.0416
270.034760.00969 -0.151256 -0.31638 0.906 3.19e-02 0.0390
280.10731 -0.292000.148827 -0.37366 1.027 4.57e-02 0.0803
29 -0.178060.18926 -0.054115 -0.25919 1.068 2.24e-02 0.0631
300.01663 -0.022110.0141980.03574 1.160 4.40e-04 0.0512
310.008360.005780.0058570.07533 1.117 1.94e-03 0.0299
320.00186 -0.036730.020487 -0.08090 1.128 2.24e-03 0.0376
330.13944 -0.14354 -0.072212 -0.26891 1.048 2.40e-02 0.0593
34 -0.002610.07493 -0.120393 -0.15289 1.188 7.99e-03 0.0922
> # Check for multicollinearity
> # The VIF function is in R package regclass
> VIF(lnmodel)
lndiameterlnheight
1.1191241.119124
>
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started