Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Scenario In an effort to estimate the plant biomass in a desert environment, field measurements on the diameter and height and laboratory determination of oven

Scenario

In an effort to estimate the plant biomass in a desert environment, field measurements on the diameter and height and laboratory determination of oven dry weight were obtained for a sample of plants in a sample of transects (area). The data are to be used to see how well the weight can be estimated by the more easily determined field observations.

Part 1 Section A

Multiple regression will be used with weight as the dependent variable and diameter and height as independent variables. For consistency, give the model in terms of the variable names. That is, use the variable names weight, diameter, and height in the model equation.

(i)Describe relationships between the dependent variable weight and independent variables diameter and height.

(ii)Describe the relationship between the independent variables diameter and height.

(b) Paste the coefficients table and give the estimated model in terms of the variable names:

(c) Interpret the estimated slope parameters for diameter and height.

(d) Have the assumptions for regression analysis been met? Justify your answer.

(e)Should we be concerned about multicollinearity? Justify your answer.

(f)Should we be concerned about potential outliers and influential observations? Should any observations be excluded? Justify your answer.

(g) List the estimate of the model standard deviation and the Adjusted R-squared?

-----------------------------------

#####################################################################

#

# Part 1 Section A Main Effects Model

#

The following objects are masked from 'package:base':

format.pval, units

> mat <- as.matrix(dataobj[,1:3])

> rcorr(mat, type="pearson")

diameter height weight

diameter1.000.420.75

height0.421.000.61

weight0.750.611.00

n= 34

P

diameter height weight

diameter0.0131 0.0000

height0.01310.0001

weight0.00000.0001

> # multiple regression model with original data

> model <- lm(weight ~ diameter + height, data=dataobj)

> summary(model)

Call:

lm(formula = weight ~ diameter + height, data = dataobj)

Residuals:

Min1QMedian3QMax

-4.5648 -1.0265 -0.04900.79754.9367

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2.116780.71430-2.9630.00580 **

diameter0.206910.038935.314 8.72e-06 ***

height0.118460.038243.0980.00412 **

---

Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.042 on 31 degrees of freedom

Multiple R-squared:0.6698, Adjusted R-squared:0.6484

F-statistic: 31.43 on 2 and 31 DF,p-value: 3.484e-08

> # confidence intervals on model parameters

> confint(model,level=0.95)

2.5 %97.5 %

(Intercept) -3.57360473 -0.6599513

diameter0.127507420.2863204

height0.040474030.1964519

> # confidence interval on E(y) for specified values of all

> #independent variables

> forecastdata = data.frame(diameter=33,

+height=35)

> predict(model,newdata=forecastdata,interval="confidence")

fitlwrupr

1 8.857584 7.092585 10.62258

> # prediction interval on y for specified values of all

> #independent variables

> predict(model,newdata=forecastdata,interval="prediction")

fitlwrupr

1 8.857584 4.333817 13.38135

> par(mfrow = c(2, 2))

> plot(model, main="Plant Biomass")

> # Shapiro-Wilk test for normality of errors and use alpha=0.01

> shapiro.test(resid(model))

Shapiro-Wilk normality test

data:resid(model)

W = 0.97422, p-value = 0.5868

> # load library lmtest for Breusch-Pagan test

> library(lmtest)

Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

as.Date, as.Date.numeric

Warning messages:

1: package 'lmtest' was built under R version 3.6.3

2: package 'zoo' was built under R version 3.6.2

> # Breusch-Pagan Test for a common error variance and use alpha=0.01

> bptest(model)

studentized Breusch-Pagan test

data:model

BP = 15.591, df = 2, p-value = 0.0004115

> par(mfrow = c(1, 1))

> # histogram of residuals

> hist(resid(model), main="Plant Biomass Histogram of Residuals",xlab="Residuals")

> # boxplot of residuals

> boxplot(resid(model), main="Plant Biomass Boxplot of Residuals")

> # Influential observations

> influence.measures(model)

Influence measures of

lm(formula = weight ~ diameter + height, data = dataobj) :

dfb.1_dfb.dmtrdfb.hghtdffit cov.rcook.dhat inf

10.0824440.195108 -1.30e-010.350378 0.894 3.88e-02 0.0441

2-0.0297970.0043241.76e-02 -0.033188 1.161 3.79e-04 0.0518

3-0.0327070.0040312.02e-02 -0.036595 1.162 4.61e-04 0.0531

4-0.0261660.188374 -2.71e-01 -0.332889 1.126 3.70e-02 0.1059

5-0.0445940.042521 -2.46e-02 -0.073330 1.142 1.85e-03 0.0450

6-0.0012640.0003572.20e-04 -0.001709 1.141 1.01e-06 0.0327

70.2726810.297897 -4.03e-010.587959 0.728 1.01e-01 0.0622

8-0.2343820.2254652.08e-010.468202 1.054 7.14e-02 0.1135

90.789282 -2.1611408.42e-01 -2.261384 0.722 1.33e+00 0.3437*

100.119935 -0.043514 -6.00e-020.121882 1.178 5.09e-03 0.0793

110.048982 -0.020534 -1.92e-020.050375 1.174 8.73e-04 0.0636

12 -0.0493340.068868 -8.80e-02 -0.158686 1.097 8.52e-03 0.0457

130.0184750.012872 -2.57e-020.031278 1.214 3.37e-04 0.0918

140.252433 -0.025285 -1.90e-010.276750 1.109 2.57e-02 0.0838

150.004223 -0.0099509.09e-030.013108 1.248 5.92e-05 0.1158

16 -0.0024740.071370 -1.89e-01 -0.268702 1.045 2.39e-02 0.0582

170.0261670.134479 -1.32e-010.171191 1.358 1.01e-02 0.1972*

18 -1.4145550.5382121.65e+002.246276 0.614 1.26e+00 0.3094*

190.096394 -0.051810 -2.82e-020.098934 1.172 3.36e-03 0.0709

200.035383 -0.031914 -3.20e-02 -0.068156 1.253 1.60e-03 0.1225

21 -0.0585310.083289 -7.14e-02 -0.132468 1.137 5.99e-03 0.0573

22 -0.1000170.0615131.42e-010.233646 1.161 1.85e-02 0.0964

230.056733 -0.013302 -3.19e-020.060158 1.169 1.24e-03 0.0615

24 -0.0006740.0004874.18e-06 -0.000769 1.171 2.04e-07 0.0577

250.154717 -0.099495 -3.31e-020.160225 1.172 8.76e-03 0.0846

260.0429560.046456 -1.07e-030.171680 1.045 9.86e-03 0.0322

27 -0.0805090.061843 -7.97e-02 -0.207800 1.019 1.43e-02 0.0354

28 -0.033923 -0.3468532.64e-01 -0.433256 1.067 6.15e-02 0.1093

29 -0.0189520.0131303.95e-04 -0.021649 1.166 1.61e-04 0.0544

30 -0.0355320.030123 -9.83e-03 -0.048379 1.154 8.05e-04 0.0484

31 -0.0049240.0007031.30e-03 -0.007007 1.139 1.69e-05 0.0317

32 -0.049112 -0.0282194.46e-02 -0.088137 1.129 2.66e-03 0.0403

330.043199 -0.169632 -3.42e-02 -0.325788 0.950 3.42e-02 0.0489

34 -0.0219670.358729 -5.05e-01 -0.590631 1.076 1.13e-01 0.1516

> # Check for multicollinearity

> # The VIF function is in R package regclass

> # load library regclass for VIF function

> library(regclass)

Loading required package: bestglm

Loading required package: leaps

Loading required package: VGAM

Loading required package: stats4

Loading required package: splines

Attaching package: 'VGAM'

The following object is masked from 'package:lmtest':

lrtest

Loading required package: rpart

Loading required package: randomForest

randomForest 4.6-14

Type rfNews() to see new features/changes/bug fixes.

Attaching package: 'randomForest'

The following object is masked from 'package:ggplot2':

margin

Important regclass change from 1.3:

All functions that had a . in the name now have an _

all.correlations -> all_correlations, cor.demo -> cor_demo, etc.

Attaching package: 'regclass'

The following object is masked from 'package:lattice':

qq

> VIF(model)

diameterheight

1.215745 1.215745

#####################################################################

#

# Part 1 Section B Log Transformed Model

#

> # Based on advice from a statistical consultant, a natural log

> #transformation for all the variables will be investigated

> # Create new variables using natural logarithm

> dataobj$lndiameter <- log(dataobj$diameter)

> dataobj$lnheight <- log(dataobj$height)

> dataobj$lnweight <- log(dataobj$weight)

> str(dataobj)

'data.frame': 34 obs. of6 variables:

$ diameter: num20.5 10 10.1 10.5 9.2 12.1 18.6 29.5 45 5 ...

$ height: num13 6.2 5.9 27 16.1 12.3 7.2 29 16 3.1 ...

$ weight: num6.84 0.4 0.36 1.38 1.01 ...

$ lndiameter: num3.02 2.3 2.31 2.35 2.22 ...

$ lnheight: num2.56 1.82 1.77 3.3 2.78 ...

$ lnweight: num1.92279 -0.91629 -1.02165 0.3257 0.00995 ...

> # examine scatterplot matrix of transformed variables

> plot(dataobj[,4:6],main="Plant Biomass Natural Log Transformed Variables")

> # correlation matrix with p-values

> #correlation matrix is first, then n, then p-values for each

> #pairwise correlation

> mat <- as.matrix(dataobj[,4:6])

> rcorr(mat, type="pearson")

lndiameter lnheight lnweight

lndiameter1.000.330.88

lnheight0.331.000.40

lnweight0.880.401.00

n= 34

P

lndiameter lnheight lnweight

lndiameter0.05970.0000

lnheight0.05970.0179

lnweight0.00000.0179

> # multiple regression model with log transformed data

> lnmodel <- lm(lnweight ~ lndiameter + lnheight, data=dataobj)

> summary(lnmodel)

Call:

lm(formula = lnweight ~ lndiameter + lnheight, data = dataobj)

Residuals:

Min1QMedian3QMax

-1.00305 -0.484790.041980.321641.62388

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)-4.42120.4617-9.576 8.93e-11 ***

lndiameter1.66940.17079.778 5.47e-11 ***

lnheight0.20890.13911.5020.143

---

Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6473 on 31 degrees of freedom

Multiple R-squared:0.795, Adjusted R-squared:0.7818

F-statistic: 60.12 on 2 and 31 DF,p-value: 2.144e-11

> # confidence intervals on model parameters

> confint(lnmodel,level=0.95)

2.5 %97.5 %

(Intercept) -5.36292722 -3.4795669

lndiameter1.321176222.0176019

lnheight-0.074790440.4925318

> # confidence interval on E(y) for specified values of all

> #independent variables

> forecastdata = data.frame(lndiameter=3.50,

+lnheight=3.56)

> predict(lnmodel,newdata=forecastdata,interval="confidence")

fitlwrupr

1 2.165194 1.712027 2.618361

> # prediction interval on y for specified values of all

> #independent variables

> predict(lnmodel,newdata=forecastdata,interval="prediction")

fitlwrupr

1 2.165194 0.7693451 3.561043

> par(mfrow = c(2, 2))

> plot(lnmodel, main="Plant Biomass Natural Log")

> # Shapiro-Wilk test for normality of errors and use alpha=0.01

> shapiro.test(resid(lnmodel))

Shapiro-Wilk normality test

data:resid(lnmodel)

W = 0.9615, p-value = 0.2687

> # Breusch-Pagan Test for a common error variance and use alpha=0.01

> bptest(lnmodel)

studentized Breusch-Pagan test

data:lnmodel

BP = 0.56949, df = 2, p-value = 0.7522

> par(mfrow = c(1, 1))

> # histogram of residuals

> hist(resid(lnmodel), main="Plant Biomass Natural Log Histogram of Residuals",xlab="Residuals")

> # boxplot of residuals

> boxplot(resid(lnmodel), main="Plant Biomass Natural Log Boxplot of Residuals")

> # Influential observations

> influence.measures(lnmodel)

Influence measures of

lm(formula = lnweight ~ lndiameter + lnheight, data = dataobj) :

dfb.1_ dfb.lndmdfb.lnhgdffit cov.rcook.dhat inf

1-0.084700.15880 -0.0143020.26984 1.000 2.39e-02 0.0465

2-0.151730.012260.127095 -0.24405 1.015 1.97e-02 0.0436

3-0.180710.005980.163955 -0.29242 0.974 2.78e-02 0.0460

4-0.00287 -0.021990.0443460.05858 1.184 1.18e-03 0.0720

50.01348 -0.023120.0240680.04931 1.149 8.36e-04 0.0449

60.01739 -0.004360.0116720.09156 1.107 2.86e-03 0.0299

70.006690.25300 -0.2305010.42539 0.883 5.67e-02 0.0576

8-0.126280.101800.0791650.18832 1.179 1.21e-02 0.0951

90.42413 -0.592060.063836 -0.68563 0.964 1.47e-01 0.1371

10 -0.397920.180760.247341 -0.42845 1.093 6.05e-02 0.1177

11 -0.205820.126620.071280 -0.23258 1.097 1.82e-02 0.0666

120.002530.00571 -0.019800 -0.03373 1.152 3.92e-04 0.0449

130.035880.05060 -0.0991710.11032 1.342 4.18e-03 0.1824*

140.896560.13843 -1.2514401.37180 0.830 5.39e-01 0.2363*

150.44066 -0.928550.7594181.18596 0.572 3.71e-01 0.1330*

160.02703 -0.00422 -0.052507 -0.08450 1.152 2.45e-03 0.0542

170.015730.14477 -0.1814370.21585 1.451 1.60e-02 0.2503*

18 -0.215980.139390.1647430.29485 1.220 2.94e-02 0.1417

19 -0.006290.005190.000819 -0.00693 1.227 1.65e-05 0.1010

20 -0.006970.005520.0044350.01027 1.224 3.64e-05 0.0985

210.00481 -0.012170.0147530.02440 1.167 2.05e-04 0.0552

22 -0.119590.083760.0958270.19205 1.158 1.25e-02 0.0841

230.02807 -0.00706 -0.0213640.03619 1.172 4.51e-04 0.0608

24 -0.086660.09478 -0.026782 -0.12231 1.169 5.12e-03 0.0739

25 -0.088980.09073 -0.004761 -0.10107 1.432 3.51e-03 0.2323*

26 -0.073550.090260.0501460.22837 1.023 1.73e-02 0.0416

270.034760.00969 -0.151256 -0.31638 0.906 3.19e-02 0.0390

280.10731 -0.292000.148827 -0.37366 1.027 4.57e-02 0.0803

29 -0.178060.18926 -0.054115 -0.25919 1.068 2.24e-02 0.0631

300.01663 -0.022110.0141980.03574 1.160 4.40e-04 0.0512

310.008360.005780.0058570.07533 1.117 1.94e-03 0.0299

320.00186 -0.036730.020487 -0.08090 1.128 2.24e-03 0.0376

330.13944 -0.14354 -0.072212 -0.26891 1.048 2.40e-02 0.0593

34 -0.002610.07493 -0.120393 -0.15289 1.188 7.99e-03 0.0922

> # Check for multicollinearity

> # The VIF function is in R package regclass

> VIF(lnmodel)

lndiameterlnheight

1.1191241.119124

>

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Calculus Early Transcendentals

Authors: James Stewart

8th edition

1285741552, 9781305482463 , 978-1285741550

More Books

Students also viewed these Mathematics questions

Question

How high is the cost of switching from one supplier to another?

Answered: 1 week ago