New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
statistics principles and methods
Introduction To Robust Estimation And Hypothesis Testing 5th Edition Rand R. Wilcox - Solutions
1. Comment on the relative merits of using a linear model versus a smoother in the context of ANCOVA.
12. Generate data from a bivariate normal distribution with the R command x=rmul(200).Then enter the R command y=x[,1]+x[,2]+x[,1]*x[,2]+rnorm(200), and examine the plot returned by the R command gamplot(x,y,scale=TRUE). Compare this to the plot returned by the R command gamplotINT(x,y,scale=TRUE).
11. Generate 25 pairs of observations from a bivariate normal distribution having correlation 0, and store them in x. (The R function rmul, written for this book, can be used.) Generate 25 more observations, and store them in y. Create a smooth using rplot using scale=T, and compare it to the
10. Generate 25 observations from a standard normal distribution, and store the results in the R variable x. Generate 25 more observations, and store them in y. Use rungen to plot a smooth based on the Harrell–Davis estimator of the median. Also create a smooth with the argument scat=F. Comment
9. For the reading data in the upper right panel of Fig. 11.5, recreate the smooth. If you wanted to find a parametric regression equation, what might be tried? Examine how well your suggestions perform.
8. The data in the lower left panel of Fig. 11.5 are stored in the file agegesell.dat. Remove the two pairs of points having the largest x value, and create a running interval smoother using the data that remain.
7. For the reading data in the file read.dat, use the R function rplot to investigate the shape of the regression surface when predicting the 20% trimmed mean of WWISST2 (the data in column 8) with RAN1T1 and RAN2T1 (the data in columns 4 and 5).
6. For the reading data in file read.dat, let x be the data in column 2 (TAAST1), and suppose it is desired to predict y, the data in column 8 (WWISST2). Speculate on whether there are situations where it would be beneficial to use x2 to predict y taking into account the value stored in column 3
5. The example at the end of Section 11.5.5 are based on the data stored in the file A3B3C_dat.txt, which can be downloaded as described in Section 1.10. Verify the results in Section 11.5.5 when leverage points are removed.
4. Use the function winreg to estimate the slope and intercept of the star data using 20%Winsorization. (The data are stored in the file star.dat. See Section 1.8 on how to obtain the data.)
3. For the data in Exercise 1, test H0: β1 = 0 with the functions regci and regtest. Comment on the results.
2. Section 8.6.2 reports data on the effects of consuming alcohol on three different occasions. Using the data for group 1, suppose it is desired to predict the response at time 1 using the responses at times 2 and 3. Test H0: β1 = β2 = 0 using the R function regtest and βˆm.
1. For the data in Exercise 1 of Chapter 10, the 0.95 confidence interval for the slope, based on the least squares regression line, is (0.0022, 0.0062). Using R, the 0.95 confidence interval for the slope returned by lsfitci is (0.003, 0.006). The 0.95 confidence interval returned by the R
15. Graphically illustrate the difference between a regression outlier and a good leverage point. That is, plot some points for which y = β1x + β0 and then add some points that represent regression outliers and good leverage points.
14. For the data in Exercise 13, identify any leverage points using the hat matrix. Next, identify leverage points with the function reglev. How do the results compare?
13. For the data used in Exercise 11, RAN1T1 and RAN2T1 (stored in columns 4 and 5)are measures of digit naming speed and letter naming speed. Use M regression with Schweppe weights to estimate the regression parameters when predicting WWISST2.Use the function elimna, described in Chapter 1, to
12. For the data used in Exercise 11, compute the hat matrix and identify any leverage points. Also check for leverage points with the R function reglev. How do the results compare?
11. The file read.dat contains reading data collected by Doi. Of interest is predicting WWISST2, a word identification score (stored in column 8), using TAAST1, a measure of phonological awareness stored in column 2, and SBT1 (stored in column 3), another measure of phonological awareness. Compare
10. For the data in Exercise 6, verify that the 0.95 confidence interval for the regression parameters, using the R function regci with M regression and Schweppe weights, are(−0.2357, 0.3761) and (−0.0231, 1.2454). Also verify that if regci is used with OLS, the confidence intervals are
9. Referring to Exercise 6, how do the results compare to the results obtained with the R function reglev?
8. For the data used in the previous exercise, compute 0.95 confidence intervals for the parameters using OLS as well as M regression with Schweppe weights.
7. The example in Section 6.6.1 reports the results of drinking alcohol for two groups of subjects measured at three different times. Using the group 1 data, compute an OLS estimate of the regression parameters for predicting the time 1 data using the data based on times 2 and 3. Compare the
6. Compute the hat matrix for the data in Exercise 1. Which x values are identified as leverage points? Relate the result to the previous exercise.
5. For the data in Exercise 1, use the R function reglev to comment on the advisability of using M regression with Schweppe weights.
4. Let T be any regression estimator that is affine equivariant. Let A be any non-singular square matrix. Argue that the predicted y values, yˆi, remain unchanged when xi is replaced by xiA.
3. Using the data in Exercise 1, show that the estimate of the slope given by βˆch is 0.057.In contrast, the OLS estimate is 0.0045, and βˆm = 0.0042. Comment on the difference among the three estimates
2. Discuss the relative merits of βˆch.
1. The average LSAT scores (x) for the 1973 entering classes of 15 American law schools and the corresponding grade point averages (y) are as follows:x: 576 635 558 578 666 580 555 661 651 605 653 575 545 572 594 y: 3.39 3.30 2.81 3.03 3.44 3.07 3.00 3.43 3.36 3.13 3.12 2.74 2.76 2.88 2.96 Using
14. Let X be a standard normal random variable, and suppose Y is a contaminated normal with probability density function given by Eq. (1.1). Let Q = ρX + 1 − ρ2Y , −1 ≤ρ ≤ 1. Verify that the correlation between X and Q is
13. If in the definition of the biweight midcovariance, the median is replaced by the biweight measure of location, the biweight midcovariance is equal to zero under independence.Describe some negative consequences of replacing the median with the biweight measure of location.
12. The definition of the percentage bend correlation coefficient, ρpb, involves a measure of scale, ωx, that is estimated with ωˆ = W(m), where Wi = |Xi − Mx| and m = [(1 − β)n], where 0 ≤ β ≤ 0.5. Note that this measure of scale is defined even when 0.5 0. Argue that the finite
11. For the data in the file read.dat, test for independence using the data in columns 4 and 5 and β = 0.1.
10. For the data used in the last two exercises, test the hypothesis of independence using the function indt. Why might indt find an association not detected by any of the correlations covered in this chapter?
9. Examine the variables in the last exercise using the R function mscor.
8. Use the data in the file read.dat and test for independence using the data in columns 2, 3, and 10 and the R function pball. Try β = 0.1, 0.3, and 0.5. Comment on any discrepancies.
7. The method for detecting outliers, described in Section 6.4.3, could be modified by replacing the MVE estimator with the Winsorized mean and covariance matrix. Discuss how this would be done and its relative merits.
6. Repeat the previous problem using the data for group 2.
5. Using the group 1 alcohol data in Section 8.6.2, compute the MVE estimate of correlation, and compare the results to the biweight midcorrelation, the percentage bend correlation using β = 0.1, 0.2, 0.3, 0.4, and 0.5, the Winsorized correlation using γ = 0.1 and 0.2, and the skipped correlation.
4. Use the function cov.mve(m,cor=TRUE) to compute the MVE correlation for the star data in Fig. 9.2. Compare the results to the Winsorized, percentage bend, skipped, and biweight correlations, as well as the M-estimate of correlation returned by the R function relfun.
3. Demonstrate that heteroscedasticity affects the probability of a Type I error when testing the hypothesis of a zero correlation based on any type M correlation and non-bootstrap method covered in this chapter.
2. Repeat Exercise 1 with Spearman’s rho, the percentage bend correlation, and the Winsorized correlation.
1. Generate 20 observations from a standard normal distribution, and store them in the R variable ep. Repeat this, and store the values in x. Compute y=x+ep, and compute Kendall’s tau. Generally, what happens if two pairs of points are added at (2.1, −2.4)?Does this have a large impact on tau?
6. Analyze the data for murderers in Section 6.3.5 using the methods in Sections 8.6.1 and 8.6.4.
5. Repeat Exercises 3 and 4 using the data for the murderers in Table 6.1.
4. Repeat Exercise 3 using the rank-based method in Section 8.5. How do the results compare to using a measure of location?
3. Analyze the data for the control group mentioned in Section 6.12.1 using the methods in Sections 8.1 and 8.2. The data are stored in the file schiz2.data, which can be accessed as described in Section 1.10. Compare and contrast the results.
2. For the data used in Exercise 1, compute confidence intervals for all pairs of trimmed means using the R function pairdepb.
1. Section 8.6.2 reports data on hangover symptoms. For group 2, use the R function rmanova to compare the trimmed means corresponding to times 1, 2, and 3.
12. For the schizophrenia data in Section 7.8.4, compare the groups with t1way and pbadepth.
11. Generate data for a 2-by-3 design, and use the function pbad2way. Note the contrast coefficients for interactions. If you again use pbad2way, but with conall=FALSE, what will happen to these contrast coefficients? Describe the relative merits of using conall=TRUE.
10. Snedecor and Cochran (1967) report weight gains for rats randomly assigned to one of four diets that varied in the amount and source of protein. The data are stored in the file Snedecor_dat.txt, which can be retrieved as described in Section 1.10.Verify the results based on the R function
9. For the data in the previous two exercises, perform all pairwise comparisons using the Harrell–Davis estimate of the median.
8. For the data in the previous exercise, compare the groups using both the Rust–Fligner and the Brunner–Dette–Munk method.
7. Suppose three different drugs are being considered for treating some disorder, and it is desired to check for side effects related to liver damage. Further suppose that the following data are collected on 28 participants.The values under the columns headed by ID indicate which of the three drugs
6. Using the data from the previous two exercises, compare the 20% trimmed means of the experimental group to the control, taking into account grade. Also test for no interactions using lincon and linconb. Is there reason to suspect that the confidence interval returned by linconbt will be longer
5. Using the data in the previous exercise, use the function lincon to compare the experimental group to the control group, taking into account grade and the two tracking abilities. (Again, tracking abilities 2 and 3 are combined.) Comment on whether the results support the conclusion that the
4. Some psychologists have suggested that teachers’ expectancies influence intellectual functioning. The file VIQ.dat contains pretest verbal IQ scores for students in grades 1 and 2 who were assigned to one of three ability tracks. (The data are from Elashoff and Snow, 1970, and originally
3. From well-known results on the random effects model (e.g., Graybill, 1976; Jeyaratnam and Othman, 1985), it follows that Use these results to derive an alternative estimate of ρWI.
2. If data are generated from exponential distributions, what problems would you expect in terms of probability coverage when computing confidence intervals? What problems with power might arise?
1. Describe how M-measures of location might be compared in a two-way design with a percentile bootstrap method. What practical problem might arise when using the bootstrap and sample sizes are small?
18. This exercise deals with the likelihood of a correct decision when using classification method C3 in Section 6.16. Imagine that training data contain 100 individuals from the first of two groups and 200 from the other. Further imagine that the test data consist of 10 individuals from the first
16. Argue that when testing Eq. (6.27), this provides a metric-free method for comparing groups based on scatter.17. The goal is to run simulations to compare the mean squared error and bias of two estimators designed for functional data. Here is some R code for generating data according to a
15. For the EEG data, compare the two groups with the method in Section 6.11.
14. For the EEG data, compare the two groups with the method in Section 6.10.
13. For the EEG data, compare the two groups with the method in Section 6.9.
12. For the EEG data used in Exercise 1, compare the two groups with the method in Section 6.7.
11. For the cork boring data mentioned in Section 5.9.10, imagine that the goal is to compare the north, east, and south sides to the west side. How might this be done with the software in Section 6.6.1? Perform the analysis and comment on the results. (The data are stored in the file corkall.dat;
10. The file read.dat contains data from a reading study conducted by L. Doi. Columns 4 and 5 contain measures of digit naming speed and letter naming speed. Use both the relplot and the MVE method to identify any outliers. Compare the results and comment on any discrepancies.
9. The MVE method of detecting outliers, described in Section 6.4.3, could be modified by replacing the MVE estimator of location with the Winsorized mean, and replacing the covariances with the Winsorized covariances described in Section 5.9.3. Discuss how this would be done and its relative
8. The average LSAT scores (X) for the 1973 entering classes of 15 American law schools and the corresponding grade point averages (Y ) are as follows.X: 576 635 558 578 666 580 555 661 651 605 653 575 545 572 594 Y : 3.39 3.30 2.81 3.03 3.44 3.07 3.00 3.43 3.36 3.13 3.12 2.74 2.76 2.88 2.96 Use a
7. Give a general description of a situation where for n = 20, the minimum depth among all points is 3/20.
6. Suppose that for each row of an n-by-p matrix, its depth is computed relative to all n points in the matrix. What are the possible values that the depths might be?
5. Repeat the last two exercises, but now use the cork data mentioned in Section 6.7.3.
4. Repeat the last exercise using the data for group 2.
3. For the data used in the last two exercises, check for outliers among the first group using the methods in Section 6.4. Comment on why the number of outliers found differs among the methods.
2. Repeat the last exercise using the data for group 2. These are the measures in the last four columns.
1. For the EEG data mentioned in Section 6.3.5, compute the MVE, MCD, OP, and the Donoho–Gasko 0.2 trimmed mean for group 1. This corresponds to the first four measures, which are based on murderers.
16. Section 5.9.15 described a method for comparing the variances of two dependent variables. It was noted that when distributions differ in shape, the method can fail to control the Type I error probability. Execute the following R commands to get a sense of why:
15. Using R, generate 30 observations from a standard normal distribution and store the values in x. Generate 20 observations from a chi-squared distribution with one degree of freedom, and store them in z. Compute y=4(z-1), so, x and y contain data sampled from distributions having identical
14. Let D = X − Y , let θD be the population median associated with D, and let θX and θY be the population medians associated with X and Y , respectively. Verify that under general conditions, θD = θX − θY .
13. The file tumor.dat contains data on the number of days to occurrence of a mammary tumor in 48 rats injected with a carcinogen and subsequently randomized to receive either the treatment or the control. The data were collected by Gail et al. (1980) and represent only a portion of the results
12. Continuing the last exercise, examine a boxplot of the data. What would you expect to happen if the 0.95 confidence interval is computed using a bootstrap-t method? Verify your answer using the R function yuenbt.
11. The file pyge.dat (see Section 1.8) contains pretest reasoning IQ scores for students in grades 1 and 2 who were assigned to one of three ability tracks. (The data are from Elashoff and Snow, 1970, and originally collected by R. Rosenthal.) The file pygc.dat contains data for a control group.
10. Section 5.9.6 used some hypothetical data to illustrate the R function yuend with 20%trimming. Use the function to compare the means. Verify that the estimated standard error of the difference between the sample means is smaller than the standard error of the difference between the 20% trimmed
9. The example at the end of Section 5.3.3 examined some data from an experiment on the effects of drinking alcohol. Another portion of the study consisted of measuring the effects of alcohol over 3 days of drinking. The scores for the control group, for the first 2 days of drinking, are 4, 47, 35,
8. Compute a confidence interval for p using Salk’s data.
7. Apply the Yuen–Welch method using Salk’s data, where the amount of trimming is 0, 0.05, 0.1, and 0.2. Compare the estimated standard errors of the difference between the trimmed means.
6. Verify that if X and Y are independent, the third moment about the mean of X − Y isμx[3] − μy[3].
5. Compare the deciles only, using the Harrell–Davis estimator and Salk’s data.
4. Consider two independent groups having identical distributions. Suppose four observations are randomly sampled from the first and three from the second. Determine P(D = 1) and P(D = 0.75), where D is given by Eq. (5.4). Verify your results with the R function kssig.
3. Summarize the relative merits of using the weighted versus unweighted Kolmogorov–Smirnov test. Also discuss the merits of the Kolmogorov–Smirnov test relative to comparing measures of location.
2. For the ozone data stored in the file rat_data.txt, compare the two groups using the weighted Kolmogorov–Smirnov test. Plot the shift function and its 0.95 confidence band.Compare the results with the unweighted test.
1. For Salk’s data stored in the file salk_dat.txt, compare the two groups using the weighted Kolmogorov–Smirnov test. Plot the shift function and its 0.95 confidence band. Compare the results with the unweighted test.
12. Generate 20 observations from a g-and-h distribution with g = h = 0.5. (This can be done with the R function ghdist, written for this book.) Examine a boxplot of the data.Repeat this 10 times. Comment on the strategy of examining a boxplot to determine whether the confidence interval for the
11. For the LSAT data in Table 4.3, compute a 0.95 bootstrap-t confidence interval for mean using the R function trimcibt with plotit=T. Note that a boxplot finds no outliers. Comment on the plot created by trimcibt in terms of achieving accurate probability coverage when using Student’s t. What
10. Verify Eq. (4.5) using the decision rule about whether to reject H0 described in Section 4.4.3.
9. Discuss the relative merits of using the R function sint versus qmjci and hdci.
8. For the exponential distribution, would the sample median be expected to have a relatively high or low standard error? Compare your answer to the estimated standard error obtained with data generated from the exponential distribution.
7. Do the skewness and kurtosis of the exponential distribution suggest that the bootstrapt method will provide a more accurate confidence interval for μt versus the confidence interval given by Eq. (4.3)?
6. If the exponential distribution has variance μ[2] = σ2, then μ[3] = 2σ3 and μ[4] = 9σ4.Determine the skewness and kurtosis. What does this suggest about getting an accurate confidence interval for the mean?
Showing 200 - 300
of 6202
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Last
Step by Step Answers