All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
business
statistical techniques in business
Questions and Answers of
Statistical Techniques in Business
7.18. Outline extensions of Problem 7.16 to multivariate monotone patterns where the factored likelihood method works.
7.17. For the model of Problem 7.16, consider now the reverse monotone missingdata pattern, with Y completely observed but n r values of X missing, and an ignorable mechanism. Does the factored
7.16. (i) Consider the following simple form of the discriminant analysis model for bivariate data with binary X and continuous Y: (a) X is Bernoulli with PrðX ¼ 1Þ ¼ 1 Prðx ¼ 0Þ ¼ p, and
7.15. If data are MAR and the data analyst discards values to yield a data set with all complete-data factors, then are the resultant missing data necessarily MAR? Provide an example to illustrate
7.14. Create a factorization table (see Rubin, 1974) for the data in Example 7.11. State why the estimates produced in Example 7.11 are ML.
7.13. Estimate the parameters of the distribution of X1, X2, X3, and X5 in Example 7.8, pretending X4 is never observed. Would the calculations be more or less work if X3 rather than X4 was never
7.12. Show how to compute partial correlations and multiple correlations using SWP.
7.11. Show that RSW is the inverse operation to SWP.
7.10. Prove that SWP is commutative and conclude that the order in which a set of sweeps is taken is irrelevant algebraically. (However, it can be shown that the order can matter for computational
7.9. Using the computer or otherwise, generate a bivariate normal sample of 20 cases with parameters m1 ¼ m2 ¼ 0, s11 ¼ s12 ¼ 1, s22 ¼ 2, and delete values of Y2 so that Prðy2 missing j y1, y2)
7.8. Show that the factorization of Example 7.1 does not yield distinct parameters ffjg for a bivariate normal sample with means ðm1, m2), correlation r and common variance s2, with missing values
7.7. Show that for the set-up of Problem 7.6, the estimate of b122 obtained by maximizing the complete-data loglikelihood over parameters and missing data is ~ b122 ¼ ^ b122s^ 22=s^*22, where (in
7.6. Compute the large sample variance of ^ b122 in Problem 7.5, and compare with the variance of the complete-case estimate, assuming MCAR.
7.5. For the bivariate normal distribution, express the regression coefficient b122 of Y1 on Y2 in terms of the parameters f in Section 7.2, and hence derive its ML estimate for the data in Example
7.4. Prove the six results on Bayes inference for monotone bivariate normal data after Eq. (7.17) in Section 7.3 (For help, see Chapter 2 of Box and Tiao (1973), and the material in Section 6.1.4.)
7.3. Compare the asymptotic variance of m^ 2 m2 given by (7.13) and (7.14) with the small-sample variance computed in Problem 7.2.
7.2. Assume the data in Example 7.1 are MCAR. By first conditioning on ðy11; ... ; yn1Þ, find the exact small sample variance of m^ 2. (Hint: If u is chisquared on d degrees of freedom, then
7.1. Assume the data in Example 7.1 are MAR. Show that given ðy11; ... ; yn1Þ, ^ b201 and ^ b211 are unbiased for b201 and b211. Hence show that m^ 2 is unbiased for m2.
6.20. The definition of MAR can depend on how the complete data are defined. Suppose that X ¼ ðxi ; ... ; xnÞ is an iid random sample, Z ¼ ðz1; ... ;znÞ are completely unobserved latent
6.19. Describe ML estimates of the parameters in Example 6.24 under (a) the fixed effects model of Eq. (6.56) with fixed parameters ðfmi g, s2Þ and (b) the random effects model of Eqs.
6.18. Suppose that given sets of (possibly overlapping) covariates X1 and X2, yi1 and yi2 are bivariate normal with means xi1b1 and xi2b2, variances s2 1 and s2 2 ¼ 1, and correlation r. The data
6.17. For a bivariate normal sample on ðY1; Y2Þ with parameters y ¼ ðm1, m2, s11, s12, s22) and values of Y2 missing, state for the following missing-data mechanisms whether (i) the data are MAR,
6.16. Find large-sample variance estimates for the two ML estimates in Example 6.22.
6.15. Suppose the following data are a random sample of n ¼ 7 from the Cauchy distribution with median y: Y ¼ ð4:2, 3:2, 2:0, 0.5, 1.5, 1.5, 3.5). Compute and compare 90% intervals for y using
intercept (b0 ¼ 0), a single covariate X, and weight for observation i wi ¼ xi, the ratio estimator y=x is (a) the ML estimate of b1, and (b) the posterior mean of b1 when the prior distribution
6.14. Derive the modifications of the posterior distributions in Eqs. (6.30–6.32) for weighted linear regression, discussed at the end of Example 6.16. Show that for special case of weighted linear
6.13. Derive the posterior distributions in Eqs. (6.30–6.32) for Example 6.16.
6.11. In Example 6.14, show that for large n, LR ¼ t 2. 6.12. Derive the posterior distributions in Eqs. (6.22–6.24) for Example 6.15.
6.10. Show that, for random samples from ‘‘regular’’ distributions (differentials can be passed through the integral), the expected squared score function equals the expected information.
6.9. For the distributions of Problem 6.1, calculate the observed information and the expected information.
6.8. Summarize the theoretical and practical differences between the frequentist and Bayesian interpretation of Approximation 6.1. Which is closer to the direct likelihood interpretation?
6.7. Show, by similar arguments to those in Problem 6.6, that for the model of Eq. (6.9), Varðyijdi; fÞ ¼ fb00ðdiÞ, where di ¼ dðxi; bÞ, and double prime denotes differentiation twice with
6.6. Show that for the GLIM model of Eq. (6.9), Eðyij xi; bÞ ¼ b0 ½dðxi; bÞ, where prime denotes differentiation with respect to the function argument. Conclude that the canonical link Eq.
6.5. Suppose the data are a random sample of size n from the uniform distribution between 0 and y; y > 0. Show that the ML estimate of y is the largest data value. (Hint: differentiation of the score
6.4. (a) Relate ML and least squares estimates for the model of Example 6.10. (b) Show that if the data are iid with the Laplace (double exponential) distribution, f ðyi jyÞ ¼ 0:5 expðj yi
6.3. For a univariate normal sample, find the ML estimate of the coefficient of variation, s=m.
6.2. Find the score function for the distributions in Problem 6.1. Which have closed-form ML estimates? Find the ML estimates for those distributions that have closed-form estimates.
6.1. Write the likelihood function for an iid sample from the (a) beta distribution; (b) Poisson distribution; (c) Cauchy distribution.
5.14. Is MI better than single imputation of a draw from the predictive distribution of the missing values (SI) because (i) It yields more efficient estimates from the filled-in data? (ii) Unlike SI,
5.13. Is multiple imputation (MI) better than imputation of a mean from the conditional distribution of the missing value because (i) It yields more efficient estimates from the filled-in data? (ii)
5.12. (a) Modify the multiple imputation approach of Problem 5.11 to give the correct inferences for large r and N=r. Hint: For example, add sRr1=2zd to the imputed values for the dth
5.11. Suppose multiple imputations are created using the method of Problem 5.10 D times, and let y ðdÞ and UðdÞ be the values of y and U for the dth imputed data set. Let y ¼ PðDÞ
5.10. Suppose in Problem 5.9, imputations are randomly drawn with replacement from the r respondents’ values. (a) Show that y is unbiased for the population mean Y . (b) Show that conditional on
5.9. Consider a simple random sample of size n with r respondents and m ¼ n r nonrespondents, and let yR and s2 R be the sample mean and variance of the respondents’ data, and yNR and s2 NR
5.6. Repeat Problem 5.4 or 5.5 for D ¼ 2, 5, 10, 20, and 50 multiple imputes, and compare answers. For what value of D does the inference stabilize? PROBLEMS 91 5.7. Repeat Problem 5.4 or 5.5 using
5.5. As discussed in Section 5.4, the imputation method in Problem 5.4 is improper, since it does not propagate the uncertainty in the regression parameter estimates. One way of making it proper is
5.4. For the data in Problem 5.3, create 10 multiply-imputed data sets with different sets of conditional draws of the parameters, using the method of Problem 5.3. Compute 90% confidence intervals
5.3. Repeat Problem 5.2, with the same observed data, but with missing values imputed using conditional draws rather than conditional means. That is, add a random normal deviate with mean zero and
5.2. Create missing values of Y2 for the data in Problem 5.1 by generating a latent variable U with values ui ¼ 2*ðyi1 1Þ þ zi3, where zi3 is a standard normal deviate, and setting yi2 as
5.1. As in Problem 1.6, generate 100 bivariate normal observations fðyi1; yi2Þ, i ¼ 1; ... ; 100g on ðY1; Y2Þ as follows: yi1 ¼ 1 þ zi1 yi2 ¼ 5 þ 2*zi1 þ zi2; 90 ESTIMATION OF IMPUTATION
4.15. For the artificial data sets generated for Problem 1.6, compute and compare estimates of the mean and variance of Y, from the following methods: (a) Complete-case analysis; (b) Buck's method,
4.14. Outline a situation where the "Last Observation Carried Forward" method of Example 4.11 gives poor estimates. (See, for example, Little and Yau, 1996).
4.13. Which of the metrics in Example 4.9 give the best imputations for a particular outcome Y? Propose an extension of the predictive mean matching metric to handle a set of missing outcomes Y.....
4.12. Another method for generating imputations is the sequential hot deck, where responding and nonresponding units are treated in a sequence, and a missing value of Y is replaced by the nearest
4.11. Consider a hot deck like that of Example 4.7, except that imputations are by random sampling of donors without replacement. To define the procedure when there are fewer donors than recipients,
4.10. Derive the expressions (4.6)-(4.8) for the simple hot deck where imputations are by simple random sampling with replacement. Show that the proportionate variance increase of YHDI over y is at
4.9. Derive the expressions for large-sample variance of the Cmean and Cdraw estimates of , in the discussion of Example 4.5.
4.8. Derive the expressions for large-sample bias in Table 4.1.
4.7. Buck's method (Example 4.3) might be applied to data with both continuous and categorical variables, by replacing the categorical variables by a set of dummy variables, numbering one less than
4.6. Show that Buck's (1960) method yields consistent estimates of the means when the data are MCAR and the distribution of the variables has finite fourth moments.
4.5. Suppose data are an incomplete random sample on Y1 and Y2, where Y1 given y ¼ ðm1, s11, b2013, b2113, b2313, s2213) is Nðm1, s11) and Y2 given Y1 and y is Nðb2013 þ b2113Y1 þ
4.4. Derive the expressions for the biases of Buck’s (1960) estimators of sjj and sjk , stated in Example 4.3.
4.3, and compare the answers. 4.3. Describe the circumstances where Buck’s (1960) method clearly dominates both complete-case and available-case analysis.
4.2. Assuming MCAR, determine the percentage bias of estimates of the following quantities computed from the filled-in data: (a) the variance of Y1 ðs11Þ; (b) the covariance of Y1 and Y2 ðs12Þ;
4.1. Consider a bivariate sample with n ¼ 45, r ¼ 20 complete cases, 15 cases with only Y1 recorded, and 10 cases with only Y2 recorded. The data are filled in using unconditional means, as in
3.18. Review the results of Haitovsky (1968), Kim and Curry (1977), and Azen and Van Guilder (1981). Describe situations where complete-cases analysis is more sensible than available-case analysis,
3.17. Consider the relative merits of complete-case analysis and available-case analysis for estimating (a) means, (b) correlations, and (c) regression coefficients when the data are not MCAR.
3.15. Construct a data set where the estimated correlation (3.18) lies outside the range (1, 1). 3.16. (a) Why does the estimated correlation (3.19) always lie in the range (1, 1)? (b) Suppose the
3.13, compute the odds ratio of response rates discussed in Problem 3.12. Repeat the computation with the respondent counts 5 and 8 in the second row of (b) in Problem 3.13 interchanged. By comparing
3.14. For the data in Problem
3.13. Compute raked estimates of the class counts from the sample counts and respondent counts in (a) and (b) below, using population marginal counts in (c):(a) sample fnjlg (b) respondent frjlg (c)
3.12. Show that raking the class sample sizes and raking the class respondent sample sizes (as in the previous example) yields the same answer if and only if pijpkl=ðpilpjk Þ ¼ 1; for all i; j; k
3.11. Oh and Scheuren (1983) propose an alternative to the raked estimate yrake in Section 3.3.3, where the estimated counts N*jl are found by raking the respondent sample sizes frjlg instead of
3.10. Generalize the response propensity method in Example 3.7 to a monotone pattern of missing data. (See Little, 1986; Robins, Rotnitsky and Zhao 1995).
3.9. The following table shows respondent means of an incomplete variable Y (say, income in $1000), and response rates (respondent sample size=sample size), classified by three fully observed
3.8. Apply the Cassell, Sarndal and Wretman (1983) estimator discussed in Example 3.7 to the data of Problem 3.7. Comment on the resulting weights as compared with those of the weighting class
3.7. Calculate Horvitz–Thompson and weighting class estimators in the following artificial example of a stratified random sample, where the xi and yi values displayed are observed, the selection
3.6. Suppose census data yield the following age distribution for the county of interest in problems 3.4 and 3.5: 20–30: 20%; 30–40: 40%; 40–50: 30%; 50– 60: 10%. Calculate the
3.5 Compute the weighting class estimate (3.6) of the mean cholesterol level in the population and its estimated mean squared error (3.7). Hence construct an approximate 95% confidence interval for
3.4. Compute the mean cholesterol for the respondent sample and its standard error. Assuming normality, compute a 95% confidence interval for the mean cholesterol for respondents in the county. Can
3.3. Show that for dichotomous Y1 and Y2, the odds ratio based on complete cases is a consistent estimate of the population odds ratio if the log probability of response logarithm of the probability
3.2. Show that if missingness (of Y1 or Y2) depends only on Y2, and Y1 has a linear regression on Y2, then the sample regression of Y1 on Y2 based on complete cases yields unbiased estimates of the
3.1. List some standard multivariate statistical analyses that are based on the sample means, variances, and correlations.
Squarea Column Row1 2 3 4 5 1 B: — E: 230 A: 279 C: 287 D: 202 2 D: 245 A: 283 E: 245 B: 280 C: 260 3 E: 182 B: — C: 280 D: 246 A: 250 4 A: — C: 204 D: 227 E: 193 B: 259 5 C: 231 D: 271 B: 266
2.13. Carry out a standard ANOVA for the following data, where three values have been deleted from a (5 5) Latin square (Snedecor and Cochran, 1967, p. 313). Yields (Grams) of Plots of Millet
2.12. Carry out the computations leading to the results of Example 2.3.
2.11. Carry out the computations leading to the results of Example 2.2.
2.10. Show Eq. (2.22) and then Eq. (2.23) and (2.24).
2.9. Justify Eqs. (2.17)–(2.20).
2.8. Carry out the computations leading to the results of Example 2.1.
2.7. Using the notation and results of Section 2.5.4, justify Eq. (2.16) and the method for calculating B and r that follows it.
2.6. Provide intermediate steps leading to Eqs. (2.13), (2.14), and (2.15).
2.5. Prove that Eq. (2.12) follows from the definition of U1.
2.4. Summarize the argument that Bartlett’s ANCOVA method leads to correct least squares estimates of missing values.
2.3. Outline the distributional results leading to Eq. (2.6) being distributed as F.
2.2. Prove that b^ in Eq. (2.2) is (a) the least squares estimate ofb, (b) the minimum variance unbiased estimate, and (c) the maximum likelihood estimate under normality. Which of these properties
2.1. Review the literature on missing values in ANOVA from Allan and Wishart (1930) through Dodge (1985).
1.6. One way to understand missing data mechanisms is to generate hypothetical complete data and then create missing values by specific mechanisms. This is common in simulation studies of
1.5. Let Y ¼ ðyijÞ be a data matrix and let M ¼ ðmijÞ be the corresponding missingdata indicator matrix, where mij ¼ 1 indicates missing and mij ¼ 0 indicates present. (a) Propose situations
1.4. What impact does the occurrence of missing values have on (a) estimates and (b) tests and confidence intervals for the analyses in Problem 1.2? For example, are estimates consistent for
Showing 900 - 1000
of 5757
First
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Last