This exercise is based on results in McNamee (2003) on the use of two-phase sampling to estimate

Question:

This exercise is based on results in McNamee (2003) on the use of two-phase sampling to estimate disease prevalence. An inexpensive, but possibly inaccurate, screening test for the disease is given in the phase I sample, an SRS of size n (1). Let xi = 1 if person i tests positive on the screening test and xi = 0 if person i tests negative on the screening test. Persons are then classified into stratum 1 (xi = 0) and stratum 2 (xi = 1). The persons sampled in phase II are given a test for the presence of the disease that, for purposes of this exercise, is assumed to be 100% accurate: The phase II response is yi = 1 if person i has the disease and 0 otherwise. We can write the population values in a contingency table:

We wish to estimate p = U = C2+/N from the two-phase sample; p1 = C21/N1 and p2 = C22/N2 are the proportions with the disease in strata 1 and 2, respectively. a. Epidemiologists often use the concepts of specificity and sensitivity to assess a test for a disease, with S1 = Specificity = P (test is negative | disease absent) = C11 / C1+ and S2 = Sensitivity = P (test is positive | disease present) = C22 / C2+.
Show that

b. Suppose that the optimal allocation is used (see Section 12.5.1) and that 0

Where R is the population Pearson correlation coefficient between x and y, given in (4.1). For the second term, first show that RSy = p (S2 − W2)/√W1W2.
c. Calculate the ratio of variances in (b) when S1 = S2 and R = min {S1 + S2 −
0.9, 0.95}, for S1 ∈ {0.5, 0.6, 0.7, 0.8, 0.9, 0.95} and c (1) / c (2) ∈ {0.0001, 0.01, 0.1, 0.5,1}. Display your results in a table. For which settings would you recommend two-phase sampling to estimate disease prevalence?

Fantastic news! We've Found the answer you've been seeking!