In Exercise 34.8 we saw a tactic that researchers can use to encourage honest answers to sensitive

Question:

In Exercise 34.8 we saw a tactic that researchers can use to encourage honest answers to sensitive questions, for example when investigating drug use among teenagers. The answer of each respondent is known, but not the question to which they are responding. Defining X to be the number of "yes" answers, we found that where n is the number of respondents, and p is the rate of illegal drug use that we want to estimate. To simplify notation, define s = p + 1 3 .

a. Using the formula for the MLE of a binomial parameter, write down the maximum likelihood estimator bs of s.

b. Write down an expression for Var(bs) in terms of the unknown value s.
Because p is a simple function of s, the value of p that maximises the likelihood is easily found from the value of s that maximises the likelihood.

c. Let pb be the maximum likelihood estimator of p. Using your answers to

(a) and (b), write down pbin terms of bs.
Hence also write down Var(pb) in terms of Var(bs), and sd(pb)
in terms of sd(bs).

d. Given that X = 216 out of n =600 pupils answer "yes" to the question on their card, find bs and se(bs), and hence find pband se(pb). Use these results to specify an approximate 95% confidence interval for p.
If you get a confidence limit < 0 or > 1 for a parameter like p that can only take values between 0 and 1, just quote the result at the corresponding boundary, 0 or 1.

e. Comment on your confidence interval for p. Is pba precise estimator?

f. You should have found in part

(d) that pb= 0.08. Imagine that, instead of the indirect sampling performed here, there is some way of asking students directly whether they are using illegal drugs, and getting honest answers. Suppose you ask the same 600 students and receive 48 "yes" responses, so your estimate of p remains at 0.08. What is the approximate 95% confidence interval under the direct sampling scheme?
g. Comment on the difference in precision between the estimators under the indirect sampling scheme, and the direct sampling scheme. Why do you think this happens? Which sampling scheme would you prefer to use, assuming both of them could be implemented effectively?

Fantastic news! We've Found the answer you've been seeking!