Help,, solving the following.
18. Exponential censoring. [Information Theory, Inference, and Learning Algorithms by David J. C. Mackay]. Unstable particles are emitted from a source and decay at a distance X ~ exp(A), where A is unknown. Scientists are interested in finding the mean decay distance, given by 1/X. Their equipment is such that decay events can be observed only if they occur in a window extending from r = lem to r = 20cm. (a) Let Z(X) be the probability that an emitted particle decays in the window of detection. Find Z(X) in terms of 1. (b) A decay event is observed at location r. Find the likelihood f(r(A). Hint: This is the probability that an observed decay event occurs at location r, given A. Use (a). (c) Suppose that based on earlier experiments, scientists believe that the mean decay distance 1/A is equally likely to be anywhere between 5cm and 30cm. By transforming random variables, this corresponds to a prior for A of fA(A) = my on [m. ;]. Over the course of a new experiment, 4 decay events are observed at locations {6, 11, 13, 14). Find the posterior odds that the mean decay distance is greater than 10cm (i.e., A S 19). Express your answer as a ratio of two integrals (you do not need to evaluate these integrals; in practice you would hand them to a computer). Exam 2 Practice 2, Spring 2014 9 NHST 19. =-test Suppose we have 49 data points with sample mean 6.25 and sample variance 12. We want to test the following hypotheses Ho: the data is drawn from a N(4, 103) distribution. HA: the data is drawn from N(, 103) where # # 4. (a) Test for significance at the c = 0.05 level. Use the tables at the end of this file to compute p-values. (b) Draw a picture showing the mill pdf, the rejection region and the area used to compute the p-value. 20. t-test Suppose we have 49 data points with sample mean 6.25 and sample variance 36. We want to test the following hypotheses: (a) Ho: the data is drawn from N(4,o'), where o is unknown. Ha: the data is drawn from N(,o') where # # 4. Test for significance at the a = 0.05 level. Use the t-table to find the p value. (b) Draw a picture showing the null pdf, the rejection region and the area used to compute the p-value for part (a). 21. There are lots of good NHST problems in psets 7 and 8 and the reading, including two-sample t test, chi-square, ANOVA, and F-test for equal variance.11. Twins Suppose 1/3 of twins are identical and 2/3 of twins are fraternal. If you are pregnant with twins of the same sex, what is the probability that they are identical? 12. Dice. You have a drawer full of 4, 6, 8, 12 and 20-sided dice. You suspect that they are in proportion 1:2:10:2:1. Your friend picks one at random and rolls it twice getting 5 both times. (a) What is the probability your friend picked the &sided die? (b) (1) What is the probability the next roll will be a 5? (ii) What is the probability the next roll will be a 15? 13. Sameer has two coins: one fair coin and one biased coin which lands heads with probability 3/4. He picks one coin at random (50-50) and flips it repeatedly until he gets a tails. Given that he observes 3 heads before the first tails, find the posterior probability that he picked each coin. (a) What are the prior and posterior odds for the fair coin? (b) What are the prior and posterior predictive probabilities of heads on the next flip? Here prior predictive means prior to considering the data of the first four flips. 6 Bayesian Updating: continuous prior, discrete likelihood 14. Peter and Jerry disagree over whether 18.05 students prefer Bayesian or frequentist statistics. They decide to pick a random sample of 10 students from the class and get Shelby to ask each student which they prefer. They agree to start with a prior f(0) ~ beta(2, 2). where o is the percent that prefer Bayesian. (a) Let z1 be the number of people in the sample who prefer Bayesian statistics. What is the pmf of r1? (b) Compute the posterior distribution of & given s = 6. (c) Use R to compute 50% and 90% probability intervals for 0. Center the intervals so that the leftover probability in both tails is the same. (d) The maximum a posteriori (MAP) estimate of 6 (the peak of the posterior) is given by e = 7/12, leading Jerry to concede that a majority of students are Bayesians. In light of your answer to part (c) does Jerry have a strong case? (e) They decide to get another sample of 10 students and ask Neil to poll them. Write down in detail the expression for the posterior predictive probability that the majority of the second sample prefer Bayesian statistics. The result will be an integral with several terms. Don't bother computing the integral. Exam 2 Practice 2, Spring 2014 7 Bayesian Updating: discrete prior, continuous likelihood 15. Suppose that Alice is always X hours late to class and X is uniformly distributed on [0, 0]. Suppose that a priori, we know that & is either 1/4 or 3/4, both equally likely. If Alice arrives 10 minutes late, what is the most likely value of 6? What if she had arrived 30 minutes late? Bayesian Updating: continuous prior, continuous likeli- hood 16. Suppose that you have a cable whose exact length is 6. You have a ruler with known error normally distributed with mean 0 and variance 10-4. Using this ruler, you measure your cable, and the resulting measurement a is distributed as N(6. 10-4). (a) Suppose your prior on the length of the cable is # ~ N(9, 1). If you then measure r = 10, what is your posterior pdf for 6? (b) With the same prior as in part (a), compute the total number of measurements needed so that the posterior variance of O is less than 10-6. 17. Gamma prior. Customer waiting times (in hours) at a popular restaurant can be modeled as an exponential random variable with parameter A. Suppose that a priori we know that A can take any value in (0, co) and has density function Suppose we observe 5 customers, with waitings times r = 0.23, x2 = 0.80, ry = 0.12,14 = 0.35.rs = 0.5. Compute the posterior density function of A. (Hint: = (a - 1)!(c) It is possible all of these intervals are in- (b) Calculate a confidence interval for the me- correct. For example, # 100 then ev- dian of the second population. Select your ery interval is incorrect. But what is the confidence level and report it with your maximum number of these intervals that answer. can be correct? What values will give this maximum number of correct in-() Suppose that we now learn that the two tervals? (Hint: The answer is not any of samples came from the same population. Thus, the two samples can be combined the four confidence intervals.) into one random sample from the one pop- 39. Homer performs three simulation studies. His ulation. Use this combined sample to ob- population is skewed to the right. For one study tain the 95% confidence interval for the he has his computer generate 10,000 random median of the population. samples of size = 10 from the population, For each random sample, the computer datcodependent random samples are selected from lates the Gosset 95% confidence interval for two populations. Below are the sorted data from and checks to see whether the interval is colle first population. rect. His second study is like his first, but " = 100. Finally, his third study is like the first, 53.2 54.2 54.7 55.3 55.9 56.0 buta = 200. In one of his studies, Homer ob- 56.3 57.0 58.2 58.5 58.7 61.0 tains 9,504 correct intervals; in another he ob- 62.5 62.8 64.4 66.3 67.0 69.0 tains 9,478 correct intervals; and in the remain- ing study he obtains 8,688 correct intervals. Hint:The mean and standard deviation of these Based on what we learned in class, match each mbers are 59.50 and 4.80. sample size to its number of correct intervalBelow are the sorted data from the second pop- Explain your answer. ulation. 40. Independent random samples are selected from 49.2 53.8 56.9 57.8 58.1 two populations. Below are the sorted data from 58.4 62.0 65.4 69.4 the first population. 362 428 476 481 Hint:The mean and standard deviation of these 545 564 585 589 590 600 numbers are 59.00 and 6.00. 671 694 723 724 904 (a) Calculate Gosset's 90% confidence inter- Hint:The mean and standard deviation of these val for the mean of the first population. numbers are 571.1 and 144.7. (b) Calculate a confidence interval for the me- Below are the sorted data from the second pop- dian of the second population. Select your ulation. confidence level and report it with your answer. 387 530 544 547 646 766 786 864 (c) Suppose that we now learn that the two samples came from the same population. Hint:The mean and standard deviation of these Thus, the two samples can be combined numbers are 633.8 and 160.8 into one random sample from the one pop- ulation. Use this combined sample to ob- (a) Calculate Gosset's 90% confidence inter- tain the 95% confidence interval for the val for the mean of the first population. median of the population. 12(d) Given that the jury makes an incorrect de- this information, what is the narrowest in- cision, what is the probability that the de- terval that is known to contain the median? cision is to release a guilty person? (Hint: The answer is not any of the four confidence intervals.) 36. Consider all courtroom trials with a single de(b) Nature announces, "Two of the intervals fendant who is charged with a felony. Suppose are correct, one interval is too small and that you are given the following probabilities for one interval is too large." Given this infor- this situation. mation, what is the narrowest interval that Seventy-five percent of the defendants are, in is known to contain the median? (Hint: fact, guilty. Given that the defendant is guilty, The answer is not any of the four confi- there is a 70 percent chance the jury will con- dence intervals.) vict the person. Given that the defendant is notc) It is possible all of these intervals are in- guilty, there is a 40 percent chance the jury will correct. For example, # 100 then ev- convict the person. ery interval is incorrect. But what is the For simplicity, assume that the only options maximum number of these intervals that available to the jury are: to convict or to release can be correct? What values ofill the defendant. give this maximum number of correct in- tervals? (Hint: The answer is not any of (a) What proportion of the defendants will be the four confidence intervals.) convicted by the jury? 38. Recall that a confidence interred isalf (b) Given that a defendant is convicted, whathe number being estimatedgerthan every is the probability the person is, in factnumber in the confidence interval. Similarly, a guilty? confidence intervals largef the number (c) What is the probability that the jury wipeing estimatedsimalle than every number make a correct decision? in the confidence interval. (d) Given that the jury makes an incorrect detach of four researchers selects a random sam- cision, what is the probability that the deple from the same population. Each researcher cision is to release a guilty person? calculates a confidence interval for the median of the population. The intervals are below. 37. Recall that a confidence interval isalf [14, 31], [20,29], [10, 23], and25, 35]. the number being estimatedgs than every number in the confidence interval. Similarly, fa) Nature announces, 'Two of the intervals confidence intervalas largef the number are correct and two are too large." Given being estimatedsimallethan every number this information, what is the narrowest in- in the confidence interval. terval that is known to contain the median? Each of four researchers selects a random sam- Hint: The answer is not any of the four ple from the same population. Each researcher confidence intervals.) calculates a confidence interval for the media() Nature announces, "Two of the intervals of the population. The intervals are below. are correct, one interval is too small and one interval is too large." Given this infor- [24, 41], [30, 39), [20, 33], and35, 45]. mation, what is the narrowest interval that is known to contain the median? (Hint: (a) Nature announces, "Two of the intervals The answer is not any of the four confi- are correct and two are too small." Given dence intervals.) 11