Problem 5 of Chapter 2 considers haplotype frequency estimation for two linked, biallelic loci. The EM algorithm

Question:

Problem 5 of Chapter 2 considers haplotype frequency estimation for two linked, biallelic loci. The EM algorithm discussed there relies on the allele-counting estimates pA, pa, pB, and pb.

(a) Construct the Dirichlet prior from these estimates mentioned in Section 3.8 and devise an EM algorithm that maximizes the product of the prior and the likelihood of the observed data. In particular, show that the EM update for pAB is pm+1,AB = 2nAABB + nAABb + nAaBB + nmAB/ab + βAB 2n + β

nmAB/ab = nAaBb 2pmABpmab 2pmABpmab + 2pmAbpmaB

, where βAB = αAB −1 and β = α − 4.

There are similar updates for pAb, paB, and pab. (Hint: The log prior passes untouched through the conditional expectation of the E step of the EM algorithm.)

(b) Implement this EM algorithm on the mosquito data given in Table 2.5 of Chapter 2 for the value α − 4 = 10 and starting from the estimated linkage equilibrium frequencies. You should find that ˆpAB = .717, ˆpAb = .083, ˆpaB = .121, and ˆpab = .079.

(c) Describe how you would generalize the algorithm to more than two loci and more than two alleles per locus.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: