The tools of this class are useful to computer scientists, but many of them are useful beyond
Question:
The tools of this class are useful to computer scientists, but many of them are useful beyond just "classic" computer science. In this assignment you'll consider an application of Bayes' Rule in the real-world. We will consider the use of DNA evidence in criminal trials. A full discussion of DNA evidence would require a discussion of many issues2 - for this assignment, we are going to limit ourselves to just how information about DNA tests should be communicated to juries. This assignment is a mix of technical tasks (appropriately applying theorems) and non-technical ones (considering tradeoffs between various real-world effects and groups). The technical aspects can be "right" or "wrong", but the non-technical aspects are unlikely to be simply "right" or "wrong" - we won't have to agree with the non-technical aspects of your analysis to consider them a good analysis. Our evaluation will be based on how well they connect to the technical aspects, as well as the depth of reasoning demonstrated. 7.1. Bayes in Court DNA evidence has been used in court cases for decades. Over time some common patterns of (dubious) argumentation have emerged, which you'll analyze in this problem. Consider the following scenario: A crime is committed somewhere in Seattle. No witnesses were at the crime, but there was blood left at the scene which had DNA extracted from it. The DNA was run against the 13 million DNA samples on file with the FBI. There was one match: a person who lived in Tacoma at the time of the crime. You know the following facts about the DNA test that was run: The false positive rate of the test is only 1/10,000,000 . The false negative rate of the test is 1 100,000,000 . 7.1.1. The Prosecution The prosecutor argues as follows The DNA match with the blood on the scene is strong. There is only a 1 / 10,000,000 chance that the defendant is innocent (after all, the test only has a 1/10,000,000 rate of failure) - certainly not a reasonable amount of doubt. You must vote to convict. Let T be the event of a positive test, and G be the event that the defendant is guilty. (a) In terms of G and T What probability or conditional probability is the prosecutor describing with their phrase "the chance the defendant is innocent, knowing about the test"? [1 point] (b) What probability or conditional probability does the 1/ 10,000,000 come from? [1 point] (c) Describe the prosecutor's error concisely (2-3 sentences). [2 points] 2Among others: under what circumstances DNA samples be taken from people and/or stored in databases. 3For example, trying to calculate a probability and getting 1.2 for an answer would involve a technical mistake. Saying "Witnesses shouldn't say the DNA evidence is reliable, because I saw an episode of CSI where it wasn't reliable." is not good reasoning for this assignment because it does not connect to the technical aspects of the problem. Saying "DNA evidence should be allowed as long as the Bayes factor is at least 100" relates to technical aspects and is considered good analysis whether or not we agree with you on "Bayes factor at least 100" being the right place to draw the line between allowable or not. 3 7.1.2. The Defense The defense attorney argues as follows: The test isn't as good as it sounds. If we ran the test on all 330,000,000 people in the country, we'd have 33 innocent people come up with positive tests. The true probability of my client being guilty is only about 1/34. Recreate the Bayes' Rule application that the defense attorney is using (a) What prior is being used? Recall the "prior" is the probability you're hoping to analyze prior to running the test. Your answer here should include both a number and where it came from. [2 points] (b) Now use Bayes' rule to confirm that (starting from that prior), the calculation is correct. [2 points] 7.2. Your Analysis From the information given in the problem, what is your estimate of the probability the defendant is guilty?4 You might want to incorporate the following information: The 13 million DNA samples in the database are not from a random section of the population, but they do come from people across the whole U.S. The Seattle metro area has about 4 million people. (a) Do a Bayes Rule calculation to give your estimate of the guilt of the defendant. What is your prior? Briefly explain where it comes from. [3 points] (b) Name at least one limitation of your estimate (something you haven't accounted for that you would have liked to, or more information you would have liked about the scenario)? (2-3 sentences) [2 points] 7.3. Make Another Argument In this part, you'll use an application of Bayes Rule to make an argument about whatever real-world scenario you would like. Your scenario can be close-to-home (say something about an RSO you're involved in), a political issue, or anything else, as long as it's based in the "real-world"5 If you can't think of a new real-world scenario, you might want to continue with one of the ones from Lecture 9 (we'll also have some suggestions in the next section) . A sample argument is given in Section 7.4 for your reference. (a) Define events A and B on which you'll apply Bayes' Rule (along with any other events you need). [2 points] (b) State probabilities (or probability estimates) for three of the four quantities you need to use Bayes rule. For those estimates, either cite a source for the numbers that you think is reliable or give a justification for your estimate. [6 points] (c) What is your takeaway from this calculation? [2 points] 4Since this is your estimate, there are multiple answers possible! We aren't grading whether we get the same answer, we're grading whether you have a correct application of reasonable assumptions. You can use either (or both) of the bullets above. If you use neither bullet, you must incorporate some other information and have something different from either of the analyses in the last subsection. 5We will be quite lenient about what counts as real world - the hope is that you will pick something you care about. If it's just the probability that the second and third card of a deck of cards are the same value, it's probably not "real-world." But if you're an avid poker player, and you want to use Bayes' Rule to analyze a particular game scenario, that would definitely count. 4 (d) Discuss at least one limitation of your calculation/application (e.g. factors that didn't go into your estimates, or assumptions you are making that might not be correct). [2 points] 7.4. Sample Solution and Ideas Since we haven't asked you to do tasks exactly like these before, here is a sample of the kind of answers we'd be expecting for an application. When doing research, scientists often use statistical significance testing. In that framework one writes a hypothesis ("smoking causes cancer"), and then asks for the p-value: the probability of seeing the data in the study, if the hypothesis is false. p = 0.05 (or less) is usually taken as statistically significant. (a) Let H be the event "the hypothesis is true" and D be the event "we saw data like this in an experiment" (b) We'll analyze a statistically significant experiment, so P D|H = 0.05. We'll consider an experiment where the result would be surprising - one where before running the experiment, P(H) = .2. Furthermore, we'll suppose the data show only a weak effect, so P(D|H) = .5. Applying Bayes Rule: P(H|D) = P(D|H) P(H) P(D) = .5 .2 .5 .2 + .05 .8 .714 (c) The chances of the hypothesis in the paper being true are pretty good - but not nearly the 95% one might imagine if you misinterpret the meaning of the p-value. (d) Estimating the probability of a hyptohesis being true without experimenting would be quite difficult in the real-world. With a lower starting value, the probability of accuracy would drop; one should perhaps be careful when reading papers (even ones with statistically significant results). Particularly when they have a surprising claim. Some Ideas You're encouraged to pick your own example(s), but here are some you might think about: We saw in class that routine medical tests can lead to false positives. See if there's a difference between taking a covid test when symptomatic vs. when testing a randomly selected (not-necessarily-symptomatic) individual. You could also think about confirmatory testing (after taking an at-home test, with lower accuracy, taking a more accurate but slower PCR test), if the tests disagree which should you believe? How hard should Captchas and other "I'm not a robot" tests be to stop the robots from random guessing, but allow through fallible humans?