Question

1 Approved Answer

Posted on Oct 10, 2024

1 2.3 Definition of a Discrete Probability Function Definition: Let S be a discrete sample space from some experiment. A function P, defined on all

1 2.3 Definition of a Discrete Probability Function Definition: Let S be a discrete sample space from some experiment. A function P, defined on all events in S, is said to be a probability function if it satisfies the following: 1. For any event E, 2. 0 P(E ) P( S) 1 3. If , where the sets E UAi Ai are mutually exclusive, then P(E ) P(Ai ) i 1 i 1 The pair ?? = (S, P) is said to be a probability space. Therefore, a probability space is a sample space along with a probability function. Note that this time our probability function is nothing more than a function on a sample space. The probability function does not represent the \"chance\" of an event. We will soon be looking at functions P that do represent our perception of chance. It should be noted that in some texts, prior to defining a probability function, we would define the Event Space to be the collection of all possible subsets of S (often called the power set of S and denoted F). We then define our probability function on the event space F. Then our probability space would be the triple ?? = (S, F, P). Example: Suppose that we roll a 6-sided die. Then S 1,2,3,4,5,6 . We now look at some possible choices for our probability function. s 1 2 3 4 5 6 P1(s) .2 .2 .2 .2 .2 .2 P2(s) .1 .1 .1 .1 .1 .1 P3(s) .2 .1 -.1 .2 .3 .3 P4(s) .1 .1 .1 .1 .1 .5 P5(s) 1/6 1/6 1/6 1/6 1/6 1/6 For which of P1 through P5 are probability functions? P6(s) .1 .1 .2 .2 .25 c P7(s) .1 .1 .1 .1 c .85 - 2c (Side Note) There are possible events 26 64 in S. That is, Card(F)=64 2 If P5 1,2 .4 , is P5 still a probability function? What value of c makes P6 a probability function? What value of c makes P7 a probability function? Theorem: For any event E, Proof: S E EC . Since E P(E ) 1 and EC . are disjoint, we have 1 P(S) P(E ) P(E C ) . This gives P(E ) 1 P(E C ) P(E ) 1 Often times books have the first statement of the definition of a probability function written as 1. For any event E, 0 P(E ) 1 . The later part of the inequality is unnecessary in our definition due to this simple theorem. Theorem: For any event E, P(E C ) 1 P(E ) . Proof: The proof is seen in part of the above proof. 2.4 Some Rules of Probability We have already seen our first rule (non-definition) of probability: For any event E, 0 P(E ) 1 Example: Suppose that we accept that when drawing a single card from a deck of cards that the probability that we get an Ace is . Then the probability of not getting an Ace is P(Ace) 4 / 52 P(Not an Ace) 48 / 52 . . 3 Also, based on the definition of a probability function, we know that if sets A and B are disjoint, then . P(A B) P( A) P(B) Example: Suppose that we accept that when drawing a single card from a deck of cards that the probability that we get an Ace is and the same is true for a King P(Ace) 4 / 52 the probability of getting an Ace or a King is Is it always true that P(A B) P(A) P(B) P(King) 4 / 52 P( Ace King) 4 / 52 4 / 52 8 / 52 . Then . ? That is not part of our definition of a probability function, but maybe it is always true and we should say so in a theorem. Let's consider some examples. Example: We will be drawing a card from a deck of cards. Let A be the set of Red cards and let B be the set of Diamonds. We will assign and . Since B is a subset of A, it should be clear that P(A) .5 P(A B) P(A) would have . We already know that P(B) .25 P( A) .5 P(A B) .5 .25 .75 .5 . So if P(A B) P(A) P(B) were always true, we . Learning to do this sort of investigative thinking will serve you well in this or any mathematics type course. Example: Let the sets A and B both be S (the sample space). P ( S S) P ( S) P ( S) 1 1 2 contradicts our theorem that states all probabilities are less than or equal to 1. Clearly, we need to work on P( A B) when A and B are not disjoint. We now define the set A-B to be those outcomes in A that are not in B and similarly for the set B-A. We can now write the union as three disjoint sets as follows: A B (A B) (A B) (B A) This gives . This 4 P( A B) P( A B) P(A B) P(B A) P(A B) P(A B) P(B A) P(A B) P( A B) (Since P(A) P(B) P(A B) A (A B) ( A B) (Adding 0 is a beautiful thing) and the last 2 sets are disjoint.) Now we have our general rule for the probability of the union of two events. Theorem: P(A B) P(A) P(B) P(A B) Example: If , P( A) .4 P(B) .5 and P( A B) .1 , determine P( A B) . P(A B) .4 .5 .1 Example: If , P( A) .25 P(B) .37 and P( A B) .12 , determine P( A B) . At this point in time, some of you may be considering a new rule for intersections based on things from the past that are partially remembered: P(A B) P(A) P(B) Is this statement true? Do some investigative thinking! Theorem: If events A and B are disjoint, P(A B) 0 5 Proof: P(A B) P(A) P(B) P(A B) Also by Rule 3, we have by the theorem above. P( A B) P(A) P(B) . We see that P( A B) is equal to two different quantities and we will set those two quantities equal to each other. P(A) P(B) P(A B) P(A) P(B) P(A B) 0 Theorem: AB Proof: Since Substituting , then AB B for P(A B) P(A) , the set AB B P( A B) . . Using our rule for union: P(A B) P(A) P(B) P(A B) . gives P(B) P(A) P(B) P(A B) 0 P(A) P(A B) P(A) P(A B) An even simpler proof: Since AB , the event A B A P(A B) P(A) So, when it comes to the intersection of two events, Theorem: P(A B) P(A) P(B) P(A B) . 0 P(A B) min{P( A), P(B)} . This is an algebraic manipulation of our union theorem. It seems clear that we will want to revisit the probability of the intersection of two events later. 2.5 Probability Viewed as a Relative Frequency While the definition of a probability function has been defined mathematically as a function that follows a set of rules, we seek to put some practical meaning to probability. This might make studying probability more interesting. It would allow us to use this field of study to get a better understanding of real-world situations. 6 Suppose that we have a finite sample space with n-outcomes. We could make the following assignment of probability for each outcome in S: 1 P Oi n for i 1,2,..., n . This is clearly a probability function on our sample space and this function assigns an equal probability to each outcome in S. Suppose that we roll a single die. As noted earlier and now S {1,2,3,4,5,6} P(Oi ) 1 / 6 . If we now think of probability as the \"chance\" that a specific outcome is selected, we might say that each outcome is \"equally likely\" to be selected. Based on our lifetime of experience of referencing the words chance, probability and likelihood, we should start feeling more comfortable about this recently defined function called a probability function. Note that a sample space does not to have a finite number of outcomes, but it does if we want to apply the equally likely concept to our probability assessment (assignment). Additionally, just because a sample space has a finite number of outcomes and it is legal to assign , it may be an unwise assignment if equally likely goes P(Oi ) 1 / n against our thought process of chance. What sample space have we seen before that we would not want to assign equal probability to all outcomes? (Even though it would be legal - it goes against our concept of chance.) Suppose that we have a standard deck of 52 cards. If we are going to select a card from the deck, we belief that each card has an equal chance of being selected. Therefore, a reasonable assignment of probability to each card would be 1/52. Let A be the event that a heart is selected. Since each card is disjoint from every other card, we would determine the probability of our event as follows: P( A) Cardinality(A) 52 To finish the P(A) determination, we would need to count the number of outcomes in A. There are 13 hearts in the deck, so P( A) P(Heart ) 13 / 52 . 7 Example: Determine Example: Determine Example: Determine P(King) P(Red Card) P(Red Five) Let's consider some more complicated experiments. Let's consider experiments that contain two or more actions. Examples: Draw two cards from a deck. Draw three cards from a deck. Toss a coin five times. Make a license plate that consists of 3 letters followed by 4 digits. Choose 5 balls from bin of 54 balls numbered 1 through 54 without replacing a ball after it has been selected. Choose 4 balls from bin of 10 balls numbered 0 through 9, where we will replace each ball after it has been selected. In these experiment examples above it seems reasonable that the equally likely probability assignment to each outcome in the sample space is a good probability model (assignment). Then, for each outcome in S, would be our desired assignment. Of course, to give this assignment an actual value P(Oi ) 1 / n for this outcome probability, we would need to count the number of outcomes in S. Additionally, to determine the probability of an event E, we would need to count the number of outcomes in E. 2.6 Counting Techniques A coin is tossed 3 times. We can draw a Tree Diagram to help determine what the sample space will contain. At each point, prior to a toss, we have two possibilities - heads and tails. So, on our first toss we have 2 possibilities. From each of those two possibilities, we have two possibilities for a total of 4 after two tosses. From each of those 4, we have two possibilities for a new total of 8 possibilities. Our conclusion here would be that our sample space would have 8 outcomes. Example: Consider rolling a die two times. There would be 6 branches to the right of our initial node. From each of these 6 nodes, we would have 6 secondary branches for a total of 36 final possibilities. Our sample space would have 36 outcomes. 8 Example: Draw two cards from a deck. We would have 52 branches from the initial node. From each of those 52 branches we would have 51 secondary branches for a total of final nodes. Our (52)(51) 2652 sample space would contain 2652 outcomes. Sometimes we do not care about the order the cards come. In such a case, the number of distinct outcomes when order does not matter would be different than 2652. We will consider that situation soon. Theorem: Suppose that we have a two-stage experiment. In stage 1, we have (branches) for the experiment. For each of those choices, we have number of outcomes in our sample space is possible choices possible choices (branches). The Card (S) a1 a2 Theorem: If we have an k-stage experiment with our sample space is a2 a1 ai choices for the i th stage, the number of outcomes in Card (S) i 1 ai k Example: Make a license plate that consists of 3 letters followed by 4 digits. Determine Card (S) . Example: Choose 5 balls from bin of 54 balls numbered 1 through 54 without replacing a ball after it has been selected. Determine . (If order does not matter, the sample space size would be reduced.) Card (S ) Suppose that we have we have n-objects to choose from that we are to place in r-slots and that the order of the objects in the slots is important. By that we mean 1, 4, 7 is different from 4, 1, 7. 9 Theorem: The number of ordered arrangements of n-objects placed into r-slots is equal to (Ordered arrangements are often called permutations and are denoted n Pr n! (n r )! . .) Proof: Using our previous theorem, there are n possibilities for the first position, (n-1) possibilities for the second position, down to (n-r+1) for the last (r th position). Using the multiplication rule we have P n(n 1)(n 2)L (n r 1) n r n! (n r )! Example: How many ways can we place 10 objects into 4 slots? It is easier to use our brains than the formula. The answer is . (10)(9)(8)(7) 5,040 Suppose that the order of our n-objects placed into our r-slots does not matter. That is, we see 1, 4, 7 as the same as 4, 1, 7. The number of possibilities now is much less than the number of possibilities if order is important. When order is irrelevant, we call the number of possibilities combinations. The number of combinations is denoted . n Theorem: n Cr Cr n! r !(n r )! Proof: If order matters, the number of possibilities is n! (n r )! . On our current case, order does not matter, so we overcounted the number of possibilities. Fortunately, each combination has been overcounted the same number of times. To determine the value of , we need to divide by the n Cr P n r number of times each combination appears in the collection of permutations. The number of ways we can pace our r-objects in a combination can fit into our r-slots is . This gives r (r 1)(r 2)L (2)(1) r ! our desired result. 10 Often, we invent notation to ease the task of writing things out. Our notation for the important counting quantity that we used above is: n n! r !(n r )! r Example: How many combinations of 5-card hands are there in a deck of cards? 52 52 52! (52)(51)(50)(49)(48) C5 2,598,960 5! 5 5!47! Of course, we can solve the problem very quickly without use of formulas. We have 52 items that can be placed into 5 slots. There are 52 51 50 49 48 ways to put the 52 cards in the 5 slots. Once the cards are picked, there are 5 4 3 2 1 ways of arranging these exact 5 cards in the 5 slots. Same answer without a formula. Example: How many 3 card hands are there? We are about to see more situations that counting is necessary. What can be challenging to students is to figure out what to use when. This most often happens when we choose to memorize instead of thing. Thus, we prefer to think about the situation and rely less on memory. If things were not so similar, yet still different, we might consider memorizing. Remember: Memorizers Lose and Thinkers Win Please memorize the above statement. Here are more ideas in counting. Putting n chips into n slots: 11 We will start with n = 4 chips with 2 types of chips (red and black) and 2 of each type.Suppose that we have two red chips numbered 1 and 2 and two black chips numbered 1 and 2. How many ways can we arrange the four chips in four slots? The answer depends. Can see the numbers on the chips? A student in our class, Mike has special glasses that allow him to see the numbers on the chips. We do not have the special glasses, so we can only see the color of the chips. Since we cannot see the numbers. We say that there are only possible Mike can see the numbers. Mike sees that there are possible ways of 4! 6 2!2! 4! 24 putting the 4 chips in the 4 slots. R1R2B1B2 R1B1R2B2 R1B1B2R2 R1R2B2B1 R1B2R2B1 R1B2B1R2 R2R1B1B2 R2R1B2B1 R2B1R1B2 R2B2R1B1 R2B1B2R1 R2B2B1R1 B1R1R2B2 B2R1R2B1 B1R1B2R2 B2R1B1R2 B1B2R1R2 B1B2R2R1 B1R1R2B2 B1R2B2R1 B2B1R1R2 B2R1R2B1 B2R2B1R1 B2B1R2R1 If there were 3 red chips ( R1 , R2 , and R3 arrangements (patterns) of these four chips. We think that Mike counted every arrangement (pattern) 4 times. RRBB RBRB RBBR BRRB BRBR BBRR ) and 2 black chips ( B1 and B2 ), Mike would see 5! 120 possible ways of putting the 5 chips in 5 slots. Since we cannot see the numbers on the chips we would say there are possible arrangements of these 5 chips. We think that Mike counted every 5! 10 3!2! arrangement 12 times. If we \"boxed\" up all the possibilities like we did above, there would be 10 boxes with 12 items each. A typical box would be: R1R2R3B1B2 , R1R2R3B2B1 , R1R3R2B1B2 , R1R3R2B2B1 R2R1R3B1B2 , R2R1R3B2B1 , R2R3R1B1B2 , R2R3R1B2B1 R3R1R2B1B2 , R3R1R2B2B1 , R3R2R1B1B2 , R3R1R1B2B1 , These 12 arrangements that Mike sees are all to us. Therefore, we would only see RRRBB 1/12 as many arrangements as Mike. 12 Generalizing what we see above: If there were n chips with r red chips and n-r black chips we would conclude that there are different ways to arrange the r red and n-r black chips in the n slots. n! r !(n r )! We note that this is the same answer to the different question of placing n-objects into r-slots when order does not matter. Generalizing to more than 2 groups (colors) gives us the following: Suppose that we have k different colors of chips. Suppose that there are n1 of the first color, n2 of the second color, ..., nk of the kth color with . Then we would say that there are n1 n2 ... nk n n! n1 ! n2 !L nk ! possible arrangements (patterns) of the n chips in n slots. As before, we invent notation to ease the task of writing things out. Our notations for the important counting quantities that we discovered in our examples above are: n n! r !(n r )! r and n n! n1 !n2 !L nk ! n1 n2 L nk (Not much of a time saver.) These last examples considered putting n objects into n cells where all of the n objects were not distinct to us. The only thing that was distinct was the actual color of the objects. The reds were indistinguishable from each other as were the blacks. Now suppose that we have n distinct objects that we wish to arrange in n slots. Also, assume that the n slots have been partitioned into k different groups. Example: Suppose that we have 12 distinct objects to put in 12 slots and that the 12 slots are partitioned into 5 slots, 3 slots and 4 slots. ___ ___ ___ ___ ___ | ___ ___ ___ | ___ ___ ___ ___ We will now determine the number of ways that we can put the 12 objects into the 12 slots with the given partition under the condition that we do not care what order they entered the group. Clearly, if order did matter then there would be 12! ways of putting the 12 objects into the 12 slots. But since order of placement within a group does not matter, there is no difference between the following two placements of the 12 objects: (1 2 3 4 5 |6 7 8 |9 10 11 12) and (2 1 3 4 5 |6 7 8 |9 10 11 12) |5 7 8 |9 10 11 12) The following two placements are not considered the same: (1 2 3 4 5 |6 7 8 |9 10 11 12) and (2 1 3 4 6 13 These are considered different placements since object 5 is now in group 2 and object 6 is in group 1. The number of ways the numbers 1 through 5 can be arranged in the first 5 slots is 5! The number of ways 6 through 8 can be arranged in the middle 3 slots is 3! The number of ways 9 through 12 can be arranged in the last 4 slots is 4! Thus, the total number of ways of putting the 12 distinct objects in the 12 slots with the given partition (5 objects | 3 objects | 4 objects) is 12 12! 5!3!4! 5 3 4 and n n! n1 ! n2 !L nk ! n1 n2 L nk in general. Consider the seemingly different problems with the same solution. Problem 1: How many patterns are there when we place 4 H's and 6 T's into 10 slots. Since each H is identical to the others and each T is identical to the others, this problem is covered by our first example and the solution is: 10 10! 210 4 4!6! Problem 2: Suppose that we have a deck of cards with only 10 distinct cards in it. How many possible 4 cards hands are there? That is, how many 4 card combinations are there? (As in all card games, the order of the cards does not matter.) When we are drawing 4 cards from the deck, this is the same as partitioning the cards into two groups, the 4 you get and the 6 you do not get. ___ ___ ___ ___ | ___ ___ ___ ___ ___ ___ So the number of possible hands here falls into the second type of problem we discussed. The total possible number of 4 card hands is then: 10 10! 210 4 4!6! Are these problems really so different? Consider the following as we revisit the first problem. We will change the objects from chips to cards. Suppose that Mike only looks at the cards in the first two slots. Also, Mike does not care about the order in which the cards come. He only cares about what cards they are. All of the permutations in the first cell, are R1R2 |B1B2 , R1R2 |B2B1 , R2R1 |B1B2 , R2R1 |B2B1 14 the same to Mike. Mike only sees the first 2 cards. So, in all 6 cases Mike sees R1R2 . This is the same for all 6 cells, so there are only 6 possible 2 card hands when we choose 2 cards from the deck of 4 cards: . There are 6 possible combinations. R1R2 , R1B1 , R1B2 , R2 B1 , R2B1 , B1B2 Example: Suppose that we have 18 distinct objects to put in 18 slots and that are partitioned into 4 groups as shown below. (Determine the answer in the proper formula notation format and also the final number.) ___ ___ ___ ___ ___ | ___ ___ ___ | ___ ___ ___ ___ | ___ ___ ___ ___ ___ ___ Example: We draw 5 cards from a deck of 52 cards. How many ways can all 5 of the cards be hearts? I'm not asking for the probability here. Now that we can count, we can determine probabilities. Example: We draw 5 cards from a deck of 52 cards. Determine the probability that all of the cards are hearts? 13 Number of All Hearts Hands 5 P(All Hearts) Number of All Possible Hands 52 5 13! 5!8! (13)(12)(11)(10)(9) .0004952 52! (52)(51)(50)(49)(48) 5!47! Example: Draw 3 cards from the deck of cards. Determine the probability that all 3 cards are aces. 2.7 Conditional Probability A bin contains 8 white chips and 5 red chips. We will select two chips from the bin without replacement. Determine the probability that both chips are white. 15 Solution: Since there are 13 total chips and we are choosing 2, the probability is . Let's look deeper into the first and last parts of this. 8 2 (8)(7) 8 7 P(Both White) 13 (13)(12) 13 12 2 P(Both White) P(White on First White on Second) P(W1 W2 ) . Now, 8/13 is the probability of a white on the first pick. We now realize that we are down to 12 chips and 7 of them are white. Thus, 7/12 is the probability that the second chip is white under the condition that we drew a white chip on the first selection. This gives us P(W1 W2 ) P(W1 ) P(W2 given that we got W1 ) the product. Our notation will be P(W2|W1 ) . We invent shortcut notation for the second term in . So we have P(W1 W2 ) P(W1 ) P(W2|W1 ) . Isolating the conditional probability term leads to the following definition for conditional probability: Def: The conditional probability of an event A occurs given that B has occurred is Example: Given Example: P( A B) .3 and P(B)=.5 , determine P(A B) .8 , P(A) .4 and P(B) .5 P( A|B) , determine . P(A|B) P(A|B) P(A B) P(A|B) P(B) .3 .6 .5 . Drawing 3 cards and getting 3 Aces (revisited) . 16 Draw 3 cards from a standard deck of cards. Determine P All 3 Aces . When we only \"own\" counting techniques, there is only one way to solve the problem: 4 Number of 3 card hands containing 3 Aces 3 4(3)(2) P All 3 Aces Number of 3 card hands 52 52(51)(50) 3 Once we \"own\" conditional probability we have options. P All 3 Aces P(A1 A2 A3 ) That is, we get an Ace on the first card followed by an Ace on the second card followed by an Ace on the third card. We can now do the following that we couldn't do before conditional probability: P All 3 Aces P( A1 A2 A3 ) P(A1 )P(A2 | A1 )P( A3 | A1 A2 ) Here, we use the conditional probability definition twice. Finishing the problem gives 4 3 2 P(A1 )P(A2 | A1 )P(A3 | A1 A2 ) 52 51 50 The second term is correct because if the first card is an Ace, there are 51 cards left and 3 are Aces. The third term is correct because if the first 2 are Aces there are 50 cards left and 2 are Aces. So, once we learn (ant then own) conditional probability we have extra options when solving problems. This should prompt us to want to learn (own) as many skills as possible so that we increase our chances of solving a problem (or solve it more efficiently). Combining this technique that uses conditional probability with the notes that counts patterns can make short work of some problems. I know what you're thinking: \"TEACH US MORE\" - Don't worry, I will!!! Example: We have 20 workers for 20 slots. There are 4 types of jobs. There are 6 Type 1desirable jobs, 4 jobs of Type 2, 5 jobs of Type 3 and 5 jobs of Type 4. Determine the probability that all 4 workers in a certain racial group (call it R) end up in the most desirable job type (Type 1). This is a book problem done by counting. Here we do the following: For the 6 Type 1 slots, the probability of the pattern RRRRNN, where N represents not R is, by conditional probability: 17 4 3 2 1 16 15 .000206 20 19 18 17 16 15 final probability is Since there are 6 15 2 ways to arrange the 4 Rs and 2 Ns, the (.000206)(15) .00396 More on Determining Conditional Probability Ethnic Group G1 G2 G3 Total Blood O A 225 250 800 75 150 350 1175 675 Type B 200 100 125 425 Total AB 50 25 60 135 725 1000 685 2410 When data is presented in tabular form, we can determine conditional probabilities (or any probabilities) directly from the table. Our method will be portion of the whole. We have interviewed 2410 people and asked their blood type and ethnic group. The data is summarized in the table. Example: A person is selected at random from the table. Determine P(Type B | G2). We could solve the problem using our definition of conditional probability: P(Type B | G2) = P(Type B G2) 100 / 2410 100 P(G2) 1000 / 2410 1000 Or, we can get the solution without the formula as follows. Since we are told that the person chosen is from G2, there are no longer 2410 possibilities, there are only 1000 possibilities. Of those 1000 people in G2, only 100 have type B blood, so our answer is 100/1000. So when we are answering probability questions from tabulated data, we would never need to use any formulas. We would only need to know what unions and intersections are and what conditional probability means. Here are some for you to try: Do not reduce the fractions!! If you are asked to enter an answer in Top Hat you must use a decimal answer. 1. P(G1) 2. P(Type A Blood) 725/2410 4. P(G1 G3) (685+200+100)/2410 Type AB Blood) 50/2410 7. P[(G1 3. P(Type B Blood G3) | Type B Blood] 5. P(G1 | Type O Blood) 6. P(Type O Blood | G1) 225/1175 8. P(G2 | Type A Blood) 9. P(Type A Blood | Type B Blood) 18 2.8 The Law of Total Probability and Bayes' Theorem The Law of Total Probability: To determine the probability of an event A, you sum up the probability of all ways that A can occur. Example: An urn contains 8 white, 5 red and 3 blue chips. A person selects 4 chips without replacement. Determine the following probability: P(The third chip is red) One way to solve this problem is to determine all of the ways that the third chip is red. There are 4 ways that the third chip can be red. Thus, P(R3 ) P(R1 R2 R3 ) P(R1 NR2 R3 ) P(NR1 R2 R3 ) P(NR1 NR2 R3 ) = 5 4 3 5 11 4 11 5 4 11 10 5 1050 5 .3125 16 15 14 16 15 14 16 15 14 16 15 14 3360 16 The second and third fraction in each part is determined by conditional probability. For the first term, a red on the first chip has probability 5/16. Once the first chip is red, we are down to 15 chips of which only 4 are red. So P(R2 |R1 ) 4 / 15 The simpler way to work THIS problem is to consider that you will pick the third chip first, set it aside and then pick the first two. Then the answer is more simply calculated as 5/16. This simple alternative solution should not be expected to be available on other problems. Example: Two cards are drawn from a standard deck of cards. Determine the probability that the first card is a Heart and the second card is an Ace. (The simple method does not work on this problem. Here there are two things that need to take place and not just one thing (third chip red) as in the problem above. Therefore, this problem needs a totally Law of Total Probability approach. P(H1 A2 ) P( Aw 1 A2 ) P(H[not Ace]1 A2 ) 1 3 12 4 51 1 52 51 52 51 52 51 52 Example: An urn contains 7 white chips, 5 red chips and 3 blue chips. A chip is randomly selected from the urn, the color is noted, then chip is returned to the urn and then the number of chips of the noted color is doubled. Now a second chip is selected. Determine the probability that the first chip was white given that the second chip is blue. 19 P(W1 |B2 ) P(W1 B2 ) P(W1 B2 ) P(W1 ) P(B2 |W1 ) P(B2 ) P(W1 B2 ) P(R1 B2 ) P(B1 B2 ) P(W1 ) P(B2 |W1 ) P(R1 ) P(B2 |R1 ) P(B1 ) P(B2 |B1 ) 7 3 15 22 7 3 5 3 3 6 15 22 15 20 15 18 The first step above is the definition of conditional probability. Step 2 uses the law of total probability. Step 3 is conditional probability from the intersection point of view. Step 4 is plugging in the numbers. The entire thing is Bayes' Theorem. Definition: Events disjoint and n U i 1 A1 , A2, L An is said to be a partition of the sample space S provided the events are . (We say the sets are mutually exclusive and exhaustive) Ai S Bayes' Theorem: Let the events Then, P( Ak |B) P(Ak |B) form a partition of S. Also, let B be a subset of S (an event). P(Ak )P(B| Ak ) n P(A )P(B| A ) i 1 Proof: A1 , A2, L An i i P(Ak B) P(A B) P(A )P(B | Ak ) n k n k P(B) P(Ai B) P(Ai )P(B| Ai ) i 1 i 1 It should be noted that using the theorem requires several other conditional probabilities. If we have access to them, we have no difficulty determining our desired conditional probability. Example: Suppose that it is known that a certain disease occurs in .5% of the population. Suppose also that we have a certain medical test to determine if person has this disease. The test produces a positive reading on 99.6% of those infected with the disease. Unfortunately, this means that .4% of those with 20 the disease go undetected. The test is not perfect. Another bad aspect of most tests is they also give false positives. That is, the test shows a positive result in some individuals that do not have the disease. Suppose that this test gives a positive result in healthy patients 2% of the time. We now want to determine the probability a person has the disease given that they have tested positive. P(D |T ) P(D T ) P(D T ) P(D )P(T |D ) P(T ) P(D T ) P(D T ) P(D )P(T |D ) P(D )P(T |D ) (.005)(.996) .20016 (.005)(.996) (.995)(.02) Another way to solve the previous problem is to create a population in table format that satisfies all of the conditions in the problem. (Alternate Solution) We have: P(Have the Disease) .005 P(Test Positive | Have the Disease) .996 P(Test Positive | Don't Have the Disease) .02 We are being asked to determine P(Have the Disease | Tested Positive) We will solve this problem by creating a table that allows us to determine the answer directly. Assume we have 100,000 random individuals that follow the above information perfectly. (100,000 is picked so we get whole numbers) Since P(Have the Disease) .005 , we have 500 total with disease and thus 999,500 without the disease. We can now fill in the bottom row. Of those with the disease, 99.6% test positive. (.996)(500) 498 . We can now fill in the \"Has Disease\" column. Of those that do not have the disease, 2% will test positive. \"Does Not Have Disease\" column and the \"Total\" column. (.02)(99,500) 1990 . We can now fill in the 21 Test Positive Test Negative Total Has Disease 498 2 500 Does Not Have Disease 1,990 97,510 99,500 P(Have the Disease | Tested Positive) 498 / 2488 .20016 . Total 2,488 97,512 100,000 1 Chapter 3: Discrete Random Variables In this chapter we will define a discrete random variable, study properties of random variables and study several common and useful discrete random variables. Discrete: Finite or countable collection Variable: A measured quantity that is allowed to take on a variety of values. Random: Not determined by us From our past experience we know what a variable is. A variable is a quantity that can vary. We usually use a letter to denote the variable (quantity). We use a letter because a number just wouldn't work since numbers can't vary. Example: A company makes widgets and sells them for $10 each. They can make as many widgets as the want. The number of widgets is therefore not a fixed number but a variable chosen by the company. Thus, we would want to use a variable (say x) to represent the number of widgets made. The revenue generated by these widgets is also a variable (say y) that is dependent on the number of widgets that we make. In life there are many quantities that vary (variables) and the value that the variable takes on is not chosen by us or anybody else. Here are some examples: The length of time it takes a person run to run a mile tomorrow The number of spots we will see after a die is rolled Your weight tomorrow morning The number of customers that will come to our restaurant tomorrow for lunch The number of patients that will be admitted to the emergency room today How long it will take us to drive to work today All of these quantities are variables since they can take on many possible values. The difference between these variables and the number of widgets made in the example above is that the actual value of our variable is not picked by us or anybody else. It is random. That is, the value is somewhat a matter of chance. There are any possibilities, but we do not pick the value. So the value eventually given to these variables is random and depends on probability. 2 3.1 What is a Random Variable Definition: Let S be a discrete sample space from some experiment. A Random Variable is a function from a sample space into the real numbers. Random variables are always denoted with a capital letter. Usually, we would want our random variable to be some meaningful quantity. Example: Our experiment is to roll two dice. Our sample space in the collection of all permutations of the integers 1 through 6. Some meaningful random variables might be\" X1 equals the sum of the two dice. We can write what our random variable X1 does to outcomes in S in our familiar functional notation: , , etc. X1 (1,3) 4 X1 (2,2) 4 X1 (3,1) 4 We can define other random variables on our sample space: X2 X3 equals the larger of the two values on the dice. equals the larger number minus the smaller number. As in algebra or calculus, we can use subscripts to distinguish one function from another. Example: Determine X1 (4,5) 3 Example: Determine Example: Determine X2 (4,5) X 3 (4,5) Example: Suppose that we are going to toss a coin 3 times. Let Determine the values of X count the number of heads tossed. , and . X (HHH) X (TTH) X (HTT ) So, our random variable takes outcomes from the sample space of a random experiment and turns them into real numbers. Thus, we now have a collection of real numbers that have been randomly chosen. These real numbers can be seen as a sample space from an experiment. We would now like to define probabilities to these numbers in a natural way. We define P( X x) Ai S ; X ( Ai ) x P( Ai ) Consider our rolling two dice experiment with the random variable X1 that equals the total number of spots rolled. Each of our 36 outcomes in the sample space has probability 1/36. We now determine the probability that our random variable takes on the value 4. Since the only outcomes that satisfy Ai S X1 (Ai ) 4 are (1,3), (2,2) and (3,1) we have P( X1 4) Certainly 1 1 1 3 36 36 36 36 P( X1 2.75) 0 since our random variable does not map any outcomes into the value 2.75. 4 Definition: The Support of a discrete random variable is defined to be where x such that 0 P( X x ) , denotes all real numbers. Example: The support of the random variable random variable X2 is S 1,2,3,4,5,6 X1 is S 2,3,4,5,6,7,8,9,10,11,12 . The support of the . Example: Determine the support of the random variable X3 . Example: A coin is tossed three times. Let X count the number of heads tossed. Determine the support of . X Notice that the support can be thought of as a sample space. Since we have endowed the support with a probability function, the support along with the collection of probabilities, is a probability space. 3.2 The Distribution Our experiment is to toss a coin three times. Let toss? Than is, what value will X X count the number of heads. How many heads will we take on? The answer is \"We don't know - it's random\". We can however come up with a collection of ordered pairs of numbers that detail what can happen and what are the associated probabilities. So, we will make a list (or create a formula) that associates the probability of occurrence for each value in our support. Definition: For a random variable X , the Probability Mass Function (pmf) is a function that assigns the probability of occurrence to each value in the support. We denote this function f (x) . 5 Our pmf can be written out in table format or in the form of a formula. Consider our experiment where we toss three coins and an experiment where we roll a single die. x 0 1 2 3 f(x) 1/8 3/8 3/8 1/8 n x x 3 1 1 f (x) x 2 2 for x 0,1,2,3 x 1 2 3 4 5 6 f(x) 1/6 1/6 1/6 1/6 1/6 1/6 f (x ) 1 for x 1,2,3,4,5,6 6 When we write out our pmf in functional form, we want to be sure to include the support. It should be noted that writing out the pmf in tabular form carries the exact same information as it does in functional form. We could also graph each pmf as done below. This format also carries the same information. pmf of Rolling a Die pmf of Tossing 3 Coins and X = Number of Heads 0.22 0.375 0.20 0.18 0.250 0.16 0.14 0.125 0.12 0 1 2 C4 3 1 2 3 4 5 6 C1 Certainly, if the support was a larger set, this graph could be very useful. We purposely use dots with a line projected down. The probabilities hare are point masses and are not spread over an interval on the x-axis. There is a mass at 0, 1, 2 and 3. As always, the total mass (probability) is equal to one. In all three formats, the total probability of 1 is distributed over the support. We therefore refer the information about the support and associated probabilities as the Distribution. The stick figure graph given above carries more information when we have more than just a few values that the random variable X can take on. Consider the visualization of four random variables given below. The graph can contain much more intuitive information than we would get by looking at a table of values. While we will use the table of values to determine answers to probability questions, the graph can give us a feeling of the distribution. 6 In each of these graphs we can instantly get a feeling for the distribution. If we looked at random variables with many more possible x-values, we might see even more information that might not be seen when looking at a table of values (see below). The pmf is one way to convey the distribution (the story of probability) of a random variable. A second method for writing out the distribution is the Cumulative Distribution Function (CDF). 7 Definition: For a random variable F ( x ) P( X x ) X , we define the Cumulative Distribution Function (CDF) as . The information contained in the pmf and CDF are identical. Neither tells us more about the story of probability than the other, so both are considered the distribution. The distribution considered in its CDF format can be very handy for discrete random variables and is a necessity for continuous random variables that we will discuss in the next chapter. For our two examples, the CDFs are given below. x 0 1 2 3 x 1 2 3 4 5 6 F(x) 1/8 4/8 7/8 1 F(x) 1/6 2/6 3/6 4/6 5/6 1 We get from the pmf to the CDF by addition and from the CDF to the pmf by subtraction. When we study continuous distributions, can you guess how we get from one to the other? When we are discussing discrete random variables, a graph of the CDF is not as useful as the pmf. We can however see some things from these sample graphs. Theorem: The CDF is right continuous. Theorem: lim F (x) 1 x Theorem: lim F (x) 0 x Theorem: P(A X B) F (B) F (A ) 8 Example: Determine P(2 X 4) for the single die experiment. P(2 X 4) F (4) F (2 ) 4 1 3 6 6 6 For many of the distributions that we will study in the next few sections, we will be using CDF charts with a support being a subset of the integers. Therefore, we would be wise to master using CDF charts. Example: The pmf for some random variable X is given below. Determine the CDF. x f(x) 1 F(x) x f(x) F(x) .15 1 .15 .15 2 .11 2 .11 .26 3 .09 3 .09 .35 4 .35 4 .35 .70 5 .17 5 .17 .87 6 .13 6 .13 1.00 To get from the pmf to the CDF we add all probability at or below each value in the support. When determining f (4) to F (3) . This asks for all values of X less than 4 and including 4. P( X 4) F (3) , we just add The chart values increase each stop of the way and terminate at 1 in the finite support case and have a limit of 1 in the countable case. P( X 4) not including 4. F (4) F (4) This asks for all values of X less than 4 but 9 P(6 X ) This asks for all values of X greater than 6 and including 6. P(6 X ) 1 F (5) This asks for all values of X greater than 6 not including 6. 1 F (6) P(4 X 8) F (8) F (4 ) F (8) F (3) 3.3 The Expected Value of a Random Variable Las Vegas, Nevada - where dreams come true (for casino owners). Casinos, just on the strip, have averaged over $6,000,000,000 ($6 billion) per year from 2005 through 2016. Are they cheating? Or, do they just expect to win? Why would they expect to win? Maybe they have the edge. Each time anybody places a bet, they are doing a random experiment and the amount won or lost on the bet is a number. Thus, we have a random variable that randomly chooses how much you win or lose. 10 What would it mean for the casino to have the edge? Does it mean that they have a better chance of winning than the player? In some cases, yes, but not in all. Let's look at some game examples and decide if we would like to place such a bet. These are not actual games in Las Vegas. Example: We roll a single die. If we roll a 1, we win $10. For all other rolls, we lose $1. In this bet, we would lose far more often than we would win. But when we win, we make much more than we lose on the times that we do lose. If everything goes according to plan, like in Disneyland, we would win $10 exactly one-sixth of the time and lose $1 exactly five-sixths of the time. Overall, we would show a profit. Of course, this isn't Disneyland and things might not go as planned, but theoretically, we would expect to win. We would want to take this bet. Example: Consider a bet that we have a 50% chance of winning. In this game, if we win the bet, we collect $10 profit. If we lose the bet, we pay $11. In this game, we will win as often as we lose in a perfect world. The bad part is clear. Each win ($10) does not make up for each loss ($11). In the long run, we would expect to lose money. We would not want to play this game. In this same example, if we only pay $10.01 when we lose, the game is not nearly as unbalanced. Example: In this game, when we win, we are paid $10. When we lose, we pay $10. Would we want to play this game? If you have already answered (Yes or No), you have answered too soon. You don't know what the probability of winning is. If your theoretical chance of winning is 50%, this is a fair game and playing is fine. If your theoretical chance of winning is 49%, the game is unfair, but not too unfair. If your theoretical chance of winning is 40%, the game is very unfair and playing would be very bad. Based on these examples, we see that your theoretical expectation is a combination of your probability of winning along with what you get paid when you win and what you pay when you lose. Definition: The mean of a discrete random variable xf (x) X , is defined by the formula (provided the sum converges) xSupport If we are in a situation where we have two random variables, say X and Y subscripts to denote which mean is connected to which random variable: , we will make use of X and Y . There are times where we prefer a different name and notation for the mean of a random variable. We will use the phrase Expected Value of a random variable interchangeably with the Mean of a random variable. Our new associated notation for the expected value is . E[ X ] 11 Remember that a random variable chooses numbers. We should thing of the mean of a random variable as what the data average , , would be if we collect a huge amount of data. x Example: A single die is rolled and X denotes the number of spots viewed. The mean of X can be determined using the above formula. 1 1 1 1 1 1 (1) (2) (3) (4) (5) (6) 3.5 6 6 6 6 6 6 Obviously, we will never roll a 3.5. But, that is the theoretical mean or average. Consider rolling the die 60 times in Disneyland. Since everything goes according to plan, we will get exactly 10 of each of the six possibilities. Our data average would be x 10(1) 10(2) 10(3) 10(4) 10(5) 10(6) 3.5 60 Example: Consider spinning the spinner below. What is the expected value of the spinner? We could make our pmf chart and then calculate the mean of the random variable. x f(x) 1 .1 2 .2 3 .3 4 .4 (1)(.1) (2)(.2) (3)(.3) (4)(.4) 3.0 Suppose that we had a game that had 300 spaces on it. We move as many spaces as we spin on the spinner. Since our expected number on the spinner is 3.0, we would expect the game to take about 100 spins. Example: Consider spinning the spinner below. What is the expected value of the spinner? We could make our pmf chart and then calculate the mean of the random variable. x f(x) 1 .4 2 .3 3 .2 Determine the expected value of 4 .1 X . About how many spins would it take to get through our 300 spaces on our board game? 12 Not all random variables have a finite mean. Let 1 1 1 f (x) for x 1,2,3,L x(x 1) x x 1 X be a random variable with pmf . You should verify that this sums to 1. To determine the mean of this random variable, we calculate 1 1 x x( x 1) x 1 ( x 1) x 1 . Since this series diverges, our random variable does not have a finite mean. Consider f (x ) k(1 / (n ^2 1) on the integers 3.4 The Expected Value of a Function of a Random Variable Often times in the real world, we wish to look at a function of our variables - or transform them. Changing feet into meters, Fahrenheit into centigrade, etc. In this section we seek to determine the expected value of this new variable. If we transform a random variable by some function , it X Y g( X ) should be clear that Y is a random variable. Theorem: Given a random variable Proof: X and the transformation , . Y aX b E[Y ] E[aX b] aE[ X ] b E[aX b] (ax b) f (x) (ax) f (x) (b) f (x) a xf ( x) b f (x) aE[ X ] b a X b Thus, the expected value of a random variable is a linear operator. 13 Example: If a random variable or Y 2 X 3 11 X has mean 4 , determine the mean of Y , where Y 2X 3 E[Y ] E[2 X 3] 2E[ X ] 3 11 If we think about this last example, it seems like a no brainer that it is true. The average score is 4. I double everybody's score. The new average is 8. I know add 3 to everybody's doubled score. The new average is 11. In this linear case, notice if we consider Y g( X ) 2 X 3 , then E[Y ] E[g( X )] 2 X 3 g(E[ X ]) would naturally wonder if that statement always holds. That is, will it always be true that If E[Y ] g(E[ X ]) the expected value of -2 fX ( x ) 1/8 xfX ( x) -1/8 -1 1/8 -2/8 0 2/8 0 1 1/8 2 X ? To determine y E[Y ] X with pmf given below. We now let , we will find 4 fY (y) 1/8 yfY (y) 4/8 1 1/8 1/8 0 2/8 0 1/8 1 1/8 1/8 2/8 4 3 1/8 3/8 4 1/8 4/8 E[ X ] 7 / 8 We see that Y g( X ) then . The answer is no, this is not always true. How will we show this? Suppose that we have a discrete random variable x . We Y X 2 fY (y) and then compute y Y X2 E[Y ] yfY (y) 0 fY (y) 2/8 yfY (y) 0 1 2/8 2/8 4 2/8 8/8 1/8 9 1/8 9/8 1/8 4/8 16 1/8 16/8 9 1/8 9/8 16 1/8 16/8 . What is . E[Y ] 35 / 8 E[Y ] 35 / 8 49 35 E[ X ] E[ X 2 ] E[Y ] 64 8 . So, we have our counter example and in general, it is not true 2 that E[Y ] g(E[ X ]) . We do however see something helpful when looking at the first two groupings in the 14 above table. We see that the probabilities are identical and that each y-value is the square of the x-value. This leads us to the following theorem that will help us determine E[Y ] Theorem: Given a discrete random variable X and a function , . Y g( X ) E[Y ] E[g( X )] g(x) fX (x) The theorem is much bigger than it seems. It allows us to determine the expected value of a transformation of a random variable without determining the distribution of the new random variable. We easily found in the above problem. With continuous random variables, it can often be difficult fY (y) to determine fY (y) , so this will be huge when we get to continuous random variables. 3.5 Some Important Functions of a Random Variable (Moments and Variance) Definition: The Variance of a discrete random variable 2 (x )2 f (x) X is defined by the formula (provided the sum converges). Alternatively, 2 E[(x )2 ] . xSupport Definition: The Standard Deviation of a random variable, denoted by Note that we can consider the function Y g( X ) (x )2 , is defined by . Then the variance is just 2 E[g( X )] E[Y ] . In a data set, we use the standard deviation of the data set to denote a measure of the variation of the data. The standard deviation of a random variable does the same thing. It gives us a measure of how spread out the support is. Example: Determine the standard deviation for rolling a single die. 15 1 1 17.5 2 (x 3.5)2 (1 3.5)2 (2 3.5)2 L (6 3.5)2 2.916667 6 6 6 x 1 6 Theorem: Proof: . So 1.7078 . 2 E[( x )2 ] E[ X 2 ] E 2 [ X ] E[(x )2 ] (x )2 f (x) (x 2 2 x 2 ) f (x) x 2 f (x) 2 xf (x) f (x) E[ X 2 ] 2 E[ X 2 ] E 2 [ X ] Example: Recalculate the standard deviation of a single die using the theorem. 1 91 2 (12 22 32 42 52 62 ) 3.52 3.52 2.916667 6 6 In general, this theorem allows a much easier calculation than the definition. Soon, we will see problems where this formula is a necessity. Not all random variables have a finite variance. We saw earlier that we can have a random variable that does not have a finite mean. We can also have a random variable with finite mean, but not finite variance. k f (x ) 3 for x 1,2,3,L x , where k is the value that makes random variable has finite mean since f (x) a pmf. That is, so that k k k 2 E[ X ] x 3 2 x x 1 x 6 x 1 variance of our random variable, we need to determine E[ X 2 ] . So, this random variable has finite mean but not finite variance. k 1 3 x 1 x . This , which is finite. To determine the k k E [ X ] x 3 x x 1 x x 1 2 2 which diverges. 16 The E[ X ] is often called the first moment of Definition: The kth moment of X X and is defined to be E[ X 2 ] is called the second moment of X . E[ X k ] These moments are important features of a random variable. Sometimes they are called the moments about the origin and then moments about the mean would be defined as . E[( X )k ] 3.5 The Moment Generating Function We now look at a very special function of our random variable : . Why that function? Let's X Y e Xt investigate. We are not at appoint where we are trying to find the distribution of our transformation, but we are interested in the expected value of our transformation. So we will consider the . E[Y ] Definition: We define the Moment Generating Function of a random variable MX (t ) E[e Xt ] e xt f (x) some open interval of Theorem: , provided t 0 MX '(t ) t 0 E[ X ] , except X to b the sum converges for all values of t in possibly at t 0 itself. 17 Proof: MX '(t ) E[e Xt ] ' e xt f (x) ' xe xt f ( x) Theorem: Proof: MX ''(t ) t 0 . Plugging in t 0 yields: MX '(0) xf ( x) E[ X ] E[ X 2 ] MX ''(t ) MX '(t ) ' xe xt f (x) ' x 2 e xt f (x) . Plugging in t 0 yields: MX ''(0) x 2 f (x) E[ X 2 ] Our generalized theorem is then. Theorem: MX (k ) (t ) t 0 E[ X k ] What should we call this function that generates the moments of the distribution (random variable)? Not all random variables have a MGF. Suppose that the MGF of Suppose that the MGF of X X is is MX (t ) (q pet )n pet MX (t ) 1 qet , determine the mean and variance of , determine the mean and variance of X X . . 1. A coin is tossed 5 times. Let X count the number of heads tossed. Determine 2. A coin is tossed 3 times and then a die is rolled. Let spots on the die. Determine , X (HHT 6) X (TTT 4) 3. A red die is rolled and then a green die. Let X (3,4) and X (4,3) 4. Determine f (x) and X X (HHTHT ) . be the number of heads times the number of X (HTT 3) . X 5(Spots on Red) (Spots on Green) . Determine . for the random variable in problem 1. 5. Determine the pmf for the random variable in problem 2. 6. Two dice are to be rolled. Let X be the largest value you see. Determine the distribution of 7. The collection of values that our random variable assigns non-zero probability is called? X