Question: A random experiment was conducted where a Person A tossed five coins and recorded the number of heads. Person B rolled two dice and recorded
A random experiment was conducted where a Person A tossed five coins and recorded the number of "heads". Person B rolled two dice and recorded the larger number out of the two dice. Simulate this scenario (use 10000 long columns) and answer questions 10 to 13.
Hint: check Lecture 26 in the book.
10. Which of the two persons (A or B) is more likely to get the number 3?
- a. Person A
- b. Person B
- c. Not possible to determine
11. Which of the two persons will have higher Median among their outcomes?
- a. Person A
- b. Person B
- c. Not possible to determine
12. What is the probability that person B obtains number 5 or 6?
- a. About 23%
- b. About 32%
- c. About 40%
- d. About 55%
13. Which of the persons has higher probability of getting the number 3 or larger?
- a. Person A
- b. Person B
- c. Not possible to determine
Lecture 26: The Die is Cast
Task 1
Use Excel to roll a die 1000 times and based on this data create the appropriate histogram.
Step 1:(Important) First we need to tell the computer that we are dealing with a die.
Create two columns that would be used to generate "a die." In simple terms, this is where we tell the computer how to generate random numbers: generate the numbers 1, 2, 3,.....,6 each with equal probability of 1/6. (Caution: Sometimes Excel interprets the command "=1/6" as date (January 6th), to avoid this, type "=1.0/6.0" instead).
Step 2:Go to the Random Number Generation and select Discrete random variables and highlight the values you just created in Step 1. (Do not highlight the headers, just the values!). Press OK and this is it! We just rolled a die 1000 times and recorded all the outcomes and all this with a click of a button.
Leave the Random Seed box empty!
Step 3:Create Bins = 1, 2, 3,...., 6 and follow the steps described earlier in order to create the histogram.
Comment: As expected the frequency of each outcome is approximately 1/6.
Task 2
Imagine an experiment: Three dice are rolled and the maximum of the three numbers is recorded. The resulting number is obviously random and the question is: What is the most likely outcome? Or in other words: if you have to bet some money, which number would you pick? And what odds would you have of winning? The task is to simulate this experiment and answer the above questions.
Step 1:Create three columns of dice, each 1000 long. Essentially repeat the previous task, but this time replace the number "1" with the number "3" in the box Number of Variables. This will create 1000 experiments where three dice were rolled. Leave the Random Seed box empty!
Step 2:Next in cell D1, type "=Max(a1:c1)." This command will compute the maximum number for the given three dice. Now click on the right corner of this box and scroll it down. (or use the Shift Ctrl-D trick)
The Shift Ctrl-D trick:Scrolling down 1000 cells is cumbersome. The following little procedure is a life saver when dealing with large data files. It allows us to perform mathematical operations on whole columns and rows without the need of scrolling.
- Make the D1 cell (in this case just type "=Max(a1:c1)").
- Split the screen and scroll the bottom half to the end (in this case, 1000th cell).
- Click on the D1 cell.
- With the other hand click on the Shift button and hold it!
- While holding the Shift button, bring cursor to the 1000th cell (i.e. D1000) and Click on it (this should highlight the whole D column).
- Now let go of the Shift button and Click and hold Ctrl button and then click on the letter D on your keyboard.
This operation essentially takes the first highlighted cell (in this case D1) and then tells Excel to repeat whatever was done there to the whole highlighted column (or multiple columns if needed).
Step 3:Now that we have created a thousand experiments, (each of which consists of rolling three dice and then picking the maximum of these three numbers), all we need is to create the histogram. Hint: identify the bins by listing all the possible outcomes.
The resulting table and chart nicely describes the "distribution" of this experiment. Namely, now we can observe that the number 6 is the most likely outcome. One can also deduce that "6" would occur approximately twice as likely as the number 4 and 200 times more likely than the number 1.
Task 3
Imagine an experiment: Five coins are tossed and you denote the heads by "1" and the tails by "0." The task is to make a new random number that would be the sum of all heads among the five coins. Use a histogram to describe the distribution.
Step 1:As before, we need to tell the computer how to generate a coin. Here we let outcomes be 0 and 1 (0 for Tail and 1 for Head), and the probabilities are obviously 0.5 and 0.5, respectively. Hint: We could increase the column length here by typing 5000 for the Number of Random numbers. Leave Seed box empty
Step 2.Mimic the previous exercise (Shift Ctrl-D trick). Here we need five random variables and the computation becomes "=Sum(a1:e1)." Finally, in order to create the histogram, choose the bins by identifying all possible outcomes. The results should look like this:
Comment:The table and chart on the left nicely describe the distribution of this experiment. We can see that the chart is symmetric: the numbers 2 and 3, as well as 1, 4 and 0, 5 are equally likely to occur.
What is more likely: The occurrence of number "5" or number "1"?
Your answer:4
Correct answer:The number "5" appeared in 116 out of 5000 trials while the number "1" appeared "774" times. Thus, "1" is more likely.
What is the approximate probability that the number "3" will appear?(Click for answer)
Task 4
Imagine an experiment where two dice are tossed and the numbers on dies are recorded as Die1 and Die2. Answer the following questions.
What is the (approximate) probability that Die1+Die2=7?
Your answer:5
Correct answer:There are two ways to answer this. First, by using the probability theory we learned earlier: list all the possible outcomes for two dice and then count how often Die1+Die2=7. The second way is to simulate this experiment 10000 times, and make a histogram of these outcomes. Hint: first "roll" two dice and then make a new column containing the Sum of the two columns (i.e. =A1+B1, and then fill the 10000 long cell). Create the histogram using bins: 2,3,...,12. And just observe the frequency for the outcome "7".
Clearly there are 1700 instances where the sum of two dice equals 7. Thus the approximate probability is 1700/10000=0.17=17%.
Theoretical Digression: Sample Distribution and the Bell-shaped Curve
A keen reader might have noticed that we have covered two distinct topics: Statistics and Probability. And although, traditionally, the two come as a pair, it is not that obvious why. Why do we combine questions about rolling dice and tossing coins, which are about Probability, with questions regarding the slopes of regression lines or confidence intervals? Where is the connection? The question is a tricky one, and frankly in my experience, very few people have a good answer to this one, instructors and lecturers included, unfortunately.
The answer is the P-value.The P-value, obviously, plays an instrumental role in Statistics, and we have covered it extensively. But we never explained how we got this value. The mathematics behind this "magical" value is way too complex for this type of class, but the intuition we can address here. It all boils down to the few charts we have created above.
Sampling Distribution:
Experiment:
Toss 5 coins and record the SUM:
Roll 3 dice and record the SUM:
The above two histograms were based on the sample of 5000 Sums. That is, for the first histogram we tossed 5 coins 5000 times and then computed the sum for each of this 5-coin sample. Thus the first histogram represents theSampling Distributionof the "sum-of-five-coins." Analogously, the second chart represents theSampling Distributionof the "sum-of-three-dice."
What can you observe?
Both charts resemble the Bell-Shaped curve! Surprisingly, although "Coin" is very different from "Die," their sums behave remarkably similar. And this is true for many other situations. As long as one adds a few random experiments, the result seems to behave as a Bell-Shaped curve. And this is a universal law of nature. For example, imagine the following "crazy" example. One randomly picks a person and records the following 5 numbers:
- The last two digits of her social security number
- Her height in inches
- Her weight in pounds
- The last two digits of her zip-code
- The number of cousins she has
Each of these five numbers is random and completely different. Next, in this strange experiment, we add these five numbers, and call itS1(indicating the sum of values for the first person). Now if you repeat this with a thousand randomly chosen individuals, and if you make a histogram, based onS1,S2...S1000, you would get a Bell-Shaped curve. Strange isn't it! This bizarre phenomenon has been mathematically confirmed by celebrated Central Limit Theorem, and it is the theatrical ingredient responsible for theP-Value.
We cannot dwell into theory, but we can describe the intuition:
Heuristics:
- When we collect the data, we actually sample the observations. This in turn is mathematically modeled as a list of random experiments. These observations (or random experiments) we call:X1,X2,X3,... as well asY1,Y2,Y3,...
- The statistical quantities of interest, likeAverage,R-square,Slope of the Regression Line, and alike, are based on mathematical formulas, which in turn are based on theadditionof these random observations. Example: The Average is computed as:(X1+X2+X3+...+Xn)/n, and the formula for the slope of regression line as well as R-square consists of sums of X's and Y's as well.
- The Central Limit Theorem now claims that all these sums will have a very similar behavior. They all have Gaussian distribution (regardless of what we actually measure). This fact, coupled with some Mathematics, now yields theP-value.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
