Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1) Write down 10 numbers between 1 and 100. These can be whole numbers but we will assume this is CONTINUOUS data (not DISCRETE). Rank
1) Write down 10 numbers between 1 and 100. These can be whole numbers but we will assume this is CONTINUOUS data (not DISCRETE). Rank order them. 1 2 3 4 5 6 7 8 9 10 x-values MEAN: ____, VARIANCE:____, STD DEV:____, Q1____, Q2____, Q3____, IQR____ 2) Use those statistics to determine if any of your data values are \"UNUSUAL\". (a) Mean + 2 standard deviations = ____ and ___ . Unusual data values?____________ (b) Mean + 1.5 * IQR = _____ and _____. Unusual data values? _______________ 3) Fill in this Frequency Table by putting you data points into the ranges given. RANGE FREQUENCY 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 Using the above table: RELATIVE FREQ. CUMULATIVE RELATIVE FREQ (BE CAREFUL HERE) (a) What percent of your data values are at or below 50: _______. (b) What percent of your data are at or below between 61: ______ and at or below 90 : _______ (c) So, what percent are between 61 and 90? ______________ 4) Lastly, let's STANDARDIZE (z-values) the data (x-values) 1 x-values z-values Probability to LEFT* 2 3 4 5 6 7 8 9 10 Probability to RIGHT* * From the z- TABLES (NOT SOFTWARE) determine the area to the LEFT of each standardized x-value. This is the PROBABILITY that our data is less than or equal to that data point. Subtract that area from 1.0000 to get the probability that our data are greater than that data points. Obviously, these two probabilities MUST add up to 1.0000 or 100% which accounts for all of our data. OK, let's see how probabilities determined from the z-values compare to those determined from the Frequency tables. We have the percent of data at or below 50 and the percent of data between 61 and 90 (a) STANDARDIZE \"50, 61 and 90 \" using your data set's statistics (i.e., mean and SD) x-values 50 61 90 z-values PROBABILITY* from FREQUENCY TABLE * From the z-Table these are the areas (probabilities) to the left of these data points. (b) Subtracting the area (probability) to the left of 61 from the probability to the left of 90 gives us the probability of data being between 61 and 90. How do these probabilities compare to the Cumulative Relative Frequencies? WEEK 4 HOMEWORK: LANE CHAPTER 7 AND ILLOWSKY CHAPTERS 6 AND 7 THE NORMAL DISTRIBUTION Z-TABLES ARE ATTACHED AND YOU ARE TO USE THEM RATHER THAN SOFTWARE TO SOLVE THESE PROBLEMS. (THIS IS STRAIGHT FORWARD TABLE READING.) THIS WEEK'S CONCEPTS ARE REALLY THE HEART OF OUR COURSE. PROBABILITY (FROM LAST WEEK) IS THE DRIVING FORCE BEHIND STATISTICS (AND THEORETICAL PHYSICS - WATCH THE PBS MOVIE: \"PARTICLE FEVER\"). PRINT OUT THE NORMAL DISTRIBUTION TABLES (ONE PAGE) AND REVIEW THEM AS YOU READ ON. THESE AND OTHER TABLES ARE IN OUR COURSE: COURSE CONTENT > COURSE RESOURCES > STATISTICAL RESOURCES > STANDARD NORMAL DISTRIBUTION TABLE THE AREAS UNDER PARTS OF THIS GRAPH ARE ALL WE ARE TRYING TO FIGURE OUT IN STATISTICS. IT'S THAT EASY. HERE IS HOW WE DO IT. FOR ANY DATA SET OF X-VALUES, WHICH CAN BE NUMBERS REPRESENTING HEIGHTS, WEIGHTS, AGES, ETC. WE FIRST CALCULATE THE MEAN, VARIANCE AND STANDARD DEVIATION WE THEN NEED TO \"STANDARDIZE\" OR CONVERT THESE INDIVIDUAL DATA VALUES TO Z-VALUES: Z = (X - MEAN) / STANDARD DEVIATION (DO THE SUBTRACTION FIRST). THE STANDARDIZED \"Z-VALUE\" IS SIMPLY THE NUMBER OF STANDARD DEVIATIONS THAT OUR CONVERTED X-VALUE IS FROM THE MEAN. [ IF WE HAVE A Z-VALUE AND WANT TO DETERMINE WHAT THE RAW DATA POINT (X-VALUE) WAS WE USE: X = Z * STANDARD DEVIATION + MEAN (DO THE MULTIPLICATION FIRST) ] IF WE GRAPHED ANY SET OF DATA POINTS (AS YOU DID IN WEEK 2 WITH THE 3 SETS OF 10 DATA POINTS), SOME DATA POINTS (X-VALUES) WOULD BE ABOVE THE MEAN AND SOME BELOW. WHEN STANDARDIZED THE MEAN OF THESE STANDARDIZED DATA POINTS (Z-VALUES) BECOMES ZERO AND THOSE POINTS BELOW THE MEAN HAVE A NEGATIVE STANDARD DEVIATON AND THOSE ABOVE IT HAVE A POSITIVE STANDARD DEVIATION. THE AREAS IN THE TABLE SIMPLY CORRESPOND TO THE PROBABILITY OF A DATA POINT BEING LESS THAN OR EQUAL TO THAT Z-VALUE. SUBTRACT THAT AREA FROM 1.0000 AND WE HAVE THE PROBABILITY THAT OUR DATA POINT IS GREATER THAN OUR Z-VALUE. SO WHAT? LONG STORY SHORT: THERE ARE SPECIFIC Z-VALUES OF INTEREST REFERRED TO AS \"CRITICAL VALUES\" AND THOSE ARE THE ONES THAT CORRESPOND TO THE SMALL (RARE) AREAS IN ONE OR BOTH \"TAILS\" OF OUR NORMAL DISTRIBUTION. THESE CRITICAL Z-VALUES CORRESPOND TO \"SIGNIFICANCE LEVELS\" WHICH ARE THE AREAS TO THE LEFT OR RIGHT OF THAT CRITICAL Z-VALUE. THE COMMON SIGNIFICANCE LEVELS ARE 1%, 5%, AND 10% (OR 0.0100, 0.0500 AND 0.1000) AND THESE ARE THE AREAS IN THE BODY OF THE TABLES THAT ARE ALSO THE PROBABILITIES. HERE ARE THIS WEEK'S HOMEWORK PROBLEMS: SOLVE THESE PROBLEMS USING THE TABLES NOT SOFTWARE (HOWEVER, SOFTWARE IS FINE AT THIS POINT TO CALCULATE ANY MEANS, VARIANCES OR STANDARD DEVAITIONS.) 1) BELOW ARE THE 30 DATA POINTS WE USED IN WEEK 2: 80, 71, 81, 99, 1, 54, 55, 16, 20, 27, 61, 62, 79, 68, 35, 37, 38, 41, 45, 49, 50, 21, 27, 50, 51, 55, 55, 60, 61, 70 MEAN:_______________, VARIANCE:__________________, STANDARD DEVIATION:______________ YOU MIGHT WANT TO RANK ORDER (low to high) THESE X-VALUES IN THE TABLE FIRST. 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 3 0 XVALUES ZVALUES 2a) NOW, WHAT CRITICAL Z-VALUES (STANDARD DEVIATIONS) CORRESPOND TO THE -1%, -5% AND -10% AREAS UNDER THE CURVE? [THIS IS THE FAR LEFT AREA OF THE CURVE AND REMEMBER THAT THE TABLES ONLY GIVE AREAS TO THE LEFT SO YOU CAN JUST READ THESE Z-VALUES FROM THE TABLE) 2b) WHAT CRITICAL Z-VALUES (STANDARD DEVIATIONS) CORRESPOND TO THE +1%, +5% AND +10% AREAS UNDER THE CURVE? THESE ARE THE AREAS TO THE FAR RIGHT BUT YOU ONLY GET AREAS TO THE LEFT FROM THE TABLE, SO WHAT AREAS AE YOU LOOKING FOR IN THE TABLE? [HINT: IF IT'S 1% TO THE RIGHT WHAT PERCENT MUST IT BE TO THE LEFT?] THESE CRITICAL VALUES DON'T CHANGE AND YOU WILL USE THEM OFTEN, SO KEEP THESE CRITICAL VALUES HANDY REMEMBER WHEN WE WERE TRYING TO IDENTIFY \"OUTLIERS\" IN A DATA SET? ONE WAY WAS TO SEE IF OUR DATA POINT WAS MORE THAN THE 2 STANDARD DEVATIONS ABOVE OR BELOW THE MEAN. NOTE THAT IN THE ABOVE GRAPH, 95.4% OF DATA IN A NORMAL DISTRIBUTION ARE IN THAT AREA OF THE CURVE (+ 2 SD'S FROM THE MEAN). ABOUT 5% ARE NOT (2.5% AT EACH EXTREME). THIS IS A RULE OF THUMB SUBSTITUTE FOR THE CRITICAL VALUE. WE CAN USE THESE Z-VALUES TO SEE IF ANY OUR DATA ARE IN THE \"RARE\" OR \"UNUSUAL\" AREAS TO THE FAR LEFT (OR FAR RIGHT) IN OUR NORMAL DISTRIBUTION. WHY DO WE CARE IF DATA ARE UNUSUAL? YOU WILL SEE. 3) LET'S SEE IF ANY OF OUR 30 DATA POINTS WOULD BE CONSIDERED \"UNUSUAL\". HOW? WE MUST DECIDE AT WHAT SIGNIFICANCE LEVEL WE WOULD CONSIDER A DATA POINT \"UNUSUAL\". IF WE CHOSE A SIGNIFICANCE LEVEL OF 10% THAT MEANS THAT A DATA POINT WOULD HAVE TO HAVE A POSITIVE ZVALUE (STANDARD DEVIATION) THAT CORRESPONDED TO A TABLE AREA OF 0.9000 TO THE LEFT (SINCE THIS DATA POINT IS IN THE 10% AREA IN THE FAR RIGHT TAIL. OR, IF THE DATA POINT HAD A NEGATIVE Z-VALUE ITS VALUE WOULD HAVE TO CORRESPOND TO A TABLE AREA OF 0.100 (10%) IN THE FAR LEFT TAIL OF THE CURVE. A SIGNIFICANCE LEVEL OF 5% WOULD NEED A +Z VALUE CORRESPONDING TO A TABLE AREA OF 0.9500 TO THE LEFT (LEAVING 0.0500 TO THE FAR RIGHT). OR A -Z-VALUE CORRESPONDING TO A TABLE AREA OF SIMPLY 0.0500 TO THE LEFT IN THE FAR LEFT TAIL. b) FILL IN THE BLANKS: A SIGNIFICANCE LEVEL OF 1% WOULD NEED A +Z VALUE CORRESPONDING TO A TABLE AREA OF ________ TO THE LEFT (LEAVING ________ TO THE FAR RIGHT). OR A -Z-VALUE CORRESPONDING TO A TABLE AREA OF SIMPLY _________ TO THE LEFT IN THE FAR LEFT TAIL c) IN QUESTION (2) YOU DETERMINED THE GENERAL CRITICAL Z-VALUES FOR SIGNIFICANCE LEVELS OF +10%, +5% AND +1%, SO COMPARE YOUR STANDARDIZED DATA TO THEM AND LIST YOUR Z-VALUES AND ORIGINAL XVALUES THAT ARE \"UNUSUAL\" AT THESE SIGNIFICANCE LEVELS. LIST ANY \"UNUSUAL\" VALUES. SO WHAT? WE USE THIS SAME METHODOLOGY IN STATISTICAL HYPOTHESIS TESTING. WE CALCULATE A\"TEST STATISTIC\" BASED ON THE SAMPLE AND POPULATION DATA WE HAVE AND THEN COMPARE IT TO THE CRITICAL VALUES AT THE SIGNIFICANCE LEVEL WE HAVE CHOSEN (THE SAME CRITICAL VALUES WE HAVE IDENTIFED ABOVE STILL APPLY). IF THE TEST STATISTIC IS GREATER THAN THE POSITIVE (+) CRITICAL Z-VALUE WE ARE IN THE \"UNUSUAL\" OR RARE AREA IN THE RIGHT TAIL OF THE NORMAL DISTRIBUTION AND WE WOULD \"REJECT\" OUR HYPOTHESIS. OR, IF THE TEST STATISTIC IS LESS THAN THE NEGATIVE (-) CRITICAL VALUE IT IS ALSO IN THE RARE AREA IN THE LEFT TAIL AND AGAIN WE REJECT OUR HYPOTHESIS. BE CAREFUL WITH THE NEGATIVES: WHILE A Z-VALUE OR STANDARED DEVIATION OF +2.36 IS GREATER THAN +2.34 (HENCE REJECT), -2.36 IS SMALLER (FURTHER LEFT IN THE TAIL) THAN -2.34 AND AGAIN WE REJECT. 4) THE SPEED OF VEHICLES ALONG A STRETCH OF I-95 HAS AN APPROXIMATELY NORMAL DISTRIBUTION WITH A MEAN OF 75 MPH AND A STANDARD DEVIATION OF 10 MPH. (a). THE SPEED LIMIT IS 70 MPH. WHAT IS THE PROPORTION OF VEHICLES GOING LESS THAN OR EQUAL TO THE SPEED LIMIT? (b) WHAT PROPORTION OF THE VEHICLES WOULD BE GOING LESS THAN 60 MPH? (c) A NEW SPEED LIMIT WILL BE INITIATED SUCH THAT APPROXIMATELY 10% OF VEHICLES WILL BE OVER THAT SPEED LIMIT. WHAT IS THE NEW SPEED LIMIT BASED ON THIS CRITERION? (NEED TO CALCULATE THE X-VALUE) (d) DO YOU THINK THE ACTUAL DISTRIBUTION (HOW THE CURVE LOOKS) OF SPEEDS DIFFERS FROM A NORMAL BELL-SHAPED DISTRIBUTION? 5) STUDENTS TAKE A STATISTICS TEST. THE GRADE DISTRIBUTION IS NORMAL WITH A MEAN OF 30, AND A STANDARD DEVIATION OF 6. (a) ANYONE WHO SCORES IN THE TOP 20% OF THE DISTRIBUTION GETS A GRADE OF \"A\" OR \"B\" WHAT IS THE LOWEST SCORE SOMEONE CAN GET AND STILL GET A \"B\"? (b) THE BOTTOM 20% GET A \"D\" OR \"F\". WHAT IS THE LOWEST SCORE THAT STILL PASSES WITH THE \"C\" ? 6) WE CAN USE THE NORMAL DISTRIBUTION TO APPROXIMATE THE BINOMIAL DISTRIBUTION. YOU REMEMBER THE COMPLICATED BINOMIAL EQUATIONS (WEEK 3)? THE EQUATIONS USING THE NORMAL TO APPROXIMATE THE BINOMIAL ARE MUCH SIMPLER, BUT THEY ARE NOT AS PRECISE. LET'S SOLVE THIS SAME BINOMIAL PROBLEM USING THE NORMAL DISTRIBUTION SHORTCUT AND SEE HOW CLOSE IT IS TO OUR BINOMIAL CALCULATION OF WEEK 3. Here is a web site that offers a decent explanation of what we will be doing here: http://onlinestatbook.com/2/normal_distribution/normal_approx.html You can also check LANE around page 204 AND 264 for a review of the DISCRETE data handling. Problem Statement from Week 3: \"You and a friend play 3 rounds of a game which you typically win 70% of the time. What is the probability that your FRIEND will win more games than you?\" What did you determine this probability was in Week 3? P(0) + P(1) = _____ Now to use the Normal Distribution as an approximation, we first need to calculate the MEAN and STANDARD DEVIATION of that data set. In addition to the above web site, our LANE text shows these formulas around page 204. MEAN = number of trials of data points, in this case 3 games times the probability of your winning, were it's 70% or 0.70. The mean is therefore: 3 x 0.70 =________ The VARIANCE is the number of trials times the probability of winning times the probability of losing (losing is simply 1.00 - probability of winning since wins and losses must have a total probability of 100% or 1.00 so no \"ties\" allowed): The variance is therefore: 3 x 0.70 x (1.0 - 0.70) = _______ The STANDARD DEVIATION is simply the square root of the Variance so it is: _______ Back to the problem. What is the number of trials: N = 3. What are the data points (x-values) which stand for the number of YOUR wins? ZERO AND 1. These binomials are DISCRETE numbers representing wins and losses, so losing 1.345 games is NOT possible. If we graphed discrete numbers, like the number of games won, it would be a bar chart. However, the NORMAL DISTRIBUTION represents CONTINOUS DATA where fractions are possible. So, how do we handle discrete data? Kind of simple, take the discrete ONE win. To make it continuous it would go from 0.5 to 1.5, Five discrete wins would become 4.5 to 5.5 continuous wins. ZERO wins is a little trickier. It would be represented by a continuous range of -0.5 to +.05 Let's proceed to calculating the probability using the Normal Approximation. We now have our continuous range for our discrete data of -0.5 to +1.5 Now we STANDARDIZE the two ends of this range: the x-value of -0.5 becomes the standardized z-value of:_____________ and +1.5 becomes the z-value of: ________ (KEEP TWO DECIMAL PLACES LIKE 1.45 OR 0.98 OR EVEN 2.40) These z values are the standard deviations (SD) from the mean that we go to our TABLE with to get the probabilities of winning 0 (lower SD) and 1 games (upper SD) out of 3. The upper SD probability is _________ meaning that this percent of our data have that probability (or less) of occurring. In this case it is winning only 1 game. The lower SD probability is ________, which is the probability of winning zero games. SINCE THESE ARE AREAS TO THE LEFT WE MUST SUBTRACT THE LOWER AREA (PERCENTAGE OR PROBABILITY) FROM THE UPPER AREA. THE RESULT IS THE PROABBILITY OF WINNING ZERO OR ONE GAMES OUT OF THREE. WHAT DID YOU CALCULATE? ______________ HOW CLOSE IS THIS NORMAL APPROXIMATION TO THE MORE ACCURATE DISCRETE NUMBER YOU CALCULATED IN WEEK 3? 7) HERE IS A GRAPH OF A NORMAL DISTRIBUTION. DRAW OVER IT WHAT A DISTRIBUTION WOULD LOOK LIKE: (a) IF IT HAD THE SAME MEAN BUT A SMALLER STANDARD DEVIATION. (be careful with the height as well as the width) (b) IF IT HAD THE SAME MEAN BUT A LARGER STANDARD DEVIATION. (same caution) (c) WHY DON'T THE ENDS TOUCH THE ZERO LINE? (remember this is probability not \"death & taxes\") 8) HEIGHT AND WEIGHT ARE TWO MEASUREMENTS USED TO TRACK A CHILD'S DEVELOPMENT. THE WEIGHTS FOR ALL 11 YEAR OLD GIRLS, 4' 8\" TALL IN A REFERENCE POPULATION HAD A MEAN OF = 74 POUNDS WITH A STANDARD DEVIATION OF = 2 LBS. ASSUME THESE WEIGHTS ARE NORMALLY DISTRIBUTED. CALCULATE THE Z-SCORES THAT CORRESPOND TO THE FOLLOWING WEIGHTS AND INTERPRET THEM. THIS IS USEFUL STATISTICS. (a) 70 LBS (b) 86 LBS (c) 60 LBS (d) IF YOU WERE THE PARENT OF ANY OF THESE CHILDREN, WOULD YOU BE CONCERNED? (WHY?) 9) A LEGAL STATISTICAL PROBLEM: A PATERNITY LAWSUIT. THE LENGTH OF A PREGNANCY IS NORMALLY DISTRIBUTED WITH A MEAN OF 280 DAYS AND A STANDARD DEVIATION OF 13 DAYS. AN ALLEGED FATHER WAS OUT OF THE COUNTRY FROM 240 TO 306 DAYS BEFORE THE BIRTH OF THE CHILD, SO THE PREGNANCY WOULD HAVE BEEN LESS THAN 240 DAYS OR MORE THAN 306 DAYS LONG IF HE WAS THE FATHER. A HEALTHY CHILD WAS BORN WITH NO COMPLICATIONS, BUT: (a) WHAT IS THE PROBABILITY THAT HE IS NOT THE FATHER? (b) WHAT IS THE PROBABILITY THAT HE COULD BE THE FATHER? (HINT: CALCULATE THE Z-SCORES FIRST, AND THEN USE THOSE TO DETERMINE THE PROBABILITIES) 10) SUPPOSE THAT THE DISTANCES OF FLY BALLS HIT TO THE OUTFIELD (IN BASEBALL) IS NORMALLY DISTRIBUTED WITH A MEAN OF 240 FEET AND A STANDARD DEVIATION OF 40 FEET. WE RANDOMLY SAMPLE 50 FLY BALLS. IF X = AVERAGE DISTANCE IN FEET FOR 50 FLY BALLS, THEN (a) WHAT IS THE PROBABILITY THAT THE 50 FLY BALLS TRAVELED AN AVERAGE OF LESS THAN 230 FEET? SKETCH THE GRAPH. SCALE THE HORIZONTAL AXIS FOR X . SHADE THE REGION CORRESPONDING TO THE PROBABILITY. FIND THE PROBABILITY. (b) FIND THE 75TH PERCENTILE OF THE DISTRIBUTION OF THE AVERAGE OF 50 FLY BALLS. (c) DETERMINE THE LONGEST DISTANCE ONE OF THE 50 FLY BALLS TRAVELS
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started