Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This chapter explores data analysis in the form of graphical displays (dotplots, stem-leaf plots, box-plots, histograms), center, spread, identifying unusual observations and convey). summarizing data

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
This chapter explores data analysis in the form of graphical displays (dotplots, stem-leaf plots, box-plots, histograms), center, spread, identifying unusual observations and convey). summarizing data ( with added emphasis on explaining the message the appears to Shape is one of the basic characteristics used to describe a distribution of your data. Most distributed). distributions can be classified either skewed right, skewed left, or symmetrical (normally Shape: Symmetric Symmetric Left and right side roughly the same Histogram is used for continuous data normal Frequency Curve Bar chart Used for categorical data or discrete X -data Shape ( ex. mom Distribution skewed right skewed left candy positively negatively skewed counting left normal skewed distribution head Hail ' right head tail Bell curve rectangular distributor or uniform distribution to predict the futureSkewed Most of the data is on one side with a long tail (or skew) on the other side 40 - 20 - 30 15 - Frequency Frequency 20 10 10 5 12 24 36 48 10 15 20 25 30 Hours of TV per Week Grade (out of 32) We are going to learn how to summarize and describe the distribution of data using graphs. 1. Dotplot displays the data of a sample by representing each data with a dot positioned along a horizontal scale, and the frequency on the vertical scale. This display is a convenient technique to use as you first begin to analyze the data. Use dotplot to display the following data set: 4,4, 4.7, 4.7, 7.7, 5.1, 5.1, 5.1, 5.1, 5.1. 8 2. The Stem-and-leaf display is a combination of a graphical technique and a sorting technique in statistics, and it is well suited to computer applications. (Jime-series) - xy plot, scatter plot Use stem-and-leaf display to summarize the following data set: 10, 10, 11, 12, 14, 14, 15, 20, 21, 21, 21, 22, 25, 30, 30, 34. 3. Pie chart and bar chart are graphs that are used to summarize qualitative data.What not to get for mothers on Mother's Day! A recent study among mothers in USA shows that mothers prefer not to receive certain items as gifts on Mother's Day as show below: Teddy bears Chocolate Jewelry Wireless earbuds 45 30 25 50 4. The scatter plot is an appropriate display of bivariate data when both variables are quantitative. Bivariate data refers to the values of two different variables that are obtained from the same population. 5. A histogram is used to summarize continuous quantitative data. How to summarize your data All graphic representations of sets of data need to be completely self-explanatory. That includes a descriptive meaningful title, and identification of the vertical and horizontal Average= mean scales. How to summarize your data using numerical values. We are going to learn how to compute and interpret mean (average), kth % trimmed mean, median, midrange, and mode. These measures of central location are used to define, in some sense, the center of a set of measurements. The average is the typical, or mean, value in the distribution. The symbol for the sample Focus! mean is x and u (Greek letter mu) for a population mean. The mathematical formula in Mean, median calculating the sample mean is . The mean is the average that is the balance point in a and midrange! distribution. The mean is pulled toward extreme values in an unbalenced distribution (i.e., a right skewed or left skewed distribution). It is computed by adding all the numbers and quantitative then dividing the number of numbers. The median is the "centermost" data value in the distribution when the data are arranged from lowest to highest. To find the median data value, make sure you rearrange the data from smallest to largest or from largest to smallest first. If there is an odd number of data values, find the middle data value. If there is an even number of data values, find the middle two data values. Data( 1, 2, 3, 4, and 5. Data:(1, 2, 3, 4, 5, 6.) 1+2+3-4 +5+6 ( 3.5) symbol Average = 1+2+ 3+ 4+s Data: 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. Population mean = Data: 1, 2, 3, 4, and 10. 12+ 3+4 + 10 sample mean = x Ey S median = medrepresent average Midrange = smallest data value + largest data value Its 6 2 3 2 1+6 - 7 = 3. 5 The 10% trimmed mean is computed by removing the largest TO% of values and the smallest 10% of values from the data set and then computing the mean of the remaining middle 80% of values. How do we determine the total amount of property taxes for the city if we know the number of properties, the mean dollar value of all properties, and the tax rate? Focus ! The mode is used to describe the most frequent observed data in the distribution. No qualitative special computation is involved in finding the mode. A simple inspection of the data frequency of occurrence of each data value is all that is required. lode ! any data that occurs the most Remember: Mode is not a frequency, but rather the value that occurs most often. For example: In the set of scores below, 73, 73, 73, 75, 78, 79, 82, 82, 85, 88, 89, 95, 95, 98 1012 3 4 5 6 the mode is 73. middle value = ? Blood Type data: O, O, O, O, O, A, A, B, B, and B. 3+ 4 3.5 What is the mode for this blood type? 2 mean , median, The mode is blood type. midrange and Suppose a study of houses that have sold recently in Santa Clara County showed the mode are following frequency distribution for the number of bedrooms: ( 10 ) (, 10 ) = 1 used to Bedrooms Frequency trimmed mean , describe the 1 2 X 2 3 4 5 6 78 9 15 center of the 2 18 what is the 10% trimmed average data set W 120 for the aboved numbers ? 4 50 10 trimmed mean: 2+3+4+5+6+7+8+9 12 8 numbers ( 100) (.10 ) = ( 10 1. Based on this information, how do we compute the mean, median, and mode? 2. Which is the better measure of the center for these data, the mean or median? Explain. When a picture is symmetric, the mode will be approximately equal to the median and the mean.or M/ Explain . fed according 10 is x or the samal. .. resented An Sample size! Data frequency what is the 2+ 3+4+8 =14 sample size for this table? Quartile values Measures of position, such as percentiles and quartiles, are used to make statements about the relative position of an observed values within a particular set of data, but they are not the same as percentages. Suppose a student scored 75 out of 100 points in a test (a score of 75%). This score could be the lowest, or highest, or somewhere in the middle in the class. However, if the score of 68 corresponds to the 75th percentile or third quartile, then he or she performed better than 75% of the students in the class. In general, the kth percentile is the value such that k per cent of the data values in the data set fall at or below that value. Sample of temperatures from summer 2019: 82, 83, 85, 85, 86, 87, 88, 88, 90, 90, 92, 93, 94, 95, 97. D 55725% 25% 25% 251 8 first quartile = lowest quartile = Q, + first second third smallest quartile quart quarti, largest Second quartile = Q2 = median Third quartile = upper quartile = Q 3 Quartiles Divide the distribution into fourths. Each quartile contains 25% of the data. Example: The dotplot shows the distribution of weights for a class of introductory statistics students. The vertical lines slice the distribution into four parts, so each part has about 25% of the observations. 25% of the weight are 25% 25% 25% 25% they are used between 101 and 121 to rank data pounds, and so on. Qi, Q 2, Q , = upper quartile value = 75th quartile 100 120 140 160 180 200 Weight (pounds) IQR Q - Q Interquartile Range = Third quartile - First quartile Fluctuation Measuring the spread Range, variance, and standard deviation: which data set has more spread 1, 2, 3 or 1, 1, 4 ? What is the variation of Range = largest numbers smallest from your data numbers 1,2 and 3 ? 1 from your Set data setDenni.na The range is the difference between the largest data value and the smallest data value in your data set. The range is an indicator of the variation in your data set. Variation is sometimes referred to as dispersion or spread. The symbol s2 is for the sample variance and o2 for population variance. The formula for the sample variance is (*1-2)2 sum of squared deviations The location of individual n - 1 degrees of freedom Q, can be numbers Group'S The sample standard deviation (s) is the square root of the sample variance. located with aver The formula for the standard deviation is: 25 ( number of )+.5 data I( x - x ) ? n - 1 E( x -F ) 2. Square to make positive. Sample size n-1 1. Deviation (or distance) 3. Add all squared of observation, x, from take the the mean. sum deviations X = 1+2+3 = 2 4. Divide by 1 less than the sample X ( x - X ) (x - x ) 2 size (see text). Think of this as 2 1-2=-1 averaging the squared deviations. 2 2-2= 0 (0)8= 0 5. Take square root to S = sample standard deviation 2 3- 2 = 1 (1) = 1restore original units. so= sample variance How to do 2( x-x )? 2 2 = population variance the standard deviation 3 -1 sigma square sample In business teams, standard deviation is regarded as a measure of the investment's rick. variance Deviations from the mean sum to zero: ET(x - x) = 0 r= population standard deviation The sum of squared deviations is En(X - X)2 = SS. A value may be an outlier if it is either greater than Q3 + 1.5(1QR) or less than Q1 - 1.5 (1QR). Coefficient of variation = standard devi - * 100% 168 x 100 % for column A mean 253% Coefficient of variation: Coefficient of variation is the ratio of the standard deviation to the mean. Column 1= 1518x 100% 53% The abbreviation for the coefficient of variation is denoted as CV. CV is used to detect the spread between two distributions of data sets. "sample variance = sample standard deviation CV= standard devia xloog = NBaT = $1 mean NX15 = X5 XSample variance, sample standard deviation, interquartile range and range are used to measure the variation of the data set, Bell curve positive skewed normal skewed right curve center = mean mean >median = median skewed left or = mode negatively skewed isiveb biebasie s meanmedian The variance for the normal, curve is smaller than the variance from the left or right skewed,6. The results of a ten-point quiz in a statistics class are as follows: Quiz score 10 9 8 6 5 4 (S.S) yousupri svitelot Percent of students 30% 20% 20% 15% 10% 5% (S.S) eshabrod zza!) obtaining score a. Find the mean and standard deviation if 20 students took the quiz. b. Find the mean and standard deviation if 60 students took the quiz. c. Suppose you don't know how many students took the quiz. Can you obtain the mean and standard deviation? Explain. (5.") bewade bigist9. Another way of combining the average and the standard deviation of a data set is to compute what is called the coefficient of variation. The coefficient of variation (denoted by CV) is expressed as a percent and is given by CV = -.100%. 1 no 2-57The CV measures relative variation. For example, a standard deviation of $1 in the price of a gallon of milk has a totally different meaning from a standard deviation of $1 in the price of refrigerators. In the former, the variability is enormous, while in the latter it is quite insignificant. Data set A gives the actual weight (in ounces) of the content of cans of pet food having a net weight of 8 ounces. Data set B gives the actual weight (in pounds) of the contents of bags of dry pet food having a net weight of 50 pounds. Data set A 8.3 7 7.6 8 7.6 8.3 8.1 7.8 7.7 7.5 Data set B 52 517 52.1 48 49 47 50.5 50.3 49 48 Milan a. For each data set compute the mean and standard deviation. b. Compute the CV for each data set. Comment on the results.12. The data sets give milk production (in grams/day) for lactating mothers breast-feeding their newly born. The data sets are from the paper "Smoking During Pregnancy and Lactation and Its Effects on Breast Milk Volume" (American Journal of Nutrition, 1991, pages 1011-1016). 2 - 59Smoking Mothers 621 793 593 545 753 655 895 767 714 598 693 Non-smoking 947 945 1086 1202 973 981 930 745 903 899 961 Mothers Compare and contrast the two data sets. 13. Suppose the distribution of an exam score is approximately bell-shaped with mean = 65 and standard deviation = 12. The 25" percentile for this distribution is 55. a. Find the 75" percentile of the distribution. b. What is an approximate value for the median of this distribution? c. Find an approximate value for the inter-quartile range of this distribution. d. What exam score corresponds to a z score of 2.5? e. If 100 students took the exam, approximately how many scored above 89? f. If 100 students took the exam, how many are expected to score between 55 and 75? 14. The reputable business magazine Forbes reports that for the year 1998 the net worth of its readers was $1 million or $2 million, depending on which "average" it reports. Which of these numbers is the mean? Which one is the median? Explain. 15. For data sets with a fairly small number of outliers, statisticians sometimes compute what is called the trimmed mean. The trimmed mean is a measure of center that is resistant to outliers and contains more information than the median. To compute a trimmed mean, a percent of the data is "trimmed" and the rest of the data are used to compute an average. A 10% trimmed mean is most popular. It is computed by discarding the lower and upper 10% of the data and averaging the remaining 80% of the data values. zoo sulby a. Compute the 10% trimmed mean for the data of problem 11 above. Is Tol ombotoo aldiecon s anw b b. Compare the trimmed mean from part a with the untrimmed mean and the median. Comment

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essential Calculus Early Transcendental Functions

Authors: Ron Larson, Robert P. Hostetler, Bruce H. Edwards

1st Edition

618879188, 618879182, 978-0618879182

More Books

Students also viewed these Mathematics questions