Question

1 Approved Answer

Posted on Oct 13, 2024

GSB420 CLASS 7 STATISTICS (CH 9 - 12) Chapter 9. Page 1 of 15 Sampling Distribution The Central Limit Theorem (CLT) Population Mean= Sample Mean

GSB420 CLASS 7 STATISTICS (CH 9 - 12) Chapter 9. Page 1 of 15 Sampling Distribution The Central Limit Theorem (CLT) Population Mean= Sample Mean Sample Mean X1 Sample Mean X4 X2 Sample Mean X3 Sample Mean Sample Mean X5 X6 Sample Mean X7 GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 2 of 15 CASE I: POPULATION MEAN () and VARIANCE(2 ) are known POPULATION MEAN, SAMPLE MEAN, DISTRIBUTIONS of POPULATION, SAMPLE, and SAMPLE MEAN GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 3 of 15 EXAMPLES of the Central Limit Theorem 1. According to the U.S. Labor Department, the average hourly wage for private-sector production and non- supervisory workers was $20.00 in February 2013. Assume the standard deviation for this population is $6.00 per hour. A random sample of 36 workers from this group was selected. 1) What is the probability that the mean for this sample is less than $19.00? . 2) What is the probability that the mean for this sample is less than $21.00? GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 4 of 15 2. Given a population mean of 100 and a population standard deviation of 10, you took a sample of 25 observations. a. What is the probability of picking a sample mean that is less than 99? b. between 97 and 104? GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 5 of 15 Chapter 10. Confidence Interval of Population Mean CASE III: POPULATION MEAN () is unknown and VARIANCE(2 ) is known GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 6 of 15 APPLICATIONS 1. If a sample of size 100 yielded a mean of 25, given its population standard deviation of 5, determine the 95% confidence interval for the population mean. 2. If a sample of size 30 yielded a mean of 20, given its population standard deviation of 3, determine the 99% confidence interval for the population mean. GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 7 of 15 CASE IV: POPULATION MEAN () and VARIANCE(2 ) are unknown GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 8 of 15 APPLICATIONS 1. If a sample of size 30 yielded a mean of 20, given its sample standard deviation of 3, determine the 99% confidence interval for the population mean. 2. If a sample of size 100 yielded a mean of 25, given its sample standard deviation of 5, determine the 95% confidence interval for the population mean. GSB420 CLASS 7 STATISTICS (CH 9 - 12) E. Page 9 of 15 Sample Size Determination Often before a survey is conducted, the issue of how many sample observations should be collected can be a major issue. Among many methods, the simple one is presented herein. Assuming that the population standard error is known, the (100 - )% confidence interval for is given as follows: = X Z / 2 X = X Z / 2 n We note that the last term in the above equation, Z / 2 , is the amount of an estimation n error, or the margin of error, around the sample mean. Because this is related to the sample size, n, it is also known as a sampling error. If we state that this sampling error as E, we note: Sampling Error = Margin of Error = E = Z / 2 n Solving for n in the above equation yields the sample size, n: 2 Z 2 2 Z n = /2 = /2 2 E E Note 1: It is not true that the larger the population size, the larger the sample size should be. As shown in the above equation, the sample size, n, is NOT determined by the population size, N. The sample size is determined by the level of confidence chosen (Z/2), the amount of tolerable error (E), and the population standard deviation (). Note 2: The population standard deviation is often unknown. Therefore, it has to be estimated by taking a small sample or based on a prior experience and knowledge. Example 1> Suppose that you are to conduct a survey to collect primary data on the average speed of cars on a highway. From a previous research, the population variance is known to be 36. You wish to have a 95% confidence about your statement by allowing only +2 m.p.h. of a sampling error. How many cars should you sample? Answer: Given =5%, = 36 = 6 , and E=+2\u00012, (1.96) 2 (6) 2 = 34.57 35 (2) 2 Note that the sample size should always be rounded up to the next largest integer in order to ensure the adequate sample size. Otherwise, it will be less than adequate. n= GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 10 of 15 GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 11 of 15 Chapter 11. One-Sample Hypothesis Testing GSB420 CLASS 7 STATISTICS (CH 9 - 12) Page 12 of 15 The Summary of Hypothesis Testing Methods Using the t-statistic Hypothesis Testing Method or Approach Two-Tail Test Upper-Tail Test Lower-Tail Test H0: =k vs. Ha: k H0: < k vs. Ha: > k H0: > k vs. Ha: < k Two-Tail Tests \u0002 There are two cutoff values (critical values), defining the regions of rejection H0: = k H1: k /2 Do not reject H0 -t Lower critical value p-Value 0 H0: k H0: k H1: > k Reject H0 +t t Upper critical value If tc t / 2,n1, accept H0: = k. Otherwise, reject H0 \u0001 accept Ha If p-value calculated > chosen, accept H0: = k. Confidence If Interval k t / 2,n 1 S X X k + t / 2,n 1 S X , accept H0: = k \u0002 There is only one critical value, since the rejection area is in only one tail H1: < k Reject H0 X k Critical Value \u0002 There is only one critical value, since the rejection area is in only one tail /2 Reject H0 Lower-Tail Tests Upper-Tail Tests t _ X Do not reject H0 0 -t Do not reject H0 Reject H0 t 0 t X Critical value Critical value If tc <+ t,n-1, accept H0: < k. Otherwise, reject H0 \u0001 accept Ha If tc > -t,n-1, accept H0: > k. Otherwise, reject H0 \u0001 accept Ha If p-value calculated > chosen, accept H0: < k. If X k + t ,n 1 S X , accept H0: < k If p-value calculated > chosen, accept H0: > k. If k t , n 1 S X X , accept H0: > k GSB420 CLASS 7 Sample Distribution (CH9, 10) Page 13 of 15 2) Find out whether the average speed of a car on the same highway has remained the same or not, you conducted a survey and found that the average speed of 16 cars surveyed was 63.90375 m.p.h. and the sample standard deviation was 6 m.p.h. assuming you don't know the population mean. 3) You know from previous experience that the average speed of a car on the highway is 60 m.p.h. In order to find out whether the average speed of a car on the same highway has increased or not, you conducted a survey and found that the average speed of 16 cars surveyed was 63.90375 m.p.h. and the sample variance was 36 m.p.h. Verify at a 5% significance level if the average car speed had increased H0: < 60 vs. H1: > 60 GSB420 CLASS 7 Sample Distribution (CH9, 10) Page 14 of 15 Chapter 12. Section 2. The F-Test for the Difference Between Two Independent Variances GSB420 CLASS 7 Sample Distribution (CH9, 10) 1. Page 15 of 15 Given the following temperature data on the Memorial Day and the Labor Day in the Chicago area, can you verify the claim that Labor Day is as warm as Memorial Day at a 95% confidence level? Before conducting the analysis, identify all assumptions that you need to make to conduct this analysis. Year 2000 2001 2002 2003 2004 2005 2006 Average= Variance= Memorial Day Low High 67 44 67 50 78 49 71 46 66 58 74 46 91 68 73.42857 51.57143 78.95238 73.28571 Average 55.5 58.5 63.5 58.5 62 60 79.5 62.5 62.91667 Labor Day High 70 85 82 65 80 87 71 77.14286 71.14286 Low 61 62 66 61 56 58 60 60.57143 9.952381 Average 65.5 73.5 74 63 68 72.5 65.5 68.85714 19.80952 Are the variance of temperature between Memorial and Labor days are significant different? Test at a 5% significance level H 0 : 12 = 22 vs H 1 : 12 22 GSB420 Class 8 HW Dr. Jin Man Lee 1. Here is the unemployment rate in US from 1993 to 2006. (Use Minitab to answer this questions, all output from Minitab needs to be included) Year 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Unemp_Rate (%) 6.80 6.10 5.60 5.40 4.90 4.50 4.20 4.00 4.80 5.80 6.00 5.50 5.00 5.10 1) Find the sample mean and sample mean standard deviation (standard error) using Minitab. 2) Based on this sample, find the 95% and 99% confidence interval of the average unemployment rate 3) Many studies found the natural rate of unemployment rate in US is 6.00%. Using the data, prove if the unemployment rate during the sample periods is same as the natural rate of unemployment rate. 4) Some macroeconomists argue that the US unemployment rate after 1990s was less than the natural rate of unemployment, which is 6%. Prove or disprove the argument. 2. Let's think about the house price. According to the Case-Shiller Home Price Indices in August 2009, Chicago and San Francisco have following sample mean and population standard deviations (the sample mean was calculated by daily base, so the sample size was 30): Sample Mean Population Standard Deviation CHICAGO 130.55 9 San Francisco 132.47 12 1) Using hypothesis test, prove if these house price indices are same. (Setup a hypothesis, show your works to perform the test, and state your verdict) 2) Some people argue that San Francisco has higher house price than that of Chicago. Prove/disprove the argument using a hypothesis test. 3) Let's assume the population standard deviations are unknown, and the sample standard deviation of for Chicago is 9.2 and that of San Francisco is 11.5. Some people argue that San Francisco has higher variability (higher variance) in house prices than that of Chicago. Setup a hypothesis, perform the test and prove/disprove the argument. GSB420 CLASS 8 STATISTICS (CH13) CLASS 8 Chapters 13. Simple Regression Analysis Page 1 of 14 GSB420 CLASS 8 STATISTICS (CH13) Page 2 of 14 Example 1) Consumption and Income income consumption 10000 25000 35000 32000 30000 27000 25000 31000 70000 59000 65000 64000 83000 95000 Scatterplot of consumption vs income 100000 Person7 90000 80000 70000 consumption obs Person1 Person2 Person3 Person4 Person5 Person6 Person7 Person6 Person5 60000 50000 40000 30000 Person1 Person4 Person2 Person3 20000 10000 0 Obs 1 2 3 4 5 6 7 income 10000 35000 30000 25000 70000 65000 83000 consumption 25000 32000 27000 31000 59000 64000 95000 Fit 15474 38124 33594 29064 69832 65302 81610 Residual 9526 -6124 -6594 1936 -10832 -1302 13390 10000 20000 30000 40000 50000 income 60000 70000 80000 90000 GSB420 CLASS 8 STATISTICS (CH13) Page 3 of 14 Example 2) Wealth and Salary Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Wealth Salary 1 100 50 2 200 75 3 150 45 4 300 100 5 205 80 6 100 30 7 150 35 8 270 50 9 280 55 10 700 13 15 14 600 11 500 400 12 4 300 10 700 90 11 500 30 12 300 25 13 650 110 14 600 50 15 600 30 Salary 50 75 45 100 80 30 35 50 55 90 30 25 110 50 30 Scatterplot of Wealth vs Salary Wealth Person 8 2 Fit 324.3 381.7 312.8 439.1 393.1 278.3 289.8 324.3 335.7 416.1 278.3 266.9 462.0 324.3 278.3 5 200 7 3 6 1 100 20 Wealth 100.0 200.0 150.0 300.0 205.0 100.0 150.0 270.0 280.0 700.0 500.0 300.0 650.0 600.0 600.0 9 Residual -224.3 -181.7 -162.8 -139.1 -188.1 -178.3 -139.8 -54.3 -55.7 283.9 221.7 33.1 188.0 275.7 321.7 30 40 50 60 70 Salary 80 90 100 110 GSB420 CLASS 8 STATISTICS (CH13) Page 4 of 14 A simple regression model tries to establish a statistically significant relationship between two variables. One of them, called an independent variable, is believed to explain (or determine or influence) the other variable, called a dependent variable. This causal relationship must be established prior to data collection on the basis of economic theory, business intuition, life experience, etc. Given the population simple regression equation of: Yt = + X t + t where Yt = dependent variable = actual, observed value of Y at time period t Xt = independent variable = actual, observed value of X at time period t = population intercept term = population slope or coefficient term t = population error term at time period t we are implicitly saying that X causes or explains Y. Furthermore, we must estimate (or guess) this population regression equation by: Yt = a + b X t + et where a = intercept term calculated from a sample data \u0001 an estimate for b = slope (or coefficient) term calculated from a sample data \u0001 an estimate for et = error term at time period t based on a sample data \u0001 an estimate for t GSB420 CLASS 8 STATISTICS (CH13) 1. Estimation of simple regression model 2. Meaning of Regression Equation Page 5 of 14 GSB420 CLASS 8 STATISTICS (CH13) Page 6 of 14 3. Regression Output Example in Minitab MTB > Regress 'Wealth' 1 'Salary'; SUBC> Constant; SUBC> Brief 2. Regression Analysis: Wealth versus Salary The regression equation is Wealth = 209 + 2.30 Salary Predictor Constant Salary Coef 209.5 2.296 S = 208.953 SE Coef 127.7 2.030 R-Sq = 9.0% T 1.64 1.13 P 0.125 0.279 R-Sq(adj) = 2.0% Analysis of Variance Source Regression Residual Error Total DF 1 13 14 SS 55828 567595 623423 MS 55828 43661 F 1.28 P 0.279 GSB420 CLASS 8 STATISTICS (CH13) Page 7 of 14 4. Decomposition of the ANOVA The regression analysis provides a very convenient way to understand how much the independent variable explains the variability (=movement) of the dependent variable. This can be accomplished by decomposing the relationship and then organizing it into an ANalysis Of VAriance (ANOVA) table as follows: Given Yt = Yt + et , we note that the following relationships hold: (Yt Y ) = (Yt Y ) + et (Y Y ) 2 = (Y Y ) 2 + e 2 t t t Note that these sums of squares can be named as follows: TSS = RSS + ESS where (Y Y )2 RSS = Regression Sum of Squares = (Y Y ) 2 TSS = Total Sum of Squares = t t 1 ESS = Error (or Residual) Sum of Squares = e 2 t That is, we can construct the following ANOVA table for regression analysis: Source of Variation Due to Regression Sum of Squares RSS Degrees of Freedom k Due to Error (or Residual) ESS n-k-1 Total TSS n-1 Mean Square (=Variance) RSS k ESS MSE= n k 1 TSS Var(Y) = n 1 MSR= Goodness to Fit measurement R2 = 1 RSS 1 ESS ESS = = 1 TSS TSS TSS Note that we called this the SSE (sum of squared errors) in Section 1 of this Chapter 13. Calculated F Value = Fc Fc= MS r MS e GSB420 CLASS 8 STATISTICS (CH13) 5. Statistical Properties of Regression Equation 1) Gauss-Markov Theorem Page 8 of 14 GSB420 CLASS 8 STATISTICS (CH13) 2) Confidence Interval of coefficients 3) Single Coefficient Hypothesis Test ( t test) Regression Analysis: Wealth versus Salary The regression equation is Wealth = 209 + 2.30 Salary Predictor Constant Salary S = 208.953 Coef 209.5 2.296 SE Coef 127.7 2.030 R-Sq = 9.0% T 1.64 1.13 P 0.125 0.279 R-Sq(adj) = 2.0% Page 9 of 14 GSB420 CLASS 8 STATISTICS (CH13) Page 10 of 14 WEEK 9 Example: Gold price vs DJIA YM GOLD DJIA 2006M01 568.75 10865 2006M02 556 10993 2006M03 582 11109 2006M04 644 11367 2006M05 653 11168 2006M06 613.5 11150 2006M07 632.5 11186 2006M08 623.5 11381 2006M09 599.25 11679 2006M10 603.75 12081 2006M11 646.7 12222 2006M12 635.7 12463 2007M01 650.5 12622 2007M02 664.2 12269 2007M03 661.75 12354 2007M04 677 13063 2007M05 659.1 13628 2007M06 650.5 13409 2007M07 665.5 13212 2007M08 672 13358 2007M09 743 13896 15000 800 14000 750 DJIA 13000 700 12000 650 11000 600 10000 550 8000 500 2006M01 2006M02 2006M03 2006M04 2006M05 2006M06 2006M07 2006M08 2006M09 2006M10 2006M11 2006M12 2007M01 2007M02 2007M03 2007M04 2007M05 2007M06 2007M07 2007M08 2007M09 9000 DJIA MTB > Regress 'GOLD' 1 'DJIA'; SUBC> Constant; GOLD GSB420 CLASS 8 STATISTICS (CH13) SUBC> Page 11 of 14 Brief 2. Regression Analysis: GOLD versus DJIA The regression equation is GOLD = 232 + 0.0334 DJIA Predictor Constant DJIA S = 26.9785 Coef 232.39 0.033357 SE Coef 75.51 0.006188 R-Sq = 60.5% Analysis of Variance Source DF SS Regression 1 21150 Residual Error 19 13829 Total 20 34979 T 3.08 5.39 P 0.006 0.000 R-Sq(adj) = 58.4% MS 21150 728 F 29.06 P 0.000 1) Write the estimated equation and explain the economic meanings. 2) Find TSS (Total Sum of Square), RSS (Regression Sum of Square), ESS (Error Sum of Square), R square, Correlation Coefficient between GOLD and DJIA. 3) Perform a t test if DJIA influences GOLD Price. 4) The DJIA was 13,896 in September, 2007. What was the predicted Gold price and the error according to this regression model? 5) Today's DJIA was 17600, what is the predicted GOLD price according to the model? The actual 1 oz of gold price was $1151 at the market. What was the error from the regression model? GSB420 CLASS 8 STATISTICS (CH13) Page 12 of 14 GSB 420 WEEK 9 Example: Gold price vs DJIA OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 YM GOLD DJIA 2008.01 923.25 12650.36 2008.02 971.5 12266.39 2008.03 933.5 12262.89 2008.04 871 12820.13 2008.05 885.75 12638.32 2008.06 930.25 11350.01 2008.07 918 11378.02 2008.08 833 11543.55 2008.09 884.5 10850.66 2008.1 730.75 9325.01 2008.11 814.5 8829.04 2008.12 865 8776.39 2009.01 919.5 8000.86 2009.02 952 7062.93 2009.03 916.5 7608.92 2009.04 883.23 8168.12 2009.05 975.5 8500.33 2009.06 934.5 8447 2009.07 939 9171.61 2009.08 955.5 9496.28 2009.09 995.75 9712.28 2009.1 1040 9712.73 2009.11 1175.75 10344.84 2009.12 1104 10428.05 2010.01 1078.5 10067.33 2010.02 1108.25 10325.26 2010.03 1115.5 10856.63 2010.04 1179.25 11008.61 14000 1400 12000 1200 10000 1000 8000 800 6000 600 4000 400 2000 200 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 DJIA GOLD GSB420 CLASS 8 STATISTICS (CH13) Page 13 of 14 Regression Analysis: GOLD versus DJIA The regression equation is GOLD = 900 + 0.0057 DJIA Predictor Constant DJIA Coef 900.4 0.00572 S = 109.983 SE Coef 133.4 0.01301 R-Sq = 0.7% T 6.75 0.44 P 0.000 0.664 R-Sq(adj) = 0.0% Analysis of Variance Source Regression Residual Error Total DF 1 26 27 SS 2342 314505 316847 MS 2342 12096 F 0.19 P 0.664 2) Write the estimated equation and explain the economic meanings. 2) Find TSS (Total Sum of Square), RSS (Regression Sum of Square), ESS (Error Sum of Square), R square, Correlation Coefficient between GOLD and DJIA. 3) Perform a t test if DJIA influences GOLD Price. 4) The DJIA was 11008.61 in April, 2010. What was the predicted Gold price and the error? 5) Today's DJIA was 17484, what is the predicted GOLD price according to the model? The 1 oz of gold was $1140 at the market. What was the error from the regression model? GSB420 CLASS 8 STATISTICS (CH13) Page 14 of 14 Required Steps to explain a simple regression model 1. 2. 3. 4. 5. 6. The meaning of coefficient ANOVA (Analysis of variance) Goodness of fit t test for individual coefficient and check the statistical significance (using p value approach) Understand the key statistics in regression output Diagnostic test of the model (Overall performance of the model based on 1-5)