Question

1 Approved Answer

Posted on Oct 19, 2024

OPRE 3360: Managerial Methods in Decision Making Under Uncertainty Assessment Assignment [Part A: Module II - Hypothesis Testing and Statistical Inference] [Metacritic and The Big

OPRE 3360: Managerial Methods in Decision Making Under Uncertainty Assessment Assignment [Part A: Module II - Hypothesis Testing and Statistical Inference] [Metacritic and The Big Short] Metacritic is a website that aggregates reviews of music, games, and movies. For each product, a numerical score is obtained from each review and the website posts the average score as well as individual reviews. The website is somewhat similar to Rotten Tomatoes, but Metacritic uses a different method of scoring that converts each review into score in 100-point scale. In addition to using the reviewer's quantitative ratings, Metacritic manually assesses the tone of the review before scoring. Historical data shows that these converted scores are normally distributed. One of the movies that the Metacritic rated was The Big Short. The first review came out on November 13 (Charlie Schmidlin from the Playlist) and other reviews started to trickle in before the movie was released on December 11, 2015. The data in the \"The Big Short\" sheet of \"Part A - Data Sets for Assessment Assignment.xslx\" contains a sample of the media outlets, reviewers, scores and review dates for all reviews that Metacritic collected until the release date. Use this data set to answer questions 1 to 5. 1. [1 pt] What are the sample mean and sample standard deviation of the scores in this sample? (a) sample mean= 77.50, sample standard deviation = 10.86 (b) sample mean= 69.71, sample standard deviation = 22.87 (c) sample mean= 69.71, sample standard deviation =10.86 (d) sample mean= 77.50, sample standard deviation = 22.87 2. [1 pt] A 95% confidence interval for the true average score () of The Big Short is: (a) [65.32, 74.10] (b) [59.56, 95.44] (c) [68.26, 86.74] (d) [68.71, 86.29] 3. [1 pt] A 90% confidence interval for the true average score () of The Big Short is: (a) [69.84, 85.16] (b) [58.96, 96.04] 1 (c) [65.54, 73.88] (d) [66.07, 73.35] 4. [1 pt] Between the 95% and 90% intervals, which one is wider and why? (a) The 90% interval because it is a less accurate interval. (b) The 95% interval because it is a more useful interval. (c) The 90% interval since, in order to decrease the confidence from the 95% to 90%, the margin of error should increase so that the intervals constructed using sampling will capture the true mean more often. (d) The 95% interval since, in order to increase the confidence from the 90% to 95%, the margin of error should increase so that the intervals constructed using sampling will capture the true mean more often. 5. [1 pt] Let denote the population proportion of reviewers who rate The Big Short with a score of higher than 85. A 90% confidence interval for is: (a) [0.291, 0.626] (b) [0.374, 0.709] (c) [0.259, 0.658] (d) A confidence interval using the Central Limit Theorem should not be used because the sample size to build a confidence interval for the proportion is too small. 2 [Google's free Internet project] According to the Pew Internet & American Life Project, many American adults spend significant amount of time on the Internet every day (Pew Internet Project, 2008). Google has a project that provides free wireless Internet in a medium-sized city. Among several factors that affect Google's decision, one factor is the percentage of adult smartphone users who are actively using social network apps. Google picked a random sample of 400 residents in one of the candidate cities, Rich-Addison, and collected the data in the \"Google\" sheet of \"Part A - Data Sets for Assessment Assignment.xslx\". Use this data set to answer questions 6 to 9. 6. [2 pt] Construct a 95% confidence interval for the true proportion of Rich-Addison residents who are actively using Facebook. 95% confidence interval: 7. [2 pt] If everything else remains the same, what is the margin of error in a 90% confidence interval? Margin of error: 8. [3 pts] Google wants to know what percentage of users is actively using all three medias - Facebook, Twitter, and an online chat/video tool (Skype, Facetime, etc.). What is 99% confidence interval for the true proportion of smartphone users who are actively using all three? 99% confidence interval: 3 9. [6 pts] Google wants to know if there is strong evidence that more than 35% of smartphone users actively use Twitter. Use the sample data to perform the hypothesis test at the 5% significance level. For full credits, you need to specify the null and alternative hypotheses, appropriate test- statistic value, pvalue and conclusion. Parameter to be tested: 0 : : Test-statistics: p-value: Conclusion: [Should we build a wind farm?] Wind Power has been the most popular alternative energy choice in recent years. One of the key determinants for choosing a location for a wind farm is whether the site has enough wind. To produce enough energy using current technology, the site should have an annual average wind speed exceeding 7 miles per hour, according to the Wind Energy Association. One candidate site in southern California was monitored for a year, with wind speeds recorded every 6 hours. A total sample of 1082 reading of wind speed averaged 7.228 mph with a sample standard deviation of 3.716 mph. The histogram of these 1082 readings is shown below. You are asked to perform statistical analysis to help the developer decide whether to place a wind turbine at this site. Use this information to answer questions 10 to 13. Figure 1: Histogram of Wind Speed at a candidate location 4 10. [2 pts] Describe the parameter you will be testing a hypothesis test for and state the appropriate null and alternative hypotheses for the purpose of your analysis. Parameter: 0 : : 11. [4 pts] Determine the value of the appropriate test statistic and the p-value of the test. test-statistics: p-value: 12. [2 pts] What are the appropriate conclusion and interpretation based on the outcome of the hypothesis test at the 5% significance level? Should you recommend building a wind farm at this site? Conclusion of the test: Interpretation: 13. [2 pts] The board feels that the 5% significance level might be too lenient and wishes to invest that the claim is supported with stronger evidence. Your manager asked you to change the significance level to either = 0.1 or = 0.01. Which one would you choose to make the null hypothesis more difficult to be rejected? With the new significance level, would you recommend building a wind farm at this site? New significance level: Your recommendation: 5 [Part B: Module III - Regression] [Concert Nation] Concert Nation, INC. is a nationwide promoter of rock concerts. The president of the company wants to develop a model to estimate the revenue of a major concert event at large venues (such as Ford Field, Madison Square Gardens) for planning marketing strategies. The company has collected revenue data of 34 recent large concert events. For each concert, they have also recorded the attendance, the number of concession stands in the venue, and the Billboard chart of the artist in the week of each event. This data is available in \"Part B - Data Set for Assessment Assignment.xslx\". They have two potential models that could explain the revenue. The two competing models are: Model A: = 0 + 1 + 2 + 3 + Model B: = 0 + 1 + 2 + Run regression on both models. Use only the regression outputs of the two models and the original data to answer questions 14 to 20 below. 14. [2 pt] Let's consider the model A first. What does the result of F-test indicate? (a) The p-value of F-test is 110.56. Thus, the model does not significantly explain the revenue. (b) The p-value of F-test is close to zero. Thus, all independent variables in the regression model are statistically significant. (c) The p-value of F-test is close to zero. This indicates that at least some independent variables in the regression model significantly explain the revenue. (d) This indicates weak evidence of a linear relationship, because the p-value is very low. 6 15. [2 pt] If we use model A for prediction, what is the point estimate for the revenue of a concert that has attendance of 50,000 people, 5 concession stands, and the song ranked in no. 15 in the Billboard ranking? (a) $3.254 M (b) $2.855 M (c) $3.148 M (d) $340K 16. [2 pt] What is an approximate 95% prediction interval for the concert revenue listed in the previous question? (a) [$2.380M, $3.916M] (b) [$2.463M, $3.239M] (c) [$2.757M, $3.533M] (d) [$2.074M, $3.628M] 17. [2 pt] Which of the following statement is correct? (a) The estimated slope for the attendance is only $57.94. This means that, when keeping everything else the same, the revenue does not depend much on the attendance. (b) The t-statistic associated with the slope for the attendance variable is 17.62. This means that there is too much noise to determine if the slope is definitely positive. (c) The p-value for the concession variable is 0.687. This means that the number of concession stands is not a statistically significant variable to determine the revenue. (d) The p-value for the concession variable is 0.687. This means that the number of concession stands is a statistically significant variable to determine the revenue. 18. [2 pt] Is it appropriate to use model A as a final model to estimate the revenue of a concert? (a) Yes. All independent variables are statistically significant. (b) Yes, because the analysis indicates a linear relationship between revenue and attendance. (c) No, because not all independent variables are statistically important. Thus, revision is necessary. (d) No, because some of the slopes were negative. Thus, revision is necessary. 7 19. [2 pt] Now, consider model B. According to model B, what is a point estimate for a concert that has attendance of 50000 people, 5 concession stands, and the song ranked in no. 15 in the Billboard ranking? (a) $3.159M (b) $2.869M (c) $7.739M (d) $13.167M 20. [2 pt] Based on the regression outputs, which model would you consider more suitable for predicting the revenue between the two models- Model A and Model B? (a) Model A is more suitable, because it has a higher 2 , lower standard error of the estimates ( ), and lower F-test p-value. (b) Model A is more suitable because the fraction of SST accounted for by the residuals is higher than for model B. (c) Model B is more suitable, because, while both models have similar 2 and F-test p-value, model B has lower standard error of the estimates ( ) and all independent variables are statistically significant. (d) Model B is more suitable, because the slope coefficient is larger in magnitude. 8 Attendance # of concessions Billboard Charts Concert Revenue 30650 8 56 1531762 80997 1 87 4047180 93686 8 24 5805972 44405 4 99 2516538 77767 4 39 4197208 95780 7 35 6226065 82701 7 86 4123048 50165 8 29 3465110 50619 5 93 2843474 36259 7 86 1866318 52013 5 35 2670798 97447 7 71 5756817 69982 7 97 3681670 31789 10 72 2072149 39787 6 89 1964361 63596 5 65 3150802 73159 5 41 5064323 51172 8 1 2901564 54187 9 17 3170058 56681 7 1 3316764 78466 7 86 3825369 65132 8 86 2983563 52866 4 8 3091641 39536 2 20 3068049 32541 1 53 1796727 36441 1 60 2011990 74987 6 58 4389931 33791 8 81 1545359 64961 6 94 3792136 61429 3 86 2695672 68178 4 50 4147528 85701 5 52 5335423 31471 11 73 1989263 37002 5 91 2101866 SUMMARY OUTPUT Regression Statistics Multiple R 0.9551742565 R Square 0.9123578602 Adjusted R Sq 0.9035936462 Standard Error 6329.7556582937 Observations 34 ANOVA df Regression Residual Total SS MS F Significance F 3 12512595109.424 4170865036.47476 104.1003634 5.93E-016 30 1201974200.811 40065806.6937004 33 13714569310.235 Coefficients Standard Error Intercept 678.1812654305 5056.2894890712 # of concessio -189.7234817524 445.3686140343 Billboard Char 113.4617116429 37.9517064099 Concert Reven 0.0157383103 0.0008932719 RESIDUAL OUTPUT ObservationPredicted Attendance Residuals 1 29621.594867376 1028.4051326236 2 74055.401223882 6941.5987761179 3 93259.663200915 426.3367990853 4 50758.052621873 -6353.052621873 5 70401.255831792 7365.7441682079 6 101309.01948375 -5529.019483749 7 73997.632744703 8703.3672552972 8 56985.759321963 -6820.759321963 9 55032.97907417 -4413.97907417 10 38480.515826522 -2221.515826522 11 45734.571336031 6278.428663969 12 98008.470488943 -561.4704889431 13 68299.167665833 1682.8323341667 14 39562.313557858 -7773.313557858 15 40553.655596248 -766.6555962475 16 56692.874564452 6903.1254355485 t Stat 0.1341262732 -0.425992034 2.9896339948 17.6187226033 P-value Lower 95%Upper 95% Lower 95.0%pper 95.0% U 0.894198501 -9648.139 11004.5 -9648.139 11004.5 0.673154564 -1099.288 719.8406 -1099.288 719.8406 0.005532258 35.95399 190.9694 35.95399 190.9694 2.2651E-017 0.013914 0.017563 0.013914 0.017563 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 84085.380676397 44939.569600898 50790.875380991 51663.839503546 69312.66828414 55874.340793744 49484.18630913 50853.875596643 34779.375483358 38961.483346846 75210.715758179 32672.131463086 69887.054194089 52292.04032279 70867.455406035 89600.115416572 38181.56620335 43134.3988539 -10926.3806764 6232.4303991024 3396.1246190089 5017.160496454 9153.3317158606 9257.6592062565 3381.8136908699 -11317.87559664 -2238.375483358 -2520.483346846 -223.7157581793 1118.868536914 -4926.054194089 9136.9596772105 -2689.455406035 -3899.115416572 -6710.566203349 -6132.3988539