Answered step by step
Verified Expert Solution
Question
1 Approved Answer
APStatistics 4.01 Transforming Data Directions: Complete the assignment. Clearly label each answer. The last page contains a table of common transformations. (27 points) 1. Consider
APStatistics 4.01 Transforming Data Directions: Complete the assignment. Clearly label each answer. The last page contains a table of common transformations. (27 points) 1. Consider the following set of observations: Obs. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 input 1 2 3 4 5 6 7 8 9 10 11 12 13 14 result 1 2 3 5 8 13 21 34 55 89 144 233 377 610 a. Enter the data in L1 and L2 in your TI calculator, find the regression line, and construct a scatterplot with the regression line included. Does a line appear to be a good model for these data? Be sure to check your residuals plot. (7 points: 2 points regression line, 2 points scatter plot, 2 points for residual plot; 1 points comment) b. What is r2? (1 point) c. What type of relationship does the data appear to have (linear, logarithmic, exponential, etc.)? (1 point) d. What type of re-expression would work in this case? (1 point) e. Find the natural logarithm of the y-values. (1 point) f. Draw a scatterplot of x vs. ln y. Find the regression equation on ln y on x and include it on the graph. Does it appear to be a better fit than the fit in part (a)? Be sure to check your residuals plot. (7 points: 2 points regression line, 2 points scatter plot, 2 points for residual plot; 1 points comment) g. Write a prediction (regression) equation for your re-expressed data (2 points): h. Use the regression equation you found in part (f) to predict the value of y when x = 10.5. (2 points) i. Does your answer for part (h) seem reasonable? Why or why not? (3 points) j. Explain the importance of checking the residuals plot before re-expressing data and then again after re-expressing data. (2 points) Table of common transformations There are many types of associations that you may encounter. This table lists the most common and summarizes the way each variable may be transformed. The list is not exhaustive, there are other possible transformations. Logarithmic General algebraic equation Take the log (natural or base 10) of the response variable, y = ln x y. Exponential General algebraic equation Take the log of the explanatory variable, x. x y = ae Quadratic General algebraic equation Take the square root of the response variable, y. 2 y = ax + bx + c Power General algebraic equation Take the log of both variables. b y = ax Complex More than one type of equation may be used to describe the association. Break the data into two or more functions and app.lied the appropriate transfromations Notice that the transformations exponential, quadratic, and power look very similar. Check the coefficient of determination after transformation to see which may be the best model. APStatistics 4.04 Two -Way Tables Directions: Complete the assignment. Your answers for this assignment must include reasons; simply stating the answer without justification will earn partial credit. (18 points) 1. A researcher suspected a relationship between people's preferences in music and preference in sports. A random sample of 100 people produced the following two-way table: Favorite Music Favorite Sport Hip Hop Basketball 35 Football 13 Softball 5 Classic Rock 5 24 6 Country 3 2 7 a. Calculate the overall (marginal) distributions for the table. (2 points) b. Compute (in percents) the conditional distribution of favorite music among those who prefer football. Show the distribution in a table. (2 points) c. Briefly describe your finding in words. (2 points) d. Compute (in percents) the conditional distribution of sport among those who chose Hip Hop as their favorite music. Show the distribution in a table. (2 points) e. Briefly describe your finding in words. (2 points) 2. Let's look at the voting record of the Civil Rights Act of 1964. Northern States Voted yes Democrats 145 Republicans 138 Southern States Democrats 7 Republicans 0 Voted no Total 9 154 24 162 87 10 94 10 a. Show the percentages of Democrats and Republicans for each region that voted in favor of the act. (2 points) : b. Show the overall percentage of Republicans that voted in favor of the act and then the overall percentage of the Democrats that voted in favor of the act (2 points) c. What is the name for this apparent contradiction? (2 points) d. Explain the phenomenon. (2 points) Obs. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 input result 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 5 8 13 21 34 55 89 144 233 377 610 ln( input ln ) (result) 0 0 0.693147 0.693147 1.098612 1.098612 1.386294 1.609438 1.609438 2.079442 1.791759 2.564949 1.94591 3.044522 2.079442 3.526361 2.197225 4.007333 2.302585 4.488636 2.397895 4.969813 2.484907 5.451038 2.564949 5.932245 2.639057 6.413459 SUMMARY OUTPUT Regression Statistics Multiple R 0.799104 R Square 0.638568 Adjusted R 0.608448 Standard E 112.5201 Observatio 14 700 600 500 400 result 300 200 100 0 0 ANOVA df Regression Residual Total SS MS 1 268423.8 268423.8 12 151929.1 12660.76 13 420352.9 Coefficients Standard Error t Stat Intercept -143.6923 63.51967 -2.26217 input 34.34945 7.460007 4.60448 RESIDUAL OUTPUT Observation Predicted result Residuals 1 -109.3429 110.3429 2 -74.99341 76.99341 3 -40.64396 43.64396 4 -6.294505 11.29451 5 28.05495 -20.05495 6 62.4044 -49.4044 7 96.75385 -75.75385 8 131.1033 -97.1033 9 165.4527 -110.4527 10 199.8022 -110.8022 11 234.1516 -90.15165 12 268.5011 -35.5011 13 302.8505 74.14945 14 337.2 272.8 2 4 scatterplot 700 600 500 400 esult 300 200 100 0 input Residual Plot f(x) = 34.3494505495x - 143.6923076923 R = 0.6385676489 0 2 4 6 8 10 12 14 16 input F Significance F 21.20123 0.000606 300 250 200 150 100 Residuals 50 0 -50 0 -100 result -150 input Line Fit Plot 800 Normal Probabili 600 2 400 4 8006 200 600 8 10 12 14 resu 16 Pre 0 400 input result 0 2 4 6 8 10 12 14 16 -200 200 0 input 0 20 40 60 Sample Percen P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% 0.043042 -282.0898 -5.294837 -282.0898 -5.294837 0.000606 18.09549 50.60341 18.09549 50.60341 PROBABILITY OUTPUT Percentile 3.571429 10.71429 17.85714 25 32.14286 39.28571 46.42857 53.57143 60.71429 67.85714 75 82.14286 89.28571 96.42857 result 1 2 3 5 8 13 21 34 55 89 144 233 377 610 al Plot Line Fit Plot ormal Probability Plot 8 10 12 14 result 16 Predicted result put 8 10 12 14 16 put 20 40 60 Sample Percentile 80 100 120 SUMMARY OUTPUT ln (result) Regression Statistics Multiple R 0.9997707886 R Square 0.9995416297 Adjusted R 0.9995034322 Standard E 0.0451972852 Observatio 14 8 6 4 ln y 2 0 f(x) = 0.484735102x - 0.3584420727 R = 0.9995416297 0 2 4 6 8 10 12 14 x ANOVA df Regression Residual Total SS MS F Significance F 1 53.45525 53.45525 26167.71 2.09E-021 12 0.024514 0.002043 13 53.47976 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% -0.3584420727 0.025515 -14.04845 8.20E-009 -0.414034 -0.30285 -0.414034 0.484735102 0.002997 161.7644 2.09E-021 0.478206 0.491264 0.478206 Intercept input 108 RESIDUAL OUTPUT Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0.698764 Predicted ln (result) Residuals 0.1262930294 -0.126293 0.6110281314 0.082119 1.0957632334 0.002849 1.5804983355 0.02894 2.0652334375 0.014208 2.5499685395 0.014981 3.0347036415 0.009819 3.5194387436 0.006922 4.0041738456 0.003159 4.4889089476 -0.000273 4.9736440497 -0.003831 5.4583791517 -0.007341 5.9431142537 -0.010869 6.4278493558 -0.01439 PROBABILITY OUTPUT Percentile 3.571429 10.71429 17.85714 25 32.14286 39.28571 46.42857 53.57143 60.71429 67.85714 75 82.14286 89.28571 96.42857 ln (result) 0 0.693147 1.098612 1.609438 2.079442 2.564949 3.044522 3.526361 4.007333 4.488636 4.969813 5.451038 5.932245 6.413459 16 input Residual Plot 0727 0 12 14 16 0.1 0.05 0 Residuals -0.05 0 -0.1 ln (result) -0.15 input Line Fit Plot 8 26 4 4 Normal Plot 6 8 10 Probability 12 14 16 8 2 6 0 4input ln (result) 0 2 4 26 8 10 12 14 16 0input 0 20 40 ln (result) Predicted ln (result) 60 80 Sample Percentile Upper 95.0% -0.30285 0.491264 100 120 Favorite Sport Hip HopClassic RockCountry Basketball 35 5 3 Football 13 24 2 Softball 5 6 7 sum 53 35 12 Favorite music Hip HopClassic RockCountry =13/39 =24/39 =2/39 33% 62% 5% Favorite Sport Basketball =35/53 Football =13/53 Softball =5/53 66.04% 24.53% 9.43% sum 43 39 18 100 northern states voted yes voted no demorats 145 9 republicans 138 24 sum 283 33 southern states voted yes voted no democrates 7 87 republicans 0 10 sum 7 97 sum 154 162 316 a) northern states voted yes demorats 94.16% =145/154 republicans 85.19% =138/162 sum 94 10 104 southern states democrates 7.45% =7/94 republicans 0.00% =0/10 b) democrates =(145+7)/ republicans=(138+0)/ 61% 52% Obs. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 input result 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 5 8 13 21 34 55 89 144 233 377 610 ln( input ln ) (result) 0 0 0.693147 0.693147 1.098612 1.098612 1.386294 1.609438 1.609438 2.079442 1.791759 2.564949 1.94591 3.044522 2.079442 3.526361 2.197225 4.007333 2.302585 4.488636 2.397895 4.969813 2.484907 5.451038 2.564949 5.932245 2.639057 6.413459 SUMMARY OUTPUT Regression Statistics Multiple R 0.799104 R Square 0.638568 Adjusted R 0.608448 Standard E 112.5201 Observatio 14 700 600 500 400 result 300 200 100 0 0 ANOVA df Regression Residual Total SS MS 1 268423.8 268423.8 12 151929.1 12660.76 13 420352.9 Coefficients Standard Error t Stat Intercept -143.6923 63.51967 -2.26217 input 34.34945 7.460007 4.60448 RESIDUAL OUTPUT Observation Predicted result Residuals 1 -109.3429 110.3429 2 -74.99341 76.99341 3 -40.64396 43.64396 4 -6.294505 11.29451 5 28.05495 -20.05495 6 62.4044 -49.4044 7 96.75385 -75.75385 8 131.1033 -97.1033 9 165.4527 -110.4527 10 199.8022 -110.8022 11 234.1516 -90.15165 12 268.5011 -35.5011 13 302.8505 74.14945 14 337.2 272.8 2 4 scatterplot 700 600 500 400 esult 300 200 100 0 input Residual Plot f(x) = 34.3494505495x - 143.6923076923 R = 0.6385676489 0 2 4 6 8 10 12 14 16 input F Significance F 21.20123 0.000606 300 250 200 150 100 Residuals 50 0 -50 0 -100 result -150 input Line Fit Plot 800 Normal Probabili 600 2 400 4 8006 200 600 8 10 12 14 resu 16 Pre 0 400 input result 0 2 4 6 8 10 12 14 16 -200 200 0 input 0 20 40 60 Sample Percen P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% 0.043042 -282.0898 -5.294837 -282.0898 -5.294837 0.000606 18.09549 50.60341 18.09549 50.60341 PROBABILITY OUTPUT Percentile 3.571429 10.71429 17.85714 25 32.14286 39.28571 46.42857 53.57143 60.71429 67.85714 75 82.14286 89.28571 96.42857 result 1 2 3 5 8 13 21 34 55 89 144 233 377 610 al Plot Line Fit Plot ormal Probability Plot 8 10 12 14 result 16 Predicted result put 8 10 12 14 16 put 20 40 60 Sample Percentile 80 100 120 SUMMARY OUTPUT ln (result) Regression Statistics Multiple R 0.9997707886 R Square 0.9995416297 Adjusted R 0.9995034322 Standard E 0.0451972852 Observatio 14 8 6 4 ln y 2 0 f(x) = 0.484735102x - 0.3584420727 R = 0.9995416297 0 2 4 6 8 10 12 14 x ANOVA df Regression Residual Total SS MS F Significance F 1 53.45525 53.45525 26167.71 2.09E-021 12 0.024514 0.002043 13 53.47976 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% -0.3584420727 0.025515 -14.04845 8.20E-009 -0.414034 -0.30285 -0.414034 0.484735102 0.002997 161.7644 2.09E-021 0.478206 0.491264 0.478206 Intercept input 108 RESIDUAL OUTPUT Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0.698764 Predicted ln (result) Residuals 0.1262930294 -0.126293 0.6110281314 0.082119 1.0957632334 0.002849 1.5804983355 0.02894 2.0652334375 0.014208 2.5499685395 0.014981 3.0347036415 0.009819 3.5194387436 0.006922 4.0041738456 0.003159 4.4889089476 -0.000273 4.9736440497 -0.003831 5.4583791517 -0.007341 5.9431142537 -0.010869 6.4278493558 -0.01439 PROBABILITY OUTPUT Percentile 3.571429 10.71429 17.85714 25 32.14286 39.28571 46.42857 53.57143 60.71429 67.85714 75 82.14286 89.28571 96.42857 ln (result) 0 0.693147 1.098612 1.609438 2.079442 2.564949 3.044522 3.526361 4.007333 4.488636 4.969813 5.451038 5.932245 6.413459 16 input Residual Plot 0727 0 12 14 16 0.1 0.05 0 Residuals -0.05 0 -0.1 ln (result) -0.15 input Line Fit Plot 8 26 4 4 Normal Plot 6 8 10 Probability 12 14 16 8 2 6 0 4input ln (result) 0 2 4 26 8 10 12 14 16 0input 0 20 40 ln (result) Predicted ln (result) 60 80 Sample Percentile Upper 95.0% -0.30285 0.491264 100 120 Favorite Sport Hip HopClassic RockCountry Basketball 35 5 3 Football 13 24 2 Softball 5 6 7 sum 53 35 12 Favorite music Hip HopClassic RockCountry =13/39 =24/39 =2/39 33% 62% 5% Favorite Sport Basketball =35/53 Football =13/53 Softball =5/53 66.04% 24.53% 9.43% sum 43 39 18 100 northern states voted yes voted no demorats 145 9 republicans 138 24 sum 283 33 southern states voted yes voted no democrates 7 87 republicans 0 10 sum 7 97 sum 154 162 316 a) northern states voted yes demorats 94.16% =145/154 republicans 85.19% =138/162 sum 94 10 104 southern states democrates 7.45% =7/94 republicans 0.00% =0/10 b) democrates =(145+7)/ republicans=(138+0)/ 61% 52%
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started