Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Table of Common Transformations There are many types of associations that you may encounter. This table lists the most common, and summarizes the way each

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
Table of Common Transformations There are many types of associations that you may encounter. This table lists the most common, and summarizes the way each is transformed. (Waming: This is by no means a complete list of every possible transformation!) Appearance of Plot ....... Possible Association Exponential* General algebra equation: y=axe Logarithmic General algebra equation: =Inx Quadratic* General algebra equation: y=ax?+bx+c Power#* General algebra equation: y =axx Complex More than one type of curve put together. Eguations for different sections may vary. Type of Transformation Take the log (natural or base 10) of the dependent (y) variable. Take the log of the independent (x) variable. Take the square root of the dependent (y) variable. Take the log of both variables. partition data where the plot changes. Treat each part as a separate association. General Procedure for Transforming Data You must always start by looking at a scatterplot of your original data, and examining the pattern. Are there outliers or influential points? What is the shape of the curve? The shape is a guide to choosing a likely transformation. If the points seem to lie in a straight line, you may not need to transform the data at all; you may have a linear relationship. If the explanatory variable involves time, particularly in years, you may want to change the variable to a form easier to work with. For example, if you were studying the U.5. population during the 1800's, looking for the effect of the Civil War, you might pick 1800 to be "zero." Then figure your explanatory variable in terms of years elapsed since 1800: 1820 becomes 20. mw " Remember that, whether you work with your calculator or a spreadsheet, you'll have results expressed simply in x and y. The correct variable for prediction is . And either variable may actually be transformed (In, exponential, square, square root, and so on). Thi to Avoid If your transformation involves taking a logarithm, remember that logarithms are undefined for zero and negative numbers. Questions 1 through & work with the length of the sidereal year vs. distance from the sun. The table of data is shown below. Planet |Distance from sun Years (as a In(Dist) In(Year) {in millions of fraction of Earth 36.19 0.2410 3.5889 67.63 0.6156 4.2140 93.50 1.0007 4.5380 142.46 1.8821 4.9591 486.46 11.8704 6.1871 B893.38 29.4580 6.7950 1,794.37 B84.0100 7.4924 2,815.19 164.7800 7.9428 . pluto 3,695.95 248.5400 8.2150 5.5156 Enter the original data in L1 and L2 (that is, the Distance from the Sun and Years). Make L3 = In(L1) and L4 = In(L2). Verify that this matches the columns given above. Don't worry about the small discrepancies you may find due to rounding and the number of decimal places shown on your calculator. If your results differ from the values above, double-check your original entries! 1. Draw a scatterplot of Distance vs. Year (using the untransformed data) with the least-squares regression line. Does the line seem to model the relationship well? (2 points) 2. On your calculator, do a linear regression {CALC Ej for these different combinations: Distance vs. In(Year) (L1 vs. L4, if you entered the data as directed above) o | n{Distance) vs. Year (L3 vs. L2, if you entered the data as directed above) o | n{Distance) vs. Ln{Year) (L3 vs. L4, if you entered the data as directed above) (MNote that the explanatory variable is always some form of "Distance.") To get the most out of this Assignment, look at a scatterplot of each of these combinations. Which transformation yields the highest correlation coefficient (Pearson's r)? sketch a scatterplot of this transformation and show the least-squares line. What is the value of r and r for that transformation, and what regression eguation does it yield? (3 points) (Hint: Remember to include "In" on the variables in your regression equation that have been transformed.) 3. Using the regression equation from the previous question that best fits the data, place the values of the residuals into L5. In case you forgot how to do this: press , highlight L5, in the data list window and press ENTER, then press [LIST], select RESID, and press [ENTER ENTER! Create a residual plot on your calculator and interpret it; you don't need to draw the plot. (Note: You'll probably need to turn off the plot in Y1 to display the scatterplot correctly.) (2 points) Using algebra, convert your regression e ation toa power equation (show your work below). Enter this eguation in Y2 (press IY..=J and enter the equation) and make a scatterplot of LI, L2, with Y2, verifying that the power equation is a good fit for this data. As you set up your regression equation, keep in mind that the variables are Iny and Inx. Here's what the graph of the scatterplot and power equation will look like. (It's upto you to derive the power equation.) Finally, summarize, in plain English, what you've done in questions 1-4. (3 points) The purpose of the transformations you're studying is to find a simple model to describe the relationship in a data set. The model can be used to predict a response value (called interpolation for values within the range of the data set and extrapolation for values outside the range of the data set). Recall that extrapolation is usually not a valid way to predict y-values. A well-known feature of our solar system is the asteroid belt between Mars and Jupiter. One theory about the asteroid belt is that it's made of primordial material that was prevented from forming another planet by the gravitational pull of Jupiter when the solar system was formed. One of the largest asteroids is 951 Gaspra. lts distance from the Sun is 207.16 million miles. Use your linear regression equation to interpolate the length of its sidereal year. (1 point) Remember that you need to take the natural log of Distance before you plug it in, and that your first result will be the natural log of Year. Show your work. Finally, calculate the length of the year for 951 Gaspra from the power function you developed in Question 4. (Show all your work) (1 point) Note: Theoretically, the answers from 5 and 6 should be the same, but they'll probably come out differently due to rounding between steps. The more digits you carry throughout the calculations, the closer the two answers will be. Questions 7 through 9 involve the following data set Increase in Life Expectancy in the United States during the 20th Century Year Life Span Life Span 1920 541 1975 72.6 1930 59.7 1980 73.7 1940 62.9 1985 4.7 1950 68.2 1990 75.4 1960 69.7 1995 75.8 1970 70.8 Source: MNational Center for Health Statistics, published in the 1998 World Almanac 7. Make a scatterplot of the untransformed data and tell which kind of relationship the points seem to follow. Also name the best type of transformation needed to "straighten\" the plot. (2 points) A Mote: Part of the transformation should invelve subtracting 1900 from the year so you're working with more manageable numbers. 8. Now try some transformations to get the data as close to linear as possible. {Use your calculator to transform the data, and try scatterplots of the different transformations). Then find the regression line, r, and r2. (4 points) Tell which transformation worked the best and back it up by showing: = A scatterplot of the transformed data with the least-squares regression equation and line, r, and r-squared. = A plot of the original data with the regression equation converted to a non-linear equation (similar to what you did for question 6) (You'll get a chance to do a residual plot in the next question.) Here are some hints: Look at the curve and think about what kind of relationship (equation) could have made such a curve (this is what you already did in guestion 7). Then try the kind of transformation that should work for that kind of curve. (See the Summary of Common Transformations at the beginning of this document.) If your first guess doesn't work, try others. More than one transformation will yield a good model; choose the one with the strongest value of r. If your transformation (the one that seems to work the best) doesn't match your answer for question 7, you may want to revise your answer for question 7! Type of transformation: Linear regression equation for transformed data: rand r2: L. Scatterplot of the transformed data with the least-squares regression equation and line, r, and r2. Plot of the original data with the regression equation converted to a non-linear equation (similar to what you did for question 6). Using the transformed data and the regression equation for it, create a plot of residuals vs. x-values. Sketch the plot and interpret it. (3 points) The data below represents Medicare expenditures from 1970 to 1996, in billions of dollars. (4 points) Year Medicare Expenditures billions of dollars 1870 7.6 1980 37.5 1985 721 1890 112.1 1991 124 4 1992 141.4 1993 153 1994 169.8 19495 1879 1996 2031 For this data set, use your TI-83/TI-84 to: Create a scatterplot of the data. . Assume that the relationship of this data is exponential. Transform the data, find the regression equation, r, and r>. Based only on the value of r would you consider this a good model for extrapolating increases in Medicare spending? Create a residual plot of the transformed data. Does the residual plot change your mind about the usefulness of this model to extrapolate increases in Medicare spending? (Note: Remember that many trends don't have a perfect mathematical model to predict them because there are too many complicating factors to yield a consistent curve. Sometimes in cases like these, a rough model will work as a rough estimator when used with appropriate caution.) Answer Solution: Question 10: (a) Scatter Plot Medicare 250 200 150 . ... Medicare 100 50 0 1965 1970 1975 1980 1985 1990 1995 2000 Years (b) Input Data for the Exponential Regression is as follows: Year LN_Medicare 0 2.028148 10 3.624341 15 4.278054 20 4.719391 21 4.823502 22 4.951593 23 5.030438 24 5.134621 25 5.23591 26 5.313698 Fitting Exponential Regression to the dataSUMMARY OUTPUT Regression Statistics Multiple R 0.992833 R Square 0.985718 Adjusted R Square 0.983933 Standard Error 0.127939 Observations 10 ANOVA df SS F Significance MS F Regression 9.037691364 9.037691 552. 1461 1.14E-08 Residual 8 0. 130946375 0.016368 Total 9 9.168637739 Coefficients Standard Upper Lower Upper Error t Stat P-value Lower 95% 95% 95.0% 95.0% Intercept 2.2243 0. 105507153 21.08198 2.69E-08 1.981 2.4676 1.981 2.4676 Year 0.123101 0.005238813 23.49779 1.14E-08 0.11102 0. 135181 0.11102 0.135181 Ln(Medicare) = 2.22 + 0.123*Year Applying Exp on both sides we get, Medicare = exp(2.22).exp(0.123) Year Medicare = 9.247.(1.131) Year Correlation Coefficient (r) = 0.9928 Coefficient of Determination (R2) = 0.9857As R = 0.9928, hence we can say that there is a strong positive linear correlation b/w actual Medicare and Predicted Medicare values from the exponential regression. As the magnitute of R is strong, hence we can say that the Predicted Values are very close to Observed values and hence Model seems to be a good fit to the data. (c) Fitted v/s Residual Plot Fitted v/s Residuals From the Fitted v/s Residual Plot, we can see that there is a linear pattern found in the data and hence the homogenity of variances assumption is voilated. The error pattern are not randomly distributed. Hence we cannot say that Exponential Regression Model is a good fir for the data. Likes: 0 Dislikes: 0

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

An Introduction to the Mathematics of financial Derivatives

Authors: Salih N. Neftci

2nd Edition

978-0125153928, 9780080478647, 125153929, 978-0123846822

More Books

Students also viewed these Mathematics questions