Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Linear Regression and Prediction Assignment [ADD IN YOUR CLAIM] Step 1. Collect Data Sets. FIND THE DATA SET(S) YOU NEED TO SUPPORT YOUR CLAIM. 2.
Linear Regression and Prediction Assignment [ADD IN YOUR CLAIM] Step 1. Collect Data Sets. FIND THE DATA SET(S) YOU NEED TO SUPPORT YOUR CLAIM. 2. Examine the data & run summary statistics. Discuss these BRIEFLY (review chapter 4 if necessary) 3. Determine relationships & format data set a) what is your dependent variable - format your data set appropriately (i.e. think \"wins\" or \"win %\" from basketball discussion b) what are your potential explanatory variables? Do you have (need) squares, interactions or binary variables? Format your data set appropriately. 4. Do some correlation analysis a) Do a scatter plot. b) Does the evidence suggest that there is a correlation c) Are there any 'weird' patterns? d) Drop any variables that seemingly lack in explanatory value 5. Run a regression on data set Look at R-squared & Adjusted R-squared. Comment on them. Do you have evidence of unnecessary variables? Do your explanatory variables seemingly capture much of the variation? Estimate of intercept - is it statistically different from zero. If not, could you justify a RTO (i.e. regression through origin)? Re-run if necessary. Coefficient estimates - rank by p-value. If there are very high p-values, consider dropping those explanatory variables and re-running the regression BEFORE continuing Do the signs of the coefficients make sense? What about magnitudes? Redo step 4 every time you re-run your regression. Stop when you feel like you have a 'good fit' 6. Fitted equation a) write out the estimated model b) Add the trend line to your scatter plot. Make sure that the fitted regression line equation is consistent with your equation in (a) 7. Plot the standard residuals vs. the fitted values. Discuss any issues that arise (i.e. heteroskedacity?, non-normal residuals? Non-constant variance?) 8. Predict THE NEXT ONE (OR MORE) VALUES USING YOUR REGRESSION MODEL.. Do your predictions hold up? Do they make sense? 9. COMMENT ON THE REGRESSION. What else might you do? Are you satisfied with the model? Does it lack something? If so, what might that be? Are there possible non-independence issues that you'd like to resolve? Etc. FTIAC YEAR 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 https://www.oakland.edu/oira/enrollment/new-student-application-admissions-enrollme Applications 2,786 3,187 3,069 3,206 3,535 3,644 3,812 3,928 4966 5,221 5,519 5,740 6,356 6,293 6,014 6,571 6,914 7,708 10,105 9,907 10,360 12,152 12,164 12,403 12,486 12,747 Admissions 2,228 2,581 2,577 2,661 2,969 3,061 3,279 3,157 3722 4,072 4,246 4,465 5,089 5,119 4,930 5,217 5,592 6,028 6,925 6,612 6,975 7,551 7,985 8,354 8,169 9,296 % Offered 80% 81% 84% 83% 84% 84% 86% 80% 75% 78% 77% 78% 80% 81% 82% 79% 81% 78% 69% 67% 67% 62% 66% 67% 65% 73% Enrollment AVG HS GPA 1,145 3.0 1,308 3.0 1,242 3.0 1,297 3.0 1,417 3.0 1,411 3.0 1,556 3.0 1,530 3.0 1,789 3.0 1,863 3.1 1,880 3.1 1,846 3.1 2,085 3.1 2,022 3.2 2,182 3.2 2,251 3.2 2,294 3.2 2,323 3.3 2,427 3.3 2,283 3.3 2,319 3.3 2,440 3.4 2,547 3.4 2,502 3.4 2,658 3.4 2,595 3.4 lication-admissions-enrollment/ AVG ACT SCORE% Enrolled (Yield) 21.6 51% 21.6 51% 21.7 48% 21.7 49% 21.2 48% 21.3 46% 21.3 47% 21.8 48% 21.7 49% 21.6 46% 21.3 44% 21.6 41% 21.2 41% 21.2 39% 21.5 44% 21.6 43% 21.8 41% 22 39% 22.2 35% 22.4 35% 22.4 33% 23.2 32% 23.3 32% 23.2 30% 23.2 33% 23.6 28% Linear Regression and Prediction Assignment Is enrollment percentage (yield) dependent on applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE? Step 1. Collect Data Sets. FIND THE DATA SET(S) YOU NEED TO SUPPORT YOUR CLAIM. YEAR 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Applications 2,786 3,187 3,069 3,206 3,535 3,644 3,812 3,928 4966 5,221 5,519 5,740 6,356 6,293 6,014 6,571 6,914 7,708 10,105 9,907 10,360 12,152 12,164 12,403 12,486 12,747 Admissions 2,228 2,581 2,577 2,661 2,969 3,061 3,279 3,157 3722 4,072 4,246 4,465 5,089 5,119 4,930 5,217 5,592 6,028 6,925 6,612 6,975 7,551 7,985 8,354 8,169 9,296 % Offered 80% 81% 84% 83% 84% 84% 86% 80% 75% 78% 77% 78% 80% 81% 82% 79% 81% 78% 69% 67% 67% 62% 66% 67% 65% 73% Enrollmen t 1,145 1,308 1,242 1,297 1,417 1,411 1,556 1,530 1,789 1,863 1,880 1,846 2,085 2,022 2,182 2,251 2,294 2,323 2,427 2,283 2,319 2,440 2,547 2,502 2,658 2,595 AVG HS GPA 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.1 3.1 3.1 3.1 3.2 3.2 3.2 3.2 3.3 3.3 3.3 3.3 3.4 3.4 3.4 3.4 3.4 AVG ACT SCORE 21.6 21.6 21.7 21.7 21.2 21.3 21.3 21.8 21.7 21.6 21.3 21.6 21.2 21.2 21.5 21.6 21.8 22 22.2 22.4 22.4 23.2 23.3 23.2 23.2 23.6 % Enrolled (Yield) 51% 51% 48% 49% 48% 46% 47% 48% 49% 46% 44% 41% 41% 39% 44% 43% 41% 39% 35% 35% 33% 32% 32% 30% 33% 28% 2. Examine the data & run summary statistics. Discuss these BRIEFLY (review chapter 4 if necessary) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count YEAR 2003. 5 1.5 2003. 5 #N/A 7.648 529 Applica tions 6953.57 7 671.997 9 Admiss ions 6153.5 #N/A 5110 409.51 49 58.5 3426.53 117411 10 -1.2 1.13838 5009.5 #N/A 2088.1 25 436026 4 1.0019 1 0 25 1991 2016 0.55299 3 9961 2786 12747 0.3969 03 7068 2228 9296 52091 26 180793 26 132860 26 % Offere d 0.7643 66 0.0137 41 0.785 0.8 0.0700 66 0.0049 09 0.8670 9 0.6780 6 0.24 0.62 0.86 19.873 52 26 Enroll ment 1969.6 92 93.245 84 AVG HS GPA 3.16923 1 0.03076 9 2053.5 #N/A 475.46 24 226064 .5 1.2871 5 0.3025 7 1513 1145 2658 3.15 3 0.15689 3 0.02461 5 51212 26 AVG ACT SCORE % Enrolled (Yield) 21.96923 0.412692 0.145154 0.013818 21.7 21.6 0.42 0.48 0.740146 0.070459 0.547815 0.004964 -1.4803 -0.1803 -1.1713 0.28682 0.4 3 3.4 1.044257 2.4 21.2 23.6 -0.35678 0.23 0.28 0.51 82.4 26 571.2 26 10.73 26 Data for Applications, Admissions, AVG HS GPA and AVG ACT SCORE is skewed to the right. And data for % Offered, Enrollment and % Enrolled (Yield) is skewed to the left. For a skewed data, median is the best measure of central tendency. Median for Applications, Admissions, % Offered, Enrollment, AVG HS GPA, AVG ACT SCORE and % Enrolled (Yield) is 6153.5, 5009.5, 0.785, 2053.5, 3.15, 21.7 and 0.42 respectively. 3. Determine relationships & format data set a) what is your dependent variable - format your data set appropriately (i.e. think \"wins\" or \"win %\" from basketball discussion % Enrolled (Yield) is my dependent variable. b) what are your potential explanatory variables? Do you have (need) squares, interactions or binary variables? Format your data set appropriately. Independent variables are applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE. There are no binary variables. 4. Do some correlation analysis a) Do a scatter plot. Scatterplot of % Enrolled ( vs Applications, Admissions, ... Applications Admissions % Offered 50.00% 45.00% % Enrolled (Yield) 40.00% 35.00% 30.00% 4000 8000 12000 3000 Enrollment 6000 9000 60.00% AVG HS GPA 70.00% 80.00% AVG ACT SCORE 50.00% 45.00% 40.00% 35.00% 30.00% 1500 2000 2500 3.0 3.2 3.421 22 23 b) Does the evidence suggest that there is a correlation there is a negative linear relationship between % enrolled and applications, admissions, enrollment, AVG HS GPA, AVG ACT SCORE. there is a positive linear relationship between % enrolled and % offered. c) Are there any 'weird' patterns? There are no outliers in the data d) Drop any variables that seemingly lack in explanatory value all independent variables are linearly related to dependent variable. 5. Run a regression on data set With R^2 = 98%, I can say that 98% variation in dependent variable is explained by all independent variables. With adj R^2 = 97%, I can say that 97% variation in dependent variable is explained by all significant independent variables. Hence I don't have evidence of unnecessary variables. Intercept is -0.093. this implies the initial value of % Enrolled (Yield) is -9.3%. Ho: beta(i) is not significant H1: beta(i) is significant With p-value < 0.05 (alpha), I reject ho and conclude that Admissions, Enrollment and AVG ACT SCORE are the only significant independent variables. With p-value >0.05, insignificant variables Applications, % Offered, AVG HS GPA should re removed and regression analysis should be re-run. After re-running the regression model, 6. Fitted equation a) write out the estimated model b) Add the trend line to your scatter plot. Make sure that the fitted regression line equation is consistent with your equation in (a) 7. Plot the standard residuals vs. the fitted values. Discuss any issues that arise (i.e. heteroskedacity?, non-normal residuals? Non-constant variance?) 8. Predict THE NEXT ONE (OR MORE) VALUES USING YOUR REGRESSION MODEL.. Do your predictions hold up? Do they make sense? 9. COMMENT ON THE REGRESSION. What else might you do? Are you satisfied with the model? Does it lack something? If so, what might that be? Are there possible non-independence issues that you'd like to resolve? Etc. Linear Regression and Prediction Assignment Is enrollment percentage (yield) dependent on applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE? Step 1. Collect Data Sets. FIND THE DATA SET(S) YOU NEED TO SUPPORT YOUR CLAIM. YEAR 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Applications 2,786 3,187 3,069 3,206 3,535 3,644 3,812 3,928 4966 5,221 5,519 5,740 6,356 6,293 6,014 6,571 6,914 7,708 10,105 9,907 10,360 12,152 12,164 12,403 12,486 12,747 Admissions 2,228 2,581 2,577 2,661 2,969 3,061 3,279 3,157 3722 4,072 4,246 4,465 5,089 5,119 4,930 5,217 5,592 6,028 6,925 6,612 6,975 7,551 7,985 8,354 8,169 9,296 % Offered 80% 81% 84% 83% 84% 84% 86% 80% 75% 78% 77% 78% 80% 81% 82% 79% 81% 78% 69% 67% 67% 62% 66% 67% 65% 73% Enrollmen t 1,145 1,308 1,242 1,297 1,417 1,411 1,556 1,530 1,789 1,863 1,880 1,846 2,085 2,022 2,182 2,251 2,294 2,323 2,427 2,283 2,319 2,440 2,547 2,502 2,658 2,595 AVG HS GPA 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.1 3.1 3.1 3.1 3.2 3.2 3.2 3.2 3.3 3.3 3.3 3.3 3.4 3.4 3.4 3.4 3.4 AVG ACT SCORE 21.6 21.6 21.7 21.7 21.2 21.3 21.3 21.8 21.7 21.6 21.3 21.6 21.2 21.2 21.5 21.6 21.8 22 22.2 22.4 22.4 23.2 23.3 23.2 23.2 23.6 % Enrolled (Yield) 51% 51% 48% 49% 48% 46% 47% 48% 49% 46% 44% 41% 41% 39% 44% 43% 41% 39% 35% 35% 33% 32% 32% 30% 33% 28% 2. Examine the data & run summary statistics. Discuss these BRIEFLY (review chapter 4 if necessary) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count YEAR 2003. 5 1.5 2003. 5 #N/A 7.648 529 Applica tions 6953.57 7 671.997 9 Admiss ions 6153.5 #N/A 5110 409.51 49 58.5 3426.53 117411 10 -1.2 1.13838 5009.5 #N/A 2088.1 25 436026 4 1.0019 1 0 25 1991 2016 0.55299 3 9961 2786 12747 0.3969 03 7068 2228 9296 52091 26 180793 26 132860 26 % Offere d 0.7643 66 0.0137 41 0.785 0.8 0.0700 66 0.0049 09 0.8670 9 0.6780 6 0.24 0.62 0.86 19.873 52 26 Enroll ment 1969.6 92 93.245 84 AVG HS GPA 3.16923 1 0.03076 9 2053.5 #N/A 475.46 24 226064 .5 1.2871 5 0.3025 7 1513 1145 2658 3.15 3 0.15689 3 0.02461 5 51212 26 AVG ACT SCORE % Enrolled (Yield) 21.96923 0.412692 0.145154 0.013818 21.7 21.6 0.42 0.48 0.740146 0.070459 0.547815 0.004964 -1.4803 -0.1803 -1.1713 0.28682 0.4 3 3.4 1.044257 2.4 21.2 23.6 -0.35678 0.23 0.28 0.51 82.4 26 571.2 26 10.73 26 Data for Applications, Admissions, AVG HS GPA and AVG ACT SCORE is skewed to the right. And data for % Offered, Enrollment and % Enrolled (Yield) is skewed to the left. For a skewed data, median is the best measure of central tendency. Median for Applications, Admissions, % Offered, Enrollment, AVG HS GPA, AVG ACT SCORE and % Enrolled (Yield) is 6153.5, 5009.5, 0.785, 2053.5, 3.15, 21.7 and 0.42 respectively. 3. Determine relationships & format data set a) what is your dependent variable - format your data set appropriately (i.e. think \"wins\" or \"win %\" from basketball discussion % Enrolled (Yield) is my dependent variable. b) what are your potential explanatory variables? Do you have (need) squares, interactions or binary variables? Format your data set appropriately. Independent variables are applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE. There are no binary variables. 4. Do some correlation analysis a) Do a scatter plot. Scatterplot of % Enrolled ( vs Applications, Admissions, ... Applications Admissions % Offered 50.00% 45.00% % Enrolled (Yield) 40.00% 35.00% 30.00% 4000 8000 12000 3000 Enrollment 6000 9000 60.00% AVG HS GPA 70.00% 80.00% AVG ACT SCORE 50.00% 45.00% 40.00% 35.00% 30.00% 1500 2000 2500 3.0 3.2 3.421 22 23 b) Does the evidence suggest that there is a correlation there is a negative linear relationship between % enrolled and applications, admissions, enrollment, AVG HS GPA, AVG ACT SCORE. there is a positive linear relationship between % enrolled and % offered. c) Are there any 'weird' patterns? There are no outliers in the data d) Drop any variables that seemingly lack in explanatory value all independent variables are linearly related to dependent variable. 5. Run a regression on data set With R^2 = 98%, I can say that 98% variation in dependent variable is explained by all independent variables. With adj R^2 = 97%, I can say that 97% variation in dependent variable is explained by all significant independent variables. Hence I don't have evidence of unnecessary variables. Intercept is -0.093. this implies the initial value of % Enrolled (Yield) is -9.3%. Ho: beta(i) is not significant H1: beta(i) is significant With p-value < 0.05 (alpha), I reject ho and conclude that Admissions, Enrollment and AVG ACT SCORE are the only significant independent variables. With p-value >0.05, insignificant variables Applications, % Offered, AVG HS GPA should re removed and regression analysis should be re-run. After re-running the regression model, With R^2 = 97.7%, I can say that 97.7% variation in dependent variable is explained by all independent variables. With adj R^2 = 97.4%, I can say that 97.4% variation in dependent variable is explained by all significant independent variables. Hence I don't have evidence of unnecessary variables. Intercept is -0.093. this implies the initial value of % Enrolled (Yield) is -9.3%. Ho: beta(i) is not significant H1: beta(i) is significant With p-value < 0.05 (alpha), I reject ho and conclude that Admissions, Enrollment and AVG ACT SCORE are the only significant independent variables. With a unit increase in Admissions, % Enrolled (Yield) is decrease by -0.0063%. With a unit increase in Enrollment, % Enrolled (Yield) is increased by 0.01 %. With a unit increase in AVG ACT SCORE, % Enrolled (Yield) is increased by 2.795 %. 6. Fitted equation a) write out the estimated model % Enrolled (Yield) = -0.076 - 0.000063 Admissions + 0.000100 Enrollment + 0.02796 AVG ACT SCORE b) Add the trend line to your scatter plot. Make sure that the fitted regression line equation is consistent with your equation in (a) Scatterplot of % Enrolled ( vs Admissions, Enrollment, AVG ACT SCOR Admissions Enrollment 50.00% 45.00% % Enrolled (Yield) 40.00% 35.00% 30.00% 2000 4000 6000 8000 10000 1000 1500 2000 2500 AVG ACT SCORE 50.00% 45.00% 40.00% 35.00% 30.00% 21.5 22.0 22.5 23.0 23.5 7. Plot the standard residuals vs. the fitted values. Discuss any issues that arise (i.e. heteroskedacity?, non-normal residuals? Non-constant variance?) Admissions Residual Plot 0.03 0.02 0.01 Residuals 0 -0.01 0 -0.02 -0.03 2,000 4,000 6,000 8,000 10,000 Admissions Enrollment Residual Plot 0.03 0.02 0.01 0 Residuals -0.011,000 -0.02 -0.03 1,500 2,000 2,500 3,000 Enrollment AVG ACT SCORE Residual Plot 0.03 0.02 0.01 Residuals 0 -0.01 21 -0.02 -0.03 21.5 22 22.5 23 23.5 24 100 120 AVG ACT SCORE Normal Probability Plot 0.6 0.4 % Enrolled (Yield) 0.2 0 0 20 40 60 80 Sample Percentile Since all points in residual plot is randomly distributed, I can say that assumption of homogeneity of error variance is satisfied. From pp plot, S shape is formed. Hence assumption of normality is followed. 8. Predict THE NEXT ONE (OR MORE) VALUES USING YOUR REGRESSION MODEL.. Do your predictions hold up? Do they make sense? when, Admissions Enrollment AVG ACT SCORE predicted % Enrolled 2400 2500 23 0.66616 6 Yes regression model is reliable. 9. COMMENT ON THE REGRESSION. What else might you do? Are you satisfied with the model? Does it lack something? If so, what might that be? Are there possible non-independence issues that you'd like to resolve? Etc. Model is reliable as all assumptions of regression analysis is followed. Linear Regression and Prediction Assignment Is enrollment percentage (yield) dependent on applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE? Step 1. Collect Data Sets. FIND THE DATA SET(S) YOU NEED TO SUPPORT YOUR CLAIM. YEAR 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Applications 2,786 3,187 3,069 3,206 3,535 3,644 3,812 3,928 4966 5,221 5,519 5,740 6,356 6,293 6,014 6,571 6,914 7,708 10,105 9,907 10,360 12,152 12,164 12,403 12,486 12,747 Admissions 2,228 2,581 2,577 2,661 2,969 3,061 3,279 3,157 3722 4,072 4,246 4,465 5,089 5,119 4,930 5,217 5,592 6,028 6,925 6,612 6,975 7,551 7,985 8,354 8,169 9,296 % Offered 80% 81% 84% 83% 84% 84% 86% 80% 75% 78% 77% 78% 80% 81% 82% 79% 81% 78% 69% 67% 67% 62% 66% 67% 65% 73% Enrollmen t 1,145 1,308 1,242 1,297 1,417 1,411 1,556 1,530 1,789 1,863 1,880 1,846 2,085 2,022 2,182 2,251 2,294 2,323 2,427 2,283 2,319 2,440 2,547 2,502 2,658 2,595 AVG HS GPA 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.1 3.1 3.1 3.1 3.2 3.2 3.2 3.2 3.3 3.3 3.3 3.3 3.4 3.4 3.4 3.4 3.4 AVG ACT SCORE 21.6 21.6 21.7 21.7 21.2 21.3 21.3 21.8 21.7 21.6 21.3 21.6 21.2 21.2 21.5 21.6 21.8 22 22.2 22.4 22.4 23.2 23.3 23.2 23.2 23.6 % Enrolled (Yield) 51% 51% 48% 49% 48% 46% 47% 48% 49% 46% 44% 41% 41% 39% 44% 43% 41% 39% 35% 35% 33% 32% 32% 30% 33% 28% 2. Examine the data & run summary statistics. Discuss these BRIEFLY (review chapter 4 if necessary) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count YEAR 2003. 5 1.5 2003. 5 #N/A 7.648 529 Applica tions 6953.57 7 671.997 9 Admiss ions 6153.5 #N/A 5110 409.51 49 58.5 3426.53 117411 10 -1.2 1.13838 5009.5 #N/A 2088.1 25 436026 4 1.0019 1 0 25 1991 2016 0.55299 3 9961 2786 12747 0.3969 03 7068 2228 9296 52091 26 180793 26 132860 26 % Offere d 0.7643 66 0.0137 41 0.785 0.8 0.0700 66 0.0049 09 0.8670 9 0.6780 6 0.24 0.62 0.86 19.873 52 26 Enroll ment 1969.6 92 93.245 84 AVG HS GPA 3.16923 1 0.03076 9 2053.5 #N/A 475.46 24 226064 .5 1.2871 5 0.3025 7 1513 1145 2658 3.15 3 0.15689 3 0.02461 5 51212 26 AVG ACT SCORE % Enrolled (Yield) 21.96923 0.412692 0.145154 0.013818 21.7 21.6 0.42 0.48 0.740146 0.070459 0.547815 0.004964 -1.4803 -0.1803 -1.1713 0.28682 0.4 3 3.4 1.044257 2.4 21.2 23.6 -0.35678 0.23 0.28 0.51 82.4 26 571.2 26 10.73 26 Data for Applications, Admissions, AVG HS GPA and AVG ACT SCORE is skewed to the right. And data for % Offered, Enrollment and % Enrolled (Yield) is skewed to the left. For a skewed data, median is the best measure of central tendency. Median for Applications, Admissions, % Offered, Enrollment, AVG HS GPA, AVG ACT SCORE and % Enrolled (Yield) is 6153.5, 5009.5, 0.785, 2053.5, 3.15, 21.7 and 0.42 respectively. 3. Determine relationships & format data set a) what is your dependent variable - format your data set appropriately (i.e. think \"wins\" or \"win %\" from basketball discussion % Enrolled (Yield) is my dependent variable. b) what are your potential explanatory variables? Do you have (need) squares, interactions or binary variables? Format your data set appropriately. Independent variables are applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE. There are no binary variables. 4. Do some correlation analysis a) Do a scatter plot. Scatterplot of % Enrolled ( vs Applications, Admissions, ... Applications Admissions % Offered 50.00% 45.00% % Enrolled (Yield) 40.00% 35.00% 30.00% 4000 8000 12000 3000 Enrollment 6000 9000 60.00% AVG HS GPA 70.00% 80.00% AVG ACT SCORE 50.00% 45.00% 40.00% 35.00% 30.00% 1500 2000 2500 3.0 3.2 3.421 22 23 b) Does the evidence suggest that there is a correlation there is a negative linear relationship between % enrolled and applications, admissions, enrollment, AVG HS GPA, AVG ACT SCORE. there is a positive linear relationship between % enrolled and % offered. c) Are there any 'weird' patterns? There are no outliers in the data d) Drop any variables that seemingly lack in explanatory value all independent variables are linearly related to dependent variable. 5. Run a regression on data set With R^2 = 98%, I can say that 98% variation in dependent variable is explained by all independent variables. With adj R^2 = 97%, I can say that 97% variation in dependent variable is explained by all significant independent variables. Hence I don't have evidence of unnecessary variables. Intercept is -0.093. this implies the initial value of % Enrolled (Yield) is -9.3%. Ho: beta(i) is not significant H1: beta(i) is significant With p-value < 0.05 (alpha), I reject ho and conclude that Admissions, Enrollment and AVG ACT SCORE are the only significant independent variables. With p-value >0.05, insignificant variables Applications, % Offered, AVG HS GPA should re removed and regression analysis should be re-run. After re-running the regression model, 6. Fitted equation a) write out the estimated model b) Add the trend line to your scatter plot. Make sure that the fitted regression line equation is consistent with your equation in (a) 7. Plot the standard residuals vs. the fitted values. Discuss any issues that arise (i.e. heteroskedacity?, non-normal residuals? Non-constant variance?) 8. Predict THE NEXT ONE (OR MORE) VALUES USING YOUR REGRESSION MODEL.. Do your predictions hold up? Do they make sense? 9. COMMENT ON THE REGRESSION. What else might you do? Are you satisfied with the model? Does it lack something? If so, what might that be? Are there possible non-independence issues that you'd like to resolve? Etc. Linear Regression and Prediction Assignment Is enrollment percentage (yield) dependent on applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE? Step 1. Collect Data Sets. FIND THE DATA SET(S) YOU NEED TO SUPPORT YOUR CLAIM. YEAR 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Applications 2,786 3,187 3,069 3,206 3,535 3,644 3,812 3,928 4966 5,221 5,519 5,740 6,356 6,293 6,014 6,571 6,914 7,708 10,105 9,907 10,360 12,152 12,164 12,403 12,486 12,747 Admissions 2,228 2,581 2,577 2,661 2,969 3,061 3,279 3,157 3722 4,072 4,246 4,465 5,089 5,119 4,930 5,217 5,592 6,028 6,925 6,612 6,975 7,551 7,985 8,354 8,169 9,296 % Offered 80% 81% 84% 83% 84% 84% 86% 80% 75% 78% 77% 78% 80% 81% 82% 79% 81% 78% 69% 67% 67% 62% 66% 67% 65% 73% Enrollmen t 1,145 1,308 1,242 1,297 1,417 1,411 1,556 1,530 1,789 1,863 1,880 1,846 2,085 2,022 2,182 2,251 2,294 2,323 2,427 2,283 2,319 2,440 2,547 2,502 2,658 2,595 AVG HS GPA 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.1 3.1 3.1 3.1 3.2 3.2 3.2 3.2 3.3 3.3 3.3 3.3 3.4 3.4 3.4 3.4 3.4 AVG ACT SCORE 21.6 21.6 21.7 21.7 21.2 21.3 21.3 21.8 21.7 21.6 21.3 21.6 21.2 21.2 21.5 21.6 21.8 22 22.2 22.4 22.4 23.2 23.3 23.2 23.2 23.6 % Enrolled (Yield) 51% 51% 48% 49% 48% 46% 47% 48% 49% 46% 44% 41% 41% 39% 44% 43% 41% 39% 35% 35% 33% 32% 32% 30% 33% 28% 2. Examine the data & run summary statistics. Discuss these BRIEFLY (review chapter 4 if necessary) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count YEAR 2003. 5 1.5 2003. 5 #N/A 7.648 529 Applica tions 6953.57 7 671.997 9 Admiss ions 6153.5 #N/A 5110 409.51 49 58.5 3426.53 117411 10 -1.2 1.13838 5009.5 #N/A 2088.1 25 436026 4 1.0019 1 0 25 1991 2016 0.55299 3 9961 2786 12747 0.3969 03 7068 2228 9296 52091 26 180793 26 132860 26 % Offere d 0.7643 66 0.0137 41 0.785 0.8 0.0700 66 0.0049 09 0.8670 9 0.6780 6 0.24 0.62 0.86 19.873 52 26 Enroll ment 1969.6 92 93.245 84 AVG HS GPA 3.16923 1 0.03076 9 2053.5 #N/A 475.46 24 226064 .5 1.2871 5 0.3025 7 1513 1145 2658 3.15 3 0.15689 3 0.02461 5 51212 26 AVG ACT SCORE % Enrolled (Yield) 21.96923 0.412692 0.145154 0.013818 21.7 21.6 0.42 0.48 0.740146 0.070459 0.547815 0.004964 -1.4803 -0.1803 -1.1713 0.28682 0.4 3 3.4 1.044257 2.4 21.2 23.6 -0.35678 0.23 0.28 0.51 82.4 26 571.2 26 10.73 26 Data for Applications, Admissions, AVG HS GPA and AVG ACT SCORE is skewed to the right. And data for % Offered, Enrollment and % Enrolled (Yield) is skewed to the left. For a skewed data, median is the best measure of central tendency. Median for Applications, Admissions, % Offered, Enrollment, AVG HS GPA, AVG ACT SCORE and % Enrolled (Yield) is 6153.5, 5009.5, 0.785, 2053.5, 3.15, 21.7 and 0.42 respectively. 3. Determine relationships & format data set a) what is your dependent variable - format your data set appropriately (i.e. think \"wins\" or \"win %\" from basketball discussion % Enrolled (Yield) is my dependent variable. b) what are your potential explanatory variables? Do you have (need) squares, interactions or binary variables? Format your data set appropriately. Independent variables are applications, admissions, % offered, enrollment, AVG HS GPA, AVG ACT SCORE. There are no binary variables. 4. Do some correlation analysis a) Do a scatter plot. Scatterplot of % Enrolled ( vs Applications, Admissions, ... Applications Admissions % Offered 50.00% 45.00% % Enrolled (Yield) 40.00% 35.00% 30.00% 4000 8000 12000 3000 Enrollment 6000 9000 60.00% AVG HS GPA 70.00% 80.00% AVG ACT SCORE 50.00% 45.00% 40.00% 35.00% 30.00% 1500 2000 2500 3.0 3.2 3.421 22 23 b) Does the evidence suggest that there is a correlation there is a negative linear relationship between % enrolled and applications, admissions, enrollment, AVG HS GPA, AVG ACT SCORE. there is a positive linear relationship between % enrolled and % offered. c) Are there any 'weird' patterns? There are no outliers in the data d) Drop any variables that seemingly lack in explanatory value all independent variables are linearly related to dependent variable. 5. Run a regression on data set With R^2 = 98%, I can say that 98% variation in dependent variable is explained by all independent variables. With adj R^2 = 97%, I can say that 97% variation in dependent variable is explained by all significant independent variables. Hence I don't have evidence of unnecessary variables. Intercept is -0.093. this implies the initial value of % Enrolled (Yield) is -9.3%. Ho: beta(i) is not significant H1: beta(i) is significant With p-value < 0.05 (alpha), I reject ho and conclude that Admissions, Enrollment and AVG ACT SCORE are the only significant independent variables. With p-value >0.05, insignificant variables Applications, % Offered, AVG HS GPA should re removed and regression analysis should be re-run. After re-running the regression model, With R^2 = 97.7%, I can say that 97.7% variation in dependent variable is explained by all independent variables. With adj R^2 = 97.4%, I can say that 97.4% variation in dependent variable is explained by all significant independent variables. Hence I don't have evidence of unnecessary variables. Intercept is -0.093. this implies the initial value of % Enrolled (Yield) is -9.3%. Ho: beta(i) is not significant H1: beta(i) is significant With p-value < 0.05 (alpha), I reject ho and conclude that Admissions, Enrollment and AVG ACT SCORE are the only significant independent variables. With a unit increase in Admissions, % Enrolled (Yield) is decrease by -0.0063%. With a unit increase in Enrollment, % Enrolled (Yield) is increased by 0.01 %. With a unit increase in AVG ACT SCORE, % Enrolled (Yield) is increased by 2.795 %. 6. Fitted equation a) write out the estimated model % Enrolled (Yield) = -0.076 - 0.000063 Admissions + 0.000100 Enrollment + 0.02796 AVG ACT SCORE b) Add the trend line to your scatter plot. Make sure that the fitted regression line equation is consistent with your equation in (a) Scatterplot of % Enrolled ( vs Admissions, Enrollment, AVG ACT SCOR Admissions Enrollment 50.00% 45.00% % Enrolled (Yield) 40.00% 35.00% 30.00% 2000 4000 6000 8000 10000 1000 1500 2000 2500 AVG ACT SCORE 50.00% 45.00% 40.00% 35.00% 30.00% 21.5 22.0 22.5 23.0 23.5 7. Plot the standard residuals vs. the fitted values. Discuss any issues that arise (i.e. heteroskedacity?, non-normal residuals? Non-constant variance?) Admissions Residual Plot 0.03 0.02 0.01 Residuals 0 -0.01 0 -0.02 -0.03 2,000 4,000 6,000 8,000 10,000 Admissions Enrollment Residual Plot 0.03 0.02 0.01 0 Residuals -0.011,000 -0.02 -0.03 1,500 2,000 2,500 3,000 Enrollment AVG ACT SCORE Residual Plot 0.03 0.02 0.01 Residuals 0 -0.01 21 -0.02 -0.03 21.5 22 22.5 23 23.5 24 100 120 AVG ACT SCORE Normal Probability Plot 0.6 0.4 % Enrolled (Yield) 0.2 0 0 20 40 60 80 Sample Percentile Since all points in residual plot is randomly distributed, I can say that assumption of homogeneity of error variance is satisfied. From pp plot, S shape is formed. Hence assumption of normality is followed. 8. Predict THE NEXT ONE (OR MORE) VALUES USING YOUR REGRESSION MODEL.. Do your predictions hold up? Do they make sense? when, Admissions Enrollment AVG ACT SCORE predicted % Enrolled 2400 2500 23 0.66616 6 Yes regression model is reliable. 9. COMMENT ON THE REGRESSION. What else might you do? Are you satisfied with the model? Does it lack something? If so, what might that be? Are there possible non-independence issues that you'd like to resolve? Etc. Model is reliable as all assumptions of regression analysis is followed
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started