Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Abstract In this paper, the analysis will reveal a significant correlation between body weight and brain weight in various, different mammals. Introduction Talk about your

Abstract In this paper, the analysis will reveal a significant correlation between body weight and brain weight in various, different mammals. Introduction Talk about your other resources here. What did their research find? What conclusions? Research Question Is there a correlation between body weight and brain weight in mammals? The predictor is the x variable, which in this case is body weight in pounds. The dependent variable is the y variable, which is brain weight in grams. Data Collection Method Summary of Data The Measures of Correlation Conclusion Simple Regression Analysis Project The Simple Regression Analysis Project is a required assignment for STAT220: Introduction to Statistics. It represents 15% of the student's course grade. The project requires that students not only perform the statistical procedures but also discuss the implication of these procedures to the underlying problem. Students demonstrate not just the mere mechanics of statistics, but understanding of the significance of statistical data. The project may be an individual or group project. POSE THE QUESTION Before you even start thinking of the relationship to investigate you need to make sure you understand the concept of linear relationship from MATH 125 well. Then you should study the examples in the Course Notes and the Blackboard. Once you understand what kind of relationships can be investigated using linear regression, then you need to look for the availability of the data, and search for prior studies on this relationship. During the process it will be a good idea to confer with your instructors on the appropriateness of the topic and the data. DATA COLLECTION One component of the Simple Regression Analysis Project involves data collection. Instructors may allow students to use existing data sets or require students to gather data by developing a survey that they create, on a topic of their choosing. Students may be encouraged to survey their place of work, or choose a topic related to their majors, if possible. Instructors also have the option of requiring that students obtain data from federal or state government websites, i.e., census or labor data. The data set should have the following properties: The sample size should be at least 30 paired observations There are at least two quantitative variables to be used for the simple linear regression. Additional quantitative variables can be gathered if the student wishes to investigate which independent variable may be a better predictor of the dependent variable Web Sites with Data Sets: http://www.keycurriculum.com/resources/fathom-resources/freeactivities-and-resources/statistics-activities http://nilesonline.com/data/ http://www.dartmouth.edu/~chance/teaching_aids/data.html http://lib.stat.cmu.edu/DASL/ and http://lib.stat.cmu.edu/datasets/ http://www.amstat.org/publications/jse/ http://www.fedstats.gov/ http://www.rossmanchance.com/ws2/mtw/index.htm http://it.stlawu.edu/~rlock/datasurf.html#Individual http://faculty.babson.edu/turner/fish.html http://sunny.moorparkcollege.edu/~kfink/statlinks.htm After collecting, enter the data into Minitab and send the Minitab data set to the Instructor for the final approval. DATA ANALYSIS The Simple Regression Analysis Project requires students to perform each of the following: Use histograms, box plots, dot plots, and descriptive statistics to study the center, variation, distribution, outliers (if any), and trend (if applicable) for each of the quantitative variables (students should decide not only whether the data are symmetric or skewed, but whether the data are sufficiently symmetric to make the assumption of normality) Use a scatter plot, numeric measures of correlation, and simple linear regression to examine whether evidence of a relationship between two variables exists, how strong that relationship is, and whether the regression equation should be used for predictive purposes Requirements of written report: Three to five pages of written text, not counting appropriate front and end matter, tables and graphics Typed, double-spaced with 12-point font, one inch margins Formatted according to the Publication Manual of the American Psychological Association (APA). Free from spelling, punctuation, or usage errors The data records the average weight of the brain and body for a number of mammal species. Data set Body Weight 1 3.385 2 0.480 3 1.350 4 465.000 5 36.330 6 27.660 7 14.830 8 1.040 9 4.190 10 0.425 44.500 15.500 8.100 423.000 119.500 115.000 98.200 5.500 58.000 6.400 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 0.101 0.920 1.000 0.005 0.060 3.500 2.000 1.700 2547.000 0.023 187.100 521.000 0.785 10.000 3.300 0.200 1.410 529.000 207.000 85.000 0.750 62.000 6654.000 3.500 6.800 35.000 4.050 0.120 0.023 0.010 1.400 250.000 2.500 55.500 100.000 52.160 10.550 0.550 60.000 3.600 4.288 0.280 0.075 0.122 0.048 192.000 3.000 160.000 0.900 1.620 0.104 4.235 4.000 5.700 6.600 0.140 1.000 10.800 12.300 6.300 4603.000 0.300 419.000 655.000 3.500 115.000 25.600 5.000 17.500 680.000 406.000 325.000 12.300 1320.000 5712.000 3.900 179.000 56.000 17.000 1.000 0.400 0.250 12.500 490.000 12.100 175.000 157.000 440.000 179.500 2.400 81.000 21.000 39.200 1.900 1.200 3.000 0.330 180.000 25.000 169.000 2.600 11.400 2.500 50.400 The Correlation between Starting Salary and Years of Education Abstract: In this paper we show there is a significant correlation, not strong, (RSquared = 54.7%) between the Years of Education Achieved and the Starting Salary of a student. The positive slope of the regression line of 3.33 implies that for each additional year of education one should be awarded $3,300 more in annual starting salary. Thus, even though it is expensive, education does pay back. Introduction. In 2012, there was a great debate in the Wall Street Journal paper as to whether the college pays or not. [You as a student need to use APA style to refer to the paper that stirred the discussion.] Our goal in this paper is to show that education pays off in a form of higher salary. Academic advisors usually show a student tables with average starting salary for those with high school, with some college, with associate degree, with a bachelor's degree, with master's and Ph.D. degree. And those tables show a difference in mean starting salary between the groups, in the direction that those with more education tend to score a higher starting salary. In this paper we will investigate if there is a significant positive correlation between the years of education achieved and starting salary of a subject. Research Question and Literature Review. Based on the Bureau of Labor Statistics report [APA citation], education seems to be a gateway to higher earnings. However, they did not perform regression analysis, they usually summarize their results in a form of a graph like the one shown below. The graph is simple and clearly shows that highly educated people are not only paid more, but also the unemployment rate is lower among them. Research Question. In this study we are going to use regression analysis to show there is a significant positive correlation between variables X = Years of Education Achieved, the predictor variable, and Y = Starting Salary, the dependent variable. The positive slope of the regression line implies the Starting Salary is higher with more Years of Education Achieved. Based on the data, we will be able to predict starting salary for subjects with certain number of years of education achieved. The Data and Collection Method. The data is presented in the Appendix A. Since this is not a scientific paper only a modest research project, I used the data I was able to access, a convenience sample of size n = 14, and consequently any conclusion I may reach will lack the power of scientific backing. Yet, I still believe the sample is representative of the area where I live, Southeastern Michigan, if nothing else, the conclusions of this paper maybe generalized to 'my social circle,' if not to any real population. I made sure that different levels of Years of Education, which has been chosen as a Predictor Variable, were represented, to alleviate the fact that a convenience sample has been used. Summary of the Data. There are 14 subjects in this study, and for each subject I have recorded the number of years of education achieved, as well as their starting salary after graduation. The Minitab software [please check APA for reference] for data processing. We provide statistical summary for both variables, and start with the variable Years of Education. The graphical summary is provided below: This useful Minitab procedure also provides a very useful number, the p-value for Anderson Darling normality test. Based on that number there is 23.8% of probability that this data comes from a population that is normally distributed. Based on the fitted histogram we assume it has roughly a normal distribution. The values 12, 14, 16, 18 and 21, repeat more often than others since they represent high school diploma, Associate Degree, Bachelor's Degree, Master's Degree and Ph.D. respectively. The data centers around 16, which implies that prediction with the value of the predictor close to 16 are the most reliable. The predictions outside the range of 10 to 21 Years of Education Achieved are not reliable. The median of 15 years, indicates the data is fairly symmetric, since there is no large difference between the mean and median. The boxplot does not indicate presence of outliers in the data. The Stem and Leaf Plot summarizes the variable Years of Education well, since the sample size is fairly small: Stem-and-Leaf Display: Years of Education Stem-and-leaf of Years of Education Leaf Unit = 1.0 1 2 N = 14 02234446889 111 For the variable Starting Salary, the graphical summary procedure provides a first view into the data The Anderson Darling test as well as the fitted histogram point at a strong possibility the data may come from a normally distributed population. The annual starting salaries range from 16 to 68 thousands of dollars, with a mean of 40 thousands. The median being 39 thousands, together with the box plot pointing at a fairly small skew to the left. Due to a fairly small sample size, the Stem and Leaf plot is a good way to present data, but this time I have select a dot plot. Dotplot of Starting Salary 21 28 35 42 49 56 63 Starting Salary The dot plot reveals starting annual salaries almost evenly spread over the range from 16 to 68. The data is fairly symmetric around the mean, which is 40 ($40,000 starting salary.) The Measures of Correlation. The scatter plot is the simplest and most of the times the best indicator as to whether there is a linear relationship between the variables. Years of Education vs. Starting Salary 70 H L 60 M Starting Salary I 50 B D N 40 G K F J 30 A 20 C E 10 10 12 14 16 Years of Education 18 20 22 The scatter plot reveals linear relationship between variables, with a positive slope, definitely not a strong one, but certainly there is statistically significant straight line pattern. The subject labeled by M deviates from the trend the most. Subject M has 14 years of education with a starting salary of $55,000, which is more than low $30K, one should be starting with an associate degree. Looking back at the data, I recall that this subject has already been working as an intern and was offered full time job upon graduation. Subjects L and H started higher than they should, and subject D should have a larger starting salary for his 21 years of education, as well as the subject K whose starting salary is lower than it should be for one with a bachelor's degree. To investigate the relationship between variables Years of education and Starting Salary further we look into numeric measures of correlation. The output of the correlation and regression Minitab procedures is shown below. Correlations: Years of Education, Starting Salary Pearson correlation of Years of Education and Starting Salary = 0.763 P-Value = 0.002 Regression Analysis: Starting Salary versus Years of Education The regression equation is Starting Salary = - 13.1 + 3.33 Years of Education S = 10.9709 R-Sq = 58.2% R-Sq(adj) = 54.7% A small p-value of .002, indicates a statistically significant correlation between variables, meaning the pattern we have observed cannot be observed by chance assuming the variables are not correlated. The Pearson Correlation Coefficient of .763 confirms a positive correlation, perhaps not large enough to call it strong. The R-Squared Adjusted of 54.7% tells us that 54.7% of variation in variable Starting Salary is explained by the Years of Education, which leaves 45.3% of variation caused by other sources, for example, experience gained during college. It is not great but a significant portion of variation. The slope of the regression line being 3.33, means that for each additional year of education attained one is expected to be awarded $3,330 more in annual starting salary. The y-intercept of the regression line of -13.1 does not make practical sense in this case. Using the equation of the regression line is y 13.1 3.33x , we may answer the questions What is the annual starting salary for one with 18 years of education (Master's Degree)? Answer: We replace x by 18 in the regression equation to get: y 13.1 3.33 *18 46.4 . It means that one who has earned a Master's Degree is expect to start at $46,400. Conclusion. The data provides some evidence that education pays off in terms of a higher starting salary. From the Scatter Plot and numeric measures of correlation, like Adjusted R-Squared we understand there are many other factors that contribute to a higher starting salary. For example: during the studies, has the student got experience in the field in a form of an internship; has the student earned awards, scholarships; how diligently has the student contacted his/her job search; interviewing skills, initiative, energy, connections... So, these factors must account for the 45% of variation in starting salary not explained by the years of education. On the other hand, there are factors that cannot be measured, that we could not touch in this project: many jobs require appropriate degree so one cannot be considered for such a job without the degree; Good college education helps students become independent learners, so the students who learn expand their horizons as well as possibilities; Also, let us not miss a parent's prospective of having a late-teen at home doing nothing - not developing job skills that will be useful later in life. So, overall, I would recommend attending college. References [on a fresh page, list your references here, include relevant websites, Minitab and textbook] Appendix A: Data Subject ID A B C D E F G H I J K L M N Years of Starting Education Salary 12 23 18 48 10 16 21 42 12 17 14 32 16 38 21 68 21 52 14 29 18 35 19 65 14 55 13 40 Project on Regression Analysis Introduction The technique of the regression analysis is very useful for the estimation of dependent variables. If there is statistically significant linear association exists between the dependent variable and independent variable then we can use regression analysis for the prediction of the dependent variable. For this project we have to estimate or predict the values of the dependent variable weight of the brain based on the weight of body for the different animals. Also, we have to check the amount of linear association between the given two variables. Let us see this simple regression analysis in detail. Research Question For any research project, it is very important to establish the research question or hypothesis. For this research project the research question is established as below: Is there any statistically significant linear relationship exists between the dependent variable weight of brain and independent variable weight of body? Data Collection For this research project, data is collected for 62 animals. Data is primarily collected for only two variables such as weight of body and weight of brain. The scientific laboratory method is used for measuring the weights of body and brain. For this project, the dependent variable is selected as weight of brain and independent variable as weight of body. We can easily available the weight of body but we cannot measure weight of brain so easily. Statistical Analysis and Results First of all we have to see the descriptive statistics for the given data. The descriptive statistics for the first variable weight of the brain is summarised in the following table: It is observed that the average weight of the brain is given as 198.6448 grams with the standard deviation of 899.18971 gram. The average weight of the body is given as 283.1342 gram with the standard deviation of 930.27894 gram. Descriptive Statistics Mean Std. Deviation N Weight of brain 198.6448 899.18971 62 Weight of body 283.1342 930.27894 62 Now, we have to see the correlation coefficient between given two variables: Correlations Weight of brain Pearson Correlation Sig. (1-tailed) N Weight of body Weight of brain 1.000 .934 Weight of body .934 1.000 Weight of brain . .000 Weight of body .000 . Weight of brain 62 62 Weight of body 62 62 The correlation coefficient between the two variables weight of brain and weight of body is given as 0.934, which means there is high positive or very strong positive linear association or correlation exists between the two variables such as weight of the brain and weight of the body. Now, we have to construct the regression model for the prediction of the dependent variable or response variable weight of brain based on the independent variable or predictor or explanatory variable weight of body. The regression model is given as below: Variables Entered/Removedb Model 1 Variables Entered Variables Removed Weight of bodya Method . Enter a. All requested variables entered. b. Dependent Variable: Weight of brain Model summary for the regression model is given as below: Model Summary Model 1 R .934a R Square Adjusted R Square .873 Std. Error of the Estimate .871 323.54146 a. Predictors: (Constant), Weight of body The correlation coefficient between the dependent variable weight of brain and independent variable weight of body is given as 0.934 which indicate a strong positive linear association exists between the given two variables weight of brain and weight of body. The coefficient of determination or the value of R square is given as 0.873, which means about 87.3% of the variation in the dependent variable weight of brain is explained by the independent variable weight of body. The ANOVA table for this regression model is given as below: ANOVAb Sum of Squares Model 1 Regression Residual Total df Mean Square 4.304E7 1 4.304E7 6280744.755 60 104679.079 4.932E7 61 F 411.165 Sig. .000a a. Predictors: (Constant), Weight of body b. Dependent Variable: Weight of brain For this ANOVA, the test statistic value F is given as 411.165 with the p-value as 0.00. This means p-value is less than the level of significance or alpha value 0.05 or 5%, so we reject the null hypothesis that there is no any statistically significant linear relationship exists between the dependent variable weight of brain and independent variable weight of body. The regression coefficients for the regression equation for the given regression model are summarised as below: Coefficientsa Unstandardized Coefficients Model 1 B Standardized Coefficients Std. Error (Constant) -57.009 42.981 Weight of body .903 .045 Beta t .934 Sig. -1.326 .190 20.277 .000 a. Dependent Variable: Weight of brain The regression equation for the prediction or estimation of the dependent variable weight of brain is given as below: Y = -57.009 + 0.903*X Weight of brain = -57.009 + 0.903*Weight of body The y-intercept for this regression equation is given as -57.009 which is not statistically significant because the concerning p-value is given as 0.19 which is greater than alpha 0.05. The slope for the regression equation is given as 0.903 which is statistically significant because the concerning p-value is given as 0.00 which is less than alpha value 0.05. By using the above regression equation, we can easily estimate the weights of the brain based on the weight of the body. Simple Regression Analysis Project Introduction The technique of the regression analysis is very useful for the estimation of dependent variables. If there is statistically significant linear association exists between the dependent variable and independent variable then we can use regression analysis for the prediction of the dependent variable. For this project we have to estimate or predict the values of the dependent variable weight of the brain based on the weight of body for the different animals. Also, we have to check the amount of linear association between the given two variables. Let us see this simple regression analysis in detail. Research Question For any research project, it is very important to establish the research question or hypothesis. For this research project the research question is established as below: Is there any statistically significant linear relationship exists between the dependent variable weight of brain and independent variable weight of body? Data Collection For this research project, data is collected for 62 animals. Data is primarily collected for only two variables such as weight of body and weight of brain. The scientific laboratory method is used for measuring the weights of body and brain. For this project, the dependent variable is selected as weight of brain and independent variable as weight of body. We can easily available the weight of body but we cannot measure weight of brain so easily. Statistical Analysis and Results First of all we have to see the descriptive statistics for the given data. The descriptive statistics for the first variable weight of the brain is summarised in the following table: It is observed that the average weight of the brain is given as 198.6448 grams with the standard deviation of 899.18971 gram. The average weight of the body is given as 283.1342 gram with the standard deviation of 930.27894 gram. Descriptive Statistics Mean Std. Deviation N Weight of brain 198.6448 899.18971 62 Weight of body 283.1342 930.27894 62 Now, we have to see the correlation coefficient between given two variables: Correlations Weight of brain Pearson Correlation Sig. (1-tailed) N Weight of body Weight of brain 1.000 .934 Weight of body .934 1.000 Weight of brain . .000 Weight of body .000 . Weight of brain 62 62 Weight of body 62 62 The correlation coefficient between the two variables weight of brain and weight of body is given as 0.934, which means there is high positive or very strong positive linear association or correlation exists between the two variables such as weight of the brain and weight of the body. Now, we have to construct the regression model for the prediction of the dependent variable or response variable weight of brain based on the independent variable or predictor or explanatory variable weight of body. The regression model is given as below: Variables Entered/Removedb Model 1 Variables Entered Variables Removed Weight of bodya Method . Enter a. All requested variables entered. b. Dependent Variable: Weight of brain Model summary for the regression model is given as below: Model Summary Model 1 R .934a R Square .873 Adjusted R Square .871 Std. Error of the Estimate 323.54146 a. Predictors: (Constant), Weight of body The correlation coefficient between the dependent variable weight of brain and independent variable weight of body is given as 0.934 which indicate a strong positive linear association exists between the given two variables weight of brain and weight of body. The coefficient of determination or the value of R square is given as 0.873, which means about 87.3% of the variation in the dependent variable weight of brain is explained by the independent variable weight of body. The ANOVA table for this regression model is given as below: ANOVAb Sum of Squares Model 1 Regression Residual Total df Mean Square 4.304E7 1 4.304E7 6280744.755 60 104679.079 4.932E7 61 F Sig. 411.165 .000a a. Predictors: (Constant), Weight of body b. Dependent Variable: Weight of brain For this ANOVA, the test statistic value F is given as 411.165 with the p-value as 0.00. This means p-value is less than the level of significance or alpha value 0.05 or 5%, so we reject the null hypothesis that there is no any statistically significant linear relationship exists between the dependent variable weight of brain and independent variable weight of body. The regression coefficients for the regression equation for the given regression model are summarised as below: Coefficientsa Unstandardized Coefficients Model 1 B Standardized Coefficients Std. Error (Constant) -57.009 42.981 Weight of body .903 .045 Beta t .934 Sig. -1.326 .190 20.277 .000 a. Dependent Variable: Weight of brain The regression equation for the prediction or estimation of the dependent variable weight of brain is given as below: Y = -57.009 + 0.903*X Weight of brain = -57.009 + 0.903*Weight of body The y-intercept for this regression equation is given as -57.009 which is not statistically significant because the concerning p-value is given as 0.19 which is greater than alpha 0.05. The slope for the regression equation is given as 0.903 which is statistically significant because the concerning p-value is given as 0.00 which is less than alpha value 0.05. By using the above regression equation, we can easily estimate the weights of the brain based on the weight of the body

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Linear Algebra and Its Applications

Authors: David C. Lay

4th edition

321791541, 978-0321388834, 978-0321791542

More Books

Students also viewed these Mathematics questions