200 10 150 115 35 200 231 150 100 120 90 100 85 250 60 90 90 50 100 175 100 100 60 60 60 85 150 105 25 100 120 50 150 75 30 259 75 120 117 99 300 100 110 60 150 150 120 60 450 100 150 100 150 150 35 60 amount 85 75 75 50 200 752416 45 5 125 68 120 50 150 50 150 150 150 175 100 80 112 96 376 100 120 50 100 150 150 150 200 150 100 200 200 300 300 185 250 100 200 71 25 150 110 40 87 65 300 80 Exploring the Sample Data One reason for exploring the sample data is determine if it is appropriate to use the t methods to perform inference There are two components to exploring the sample data 1) obtain the summary (or descriptive ) statistics 2) obtain a graphical display Let's start with the summary statistics Note the summary () command gives all the information from the favstats() command except the sample size and the standard deviation These two are obtained by using the length() command and sd () command 4 (2 points) Provide a table of summary statistics in your document for Question 4 Note do not copy and paste the table obtained in R Rather, create your own table of summary statistics similar to the one provided below (don't forget to include units ) Sample size Minimum 1st quartile mean Median 3rd quartile Maximum Standard deviation (variable name) Summary statistics using the MOSAIC package Obtain relevant summary statistics using the favstats() command in the MOSAIC package See the Introductory R Tutorial for information on installing the MOSAIC package on your computer if you do not already have it installed (Information is also given in the R Tutorial Assignment 1 R script ) Note the MOSAIC package is not compatible with some computers laptops If you are unable to install the MOSAIC package on your device, do not worry Alternate code will always be given so that you will be able to obtain the output necessary to answer questions on the assignment Once installed, the next step is to load the MOSAIC package for this session (remember, the MOSAIC package must be loaded each time you open up R if you plan on using the package ) require(mosaic) The general code for the favstats() command for a single quantitative variable is favstats (zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable Graphical Display the histogram The general code for a histogram is as follows hist (zzz$yyy, col main xlab 5 Replace zzz with the name of the data set Replace yyy with the name of the variable Give the bars in the histogram a color by typing the name of a color between the quotes in the col argument Give the histogram a title by typing a title between the quotes in the main argument I Label the x axis of the histogram by typing the label between the quotes in the xlab argument (1 point) Copy and paste your properly labeled histogram into your document for Question 5 You do not need to comment about your histogram at this point Alternate code if you do not have the MOSAIC package summary(zzz$yyy) length(zzz$yyy) sd(zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable 2 Graphical Display the normal probability plot You will be asked to discuss the shape of the sample data in a question below In some situations (example small sample sizes), it may be hard to assess if the sample data are approximately normal Another graph that can better assess if the sample data are normally distributed is the normal probability plot The general code to obtain a normal probability plot is as follows 3 qqnorm(zzz$yyy, pch main Normal Probabilty Plot ) Then add a reference line to the plot qqline (zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable If desired, give the graph a different plotting character than the default (which is an open circle) by putting a number next to pch For example, pch 19 will put filled in circles in the plot Title the plot Normal Probability Plot as shown in the main argument above The qqline() command adds a reference line to help us determine if the data are normally distributed if the points fall right on the line, the data are normally distributed 6 (1 point) Copy and paste your normal probability plot into your document for Question 6 7 (2 points) Based on the summary statistics and either the histogram or normal probability plot, do you feel the t methods can be used for inference Briefly explain why or why not (Note recall that the t methods should only be used if the distribution of sample means is approximately normal Therefore, the answer to this question should focus on whether you believe the distribution of sample means will be approximately normal and why or why not Make sure to specify what graph and what feature of that graph you are using to support your answer ) Regardless of how you answered question 7, use the t methods to construct the confidence interval t test(zzzyyy, conf level ccc) Replace zzz with the name of the data set Replace yyy with the name of the variable Replace ccc with desired level of confidence as a proportion (i e 0 95 for 95 confidence level) By default, the confidence level is 0 95 Note even though you are only being asked for a confidence interval, the t test() command is used The top part of the output will provide information for a hypothesis test ignore that part if all you want is a confidence interval About halfway down the output, you will find the bounds of the confidence interval As an illustration, here is the code and output for a non related problem The problem involves a random sample of 34 students each was asked the amount of minutes per day spent exercising t test(exercise$time, conf level 0 95) data One Sample t test exercise$time t 8 9529, df 33, p value 2 394c 10 alternative hypothesis true mean is not equal to 0 95 percent confidence interval 32 11473 51 00292 sample estimates mean of x 41 55882 Ignore for confidence interval This is what to look for Now, try it for the food data set Answer the next two questions based on the food data set (and not the output given above ) 8 (4 points) In proper syntax, report the 95 confidence interval for the mean amount spent on food per week for college students Then interpret the confidence interval in the context of the problem Suppose Adam's mother wondered if saving $100 a week was appropriate to cover Adam's expenses for a year Based on the sample, is there evidence to indicate that the average amount college students spend on food each week is different than $100 9 (3 points) State the null and alternative hypotheses in notation Define the notation used in the context of the problem (That is, define the parameter used in the notation in the context of the problem ) The last sentence in the paragraph above will be helpful in determining the alternative hypothesis 10 (2 points) Using only the confidence interval constructed in 8, what decision would you make and at what significance level (i e reject the null hypothesis or fail to reject the null hypothesis ) Explain why you are making this decision Again, your support should only reference the confidence interval (i e no p value here ) To determine the p value using the t methods, the same code is used as above We'll add two additional arguments and remove the conf level argument (although it's okay to leave conf level 0 95 in) the direction of the alternative hypothesis the value of the mean if the null hypothesis is true Here is the code for a non related problem where Ho 50 minutes day HA 50 minutes day 4 5 t test (exercise$time, alternative two sided , mu One Sample t test data exercise$time t 1 8185, df 33, p value 0 07808 alternative hypothesis true mean is not equal to 50 95 percent confidence interval 32 11473 51 00292 sample estimates mean of x 41 55882 50) The information for the hypothesis test (t statistic, degrees of freedom, p value) are given in boxed part of the output Now, return to the food data set and use the above code to determine the p value from the t methods Using the output from the food data set, answer the following questions (Again, do not use the output given above as that was for illustration purposes only and not related to the food data set ) 11 (2 points) Report the value of the t statistic with degrees of freedom 12 (3 points) Based on the p value, state a conclusion in the context of the problem (Report the p value in your conclusion Remember, a conclusion is written in terms of evidence to say the alternative hypothesis is true ) If you are performing a one sided hypothesis test, you MUST re do the t test command to obtain both bounds of the confidence interval using the code to construct a confidence interval only t test(zzz$yyy, conf level ccc) Replace zzz with the name of the data set Replace yyy with the name of the variable Replace ccc with desired level of confidence as a proportion One final note Suppose you were performing a one sided test The confidence interval reported in the output will only provide one bound For example, in the non related problem, suppose Ho 50 minutes day HA 50 minutes day The code is as follows (I've changed the confidence level to 90 to illustrate how to do this in R) t test (exercise$time, alternative less , mu 50, conf level 0 90) Below is the output Note how the lower bound of the confidence interval is Inf (which stands for negative infinity ) Of course, this is not a legitimate lower bound One Sample t test data exercise$time t 1 8185, df 33, p value 0 03904 alternative hypothesis true mean is less than 50 90 percent confidence interval Inf 47 62926 sample estimates mean of x 41 55882 6

Question

200 10 150 115 35 200 231 150 100 120 90 100 85 250 60 90 90 50 100 175 100 100 60 60 60 85 150 105 25 100 120 50 150 75 30 259 75 120 117 99 300 100 110 60 150 150 120 60 450 100 150 100 150 150 35 60 amount 85 75 75 50 200 752416 45 5 125 68 120 50 150 50 150 150 150 175 100 80 112 96 376 100 120 50 100 150 150 150 200 150 100 200 200 300 300 185 250 100 200 71 25 150 110 40 87 65 300 80 Exploring the Sample Data One reason for exploring the sample data is determine if it is appropriate to use the t methods to perform inference  There are two components to exploring the sample data  1) obtain the summary (or  descriptive ) statistics 2) obtain a graphical display Let's start with the summary statistics  Note  the summary () command gives all the information from the favstats() command except the sample size and the standard deviation  These two are obtained by using the length() command and sd () command  4  (2 points) Provide a table of summary statistics in your document for Question 4  Note  do not copy and paste the table obtained in R  Rather, create your own table of summary statistics similar to the one provided below (don't forget to include units )  Sample size Minimum 1st quartile mean Median 3rd quartile Maximum Standard deviation (variable name) Summary statistics using the MOSAIC package Obtain relevant summary statistics using the favstats() command in the MOSAIC package  See the Introductory R Tutorial for information on installing the MOSAIC package on your computer if you do not already have it installed  (Information is also given in the  R Tutorial Assignment 1 R script  ) Note  the MOSAIC package is not compatible with some computers laptops  If you are unable to install the MOSAIC package on your device, do not worry  Alternate code will always be given so that you will be able to obtain the output necessary to answer questions on the assignment  Once installed, the next step is to load the MOSAIC package for this session (remember, the MOSAIC package must be loaded each time you open up R if you plan on using the package ) require(mosaic) The general code for the favstats() command for a single quantitative variable is favstats (zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable Graphical Display  the histogram The general code for a histogram is as follows    hist (zzz$yyy, col   main   xlab   5  Replace zzz with the name of the data set Replace yyy with the name of the variable Give the bars in the histogram a color by typing the name of a color between the quotes in the col   argument  Give the histogram a title by typing a title between the quotes in the main   argument   I   Label the x axis of the histogram by typing the label between the quotes in the xlab   argument  (1 point) Copy and paste your properly labeled histogram into your document for Question 5  You do not need to comment about your histogram at this point  Alternate code if you do not have the MOSAIC package  summary(zzz$yyy) length(zzz$yyy) sd(zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable 2 Graphical Display  the normal probability plot You will be asked to discuss the shape of the sample data in a question below  In some situations (example  small sample sizes), it may be hard to assess if the sample data are approximately normal  Another graph that can better assess if the sample data are normally distributed is the normal probability plot  The general code to obtain a normal probability plot is as follows  3 qqnorm(zzz$yyy, pch  main    Normal Probabilty Plot ) Then add a reference line to the plot  qqline (zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable If desired, give the graph a different plotting character than the default (which is an open circle) by putting a number next to pch   For example, pch   19 will put filled in circles in the plot  Title the plot  Normal Probability Plot  as shown in the main   argument above  The qqline() command adds a reference line to help us determine if the data are normally distributed   if the points fall right on the line, the data are normally distributed  6  (1 point) Copy and paste your normal probability plot into your document for Question 6  7  (2 points) Based on the summary statistics and either the histogram or normal probability plot, do you feel the t methods can be used for inference  Briefly explain why or why not  (Note  recall that the t methods should only be used if the distribution of sample means is approximately normal  Therefore, the answer to this question should focus on whether you believe the distribution of sample means will be approximately normal and why or why not  Make sure to specify what graph and what feature of that graph you are using to support your answer ) Regardless of how you answered question 7, use the t methods to construct the confidence interval  t test(zzzyyy, conf level   ccc) Replace zzz with the name of the data set Replace yyy with the name of the variable Replace ccc with desired level of confidence as a proportion (i e  0 95 for 95  confidence level)  By default, the confidence level is 0 95  Note  even though you are only being asked for a confidence interval, the t test() command is used  The top part of the output will provide information for a hypothesis test   ignore that part if all you want is a confidence interval  About halfway down the output, you will find the bounds of the confidence interval  As an illustration, here is the code and output for a non related problem  The problem involves a random sample of 34 students   each was asked the amount of minutes per day spent exercising    t test(exercise$time, conf level   0 95) data  One Sample t test exercise$time t   8 9529, df   33, p value   2 394c 10 alternative hypothesis  true mean is not equal to 0 95 percent confidence interval  32 11473 51 00292 sample estimates  mean of x 41 55882 Ignore for confidence interval This is what to look for   Now, try it for the food data set  Answer the next two questions based on the food data set (and not the output given above )  8  (4 points) In proper syntax, report the 95  confidence interval for the mean amount spent on food per week for college students  Then interpret the confidence interval in the context of the problem  Suppose Adam's mother wondered if saving $100 a week was appropriate to cover Adam's expenses for a year  Based on the sample, is there evidence to indicate that the average amount college students spend on food each week is different than $100  9  (3 points) State the null and alternative hypotheses in notation  Define the notation used in the context of the problem  (That is, define the parameter used in the notation in the context of the problem ) The last sentence in the paragraph above will be helpful in determining the alternative hypothesis  10  (2 points) Using only the confidence interval constructed in  8, what decision would you make and at what significance level  (i e   reject the null hypothesis  or  fail to reject the null hypothesis ) Explain why you are making this decision  Again, your support should only reference the confidence interval (i e  no p value here  )  To determine the p value using the t methods, the same code is used as above  We'll add two additional arguments and remove the conf level argument (although it's okay to leave conf level 0 95 in)  the direction of the alternative hypothesis the value of the mean if the null hypothesis is true  Here is the code for a non related problem where Ho    50 minutes day HA  50 minutes day 4 5   t  test (exercise$time, alternative    two sided , mu One Sample t test data  exercise$time  t   1 8185, df   33, p value   0 07808 alternative hypothesis  true mean is not equal to 50 95 percent confidence interval  32 11473 51 00292 sample estimates  mean of x 41 55882   50) The information for the hypothesis test (t statistic, degrees of freedom, p value) are given in boxed part of the output  Now, return to the food data set and use the above code to determine the p value from the t  methods  Using the output from the food data set, answer the following questions  (Again, do not use the output given above as that was for illustration purposes only and not related to the food data set ) 11  (2 points) Report the value of the t statistic with degrees of freedom  12  (3 points) Based on the p value, state a conclusion in the context of the problem  (Report the p value in your conclusion  Remember, a conclusion is written in terms of evidence to say the alternative hypothesis is true ) If you are performing a one sided hypothesis test, you MUST re do the t test command to obtain both bounds of the confidence interval using the code to construct a confidence interval only  t test(zzz$yyy, conf level   ccc) Replace zzz with the name of the data set Replace yyy with the name of the variable Replace ccc with desired level of confidence as a proportion  One final note  Suppose you were performing a one sided test  The confidence interval reported in the output will only provide one bound  For example, in the non related problem, suppose Ho  50 minutes day HA   50 minutes day The code is as follows (I've changed the confidence level to 90  to illustrate how to do this in R)    t  test (exercise$time, alternative    less , mu   50, conf  level   0 90) Below is the output  Note how the lower bound of the confidence interval is   Inf  (which stands for  negative infinity )  Of course, this is not a legitimate lower bound  One Sample t test data  exercise$time t    1 8185, df   33, p value   0 03904 alternative hypothesis  true mean is less than 50 90 percent confidence interval   Inf 47 62926 sample estimates  mean of x 41 55882 6

Accepted Answer

The Answer is in the image, click to view ...

Question

200 10 150 115 35 200 231 150 100 120 90 100 85 250 60 90 90 50 100 175 100 100 60 60

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Modeling the Dynamics of Life Calculus and Probability for Life Scientists

Students also viewed these Mathematics questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question