Answered step by step
Verified Expert Solution
Question
1 Approved Answer
200 10 150 115 35 200 231 150 100 120 90 100 85 250 60 90 90 50 100 175 100 100 60 60
200 10 150 115 35 200 231 150 100 120 90 100 85 250 60 90 90 50 100 175 100 100 60 60 60 85 150 105.25 100 120 50 150.75 30 259.75 120 117.99 300 100 110 60 150 150 120 60 450 100 150 100 150 150 35 60 amount 85 75 75 50 200 752416 45.5 125 68 120 50 150 50 150 150 150 175 100 80 112.96 376 100 120 50 100 150 150 150 200 150 100 200 200 300 300 185 250 100 200 71.25 150 110 40.87 65 300 80 Exploring the Sample Data One reason for exploring the sample data is determine if it is appropriate to use the t-methods to perform inference. There are two components to exploring the sample data: 1) obtain the summary (or "descriptive") statistics 2) obtain a graphical display Let's start with the summary statistics. Note: the summary () command gives all the information from the favstats() command except the sample size and the standard deviation. These two are obtained by using the length() command and sd () command. 4. (2 points) Provide a table of summary statistics in your document for Question 4. Note: do not copy and paste the table obtained in R. Rather, create your own table of summary statistics similar to the one provided below (don't forget to include units!): Sample size Minimum 1st quartile mean Median 3rd quartile Maximum Standard deviation (variable name) Summary statistics using the MOSAIC package Obtain relevant summary statistics using the favstats() command in the MOSAIC package. See the Introductory R Tutorial for information on installing the MOSAIC package on your computer if you do not already have it installed. (Information is also given in the "R Tutorial Assignment 1 R script".) Note: the MOSAIC package is not compatible with some computers/laptops. If you are unable to install the MOSAIC package on your device, do not worry. Alternate code will always be given so that you will be able to obtain the output necessary to answer questions on the assignment. Once installed, the next step is to load the MOSAIC package for this session (remember, the MOSAIC package must be loaded each time you open up R if you plan on using the package.) require(mosaic) The general code for the favstats() command for a single quantitative variable is favstats (zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable Graphical Display: the histogram The general code for a histogram is as follows: " hist (zzz$yyy, col = main = xlab = 5. Replace zzz with the name of the data set Replace yyy with the name of the variable Give the bars in the histogram a color by typing the name of a color between the quotes in the col = argument. Give the histogram a title by typing a title between the quotes in the main = argument. "I " Label the x-axis of the histogram by typing the label between the quotes in the xlab = argument. (1 point) Copy and paste your properly-labeled histogram into your document for Question 5. You do not need to comment about your histogram at this point. Alternate code if you do not have the MOSAIC package: summary(zzz$yyy) length(zzz$yyy) sd(zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable 2 Graphical Display: the normal probability plot You will be asked to discuss the shape of the sample data in a question below. In some situations (example: small sample sizes), it may be hard to assess if the sample data are approximately normal. Another graph that can better assess if the sample data are normally distributed is the normal probability plot. The general code to obtain a normal probability plot is as follows: 3 qqnorm(zzz$yyy, pch= main = "Normal Probabilty Plot") Then add a reference line to the plot: qqline (zzz$yyy) Replace zzz with the name of the data set Replace yyy with the name of the variable If desired, give the graph a different plotting character than the default (which is an open circle) by putting a number next to pch = For example, pch = 19 will put filled-in circles in the plot. Title the plot "Normal Probability Plot" as shown in the main = argument above. The qqline() command adds a reference line to help us determine if the data are normally distributed - if the points fall right on the line, the data are normally distributed. 6. (1 point) Copy and paste your normal probability plot into your document for Question 6. 7. (2 points) Based on the summary statistics and either the histogram or normal probability plot, do you feel the t-methods can be used for inference? Briefly explain why or why not. (Note: recall that the t-methods should only be used if the distribution of sample means is approximately normal. Therefore, the answer to this question should focus on whether you believe the distribution of sample means will be approximately normal and why or why not. Make sure to specify what graph and what feature of that graph you are using to support your answer!) Regardless of how you answered question 7, use the t-methods to construct the confidence interval: t.test(zzzyyy, conf.level = ccc) Replace zzz with the name of the data set Replace yyy with the name of the variable Replace ccc with desired level of confidence as a proportion (i.e. 0.95 for 95% confidence level). By default, the confidence level is 0.95. Note: even though you are only being asked for a confidence interval, the t.test() command is used. The top part of the output will provide information for a hypothesis test - ignore that part if all you want is a confidence interval. About halfway down the output, you will find the bounds of the confidence interval. As an illustration, here is the code and output for a non-related problem. The problem involves a random sample of 34 students - each was asked the amount of minutes per day spent exercising. > t.test(exercise$time, conf.level = 0.95) data: One Sample t-test exercise$time t = 8.9529, df = 33, p-value - 2.394c-10 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 32.11473 51.00292 sample estimates: mean of x 41.55882 Ignore for confidence interval This is what to look for!! Now, try it for the food data set. Answer the next two questions based on the food data set (and not the output given above!). 8. (4 points) In proper syntax, report the 95% confidence interval for the mean amount spent on food per week for college students. Then interpret the confidence interval in the context of the problem. Suppose Adam's mother wondered if saving $100 a week was appropriate to cover Adam's expenses for a year. Based on the sample, is there evidence to indicate that the average amount college students spend on food each week is different than $100? 9. (3 points) State the null and alternative hypotheses in notation. Define the notation used in the context of the problem. (That is, define the parameter used in the notation in the context of the problem.) The last sentence in the paragraph above will be helpful in determining the alternative hypothesis. 10. (2 points) Using only the confidence interval constructed in #8, what decision would you make and at what significance level? (i.e. "reject the null hypothesis" or "fail to reject the null hypothesis.) Explain why you are making this decision. Again, your support should only reference the confidence interval (i.e. no p-value here!!). To determine the p-value using the t-methods, the same code is used as above. We'll add two additional arguments and remove the conf.level argument (although it's okay to leave conf.level=0.95 in): the direction of the alternative hypothesis the value of the mean if the null hypothesis is true. Here is the code for a non-related problem where Ho: = 50 minutes/day HA: 50 minutes/day 4 5 > t. test (exercise$time, alternative = "two.sided", mu One Sample t-test data: exercise$time |t = 1.8185, df = 33, p-value = 0.07808 alternative hypothesis: true mean is not equal to 50 95 percent confidence interval: 32.11473 51.00292 sample estimates: mean of x 41.55882 = 50) The information for the hypothesis test (t-statistic, degrees of freedom, p-value) are given in boxed part of the output. Now, return to the food data set and use the above code to determine the p-value from the t- methods. Using the output from the food data set, answer the following questions. (Again, do not use the output given above as that was for illustration purposes only and not related to the food data set.) 11. (2 points) Report the value of the t-statistic with degrees of freedom. 12. (3 points) Based on the p-value, state a conclusion in the context of the problem. (Report the p-value in your conclusion. Remember, a conclusion is written in terms of evidence to say the alternative hypothesis is true.) If you are performing a one-sided hypothesis test, you MUST re-do the t.test command to obtain both bounds of the confidence interval using the code to construct a confidence interval only: t.test(zzz$yyy, conf.level = ccc) Replace zzz with the name of the data set Replace yyy with the name of the variable Replace ccc with desired level of confidence as a proportion. One final note. Suppose you were performing a one-sided test. The confidence interval reported in the output will only provide one bound. For example, in the non-related problem, suppose Ho: 50 minutes/day HA:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started