Data
https://drive.google.com/file/d/1sz32tfzRPM1DgkDpm57ZL2pwVHYrG0oN/view?usp=sharing
# Upload the epa data set
# call the dataset cardata
cardata = read.csv(file.choose(), header = TRUE)
View(cardata)
# Create a side by side boxplot
boxplot(FiveYearEstimatedCost~Guzzler, data = cardata,
main = "add title",
xlab = "add horizontal axis name",
col = c("red", "green"), ### add different color names
horizontal = TRUE)
# Calculate the mean, sd and sample sizes for five year fuel cost split among Guzzler and no Guzzler
# the command aggregate() will perform the given function on the specified groups
# aggregate(response~treatment, data = datasetname, function)
# Make a table of values with the results.
aggregate(FiveYearEstimatedCost~Guzzler, data = cardata, mean)
aggregate(FiveYearEstimatedCost~Guzzler, data = cardata, sd)
aggregate(FiveYearEstimatedCost~Guzzler, data = cardata, length)
# What type of vehicles are Guzzlers? Find out by subsetting the dataset to only include Guzzlers.
GuzzlersOnly = subset(cardata, Guzzler == "Guzzler")
View(GuzzlersOnly)
# Perform a two sample t test
# Use the same response~treatment format as the functions above.
# t.test(response~treatment, data = datasetname, conf.level = enterconfidencelevel, alternative = "two.sided" ))
This is the output of a data set. > # Calculate the mean, sd and sample sizes for five year fuel cost split among Guzzler and no Guzzler # the command aggregate() will perform the given function on the specified groups # aggregate (response~treatment, data = datasetname, function) > # Make a table of values with the results. > > aggregate (FiveYearEstimatedCost~Guzzler, data = cardata, mean) Guzzler FiveYearEstimatedCost JH Guzzler 8666. 667 2 Non-Guzzler 2304. 328 > aggregate(FiveYearEstimatedCost~Guzzler, data = cardata, sd) Guzzler FiveYearEstimatedCost H Guzzler 1437 . 391 2 Non-Guzzler 2590.439 > aggregate(FiveYearEstimatedCost~Guzzler, data = cardata, length) Guzzler FiveYearEstimatedCost JH Guzzler 30 2 Non-Guzzler 543 2. Describe the distribution in context. Is there visual evidence the average five-year fuel cost is different between the Guzzler and Non-Guzzler vehicles? Explain._3. Provide an organized table of the summagg statistics. Include the sample means, standard deviations and sample sizes for each group. Round to nearest whole number. Fill this table. Mean Standard Deviation Sample Size Guzzler NonaGuzzler What type of vehicles are guzzlers? Hypotheses: The null and alternative hypotheses are as follows, where we assume Population 1 vehicles are guzzlers and Population 2 are non-guzzlers. Hu:,tt1,tt2=0 Ha3t'11_.z *0 Checking Conditions: The sampling method is asmmed to be an attempt at a census rather than a random sample. This means that the data is not random; it is most likely representative of guzzlers and nonguzzler vehicles sold in the US. The sample sizes are large enough so that the sampling distributions for gland f2 are both normal according to the central limit theorem. Lastly. the populations are independent. There is no repeated measurement nor is there any dependence between the two groups. Overall, the conditions are somewhat met. The sampling method is unknown so we should consider this in our conclusions. Calculate: a. b. C. d. e. (2 points) From the summaryr statistics calculate the test statistic \"by hand\". Show work. (1 point) State the degrees of freedom. You choose conservative or Satterdlwaite. Either are okay. (1 point} Obtain a pvatue based on your calculated test statistic and degrees of freedom from a t table. Show work. (2 points} From the summary statistics. calculate the 95% Condence Interval \"by hand\". Show work. (2 points) Obtain a pvalue from t test and condence interval usingR. Paste the output. Are your answers different? Why, yes'no? Conclude: f. From the R output. write a four-part conclusion describing the results. It (1 points) Provide a statement in terms of the alternative hypothesis. - (1 points} State whether {or not) to reject the null. o (3 points) Give in context an interpretation of the point and interval estimate. I Make sure to provide a direction to your interval1 for example, one group had a smaller (or larger) mean than the other, include this relationship in your point and interval estimate. I Include anyr other information you might feel to relevant