Answered step by step
Verified Expert Solution
Link Copied!

Question

...
1 Approved Answer

You mo not I: load this Ie to an online homework he! sites. Please see our course syllabus [or honor code rules. Thank you. Your

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
You mo not I: load this Ie to an online homework he! sites. Please see our course syllabus [or honor code rules. Thank you. Your solutions document should include the following items. Points will be deducted if the following are not included. 1. Type your Name and STAT 250 with your correct section number (e. g. STAT 250-100;) right justied and then Data Analysis Assignment #3 centered on the top of page 1 below your name to begin your solutions document. 2. Number your pages across your entire solutions document. 3. Your solutions document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. 4. Generate all requested graphs and tables using StatKey or guroo, where stated. 5. Upload your solutions document onto Blackboard as a pdf file using the link provided by your instructor. It is your responsibility for uploading a readable le. 6. You may not work with other individuals on this assignment. It is an honor code violation if you do. Please note: all StotKey or Rguroo Instructions provided in the parts of the problems will be presented in italics. Elements of good technical writing: Use complete and coherent sentences to answer the questions. Graphs must be appropriately titled and should refer to the context of the question. Graphical displays must include labels with units if appropriate for each axis. Units should always be included when referring to numerical values. When making a comparison you muest use comparative language. sueh as \"greater than\". \"less than\Investigation 1: Appropriateness of Inference: Price of a Haircut For thetowing scenario, answer the questions below. Please note, do not conduct inference in this problem; just answer each question. A random sample of 24 Mason students was collected and each student was asked, among other things, the total cost of their last hair service (including cuts, styling, etc.). The researcher does not have any information about the population from which the sample was collected. The data set is called Hair-Price. a) If we attempt to conduct statistical inference using the collected sample, what is the parameter of interest? Use the correct symbol and describe the parameter in context in one sentence. b) Check the specic conditions necessary to consider conducting inference using theory- based methods using the t-distribution. There are three to consider: (1) Was a random sample collected; (2) Is the population where the sample comes from normal; and (3) Is the sample size greater than or equal to 30? Answer each of these questions in one sentence. 0) 'What could be checked using the sample data if our sample size is less than 30? Answer this question in one sentence. d) Depending on your answer to part (a), construct one or two frequency histograms in Rgumo. Remember to properly title and label the graph(s}. Copy and paste this graph (or these graphs) into your document. e) Describe the shape of the histogram(s) in one sentence. f) Depending on your answer to part (a), construct one or two horizontal boxplots in Rguroo. Remember to properly title and label the graph(s}. Copy and paste this graph (or these graphs] into your document. g} Does the boxplot (or do the boxplots) show any outliers? Answer this question in one sentence and identify any outliers if they are present. h) Considering your answers to parts (e) and (g), is theory-based inference appropriate in this case? If you respond \"yes.\" provide a reason for your response. If you respond \"no.' state the reason why not and present another possibility if the researcher still wanted to conduct statistical inference. Use one or two complete sentences in your response. 3 Investigation 2: Earnings among Voters A political scientist wondered if there is a signicant difference between the proportion of Democrats and Republicans earning over $100,000. To obtain the data, she used the National Election Pool. The NEP is a consortium of major new networks (ABC, CBS, CNN, and NBC) that pools together resources to gather voting and exit poll data from a random sample of voters. On Election Day, November 8, 2022, exit poll data showed that among a random sample of 774 voters, 408 were registered as Republican and 366 were registered as Democrat. 0f the 408 Republican voters, 216 earn over $100,000 a year. Of the 366 Democrat voters, 162 earn over $100,000 a year. Use or = 0.05. a) Dene the population parameter using context and symbols in one complete sentence. b) State the hypotheses using the political scientist's claim. c) Check the specic conditions necessary to consider conducting inference using theory- based (or distribution based) methods. There are two to consider: {1) Was the data collected randomly from the population; and (2) Are there at least ten successes and failures in each group? Answer each of these questions in one sentence and show that condition (2) is true or false using calculation to obtain the failures (note the successes are given). d) Calculate and label the two sample proportions separately and round the values to four decimal places. Next, calculate the difference between these sample proportions by subtracting {Republican Democrat). Type all of these calculations and label each of them. e) Calculate the pooled proportion estimate needed in the calculation of the standard error of the test statistic. Type this calculation and round to four decimal places. f} Calculate the test statistic value using your proportions obtained in parts (d) and (e) and type your work. Round your test statistic value to three decimal places. g} Obtain your p-valuc using your test statistic calculated in part (i) in StatKey using Theoreticai Distributions ! Normal. Copy and paste the image of the standard normal distribution and type the value of the p-valuc below the image. h) Verify your test statistic and p-value using Rguroo. Go to Anoiytics r Analysis r Proportion Inference ' Two Populations. See the image below to ll in each box correctly. Then, click the Test of Hypothesis tab and choose Large Sample 2 under method. Finally, correctly set your alternative hypothesis and signicance level and click Preview. Copy and paste only the output and table displayed under the title \"Two Population Proportion Test of Hypothesis" Two Population Proportion Inference OX Dataset : Select a Dataset - x Response / Success ? Response : Select a Factor Yes Success : Select a Level > 100K Failure Analysis > Mean Inference > One & Two Populations. In the Dataset dropdown, select Dietl. In the Variable 1 dropdown, select Fasting. Variable 2 dropdown, select Keto. Below the variables, select the fourth tab Population 1-2 and note that the Confidence Interval tab is selected. Then, select t- statistic under Method. Leave the Assumptions box on the right as is and click Preview. Provide either a screenshot or a copy of your output table and state the confidence interval. d) Make a decision using the confidence interval by checking to see if the null value is captured by the confidence interval. State the decision and explanation in one sentence. e) Draw a conclusion about the claim using one or two sentences in context of the problem. f) Verify your decision and conclusion by obtaining a test statistic and p-value in Rguroo. Follow the directions presented in part (c). Rather than staying on the Confidence Interval tab, select the Test of Hypothesis tab. In this tab, keep the significance level at 0.05 and update the alternative hypothesis. Select t-statistic under Method. Leave the Assumptions box on the right as is. Click Preview. Provide either a screenshot or a copy of your output table and verify your decision by comparing your p-value to the significance level in one complete sentence. g) Provide at least one confounding variable that may have had an effect on this study's results in one sentence.Investigation 3.2: Diet Comparison (Paired) The doctor continued the research study after obtaining additional information from the participants. During the 12-week study, each of the participants tracked the amount of exercise they completed each week. Using this additional information, the doctor paired the adult who exercised the least in the fasting group with the adult who exercised the least in the Keto group. She continued to pair the adults until the adult who exercised the most in the fasting group was paired with the adult who exercised the most in the Keto group. The data set is called Diet2. The data set includes columns for the minutes of exercise, the paired data and the differences. Again, assume the Central Limit Theorem conditions hold. a) Define the population parameter for this investigation in context in one sentence. b) State the null and alternative hypothesis to test the claim that a difference exists in the mean BMI reduction between Fasting and Keto. Please consider the new data design. c) Again, we will obtain a 95% confidence interval to make the hypothesis test decision. In Rguroo, go to Analytics > Analysis -> Mean Inference -> One & Two Populations. In the Dataset dropdown, select Diet2. In the Variable 1 dropdown, select Fasting. Variable 2 dropdown, select Keto. Under the Summary tab, check the box to the left of Paired Data. Next, select the fourth tab Population 1-2 and note that the Confidence Interval tab is selected. Then, select t-statistic under Method. Leave the Assumptions box on the right as is and click Preview. Provide either a screenshot or a copy of your output table and state the confidence interval. d) Construct the 95% confidence interval again by treating the column of differences as if it were data from one sample. Go to Analytics > Analysis > Mean Inference > One Populations. In the Dataset dropdown, select Diet2. In the Variable dropdown, select Differences. Then, select t-statistic under Method. Leave the Assumptions box on the right as is and click Preview. Provide either a screenshot or a copy of your output table and state the confidence interval. e) Make a decision using the confidence interval by checking to see if the null value is captured by the confidence interval. State the decision and explanation in one sentence. f) Draw a conclusion about the claim using one or two sentences in context of the problem. g) Comment on the differences between the standard errors calculated in 3.1(c) and 3.2(c) and the decisions made in each test. Why do you believe these standard errors differ? h) Can we generalize these results to a larger group (i.e. a population)? Answer this question in one sentence and please provide a reason for your answer. i) Can we determine if a cause and effect relationship exists between the variables? Answer this question in one sentence and please provide a reason for your answer. 6Investigation 4: Predicting a 5K Race Time A running coach was interested in predicting a runner's 5K (3.1 miles) finishing time (in minutes). Two variables the coach considered was the amount of oxygen runners could utilize during training (known as VO2 max (ml/kg/min)) and a runner's age (in years). The data were collected from a random sample of Garmin watch users who ran a popular 5K (e.g. a Thanksgiving Turkey Trot) so cach runner ran the same course. A runner's 5K finishing time (Time), their VO2 Max (measured on their Garmin watch), and their Age were collected. The data set is called RunningTime. We will use Rguroo and Statkey for this investigation. a) Make two separate scatterplots where each scatterplot will present one of the explanatory variables graphed with the response variable Time. Go to Plots > Create Plot -> Scatterplot. In the Dataset dropdown, select Running Time. Change the predictor (explanatory) and response variables accordingly. Properly title and label your graph and axes. Copy and paste your scatterplots into your solutions document. b) Interpret the scatterplot of VO2 Max and Time using trend, strength, and shape (form) in one compete sentence. c) Interpret the scatterplot of Age and Time using trend, strength, and shape (form) in one compete sentence. d) Provide both correlation coefficients. Go to Analytics -> Analysis > Linear Regression > Simple Regression. In the Dataset dropdown, select Running Time. Change the predictor (explanatory) and response variables accordingly. Click Preview. You will need to complete this twice, once for each explanatory variable. The correlation is presented as "Pearson Correlation Coefficient (r)." Please state both correlation values in your solutions. e) Use StatKey to create a bootstrap distribution of correlations. From the main page, select CI for Slope, Correlation. Edit the data by copying only two columns (explanatory and response) into the box (repeat for each explanatory). Then, generate 10,000 samples and use each standard error to produce 2SE confidence intervals to estimate the population correlation. Present each confidence interval and comment on whether 0 is captured in each interval. f) Which of the two explanatory variables would be the better predictor of Time? Base your answer on the scatterplots, the correlation coefficients and their confidence intervals. State your answer in one or two complete sentences including an explanation for your variable choice. g) Find the fitted line for the explanatory variable VO2 Max and the response variable Time, run a simple linear regression analysis. You may use the same output as in part (d) but look at "Equation of Least Squares Line" to help you state the fitted line equation in your solutions document. 711) j) k) 1) Produce the tted line plot for VO2 Max and Time and copy it into your solutions document. Scroll down the page in your output from part (d) and copy and paste the graph labelled \"Response Versus Numerical Predictor.\" Interpret the slope of the regression line for VO2 Max and Time in context of the problem. Would it be meaningil to interpret the y-intercept for V0; Max and Time? Explain why or why not in one sentence. Provide r2 for V02 Mar and Time and explain what this value means in context of the problem. Again, refer to the output from part (d), but look at \"Coefcient of Determination [R-Squared}.\" Test whether the slope is significant using theorybased inference (assuming all conditions hold). Go to Analytics 9 Analysis ) Linear Regression -) Simple Regression. In the Dataset dropdown, select Runningme. Change the predictor (explanatory) and response variables accordingly. Under the tab \"Test of Association,\" cheek Slope under Alternative Hypothesis and leave Not Zero selected. Under \"Methods\" choose Theoretical t-statistic. Keep the Signicance Level at 0.05. State the hypotheses, show work to obtain the t-test statistic using the output, and use the p-value provided in the output to make your decision. Finally, draw a conclusion in one complete sentence. m) If a randomly selected runner had a V0; Max of 53, predict their 5K nishing time. Use the regression equation om part (g) and show all work and calculations

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Design Implementation and Management

Authors: Carlos Coronel, Steven Morris

13th edition

978-1337627900

Students also viewed these Mathematics questions