1. [10 marks] Researchers developed a linear regression model to predict the fuel consumption (mpg; in miles per gallon) by the weight (wt; in 1,000 pounds) of the automobiles designed in the year 2018. The results appearing below were obtained from a statistical analysis of data from a random sample of 30 cars. Use this summary information to answer the following questions. mpg=Miles/(US)gallon wt= weight of a car (in 1000lbs) Dependent variable: mpg Independent variable: wt Linear model: Y = a + b*X Parameter Least Squares Estimate Standard Error T Statistic P-Value Intercept 37.2851 1.87763 19.8576 0.0000 Slope 5.34447 0.559101 9.55904 0.0000 Correlation Coefficient = -0.867659 Standard Error of Est. = 3.04588 Plot of Fitted Model mpg = 37.2851 - 5.34447 w 34 22 2.5 a) [1 mark] Identify the elements (subjects) of interest in this study. b) [2 marks] For the variable(s) described in this study, specify the type of data and scale of measurement (for each of the variables).c) [3 marks] Is there a significant linear relationship between the fuel consumption and weight of the car? Use an appropriate decision point and show all necessary steps to answer this question. d) [1 mark] Write down the equation of the estimated regression line. e) [2 marks] What is the value of the slope of the regression line? Explain what the number means in terms of the variables, fuel consumption and weight of a car, paying special attention to the units of measurement for these data. f) [1 mark] Provide an estimate of the fuel consumption when the weight of the car is 3,000 pounds. 2. [6 marks] For each of the following situations, name (i) a graphical display and (ii) a statistic (for example. sample mean, sample proportion, sample correlation. Chi-square statistic, etc.) which are the most appropriate to summarize or to describe the data. No explanations and no drawing of graphs are needed. a) [2 marks] An employer wants to investigate the relationship between his employees' daily commute time to work (in minutes) and their annual salary (in dollars). Graphical display: Statistic: b) [2 marks] In order to estimate the average household incomes in Vancouver. a city counsellor will survey 300 households. Each household will be asked to provide the household's total annual income (in dollars) in 2018. Graphical display: Statistic: c) [2 marks] A marketer wants to investigate if consumers from different geographic regions (West. East. South. or North) prefer different types of automobile (Sedan. Coupe, or SUV). Graphical display: Statistic: 3. [6 marks] The Biology Department at a university plans to recruit a new faculty member. Data collected by a different university on the 400 possible candidates were available. The Biology Department is debating whether to put a requirement of 10 years of teaching experience in thejob advertisement. The available data on the candidates are shown below: Work Ex nerience m5- Less than 10 ears -IEI__Ii_ .En a) [2 marks] What percentage of candidates is male or has 10 or more years of experience? b) [1 mark] We randomly select a male candidate. What is the probability that the selected male candidate has 10 or more years of experience? c) [3 marks] A candidate is randomly selected. Are the events A = {a candidate is male} and B = {a candidate has 10 or more years of experience} independent? Support your answer by using probability calculations. 4. [6 marks] Answer each of the following questions. To get full marks, you must show sufficient justication for your answer. a) [2 marks] When the distribution of data values is highly skewed, which measure of variabilityfspread is better: the interquartile range or standard deviation? Briey explain your answer. b) [2 marks] When surveying shoppers at a supermarket, is it appropriate to use a simple random sampling method to select a sample? Briey explain your answer. 0) [2 marks] Suppose that a 90% confidence interval for the population mean grade (in percentage) of all Stat 4600 students at Langara College is (65.6, 70.2). Explain why the following definition of this interval is wrong: "90% of all Langara students have grades that are between 65.6 and 70.2". 5. [7 marks] A sample of 1081 hospitalized patients in China, with laboratory-confirmed Covid-19, was obtained. Fever was present in 473 of these patients when they were first admitted to the hospital (Source: New England Journal of Medicine, February 28, 2020). a) [4 marks] Calculate a 90% confidence interval for the population proportion (p) of Covid-19 patients that have a fever when they are admitted to the hospital. b) [1 mark] Provide an interpretation (in context) of your interval estimate in part (a). c) [2 marks] State the assumptions that are required to assure the validity of your interval estimate in part (a). Are they satisfied in this situation?6. [4 marks] Assume that the average time for a college student to learn a certain computer software package is 20 hours with a standard deviation of 5 hours. Assume that the learning time for a college student is normally distributed. a) [2 marks] Suppose we randomly select one college student. What is the probability that the selected college student spends more than 29 hours learning the computer software package? b) [2 marks] In learning the computer software package, eighty-five percent (i.e. 85%) of college students spend more than how many hours?7. [6 marks] An employer was interested in the relationship between his employees' age and their annual salary. He divided all his employees into three age groups and then selected a random sample of employees from each of three age groups: "Under 30", "Between 30 and 50", and \"Over 50" in years. The sample of 100 employees provided the following summary data. ANNUAL SALARY AGE GROUP Under $60 000 Over $60 000 Under 30 ears __ Between 30 and 50 ears __ Over 50 ears a) [1 mark] What graph would you use to represent the data in the above crosstabulation? Do NOT draw the graph. b) [2 marks] State the null and alternative hypotheses in this problem. c) [2 marks] Obtain the expected count (i.e. expected frequency) for the cell "Under 50 and Under $60,000\" and explain what it represents in the context of the question. d) [1 mark] The employer calculated the Chi-Square statistic by hand, and found a Chi-Square statistic of -2.38. He knew that this answer was not correct immediately. Explain (in one sentence) what is wrong with this Chi-square statistic. 8. [7 marks] Five years ago the average university student owed $19,000 in student-loan debt at the time of graduation. With all the cuts in funding, it is suspected that this amount has gone up. A survey of 45 recent university graduates revealed an average student-loan debt of $25,000. Assume that the population standard deviation is $4,000. a) [3 marks] Define the parameter of interest (in words), and then formulate the null hypothesis and the alternative hypotheses. b) [4 marks] Find the p-value and make a conclusion in the context of the question. Use a level of significance of 5% (i.e. a = 0.05).9. [6 marks] Consider a casino game, 'Quick Draw\". In this game. a player pays $10 to play. The player picks one card from a standard pack of 52 cards (i.e. there are four A's and four K's in a standard pack of 52 cards). If the player gets an Ace. they win $55 (i.e. the prot is $45); if the player selects a King, they win $35 (i.e. the prot is $25). Othenvise, the player wins nothing and also loses the bet of $10. Let the random variable X represent the player's prot on a single play of 'Quiok Draw'. a) [3 marks] Construct the probability distribution for the random variable X. b) [3 marks] Calculate the expected value of the random variable X and provide an interpretation of its value, in context of this