Question 1. [12.5 pts, 5 parts) Please download the dataset "economic dashboard" in the desired format to answer the following questions using your preferred software. Statistics Canada releases monthly data to measure Canada's economic health. The numbers in the dataset are from Tables 36-10-0434-01, 18-10-0004-01, 14-10-0289-01, and 20-10-0008-01. The data shows the monthly Gross Domestic Products (GDP) in Million CAD (CAD - Canadian dollars), Consumer Price Index, and Actual hours' worked at the main job in 1000 hours, and sales in the retail sector in thousand CAD: Variable Definition name Date Month-year CPI Consumer Price Index, monthly Hours Actual hours worked at main job monthly, (x 1,000) GDP Gross domestic product (GDP) at basic prices, monthly (x 1,000,000 CAD Retail Retail trade sales, monthly (x 1000 CAD) Note: figures for GDP and Sales in retail sector are not yet reported for the month of September. These are missing values. Missing values do not appear in your graphs and calculations. Number of hours actually worked by the respondent during the reference week, including paid and unpaid hours. 2 A value of 1 represent 1000 hours. For example, the value of 621,884 in the first row, represent more than 6 million hours. A B C D E F G CPI GDP Retail w N 133 134 135 136 136 136 137 136 1 date 2 19-Jan 19-Feb 4 19-Mar 5 19-Apr 6 19-May 7 19-Jun 8 19-Jul 9 19-Aug 10 19-Sep 11 19-Oct 12 19-Nov 13 19-Dec 14 20-Jan 15 20-Feb 16 20-Mar 17 20-Apr 18 20-May 19 20-Jun 20 20-Jul 21 20-Aug 22 20-Sep 23 24 136 136 136 136 136 137 136 135 136 137 137 137 136 Hours 621,884 618,425 624,112 625,992 624,240 627,873 624,202 627,826 626,077 626,667 625,557 625,458 624,926 632,366 536,710 456,984 485,951 533,462 561,706 577,782 588,655 1950955 1947760 1960078 1966041 1972475 1976566 1977923 1978615 1981101 1980913 1981630 1988279 1989963 1995120 1849437 1634708 1713647 1825876 1881709 1904249 50195398 50626765 51538062 51212247 51203886 51028664 51488286 51375824 51496503 50799759 51333176 51681463 51891454 52262166 47028013 35352146 42852368 52488186 52996949 53189033 25 Q1.a.[3 points) A student in wants to use this dataset to understand the association between CPI and GDP and the association between GDP and Hours worked in the main job. Prepare two scatter plots to show these associations (copy them in your solution and label them Figure 1 and Figure 2) and then describe the associations between the variables for each plot. (Note! In the marking scheme, points will be deducted from the plots with no caption or/and no label for axis) Q1.b./2 points| The graph below shows the association between GDP and sales in retail sector. Scatterplo GDP vs Sales in retail sector 1 GOP M. CAD 0817-06-06 1900 2.0-06 e-01 40+07 40 5.00+07 Se insector. CAD 5.500 Figure 3 Using a software, calculate the correlation coefficient between GDP and sales in retail sector and report it. Then, comment on how the association between these two variables is reflected in the correlation coefficient (talk about strength and direction) Q 1.c. 12.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you see any unusual observation Q.1.0.14 points Using a software, complete the following table (note that M. CAN denetre Q1.b.[2 points) The graph below shows the association between GDP and sales in retail sector. Scatterplo GDP vs Sales in retail sector Y GDP in M. CAD 1.7e-06 18e-06 1.8e-06 2.0+06 I 3.60-06 3.5e+07 4.0e+07 4.5e+07 Sales in retail sector K CAD 5.00+07 5.50+07 Figure 3 Using a software, calculate the correlation coefficient between GDP and sales in retail sector and report it. Then, comment on how the association between these two variables is reflected in the correlation coefficient (talk about strength and direction) Q1.c. (2.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you see any unusual observation. Q 1.d.[4 points) Using a software, complete the following table (note that M. CAD denotes Millions CAD, and K hours denotes, Thousands of hours): Table 1- Summary statistics GDP (M. CAD) Hours of work at the main job (K hours) Mean Median Standard deviation IQR Q1.e. [1 point) Based on Figures 4 and 5, what summary statistics are more suitable to describe the data distribution of hours of work at the main job? Explain why. I Question 2. (6.5 pts, 4 parts] Suppose an analyst at Ottawa Hospital wants to compare the mean per patient costs of diagnostic imaging (DI) between two campuses: Civic and General. This cost would vary for each patient as the equipment and time a patient needs to be at the hospital varies. The analyst looks at the data distribution of per patient cost at each campus and finds both data distribution slightly skewed to the right. The analyst randomly selects the costs for 50 patients from DI of Civic campus and 30 patients from DI of the General campus. Q2.a. [1 point] What do you think the data distribution look like for the histogram of the mean costs of imaging at Civic campus? Justify your answer. Q2.b. 12 points) Suppose the analyst's calculations show that the mean cost of DI per patient at Civic campus is 213 CAD with standard deviation of 25 CAD. What is the probability that mean cost of DI for the 50 patients that are randomly selection is more than 205 CAD? Q2.c. (2.5 points Suppose the analyst uses the data from 30 patients randomly selected from General campus and see the means of DI cost is 220 CAD with standard error of 28 CAD. What is the probability that mean costs of DI for patients at General hospital is more than mean costs of DI at Civic hospital (213 CAD)? For the purposes of answering this question you may assume that 213 CAD is a constant. Q2.d 11 point) What assumptions you made about data distribution in Q2.c to be able to solve the question 1 Question 3.[6 pts, 3 parts) In the recent US presidential election, 49.9% of voters in Pennsylvania voted for Joe Biden (As of Saturday Nov 14th, 2020. This figure might change once the ballot counting completes) Q3.a. (2 points] What is the probability that in a group of randomly selected 1000 voters from Pennsylvania, we find at least 500 voters who voted for Joe Biden? Q3.b.[1 point] What assumptions did you make in order to proceed with calculations in Q3.a? Q3.c. 13 points What is the probability that in a group of randomly selected 1000 voters from Pennsylvania, the proportion of individuals who voted for Joe Biden is between 0.5 and 0.51? Question 1. [12.5 pts, 5 parts) Please download the dataset "economic dashboard" in the desired format to answer the following questions using your preferred software. Statistics Canada releases monthly data to measure Canada's economic health. The numbers in the dataset are from Tables 36-10-0434-01, 18-10-0004-01, 14-10-0289-01, and 20-10-0008-01. The data shows the monthly Gross Domestic Products (GDP) in Million CAD (CAD - Canadian dollars), Consumer Price Index, and Actual hours' worked at the main job in 1000 hours, and sales in the retail sector in thousand CAD: Variable Definition name Date Month-year CPI Consumer Price Index, monthly Hours Actual hours worked at main job monthly, (x 1,000) GDP Gross domestic product (GDP) at basic prices, monthly (x 1,000,000 CAD Retail Retail trade sales, monthly (x 1000 CAD) Note: figures for GDP and Sales in retail sector are not yet reported for the month of September. These are missing values. Missing values do not appear in your graphs and calculations. Number of hours actually worked by the respondent during the reference week, including paid and unpaid hours. 2 A value of 1 represent 1000 hours. For example, the value of 621,884 in the first row, represent more than 6 million hours. A B C D E F G CPI GDP Retail w N 133 134 135 136 136 136 137 136 1 date 2 19-Jan 19-Feb 4 19-Mar 5 19-Apr 6 19-May 7 19-Jun 8 19-Jul 9 19-Aug 10 19-Sep 11 19-Oct 12 19-Nov 13 19-Dec 14 20-Jan 15 20-Feb 16 20-Mar 17 20-Apr 18 20-May 19 20-Jun 20 20-Jul 21 20-Aug 22 20-Sep 23 24 136 136 136 136 136 137 136 135 136 137 137 137 136 Hours 621,884 618,425 624,112 625,992 624,240 627,873 624,202 627,826 626,077 626,667 625,557 625,458 624,926 632,366 536,710 456,984 485,951 533,462 561,706 577,782 588,655 1950955 1947760 1960078 1966041 1972475 1976566 1977923 1978615 1981101 1980913 1981630 1988279 1989963 1995120 1849437 1634708 1713647 1825876 1881709 1904249 50195398 50626765 51538062 51212247 51203886 51028664 51488286 51375824 51496503 50799759 51333176 51681463 51891454 52262166 47028013 35352146 42852368 52488186 52996949 53189033 25 Q1.a.[3 points) A student in wants to use this dataset to understand the association between CPI and GDP and the association between GDP and Hours worked in the main job. Prepare two scatter plots to show these associations (copy them in your solution and label them Figure 1 and Figure 2) and then describe the associations between the variables for each plot. (Note! In the marking scheme, points will be deducted from the plots with no caption or/and no label for axis) Q1.b./2 points| The graph below shows the association between GDP and sales in retail sector. Scatterplo GDP vs Sales in retail sector 1 GOP M. CAD 0817-06-06 1900 2.0-06 e-01 40+07 40 5.00+07 Se insector. CAD 5.500 Figure 3 Using a software, calculate the correlation coefficient between GDP and sales in retail sector and report it. Then, comment on how the association between these two variables is reflected in the correlation coefficient (talk about strength and direction) Q 1.c. 12.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you see any unusual observation Q.1.0.14 points Using a software, complete the following table (note that M. CAN denetre Q1.b.[2 points) The graph below shows the association between GDP and sales in retail sector. Scatterplo GDP vs Sales in retail sector Y GDP in M. CAD 1.7e-06 18e-06 1.8e-06 2.0+06 I 3.60-06 3.5e+07 4.0e+07 4.5e+07 Sales in retail sector K CAD 5.00+07 5.50+07 Figure 3 Using a software, calculate the correlation coefficient between GDP and sales in retail sector and report it. Then, comment on how the association between these two variables is reflected in the correlation coefficient (talk about strength and direction) Q1.c. (2.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you see any unusual observation. Q 1.d.[4 points) Using a software, complete the following table (note that M. CAD denotes Millions CAD, and K hours denotes, Thousands of hours): Table 1- Summary statistics GDP (M. CAD) Hours of work at the main job (K hours) Mean Median Standard deviation IQR Q1.e. [1 point) Based on Figures 4 and 5, what summary statistics are more suitable to describe the data distribution of hours of work at the main job? Explain why. I Question 2. (6.5 pts, 4 parts] Suppose an analyst at Ottawa Hospital wants to compare the mean per patient costs of diagnostic imaging (DI) between two campuses: Civic and General. This cost would vary for each patient as the equipment and time a patient needs to be at the hospital varies. The analyst looks at the data distribution of per patient cost at each campus and finds both data distribution slightly skewed to the right. The analyst randomly selects the costs for 50 patients from DI of Civic campus and 30 patients from DI of the General campus. Q2.a. [1 point] What do you think the data distribution look like for the histogram of the mean costs of imaging at Civic campus? Justify your answer. Q2.b. 12 points) Suppose the analyst's calculations show that the mean cost of DI per patient at Civic campus is 213 CAD with standard deviation of 25 CAD. What is the probability that mean cost of DI for the 50 patients that are randomly selection is more than 205 CAD? Q2.c. (2.5 points Suppose the analyst uses the data from 30 patients randomly selected from General campus and see the means of DI cost is 220 CAD with standard error of 28 CAD. What is the probability that mean costs of DI for patients at General hospital is more than mean costs of DI at Civic hospital (213 CAD)? For the purposes of answering this question you may assume that 213 CAD is a constant. Q2.d 11 point) What assumptions you made about data distribution in Q2.c to be able to solve the question 1 Question 3.[6 pts, 3 parts) In the recent US presidential election, 49.9% of voters in Pennsylvania voted for Joe Biden (As of Saturday Nov 14th, 2020. This figure might change once the ballot counting completes) Q3.a. (2 points] What is the probability that in a group of randomly selected 1000 voters from Pennsylvania, we find at least 500 voters who voted for Joe Biden? Q3.b.[1 point] What assumptions did you make in order to proceed with calculations in Q3.a? Q3.c. 13 points What is the probability that in a group of randomly selected 1000 voters from Pennsylvania, the proportion of individuals who voted for Joe Biden is between 0.5 and 0.51