Government is always interested in the household expenditure difference of different region. According to the theoretical and empirical analysis, personal income is treated as the main factor for this difference. Here, we get the data of average personal annual consumption expenditure (Y) and average personal annual income (X) for 32 cities in China in 2002. Our objective is to study the relationship between expenditure and income. 1. Plot a scatter plot of Y(= expenditure ) versus X (= income). Then what can you learn from the plot? 2. Please build a simple linear regression using Python. Report the fitted model, and try to interpret the coefficient beta_1 of X. 3. Now we are interested whether X really affects Y. Please test the null hypothesis: beta_1 1=0, where beta_ 1 is the coefficient of X, and report the p-value. What can you tell from this result given 0.05 significance level? 4. Please show a 95% confidence interval for the parameter beta_1. 5. Which of the following statements is valid about the confidence interval? 3. Now we are interested whether X really affects Y. Please test the null hypothesis: beta_1 =0, where beta_1 is the coefficient of X, and report the p-value. What can you tell from this result given 0.05 significance level? 4. Please show a 95% confidence interval for the parameter beta_1. 5. Which of the following statements is valid about the confidence interval? - If the data collection and model estimation procedure were repeated many times, 95% of the estimated beta_1 would be within the confidence interval from step 4. - If the data collection and model estimation procedure were repeated many times, 95% of the estimated confidence interval would contain the true beta_1. - For 95% of sample of the cities, the marginal effect of X(= income) on Y(= expenditure) would be within the confidence interval from step 4. 6. What is value of the R-square of the regression model? What can you tell from it? 6. What is value of the R-square of the regression model? What can you tell from it? A research is conducted to study the relationship between the electricity power consumption per hour during the peak period and the monthly electricity power consumption. The electricity.cSv dataset contains the data of hourly consumption during the peak period (Y) and the monthly consumption for 53 families. 1. Plot a scatter plot of Y(= hourly electricity power consumption during the peak period) versus X(= monthly electricity power consumption). Then what can you learn from the plot? 2. Please build a simple linear regression using Python. Report the fitted model, and try to interpret the coefficient beta_ 1 of X. 3. Now we are interested whether X is significant in determining Y. Please test the null hypothesis: beta_1 =0, where beta_ 1 is the coefficient of X, and report the p-value. What can you tell from this result given 0.05 significance level? 1. Plot a scatter plot of Y(= hourly electricity power consumption during the peak period) versus X (=monthly electricity power consumption). Then what can you learn from the plot? 2. Please build a simple linear regression using Python. Report the fitted model, and try to interpret the coefficient beta_1 of X. 3. Now we are interested whether X is significant in determining Y. Please test the null hypothesis: beta_1=0, where beta_1 is the coefficient of X, and report the p-value. What can you tell from this result given 0.05 significance level? 4. What is value of the R-square of the regression model? What can you tell from it? Please submit your homework on canvas by the end of next Sunday, Feb. 5. You should submit both an analysis report (in.doc or .pdf file) and your code (in ipynb file). Please let me know if you have any questions. 8:24 4 Search