Question

1 Approved Answer

Posted on Oct 14, 2024

set 1 Stats 413: Applied Regression Analysis due Sep 29, 2017 Problem sets are due in lab on the due date. For problems that require

set 1 Stats 413: Applied Regression Analysis due Sep 29, 2017 Problem sets are due in lab on the due date. For problems that require programming, please properly comment your code and submit it together with any output. You are encouraged to collaborate on problem sets with classmates, but the final write-up (including any code) must be your own. 1. Multi-task regression (by Andrew Ng) Thus far, we only considered regression with scalar-valued responses. In some applications, the response is itself a vector: yi Rp . We posit the relationship between the features and the vector-valued response is linear: yiT xTi B , where B Rdp is a matrix of regression coefficients. (a) Express the sum of squared residuals (SSR) in matrix summations). Hint: work out how to express the SSR in terms of xT1 . nd .. X= , Y= R xTn notation (i.e. without using any y1T .. . ynT np . R (b) Find the matrix of regression coefficients that minimizes the SSR. (c) Instead of minimizing the SSR, we break up the problem into p regression problems with scalar-valued responses. That is, we fit p linear models of the form (yi )k xTi k , where k Rd . How do the regression coefficients from the p separate regressions compare to the matrix of regression coefficients that minimizes the SSR. 2. Predicting crime rate Download the Boston dataset, which we saw in lab, from the course website. In this problem, we will predict the pre capita crime using the other variables in the dataset. 1 (a) For each predictor, fit a simple linear regression model to predict the response. In which of the simple linear models is there a statistically significant association between the predictor and the response. (b) Fit a (multiple) regression model to predict the response using all the other features in the dataset. For which features can we reject the null H0 : j = 0. (c) How do the results from (a) and (b) compare. Create a scatterplot displaying the simple regression coefficient of each predictor from (a) on the x-axis, and the multiple regression coefficient from (b) on the y-axis. That is, each predictor is displayed as a point on the plot. (d) Is there evidence of non-linear relationship between any of the features and response? For each predictor xj , look at the fit of the cubic model y 0 + 1 xj + 2 x2j + 3 x3j . 3. Effective bootstrap sample sizes Consider drawing a bootstrap sample of size B from a dataset that consists of n samples. In other words, this is sampling B observations with replacement from the original dataset. What is the expected number of samples that do no appear in the bootstrap sample? Hint: first calculate the probability that a particular sample does not appear in the boostrap sample? 4. Red brain, blue brain (by Cosma Shalizi) The dataset n88pol.csv contains data on 88 university students who participated in a psychological experiment on the relationships between the size of different regions of the brain and political views. The variables amygdala and acc are the volume of the amygdala and the anterior cingulate cortex respectively. The variable orientation is the subjects' political views on a five-point scale from 1 (very conservative) to 5 (very liberal). orientation is an ordinal variable; so scores of 1 and 2 are not necessarily as far apart as scores of 2 and 3. (a) What is the correlation between orientation and amygdala? Between orientation and acc? (b) Give 95% bootstrap confidence intervals for the correlations. (c) The function rank accepts a vector x Rn returns a vector of ranks; i.e. it returns a vector whose i-th component is j if xi is the j-th smallest component of x. What are the correlations between the ranks of orientation and the ranks of amygdala? Between the ranks of orientation and the ranks of acc? (d) Give 95% bootstrap confidence intervals for the rank correlations. 2