Question
The quality of each wine scored as integer between 0 and 10. We label wines with quality score 7 as premium. We want to build
The quality of each wine scored as integer between 0 and 10. We label wines with quality score 7 as premium. We want to build logistic regression models to predict if each wine is premium or not, given the feature values. Let X be a vector of the independent variables except the Type, i.e., [fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, bound dioxide, density, pH, sulphates], and Z be a binary variable s.t. Z = 1 if the type of the wine is red, and Z = 0 if the type of the wine is white. Let Y = 1 indicate the wine is premium, and Y = 0 otherwise. a) (15 points) Train a logistic regression model to predict the wine quality score. Pr(Y =1|X,Z)= 1 . Use the same training data and test data as in Part 1. Please do feature selection and provide reasonable explanation for your choice of features. Note that not using the i-th feature of X, Xi, means you are setting i-th value of 1 as zero, i.e., 1,i = 0.