Question
Exercise 2 The use of spectrography techniques in food science and chemometrics is often employed for food types classification and detection, a task that has
Exercise 2 The use of spectrography techniques in food science and chemometrics is often employed for food types classification and detection, a task that has important applications in food safety, authenticity assessment, and quality assurance. The file data_hw1_strawberry.RData contains a total collection of 983 mid-infrared spectra collected from different fruit purees. Each spectrum is assigned to one of two classes: strawberry, purees prepared from fresh whole strawberries by the food scientists, and adulterated, diverse collection of other purees (raspberry, apple, blackcurrant, blackberry, plum, cherry, apricot, grape juice, and mixtures of these) or strawberry adulterated with other fruits and sugar solutions. The data are described in more detail in this scientific journal article. The researches wish to build a supervised learning system capable of discriminating pure strawberry purees from adulterated or fake purees. The data is divided into training and test sets, and the file includes the two data matrices data_train and data_test. The data matrix data_train is be used for training, validation, and comparison of the models, while the data matrix data_test for testing. The following classifiers are considered and compared in order to predict if a sample is pure strawberry or not using the input spectrum features: 1 Standard logistic regression classifier + PCA dimension reduction with coordinate vectors. 2 Regularized logistic regression model with 1 penalty function. Main tasks: Use the available training data data_train to implement an appropriate cross-validation procedure for comparing and tuning the classifiers, and select the best one. For 1 , consider a range of values of the number of coordinate vectors in the interval (2, 10). (50 marks) Using the available test data, evaluate the test classification performance of the selected best classifier. Comment briefly on its predictive performance, especially in relation to its ability of correctly identifying pure strawberry samples. (20 marks) Instructions and hints: Provide a concise discussion/justification about the different data analysis choices made. Using the caret R package (or similar packages) is not allowed. Consider a transformation of the input data.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started