Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Exercise 2 The use of spectrography techniques in food science and chemometrics is often employed for food types classification and detection, a task that has

Exercise 2 The use of spectrography techniques in food science and chemometrics is often employed for food types classification and detection, a task that has important applications in food safety, authenticity assessment, and quality assurance. The file data_hw1_strawberry.RData contains a total collection of 983 mid-infrared spectra collected from different fruit purees. Each spectrum is assigned to one of two classes: strawberry, purees prepared from fresh whole strawberries by the food scientists, and adulterated, diverse collection of other purees (raspberry, apple, blackcurrant, blackberry, plum, cherry, apricot, grape juice, and mixtures of these) or strawberry adulterated with other fruits and sugar solutions. The data are described in more detail in this scientific journal article. The researches wish to build a supervised learning system capable of discriminating pure strawberry purees from adulterated or fake purees. The data is divided into training and test sets, and the file includes the two data matrices data_train and data_test. The data matrix data_train is be used for training, validation, and comparison of the models, while the data matrix data_test for testing. The following classifiers are considered and compared in order to predict if a sample is pure strawberry or not using the input spectrum features: 1 Standard logistic regression classifier + PCA dimension reduction with coordinate vectors. 2 Regularized logistic regression model with 1 penalty function. Main tasks: Use the available training data data_train to implement an appropriate cross-validation procedure for comparing and tuning the classifiers, and select the best one. For 1 , consider a range of values of the number of coordinate vectors in the interval (2, 10). (50 marks) Using the available test data, evaluate the test classification performance of the selected best classifier. Comment briefly on its predictive performance, especially in relation to its ability of correctly identifying pure strawberry samples. (20 marks) Instructions and hints: Provide a concise discussion/justification about the different data analysis choices made. Using the caret R package (or similar packages) is not allowed. Consider a transformation of the input data.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_step_2

Step: 3

blur-text-image_step3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

4. What actions should Bouleau & Huntley take now?

Answered: 1 week ago