Answered step by step
Verified Expert Solution
Question
1 Approved Answer
You will work with the Thyloid.csv file, which contains gene data from each patient. The dataset includes various gene expression measurements ( features ) and
You will work with the Thyloid.csv file, which contains gene data from each patient. The dataset includes various gene expression measurements features and a label indicating the stage information.
Part : Principal Component Analysis PCA
Implement PCA from Scratch:
a Write Python code to implement PCA from scratch. Include functions to compute the covariance matrix, eigenvalues, and eigenvectors.
b Apply your PCA implementation to reduce the dimensionality of the features in Thyloid.csv
c Choose an appropriate number of principal components to retain a significant amount of variance eg
PCA using scikitlearn:
a Use the PCA module from sklearn to perform dimensionality reduction on the dataset.
b Compare the results with your fromscratch implementation in terms of explained variance and the reduced feature set.
Part : Kernel PCA KPCA
KPCA with RBF Kernel:
a Implement Kernel PCA with the Radial Basis Function RBF kernel from scratch.
b Apply your KPCA implementation to the dataset.
KPCA with Polynomial Kernel:
a Implement Kernel PCA with a Polynomial kernel from scratch.
b Apply your KPCA implementation to the dataset.
KPCA with Linear Kernel:
a Implement Kernel PCA with a Linear kernel from scratch.
b Apply your KPCA implementation to the dataset.
Combining Kernels:
a Combine two different kernels eg RBF and Polynomial and apply the combined KPCA to the dataset.
b Evaluate the classification performance using accuracy metrics for the combined kernels.
Part : Testing and Evaluation
Split your Thyloid.csv into Train and Test datasets.
Applying PCA and KPCA to the Test Dataset:
a Use the PCA and KPCA models RBF Polynomial, Linear, and combined kernels trained on the Train dataset to transform the Test dataset.
b Ensure the dimensionality reduction is consistent with what was performed on the training data.
Covariance Matrix Analysis:
a Calculate the covariance matrix of the dataset.
b Identify the top features with the highest covariance values.
c Extract these top features and evaluate the classification performance using accuracy metrics.
Classification Experiment:
a Choose a classifier minimum distance classifier: provided at the end of this assignment to classify the observations in the Test dataset.
b Evaluate the classification performance using accuracy metrics and compare the effectiveness of PCA and KPCA features.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started