This coursework is in the form of a mini project. You are asked to apply at least two machine learning algorithms to a dataset or datasets of your choosing. You will write your findings in a report. The report, written as a document and in a formal style, and your Jupiter notebook, constitute the hand-in. Aim You must decide what you wish to achieve. Possibilities include: the best ML model for a particular dataset; comparison of algorithms across differing datasets; systematic examination of variations of a particular algorithm (for example, naive k-means vs PCA initialisation); comparison of different algorithms on a dataset or datasets (for example, k-means vs other clustering algorithms). Choosing a dataset/datasets You can either choose a dataset/datasets that is/are packaged with a machine learning library or pick a dataset that interests you from a public repository such as kaggle.com. For example, scikit-learn contains several standard, classic datasets such as Iris, Wine and Handwritten digits. These are perfect for this project. You might wish to browse (e.g.) kaggle for an interesting dataset but please ensure that you can vectorise the dataset into a suitable form for input into a machine learning algorithm. You will not receive any credit for manipulating the data prior to analysis. Algorithms You should apply at least two machine learning algorithms from the first part of this module to your chosen problem. Specifically, at least two from: kNN, decision trees, linear regression, gradient descent, polynomial regression, Bayesian classification, k-means and PCA. You should implement at least one of the ML algorithms from scratch. This/these implementation(s) must be in standard Python code and should not refer to any machine learning libraries. The use of numpy and matplotlib is permissible (and expected). Methodology, Analysis and Evaluation The first half of this module (Topics 1-6) introduced several important ML techniques such as Training/test set splitting, classifier evaluation metrics (precision, accuracy, ...), data scaling, over/ under fitting, regularisation and cross-validation. You should utilise these techniques wherever appropriate