In this homework, you are expected to create a synthetic toy 2-dimensional machine learning problem and conduct several experiments on it. As part of the homework, you are required to implement a simple classifier and then demonstrate the effect of changing hyper-parameters, introduction of class-label noise, and the use of a noise removal technique on its classification performance, you may only use Python and the libraries numpy and matplotlib. Please follow the steps below:
they can be any points by your choice
an array for example which can be stored in an excel file or anything
1) Create a synthetic 2-dimensional toy" data set pre-labeled with binary class labels. The dataset should follow the following rules: a. Number of examples: 1000 b. Number of features: 2 C. Type of features: continuous (numerical) values only d. Class labels: binary e. Proportion of examples (roughly): 50% positive, and 50% negative f. Geometrical shape: You can choose any geometrical shape you like so long you keep the rough proportions of positive and negative examples. Be creative! Below are some shape examples: 10 10 08 02 02 02 00 00 LO 2) Implement and test the K-NN classifier using different K values: a. Make sure that you can easily adjust the K parameter. b. Show that you are testing the classifier properly by following the correct testing process. 3) Introduce noise to the problem by randomly flipping some of the examples from positive to negative, and from negative to positive. Then, test the classifier again and show the accuracies. a. Test your classifier again with the following noise (flipping) rates: 10%, 20%, 30%, 40%, and 50%. b. Show in your report how the data looks like after introduction of noise. 4) As you learned in class, using Tomek links is one way to remove some of these noisy examples. Identify, and remove examples that form a Tomek link in your data, then re-test your classifier once more. How effective is the removal of Tomek links? 1.0 0.8 0.6 0.4 0.2 positive negative 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1) Create a synthetic 2-dimensional toy" data set pre-labeled with binary class labels. The dataset should follow the following rules: a. Number of examples: 1000 b. Number of features: 2 C. Type of features: continuous (numerical) values only d. Class labels: binary e. Proportion of examples (roughly): 50% positive, and 50% negative f. Geometrical shape: You can choose any geometrical shape you like so long you keep the rough proportions of positive and negative examples. Be creative! Below are some shape examples: 10 10 08 02 02 02 00 00 LO 2) Implement and test the K-NN classifier using different K values: a. Make sure that you can easily adjust the K parameter. b. Show that you are testing the classifier properly by following the correct testing process. 3) Introduce noise to the problem by randomly flipping some of the examples from positive to negative, and from negative to positive. Then, test the classifier again and show the accuracies. a. Test your classifier again with the following noise (flipping) rates: 10%, 20%, 30%, 40%, and 50%. b. Show in your report how the data looks like after introduction of noise. 4) As you learned in class, using Tomek links is one way to remove some of these noisy examples. Identify, and remove examples that form a Tomek link in your data, then re-test your classifier once more. How effective is the removal of Tomek links? 1.0 0.8 0.6 0.4 0.2 positive negative 0.0 0.0 0.2 0.4 0.6 0.8 1.0