Question

1 Approved Answer

Posted on Sep 07, 2024

I NEED Test And Train accuracies in ONE Variable , TO show just one line in run screen Task 2: Perceptron for binary classification. Perceptron

I NEED Test And Train accuracies in ONE Variable , TO show just one line in run screen

image text in transcribed

Task 2: Perceptron for binary classification. Perceptron is a supervised learning algorithm for classification or regression. In supervised learning, you are given a data set of pairs, where the first element of each pair is a list of features xRd, and the second element is a label, yR={0,1} in a (binary) classification problem, or y R=R in a regression problem. The underlying assumption is that those data represent a function f:RdR, and the goal is to recover f from the data. A good predictor (classifier or regressor) does not make too many mistakes, in the case of classification; or does not make predictions that are too far off from the true label, in the case of regression. It is necessary to hold out some data, as a test set, to see whether the predictor can generalize well. Tasks/Questions: - Your main task is to implement the perceptron algorithm for binary classification. In particular, your task is to train a single-layer perceptron to predict whether breast cancer is benign or malignant based on a set of features. The breast cancer diagnostic dataset contains features calculated from a digital image of a fine needle aspirate (FNA) of a breast mass, and a label representing the diagnosis, i.e., benign or malignant. The features describe the characteristics of the cell nuclei present in the image. There are 569 data points and 30 features, which consist of the mean, standard error, and "worst" (i.e., the mean of the three largest values) of ten measurements, such as radius, perimeter, area, etc. The dataset is available in the UCI machine learning repository, you can access it here. The file name is wdbc.data. Each line represents an instance (i.e., a patient), where the first number is the patient's ID, then the class label (B for benign or M for malignant), and then the list of 30 features. These values are separated by commas (csv file). The first instances are shown in the following screenshot. 842302,M,17.99,10.38,122,8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.066399,0.04904,0.05373,0.0158,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189842517,M,20.57,17.77,132.9,1326,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,0.5435,0.7339,3.398,74.08,0.005225,0.01308,0.0186,0.034,0.01389,0.003532,24,99,23.41,158.8,1956,0.1238,0.1866,0.2416,0.186,0.275,0.0890284300903,M,19.69,21.25,136,1203,0.1096,0.1599,0.1974,0.1279,0.2669,0.05999,0.7456,0.7869,4.585,94.03,0.06615,0.04806,0.03832,0.0205,0.0225,0.004571,23.57,25.53,152.5,1709,0.1444,0.4245,0.4504,0.243,0.3613,0.0875884348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,0.4956,1.156,3.445,27.23,0.00911,0.07458,0.05661,0.0167,0.05963,0.009208,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.17384358402,M,20.29,14,34,135.1,1297,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,0.7572,0.7813,5.438,94.44,0.01149,0.02461,0.05688,0.0185,0.01756,0.005115,22.54,16.67,152.2,1575,0.1374,0.205,0.4,0.1625,0.2364,0.07678843786,M,12.45,15.7,82.57,477.1,0.1278,0.17,0.1578,0.08089,0.2087,0.07613,0.3345,0.8902,2.217,27.19,0.00751,0.03345,0.03672,0.011370.02165,0.005082,15.47,23.75,103.4,741.6,0.1791,0.5249,0.5355,0.1741,0.3985,0.1244 - Write a program to load the instances from the training file wdbc.data. - You need to split the dataset into a train set (80%) and a test set (20%). Use the train set to train the model. Note: you can use scikit-learn built-in tools for this task and ensure the balance between the class labels in the two sets. - Implement a binary perceptron classifier and measure the performance (i.e., accuracy) on the train and test datasets during the training epochs, where the accuracy is the number of correct predictions divided by the total number of predictions. Plot the train and test accuracies against the number of epochs. According to your plot, what would be the ideal number of iterations to terminate the training? Discuss the obtained results. Relevant information about the dataset: - Number of instances: 569 - Number of attributes: 32 (ID, diagnosis, 30 real-valued input features) Attribute information: - ID number - Diagnosis (M= malignant, B = benign ) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from centre to points on the perimeter) b) texture (standard deviation of Gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^ 2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)