Question

1 Approved Answer

Posted on Sep 26, 2024

Python 2.7 only. Must write one program for SVM. Cannot use Sklearn packages for SVM must write own SVM. Thank you!! please follow instructions as

Python 2.7 only. Must write one program for SVM. Cannot use Sklearn packages for SVM must write own SVM. Thank you!!

please follow instructions as stated in description.. No quick linear regression. must be full svm and decision tree with kernal and everything.

data set is https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data There are question marks (?) in the data set that must be changed to 1's thank you

Implement SVM Classifier using the Breast Cancer Wisconsin data set from the University of California Irvine Machine Learning Data Repository at archive.ics.uci.edu/ml . In data, attributes 2 through 10 have been used to represent instances. Each instance has one of 2 possible classes: benign or malignant. More precisely, sample data have 10 input attributes (Sample code number id number, Clump Thickness 1 - 10, Uniformity of Cell Size 1 - 10, Uniformity of Cell Shape 1 - 10, Marginal Adhesion 1 - 10, Single Epithelial Cell Size 1 - 10, Bare Nuclei 1 - 10, Bland Chromatin 1 - 10, Normal Nucleoli 1 - 10, Mitoses 1-10) plus the single output class attribute (2 for benign, 4 for malignant). Partition data into training (learning model) and test sets. For tree classifier use the top-down greedy algorithms with either GINI or Information Gain/Entropy measures for node splitting. It would be more elegant (but not required) to avoid model overfitting using pessimistic error formula whether to prune leaves nodes or not to avoid model overfitting. For SVM you can use either linear SVM (risking that both classification (training and generalization) error will be large), or preferably nonlinear SVM using e.g., polynomial, Gaussian radial, or sigmoid kernel. Of course, your output class attribute should be modified: instead of 2 for benign class use +1, and instead of 4 for malignant class use -1. For implementation use Python or R. You can be inspired, but you are not allowed to use an existing code, in other words you write your own programs, but you can use standard or other language libraries, including libraries for linear algebra, matrices, and Lagrangian nonlinear optimization with constraints (excluding libraries/ software packages for data mining or machine learning with implemented complete algorithms). Please include both sources and sample outcome running of your programs. Compare performance of both classifiers, i.e., it is sufficient to provide both training accuracy and test/generalization accuracy for both your programs (of course, using the same training and test data). Based on that, reply which classifier seems be performing better for your programs and data. Comment: a more elegant would be to test, e.g., the confidence interval for the true accuracy (based on test accuracy) at (1 - ) confidence level, or the hypothesis that the performance difference for stochastic variable d = e1 - e2 (where e1 is misclassification error for the tree classifier, and e2 is misclassification error for the SVM classifier) is statistically significant at (1 - ) confidence level.