Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Help with Exercise 2 Exercise 1 for Reference: Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns,
Help with Exercise 2
Exercise 1 for Reference:
Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns, using the database BDOShohamIML.csv and evaluate the performance. b) Apply parameters optimization to (a) and evaluate the performance. c) Explain the Confusion Matrix and metrics obtained in (a) y (b), that is, before and after parameters optimization.
Code for Exercise 1:
In [14]: import numpy as np import pandas as pd from sklearn. neighbors import KNeighborsClassifier from sklearn. metrics import accuracy_score from sklearn. model_selection import GridSearchCV from sklearn. naive_bayes import GaussianNB from sklearn. metrics import confusion_matrix data = pd. read_csv( 'Data_Glioblastoma5Patients_SC. csv' ) print ( ' Shape: ' , data . shape) data . head ( ) Shape: (430, 5949) Out [14 ] : A2M AAAS AAK1 AAMP AARS AARSD1 AASDH AASDHPPT AA 0 -3.80147 -3.889900 -3.985616 2.651558 2.170748 -2.550822 4.807330 3.961170 -0.1926 1 -3.80147 -3.889900 -3.158708 2.358992 -6.041792 -0.056092 3.606735 -2.632250 2.2493 2 -3.80147 -3.889900 1.733125 -5.820241 -6.041792 -0.576957 -2.473517 -4.354127 0.063 3 -3.80147 -3.889900 -1.665669 3.514271 -6.041792 -3.699171 4.509461 -4.354127 2.985 4 -3.80147 3.742495 -2.166992 -5.820241 2.094729 4.021873 5.535007 4.019633 2.5603 5 rows x 5949 columns In [22]: #Knn Code # a ) clf1 = KNeighborsClassifier(n_neighbors=3) . fit(data . iloc[ : , : -1] . values, data. il oc[ : , -1: ] . values . ravel( )) y_pred = clf1. predict(data. iloc[ : , : -1] . values) print( ' Accuracy score: ', accuracy_score (data. iloc[ : , -1: ] . values, y_pred) ) confusion_matrix(data. iloc[ : , -1: ] . values, y_pred) Accuracy score: 0. 9506607929515418 Out [22]: array([[ 973, 0 , 26, 0 , 34] , 121, 1, 3, 0 , 01, 1, 550, 41, 0, 2], 67. 9, 38, 2768, 4, 19]. 0, 0, 0 , 2, 136, 2] , 20, 2, 1 , 8 , 847]], dtype=int64) In [15 ]: # b) leaf_size = list(range(1, 10) ) n_neighbors = list(range(1, 5) )In [ ]: hyperparameters = dict(leaf_size=leaf_size, n_neighbors=n_neighbors) clf2 = KNeighborsClassifier() clf3 = GridSearchCV(clf2, hyperparameters, cv=5) best_model = clf3. fit(data . iloc[ : , : -1] . values, data. iloc[ :, -1: ] . values. ravel( )) print( 'Best leaf_size: ', best_model. best_estimator_. get_params( ) ['leaf_size' ]) print( 'Best n_neighbors: ', best_model. best_estimator_. get_params( ) ['n_neighbor s' ] ) In [ ]: clf4 = KNeighborsClassifier(n_neighbors=3, leaf_size=1) . fit(data. iloc[ :, : -1] . va lues, data . iloc[ : , -1: ]. values . ravel( ) ) y_pred = clf1. predict(data. iloc[ :, : -1] . values) print( ' Accuracy score: ' , accuracy_score(data. iloc[:, -1: ] . values, y_pred) ) print ( ' confusion Matrix: ') confusion_matrix (data . iloc [ : , -1: ]. values, y_pred) In [26]: #Naive Bayes Code # a ) clf1 = GaussianNB( ) . fit(data. iloc[ :, : -1] . values, data. iloc[ : , -1: ] . values. ravel ( ) ) y_pred = clf1. predict(data. iloc[ : , : -1] . values) print( 'Accuracy score: ', accuracy_score (data. iloc[ : , -1: ]. values, y_pred) ) confusion_matrix(data . iloc[ : , -1: ] . values, y_pred) Accuracy score: 0. 6754185022026432 Out [26]: array ([[ 879, 0, 0, 143, 1, 10], 0 , 121, 0 , 4, 0 , 0], 1, 3, 471, 115, 4, 0], [ 124, 53, 192, 2228, 240 68] , 0 , 0 , 0 , 11, 129 0] , [ 307, 0 , 9 , 488, 69, 5]], dtype=int64) In [27 ]: # b) hyperparameters = {'var_smoothing' : np. logspace(0, -9, num=100) } clf2 = GaussianNB( ) clf3 = GridSearchCV(clf2, hyperparameters, cv=5) best_model = clf3. fit(data. iloc[ : , : -1] . values, data. iloc[ :, -1: ] . values. ravel( ) ) print( 'Best var_smoothing: ', best_model. best_estimator_. get_params ( ) ['var_smoo thing ' ]) Best var_smoothing: 1. 873817422860387e-09 In [ ]:In [20]: clf4 = GaussianNB(var_smoothing= 1. 2328467394420635e-09) . fit(data. iloc[ : , :-1]. values, data. iloc[ :, -1: ] . values . ravel( )) y_pred = clf4. predict(data. iloc[ : , : -1] . values) print( 'Accuracy score: ', accuracy_score(data. iloc[ : , -1: ] . values, y_pred) ) print( ' confusion Matrix: ') confusion_matrix(data . iloc[ : , -1: ]. values, y_pred) Accuracy score: 0. 6755947136563877 confusion Matrix: Out [20]: array ([[ 879, 0 0 , 143, 1, 10], 0 , 121, 0 , 4 0 , 01, 1, 2 , 471, 116 , 4 , 01, 124, 52, 192, 2229, 240, 68], 0 , 0 , 0 , 11, 129 01, 307, 9 , 488 , 69 5]], dtype=int64) C) Explain the Confusion Matrix and Metrics before a & b The accuracy score associated with the confusion matrix with the K-nn before and parameters optizimation was identical, high 0.95. The accuracy score associated with the confusion matrix with Naive Bayes before and after parameters optimization was very close, both around .68. These data seem to indicate that a higher accuracy of prediction is obtained through the K-nn prediction methodStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started