Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Help with Exercise 2 Exercise 1 for Reference: Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns,

image text in transcribedimage text in transcribed

Help with Exercise 2

Exercise 1 for Reference:

Exercise 1: a) Use the Machine Learning algorithms: k-NN, and Nave Bayes to classify multiphase flow patterns, using the database BDOShohamIML.csv and evaluate the performance. b) Apply parameters optimization to (a) and evaluate the performance. c) Explain the Confusion Matrix and metrics obtained in (a) y (b), that is, before and after parameters optimization.

Code for Exercise 1:

image text in transcribedimage text in transcribedimage text in transcribed
In [14]: import numpy as np import pandas as pd from sklearn. neighbors import KNeighborsClassifier from sklearn. metrics import accuracy_score from sklearn. model_selection import GridSearchCV from sklearn. naive_bayes import GaussianNB from sklearn. metrics import confusion_matrix data = pd. read_csv( 'Data_Glioblastoma5Patients_SC. csv' ) print ( ' Shape: ' , data . shape) data . head ( ) Shape: (430, 5949) Out [14 ] : A2M AAAS AAK1 AAMP AARS AARSD1 AASDH AASDHPPT AA 0 -3.80147 -3.889900 -3.985616 2.651558 2.170748 -2.550822 4.807330 3.961170 -0.1926 1 -3.80147 -3.889900 -3.158708 2.358992 -6.041792 -0.056092 3.606735 -2.632250 2.2493 2 -3.80147 -3.889900 1.733125 -5.820241 -6.041792 -0.576957 -2.473517 -4.354127 0.063 3 -3.80147 -3.889900 -1.665669 3.514271 -6.041792 -3.699171 4.509461 -4.354127 2.985 4 -3.80147 3.742495 -2.166992 -5.820241 2.094729 4.021873 5.535007 4.019633 2.5603 5 rows x 5949 columns In [22]: #Knn Code # a ) clf1 = KNeighborsClassifier(n_neighbors=3) . fit(data . iloc[ : , : -1] . values, data. il oc[ : , -1: ] . values . ravel( )) y_pred = clf1. predict(data. iloc[ : , : -1] . values) print( ' Accuracy score: ', accuracy_score (data. iloc[ : , -1: ] . values, y_pred) ) confusion_matrix(data. iloc[ : , -1: ] . values, y_pred) Accuracy score: 0. 9506607929515418 Out [22]: array([[ 973, 0 , 26, 0 , 34] , 121, 1, 3, 0 , 01, 1, 550, 41, 0, 2], 67. 9, 38, 2768, 4, 19]. 0, 0, 0 , 2, 136, 2] , 20, 2, 1 , 8 , 847]], dtype=int64) In [15 ]: # b) leaf_size = list(range(1, 10) ) n_neighbors = list(range(1, 5) )In [ ]: hyperparameters = dict(leaf_size=leaf_size, n_neighbors=n_neighbors) clf2 = KNeighborsClassifier() clf3 = GridSearchCV(clf2, hyperparameters, cv=5) best_model = clf3. fit(data . iloc[ : , : -1] . values, data. iloc[ :, -1: ] . values. ravel( )) print( 'Best leaf_size: ', best_model. best_estimator_. get_params( ) ['leaf_size' ]) print( 'Best n_neighbors: ', best_model. best_estimator_. get_params( ) ['n_neighbor s' ] ) In [ ]: clf4 = KNeighborsClassifier(n_neighbors=3, leaf_size=1) . fit(data. iloc[ :, : -1] . va lues, data . iloc[ : , -1: ]. values . ravel( ) ) y_pred = clf1. predict(data. iloc[ :, : -1] . values) print( ' Accuracy score: ' , accuracy_score(data. iloc[:, -1: ] . values, y_pred) ) print ( ' confusion Matrix: ') confusion_matrix (data . iloc [ : , -1: ]. values, y_pred) In [26]: #Naive Bayes Code # a ) clf1 = GaussianNB( ) . fit(data. iloc[ :, : -1] . values, data. iloc[ : , -1: ] . values. ravel ( ) ) y_pred = clf1. predict(data. iloc[ : , : -1] . values) print( 'Accuracy score: ', accuracy_score (data. iloc[ : , -1: ]. values, y_pred) ) confusion_matrix(data . iloc[ : , -1: ] . values, y_pred) Accuracy score: 0. 6754185022026432 Out [26]: array ([[ 879, 0, 0, 143, 1, 10], 0 , 121, 0 , 4, 0 , 0], 1, 3, 471, 115, 4, 0], [ 124, 53, 192, 2228, 240 68] , 0 , 0 , 0 , 11, 129 0] , [ 307, 0 , 9 , 488, 69, 5]], dtype=int64) In [27 ]: # b) hyperparameters = {'var_smoothing' : np. logspace(0, -9, num=100) } clf2 = GaussianNB( ) clf3 = GridSearchCV(clf2, hyperparameters, cv=5) best_model = clf3. fit(data. iloc[ : , : -1] . values, data. iloc[ :, -1: ] . values. ravel( ) ) print( 'Best var_smoothing: ', best_model. best_estimator_. get_params ( ) ['var_smoo thing ' ]) Best var_smoothing: 1. 873817422860387e-09 In [ ]:In [20]: clf4 = GaussianNB(var_smoothing= 1. 2328467394420635e-09) . fit(data. iloc[ : , :-1]. values, data. iloc[ :, -1: ] . values . ravel( )) y_pred = clf4. predict(data. iloc[ : , : -1] . values) print( 'Accuracy score: ', accuracy_score(data. iloc[ : , -1: ] . values, y_pred) ) print( ' confusion Matrix: ') confusion_matrix(data . iloc[ : , -1: ]. values, y_pred) Accuracy score: 0. 6755947136563877 confusion Matrix: Out [20]: array ([[ 879, 0 0 , 143, 1, 10], 0 , 121, 0 , 4 0 , 01, 1, 2 , 471, 116 , 4 , 01, 124, 52, 192, 2229, 240, 68], 0 , 0 , 0 , 11, 129 01, 307, 9 , 488 , 69 5]], dtype=int64) C) Explain the Confusion Matrix and Metrics before a & b The accuracy score associated with the confusion matrix with the K-nn before and parameters optizimation was identical, high 0.95. The accuracy score associated with the confusion matrix with Naive Bayes before and after parameters optimization was very close, both around .68. These data seem to indicate that a higher accuracy of prediction is obtained through the K-nn prediction method

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Core Concepts Of Accounting Information Systems

Authors: Nancy A. Bagranoff, Mark G. Simkin, Carolyn Strand Norman

11th Edition

9780470507025, 0470507020

Students also viewed these Mathematics questions

Question

Date the application was sent

Answered: 1 week ago