Answered step by step
Verified Expert Solution
Question
1 Approved Answer
#Train Test Split from sklearn.model _ selection import train _ test _ split X _ train, X _ test, y _ train, y _ test
#Train Test Split
from sklearn.modelselection import traintestsplit
Xtrain, Xtest, ytrain, ytest traintestsplitXnew, y testsize randomstate
# of the given data is used as testing data, the remaining is training data. This selection was made randomly.
#
from sklearn.ensemble import RandomForestClassifier
rfdefault RandomForestClassifierrandomstate
rfdefault.fitXtrain, ytrain
ypredictrf rfdefault.predictXtest
#
def evaluatemodelypredict, ytest:
# Evaluate the performance of model using the test data.
# Use accuracy score, precision, recall and confusion matrix as performance metrics.
confusionmatrix metrics.confusionmatrixytest, ypredict
snsheatmapconfusionmatrix annotTrue, fmtd
printAccuracy: :fformatmetricsaccuracyscoreytest, ypredict
Precision: :fformatmetricsprecisionscoreytest, ypredict
Recall: :fformatmetricsrecallscoreytest, ypredict
Confusion Matrix:
#
evaluatemodelypredictrf ytest
#We obtain highest accuracy level, precision and recall. However, we can use grid search cross validation to check our models performance again.
#Accoding to Breiman who proposed Random Forest, maxfeatures and nestimators ar most important parameters of Random Forest. We can try to optimize them.
#In addition to this we may try to balance the class weights to overcome imbalance data problem.
#
params
'maxfeatures': auto "sqrt", "log
nestimators':
#
rfdefault RandomForestClassifierclassweight "balancedsubsample", randomstate
stratifiedkfold StratifiedKFoldnsplits shuffle True, randomstate
gridsearch GridSearchCVrfdefault, params, njobs cvstratifiedkfold, verbose
gridsearchresults gridsearch.fitXnew, yvalues.ravel
#
target 'class'
X mushroomdata.dropcolumnstarget
y mushroomdatatarget
printfY shape yshape
printfX shape Xshape
#
from sklearn.modelselection import traintestsplit
Xtrain, Xtest, ytrain, ytest traintestsplitX y testsize randomstate
printfshape of X Train Xtrain.shape
printfshape of X Test Xtest.shape
printfshape of Y Train ytrain.shape
printfshape of Y Test ytest.shape
#
accbaseline ytrain.valuecountsnormalize Truemax
printfAccuracy of baseline accbaseline
#
from sklearn.preprocessing import OrdinalEncoder
from sklearn.pipeline import makepipeline
from sklearn.ensemble import RandomForestClassifier
clf makepipelineOrdinalEncoder
RandomForestClassifierrandomstate
params
'randomforestclassifiernestimators': range
'randomforestclassifiermaxdepth': range
params
#
# summarize results
printBest: f using sgridsearchresults.bestscore gridsearchresults.bestparams
#
from sklearn.modelselection import GridSearchCV
model GridSearchCV
clf
paramgrid params,
cv
njobs
verbose
model
#
model.fitXtrain ytrain
#
cvresults pdDataFramemodelcvresults
cvresults.sortvaluesby 'ranktestscore'
#
#evaluatemodelypred, y
cvresults.sortvaluesby'ranktestscore'
rfmodel RandomForestClassifierclassweight"balancedsubsample", maxfeatures'auto', nestimators
randomstate
rfdefault.fitX y
ypred rfdefault.predictX
#
features Xtest.columns
importances model.bestestimatornamedstepsrandomforestclassifierfeatureimportances
featimp pdSeriesimportances index featuressortvalues
featimp.tailplotkind 'barh'
pltxlabelGini Importance"
pltylabelFeature
plttitleFeature Importance"; THE PYTHON CODE GIVEN ABOVE IS RELATED TO RANDOM FOREST CLASSIFICATION IN THE DATA SCIENCE COURSE.
PLEASE INTERPRET THIS CODE AND PREPARE A REPORT and presentation ACCORDING TO THE SUBJECTS AND CODES.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started