Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 28, 2024

#Data Visualization import seaborn as sns import matplotlib.pyplot as plt sns . barplot ( x = class, y = data [ class ]

#Data Visualization

import seaborn as sns

import matplotlib.pyplot as plt

sns $.$ barplot $($ x $=$ "class", y $=$ data $["$ class $"] .$ index, palette $=$ 'mako', data $=$ mushroom $_$ data $)$

#The number of poisonous mushrooms is almost twice the number of normal mushrooms. There is an imbalance data problem.

#We will be using Matplotlib pyplot and Seaborn to plot our data.

# $% %$

from sklearn import preprocessing

#Label encoding is used to convert categorical features to numerical values.

def label $_$ encode $_$ fit $($ mushroom $_$ data, columns $)$ :

result $=$ mushroom $_$ data.copy $()$

encoders $= {}$

for column in columns:

encoder $=$ preprocessing.LabelEncoder $()$

result $[$ column $] =$ encoder.fit $_$ transform $($ result $[$ column $])$

encoders $[$ column $] =$ encoder

return result, encoders

# $% %$

data $1,$ encoders $1 =$ label $_$ encode $_$ fit $($ data $,$ data.columns $)$

data $1 .$ head $(10)$

# $% %$

def correlation $_$ map $($ mushroom $_$ data, method $)$ :

corr $=$ mushroom $_$ data.corr $($ method $)$

ix $=$ corr.sort $_$ values $('$ class $',$ ascending $=$ False $) .$ index

df $_$ sorted $_$ by $_$ correlation $=$ mushroom $_$ data.loc $[$ : $,$ ix $]$

corr $=$ df $_$ sorted $_$ by $_$ correlation.corr $($ method $)$

plt $.$ subplots $($ figsize $= (18, 14))$

with sns $.$ axes $_$ style $("$ white $")$ :

# display a correlation heatmap

ax $=$ sns $.$ heatmap $($ corr $,$ annot $=$ True $)$

plt $.$ show $()$

# $% %$

correlation $_$ map $($ data $1,$ method $=$ "spearman" $)$

#Gill $_$ size has the highest correlation with class. It should be included to the model.

#There some highly correlated variables such as $,$ gill $-$ color & ring $-$ type, gill $-$ color & bruises, bruises & stalk $-$ surface $-$ below $-$ ring etc. These highly correlated variables ohuld be discarded from the model to obtain more accurate results.

# $% %$

y $=$ data $1 [['$ class $']]$ # contains only "class", target, variable.

X $=$ data $1 .$ iloc $[$ : $, 1$ : $]$ # contains independent variable.

# $% %$

from sklearn.feature $_$ selection import SelectKBest

import numpy as np

def SelectKBestCustomized $($ mushroom $_$ data, k $,$ score $_$ func, target $=$ "class" $)$ :

X $=$ mushroom $_$ data.drop $($ columns $=$ target $)$

y $=$ mushroom $_$ data $[$ target $]$

np $.$ random.seed $(123)$ # for mutual $_$ info regression

fs $=$ SelectKBest $($ score $_$ func $=$ score $_$ func, k $=$ k $)$

fs $.$ fit $($ X $,$ y $)$

mask $=$ fs $.$ get $_$ support $()$

selected $_$ features $= [$ feature for bool, feature in zip $($ mask $,$ X $.$ columns $)$ if bool $]$

return selected $_$ features

# $% %$

from sklearn.feature $_$ selection import mutual $_$ info $_$ classif

mutual $_$ info $_$ classif $($ X $,$ y $,$ random $_$ state $= 123)$

# $% %$

mutual $_$ info $_$ selection $=$ SelectKBestCustomized $($ data $1, 9,$ mutual $_$ info $_$ classif $)$

# $% %$

mutual $_$ info $_$ selection

# $% %$

X $_$ new $=$ X $[['$ odor $',$ 'gill $-$ size',

'gill $-$ color',

'stalk $-$ surface $-$ above $-$ ring',

'stalk $-$ surface $-$ below $-$ ring',

'stalk $-$ color $-$ above $-$ ring',

'stalk $-$ color $-$ below $-$ ring',

'ring $-$ type',

'spore $-$ print $-$ color' $]]$

# $% %$

data $_$ selected $_$ features $=$ data $1 [['$ odor $',$

'gill $-$ size',

'gill $-$ color',

'stalk $-$ surface $-$ above $-$ ring',

'stalk $-$ surface $-$ below $-$ ring',

'stalk $-$ color $-$ above $-$ ring',

'stalk $-$ color $-$ below $-$ ring',

'ring $-$ type',

'spore $-$ print $-$ color',

'class' $]]$

# $% %$

a $= 5$ # number of rows

b $= 3$ # number of columns

c $= 1$ # initialize plot counter

fig $=$ plt $.$ figure $($ figsize $= (14, 22))$

for i in data $_$ selected $_$ features:

plt $.$ subplot $($ a $,$ b $,$ c $)$

#plt $.$ title $(' {},$ subplot: ${} {} {}' .$ format $($ i $,$ a $,$ b $,$ c $))$

plt $.$ xlabel $($ i $)$

sns $.$ barplot $($ x $=$ i $,$ y $=$ data $_$ selected $_$ features $[$ i $] .$ index, palette $=$ 'Set $3_$ r $',$ hue $=$ "class", data $=$ data $_$ selected $_$ features $)$

c $=$ c $+ 1$

plt $.$ show $()$ THE PYTHON CODE GIVEN ABOVE IS RELATED TO RANDOM FOREST CLASSIFICATION IN THE DATA SCIENCE COURSE.

PLEASE INTERPRET THIS CODE AND PREPARE A REPORT ACCORDING TO THE SUBJECTS AND CODES.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Graph Database Modeling With Neo4j

Authors: Ajit Singh

2nd Edition

B0BDWT2XLR, 979-8351798783

More Books

Students also viewed these Databases questions

Question

★★★★★

In the article Router Roundup (Popular Mechanics, Vol. 180, No. 12, pp. 104109), T. Klenck reported on tests of seven fixed-base routers for performance, features, and handling. The following table...

Answered: 1 week ago

Question

★★★★★

=+8.38. 5.121 Consider an irreducible, aperiodic, positive persistent chain. Let T, be the smallest n such that X ,, = j, and let m ;, = E,[7;]. Show that there is an r such that p= PiX, + j. ..., X,...

Answered: 1 week ago

Question

★★★★★

Erin Kramer Inc. has recently hired a new independent auditor, Jodie Larson, who says she wants to get everything straightened out. Consequently, she has proposed the following accounting changes in...

Answered: 1 week ago

Question

★★★★★

Course: BUSINESS FINANCE 1: 1. Present Value and Multiple Cash Flows. Ancelet Co. has identified an investment project with the following cash flows. If the discount rate is 10 percent, what is the...

Answered: 1 week ago

Question

★★★★★

Content Area Which of the following is a way to accomplish an activity cost reduction? a. use lower-cost materials b. change the classification of employees doing an activity so as to increase the...

Answered: 1 week ago

Question

★★★★★

Which area do you think is most important for business success and explain 5 reasons why the selected topic is more important than other three in Operations Management 1. Operations strategy, 2....

Answered: 1 week ago

Question

★★★★★

Companies must have quantifiable and measurable strategic objectives so actual performance can be assessed and compared against planned metrics or figures. The sooner the monitoring and control...

Answered: 1 week ago

Question

★★★★★

Develop a complete mission statement for Belikin Beer Company in Belize. Make sure it covers all 9 components of a Mission Statement being: customers, products / services, markets, technology,...

Answered: 1 week ago

Question

★★★★★

Briefly describewhy you think that hotels, especially room operations place such an important and recurring training message to our employees on personalizing the guest experience

Answered: 1 week ago

Question

★★★★★

Differentiate between a problem-seeking decision-making strategy and a problem-solving decision-making strategy

Answered: 1 week ago

Question

★★★★★

C Does self-policing work on the Internet? What circumstances might inhibit a groups ability to selfpolice?

Answered: 1 week ago

Question

★★★★★

A Do you participate in Internet forums? Do you prefer moderated or open forums? What makes you prefer one over the other?

Answered: 1 week ago

Question

★★★★★

B Which is more important, a free-speech open forum or a managed, productive conflict? Do you think its necessary to trade off one for the other?

Answered: 1 week ago

Previous Question Next Question