SMS Spam Classification Detecting Unwanted Messages Life Cycle of the Project Steps to be Performed Introduction Problem Statement Data Checks to Perform Data Cleaning EDA Text Preprocessing Model Training Evaluation Conclusion Author Message 1 Introduction This Kaggle notebook presents a step by step guide to building an efficient SMS spam classification model using the SMS Spam Collection dataset By the end of this notebook, you'll have a powerful tool to help you filter out unwanted messages and ensure that your text messaging experience is smoother and safer 2 Problem Statement The primary goal of this notebook is to develop a predictive model that accurately classifies incoming SMS messages as either ham or spam We will use the SMS Spam Collection dataset, which consists of 5 , 5 7 4 SMS messages tagged with their respective labels 3 Data Checks to Perform 3 1 Import Necessary Libraries Importing necessary libraries import numpy as np For numerical operations import pandas as pd For data manipulation and analysis import matplotlib pyplot as plt For data visualization matplotlib inline Importing WordCloud for text visualization from wordcloud import WordCloud Importing NLTK for natural language processing import nltk from nltk corpus import stopwords For stopwords Downloading NLTK data nltk download ( ' stopwords ' ) Downloading stopwords data nltk download ( ' punkt ' ) Downloading tokenizer data opt conda lib python 3 1 0 site packages scipy init py 1 4 6 UserWarning A NumPy version 1 1 6 5 and 1 2 3 0 is required for this version of SciPy ( detected version 1 2 3 5 warnings warn ( f A NumPy version np minversion and np maxversion nltk data Downloading package stopwords to usr share nltk data nltk data Package stopwords is already up to date nltk data Downloading package punkt to usr share nltk data nltk data Package punkt is already up to date True Back to the Top 3 2 Load the Data df pd read csv ( ' kaggle input sms spam collection dataset spam csv ' , encoding 'latin 1 ' ) styled df df head ( ) styled df styled df style set table styles ( selector th , props ( color , 'black' ) , ( background color , FF 0 0 CC ) ) styled df v 1 v 2 Unnamed 2 Unnamed 3 Unnamed 4 0 ham Go until jurong point, crazy Available only in bugis n great world la e buffet Cine there got amore wat nan nan nan 1 ham Ok lar Joking wif u oni nan nan nan 2 spam Free entry in 2 a wkly comp to win FA Cup final tkts 2 1 st May 2 0 0 5 Text FA to 8 7 1 2 1 to receive entry question ( std txt rate ) T C ' s apply 0 8 4 5 2 8 1 0 0 7 5 over 1 8 ' s nan nan nan 3 ham U dun say so early hor U c already then say nan nan nan 4 ham Nah I don't think he goes to usf, he lives around here though nan nan nan Back to the Top 4 Data Cleaning 4 1 Data Info df info ( ) RangeIndex 5 5 7 2 entries, 0 to 5 5 7 1 Data columns ( total 5 columns ) Column Non Null Count Dtype 0 v 1 5 5 7 2 non null object 1 v 2 5 5 7 2 non null object 2 Unnamed 2 5 0 non null object 3 Unnamed 3 1 2 non null object 4 Unnamed 4 6 non null object dtypes object ( 5 ) memory usage 2 1 7 8 KB 4 2 Drop the Columns df drop ( columns ' Unnamed 2 ' , 'Unnamed 3 ' , 'Unnamed 4 ' , inplace True ) styled df df head ( 5 ) style Modify the color and background color of the table headers ( th ) styled df set table styles ( selector th , props ( color , 'Black' ) , ( background color , FF 0 0 CC ) , ( ' font weight', 'bold' ) ) v 1 v 2 0 ham Go until jurong point, crazy Available only in bugis n great world la e buffet Cine there got amore wat 1 ham Ok lar Joking wif u oni 2 spam Free entry in 2 a wkly comp to win FA Cup final tkts 2 1 st May 2 0 0 5 Text FA to 8 7 1 2 1 to receive entry question ( std txt rate ) T C ' s apply 0 8 4 5 2 8 1 0 0 7 5 over 1 8 ' s 3 ham U dun say so early hor U c already then say 4 ham Nah I don't think he goes to usf, he lives around here though 4 3 Rename the Column Rename the columns name df rename ( columns ' v 1 ' 'target', ' v 2 ' 'text' , inplace True ) 4 4 Convert the target variable from sklearn preprocessing import LabelEncoder encoder LabelEncoder ( ) df ' target ' encoder fit transform ( df ' target ' ) styled df df head ( ) style Modify the color and background color of the table headers ( th ) styled df set table styles ( selector th , props ( color , 'Black' ) , ( background color , FF 0 0 CC ) , ( ' font weight', 'bold' ) ) target text 0 0 Go until jurong point, crazy Available only in bugis n great world la e buffet Cine there got amore wat 1 0 Ok lar Joking wif u oni 2 1 Free entry in 2 a wkly comp to win FA Cup final t Can you explain this code

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 06, 2024

SMS Spam Classification: Detecting Unwanted Messages Life Cycle of the Project Steps to be Performed Introduction Problem Statement Data Checks to Perform Data Cleaning EDA

SMS Spam Classification: Detecting Unwanted Messages

Life Cycle of the Project

Steps to be Performed

Introduction

Problem Statement

Data Checks to Perform

Data Cleaning

EDA

Text Preprocessing

Model Training

Evaluation

Conclusion

Author Message

1 .

Introduction

This Kaggle notebook presents a step

-

-

step guide to building an efficient SMS spam classification model using the SMS Spam Collection dataset. By the end of this notebook, you'll have a powerful tool to help you filter out unwanted messages and ensure that your text messaging experience is smoother and safer.

2 .

Problem Statement

The primary goal of this notebook is to develop a predictive model that accurately classifies incoming SMS messages as either ham or spam. We will use the SMS Spam Collection dataset, which consists of

5, 574

SMS messages tagged with their respective labels.

3 .

Data Checks to Perform

3.1

Import Necessary Libraries

# Importing necessary libraries

import numpy as np # For numerical operations

import pandas as pd # For data manipulation and analysis

import matplotlib.pyplot as plt # For data visualization

%

matplotlib inline

# Importing WordCloud for text visualization

from wordcloud import WordCloud

# Importing NLTK for natural language processing

import nltk

from nltk

.

corpus import stopwords # For stopwords

# Downloading NLTK data

nltk

.

download

('

stopwords

')

# Downloading stopwords data

nltk

.

download

('

punkt

')

# Downloading tokenizer data

/

opt

/

conda

/

lib

/

python

3.10 /

site

-

packages

/

scipy

/__

init

__.

py:

146

: UserWarning: A NumPy version

> = 1.16.5

and

< 1.23.0

is required for this version of SciPy

(

detected version

1.23.5

warnings.warn

(

"

A NumPy version

> = {

_

minversion

}

and

< {

_

maxversion

} "

[

nltk

_

data

]

Downloading package stopwords to

/

usr

/

/

nltk

_

data...

[

nltk

_

data

]

Package stopwords is already up

-

-

date!

[

nltk

_

data

]

Downloading package punkt to

/

usr

/

/

nltk

_

data...

[

nltk

_

data

]

Package punkt is already up

-

-

date!

True

Back to the Top

3.2

Load the Data

=

.

read

_

csv

(' /

kaggle

/

input

/

sms

-

spam

-

collection

-

dataset

/

spam

.

csv

',

encoding

=

'latin

1')

styled

_

=

.

head

()

styled

_

=

styled

_

.

style.set

_

table

_

styles

([

{"

selector

"

"

",

"props":

[("

color

",

'black'

), ("

background

-

color",

"

#FF

00

")]}

])

styled

_

1

2

Unnamed:

2

Unnamed:

3

Unnamed:

4

0

ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... nan nan nan

1

ham Ok lar... Joking wif u oni... nan nan nan

2

spam Free entry in

2

a wkly comp to win FA Cup final tkts

21

st May

2005 .

Text FA to

87121

to receive entry question

(

std txt rate

)

T&C

'

s apply

08452810075

over

18'

s nan nan nan

3

ham U dun say so early hor... U c already then say... nan nan nan

4

ham Nah I don't think he goes to usf, he lives around here though nan nan nan

Back to the Top

4 .

Data Cleaning

4.1 |

Data Info

.

info

()

RangeIndex:

5572

entries,

0

5571

Data columns

(

total

5

columns

)

# Column Non

-

Null Count Dtype

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

0

1 5572

non

-

null object

1

2 5572

non

-

null object

2

Unnamed:

2 50

non

-

null object

3

Unnamed:

3 12

non

-

null object

4

Unnamed:

4 6

non

-

null object

dtypes: object

(5)

memory usage:

217.8 +

4.2 |

Drop the Columns

.

drop

(

columns

= ['

Unnamed:

2',

'Unnamed:

3',

'Unnamed:

4'],

inplace

=

True

)

styled

_

=

.

head

(5) .

style

# Modify the color and background color of the table headers

(

)

styled

_

.

set

_

table

_

styles

([

{"

selector

"

"

",

"props":

[("

color

",

'Black'

), ("

background

-

color",

"

#FF

00

"), ('

font

-

weight', 'bold'

)]}

])

1

2

0

ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...

1

ham Ok lar... Joking wif u oni...

2

spam Free entry in

2

a wkly comp to win FA Cup final tkts

21

st May

2005 .

Text FA to

87121

to receive entry question

(

std txt rate

)

T&C

'

s apply

08452810075

over

18'

3

ham U dun say so early hor... U c already then say...

4

ham Nah I don't think he goes to usf, he lives around here though

4.3 |

Rename the Column

# Rename the columns name

.

rename

(

columns

= {'

1'

: 'target',

'

2'

: 'text'

},

inplace

=

True

)

4.4 |

Convert the target variable

from sklearn.preprocessing import LabelEncoder

encoder

=

LabelEncoder

()

['

target

'] =

encoder.fit

_

transform

(

['

target

'])

styled

_

=

.

head

() .

style

# Modify the color and background color of the table headers

(

)

styled

_

.

set

_

table

_

styles

([

{"

selector

"

"

",

"props":

[("

color

",

'Black'

), ("

background

-

color",

"

#FF

00

"), ('

font

-

weight', 'bold'

)]}

])

target text

0 0

Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...

1 0

Ok lar... Joking wif u oni...

2 1

Free entry in

2

a wkly comp to win FA Cup final t

Can you explain this code

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Records And Database Management

Authors: Jeffrey R Stewart Ed D, Judith S Greene, Judith A Hickey

4th Edition

★★★★★

A What other items, besides clothing, nonverbally communicate information about a high school student?

Answered: 1 week ago

Previous Question Next Question