Problem 3 TokenizationIf your regular expression is right, all the tests defined below should pass init test ( ) TEST EXAMPLES ( ' This is a test ' , ' This ' , ' is ' , ' a ' , 'test', ' ' ) , ( ' Is this a test ' , ' Is ' , 'this', ' a ' , 'test', ' ' ) , ( I don't think this is a test , ' I , ' do , n ' t , 'think', 'this', ' is , ' a ' , 'test ) , Th y ph c c a t i l y c a l n , ' Th y ' , 'phi', ' ca ' , ' c a ' , ' t i ' , ' l ' , ' y ' , ' ' , ' c a ' , ' l n ' ) , ' Is ' , ' it ' , 'legal', ' to ' , 'shout', The word 'very' is very over used , The word 'very' is very over used , ' 'word', ' , 'very', ' , ' is ' , 'very', 'over', ' ' , 'used' ) , I don't think we'll ve been there yet , ' I ' , ' do ' , n ' t , 'think', ' we ' , ' 1 1 , ' ve , 'been', 'there', 'yet' ) , ( Give me 1 2 apples, please , ' Give ' , ' me ' , ' 1 2 ' , 'apples', ' , ' , 'please' ) , A 2 0 tip on a $ 3 0 tab is 6 dollars , ' A ' , ' 2 0 ' , 'tip', ' on ' , ' a ' , ' $ 3 0 , 'tab', ' is ' , ' 6 ' , 'dollars' ) , Qpytest mark parametrize ( ' text , toks', TEST EXANPLES ) def test tokenizer ( text , toks ) test tokenizer ( text , toks ) assert tokenize ( text ) toks run test ( ) If your regular expression is right, all the tests defined below should pass init test ( ) TEST EXAMPLES ( ' This is a test ' , ' This ' , ' is ' , ' a ' , 'test', ' ' ) , ( ' Is this a test ' , ' Is ' , 'this', ' a ' , 'test', ' ' ) , ( I don't think this is a test , I ' , ' do ' , n ' t , 'think', 'this', ' is ' , ' a ' , 'test' ) , ( Th y phi c c a t i l y c a l n , ' Th y ' , 'phi', ' c ' , ' c a ' , ' t i ' , ' l ' , ' y ' , ' ' , ' c a ' , ' l n ' ) , ( Is it legal to shout ' 'Fire ' in a crowded theater , ' Is ' , ' it ' , 'legal', ' to ' , 'shout', ' , 'Fire', ' ' , ' ' , ' in ' , ' a ' , 'crowded', 'theater', ' ' ) , ( The word 'very' is very over used , ' The ' , 'word', ' , 'very', ' , ' is ' , 'very', 'over', ' ' , 'used' ) , ( I don't think we'll've been there yet , ' I ' , ' do ' , n ' t , 'think', ' we ' , ' 1 1 , ' ve , 'been', 'there , 'yet ) , ( Give me 1 2 apples, please , ' Give ' , ' me ' , ' 1 2 ' , 'apples', ' , ' , 'please' ) , ( A 2 0 tip on a $ 3 0 tab is 6 dollars , ' A ' , ' 2 0 ' , 'tip', ' on ' , ' a ' , ' $ 3 0 ' , 'tab', ' is ' , ' 6 ' , 'dollars' ) , ( They ' re going to pay us 1 0 of $ 1 2 0 , 0 0 0 by Jun 4 , 2 0 2 1 , ' They ' , ' re , 'going', ' to ' , 'pay', ' us ' , ' 1 0 ' , ' of ' , ' $ 1 2 0 , 0 0 0 ' , ' by ' , 'Jun', ' 4 ' , ' , ' , ' 2 0 2 1 ' ) , Qpytest mark parametrize ( ' text , toks', TEST EXAMPLES ) def test tokenizer ( text , toks ) assert tokenize ( text ) toks run test ( ) Modify this expression so that it meets the following additional requirements the punctuation marks ' and ' ' ( left double apostrophe and right double apostrophe ) should be single tokens like n ' t , the contractions ' ve , ' 1 1 , ' re , and ' s should be seperate tokens numbers should be separate tokens, where a number may start with $ or end with a number may start with or contain a comma but may not end with one ( technically , number tokens shouldn't start with a comma but it's okay if your transducer allows it ) tok patterns insert spaces before and after punctuation tok patterns ' punct ' FST re ( r $ rewrite ( ' ' ' ' , ' ' ' ' ) ) insert space before n ' t tok patterns ' contract ' FST re ( r $ rewrite ( ' ' ' ' n ' t ) ) tokenizer FST re ( $punct $contract , tok patterns ) def tokenize ( s ) s list ( tokenizer generate ( s ) ) if len ( s ) 1 return s split ( ) else return None 0 0 s

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 30, 2024

Problem 3 : TokenizationIf your regular expression is right, all the tests defined below should pass: init _ test ( ) TEST _ EXAMPLES (

Problem

3

: TokenizationIf your regular expression is right, all the tests defined below should pass:

init

_

test

()

TEST

_

EXAMPLES

('

This is a test!

', ['

This

','

','

',

'test',

'!']),

('

Is this a test?

', ['

',

'this',

'

',

'test',

' ?']),

("

I don't think this is a test",

['

",'

", "

'

",

'think', 'this',

'

",'

',

'test"

]),

"

y ph

a t

i l

a l

",

['

',

'phi',

'

','

','

','

','

','','

','

']),

['

','

',

'legal',

'

',

'shout', "The word 'very' is very over

-

used",

["

The word 'very' is very over

-

used",

'

'word',

"' ",

'very',

"' . ",'

',

'very', 'over',

' -',

'used'

]),

I don't think we'll"ve been there yet",

['

','

', "

'

",

'think',

'

', "' 11 ", "'

",

'been', 'there', 'yet'

]),

("

Give me

12

apples, please",

['

Give

','

',' 12',

'apples',

',',

'please'

]),

"

20 %

tip on a $

30

tab is

6

dollars",

['

',' 20 %',

'tip',

'

','

','

30 ",

'tab',

'

',' 6',

'dollars'

]),

Qpytest.mark.parametrize

('

text

,

toks', TEST

_

EXANPLES

)

def test

_

tokenizer

(

text

,

toks

)

test

_

tokenizer

(

text

,

toks

)

: assert tokenize

(

text

) =

toks

run

_

test

()

If your regular expression is right, all the tests defined below should pass:

init

_

test

()

TEST

_

EXAMPLES

('

This is a test!

', ['

This

','

','

',

'test',

'!']),

('

Is this a test?

', ['

',

'this',

'

',

'test',

' ?']),

("

I don't think this is a test",

["

','

', "

'

",

'think', 'this',

'

','

',

'test'

]),

("

y phi c

a t

i l

a l

",

['

',

'phi',

'

','

','

','

','

','','

','

']),

("

Is it legal to shout

'

'Fire!

'

in a crowded theater?",

['

','

',

'legal',

'

',

'shout',

"' . ",

'Fire',

'!', "' .','

','

',

'crowded', 'theater',

' ?']),

("

The word 'very' is very over

-

used",

['

The

',

'word',

"' ",

'very',

"' ",'

',

'very', 'over',

' -',

'used'

]),

("

I don't think we'll've been there yet",

['

','

', "

'

",

'think',

'

', "' 11 ", "'

",

'been', 'there", 'yet"

]),

("

Give me

12

apples, please",

['

Give

','

',' 12',

'apples',

',',

'please'

]),

("

20 %

tip on a $

30

tab is

6

dollars",

['

',' 20 %',

'tip',

'

','

','

30^{'},

'tab',

'

',' 6',

'dollars'

]),

("

They

'

re going to pay us

10 %

of $

120, 000

by Jun

4, 2021 ",

['

They

', "'

",

'going',

'

',

'pay',

'

',' 10 %','

','

120, 000','

',

'Jun',

' 4',',',' 2021']),

Qpytest.mark.parametrize

('

text

,

toks', TEST

_

EXAMPLES

)

def test

_

tokenizer

(

text

,

toks

)

assert tokenize

(

text

) = =

toks

run

_

test

()

Modify this expression so that it meets the following additional requirements:

the punctuation marks

'

and

'' (

left double apostrophe and right double apostrophe

)

should be single tokens

like n

'

,

the contractions

'

,' 11,'

,

and

'

s should be seperate tokens

numbers should be separate tokens, where:

@ a number may start with $ or end with

%

a number may start with or contain a comma but may not end with one

(

technically

,

number tokens shouldn't start with a comma but it's okay if your transducer allows it

)

tok

_

patterns

= {}

# insert spaces before and after punctuation

tok

_

patterns

['

punct

'] =

FST

.

(

"

^

rewrite

(''

'' [! ?,]''

'') ")

# insert space before n

'

tok

_

patterns

['

contract

'] =

FST

.

(

"

^

rewrite

(''

'' \frac{n}{\frac{?}{?}}'

) ")

tokenizer

=

FST

.

("

$punct @ $contract", tok

_

patterns

)]

def tokenize

(

)

s =

list

(

tokenizer

.

generate

(

))

if len

(s) = = 1

return

s [] .

split

()

else:

return None

0.0 s

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M. Kroenke, David J. Auer

★★★★★

Payroll Remittance and Reporting INSTRUCTIONS Please select one of the two options below. They are to be completed in MS Word table format and to be placed in the assignment folder on the course page...

Answered: 1 week ago

Previous Question Next Question