Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The following cell defines a regular expression for a simple tokenizer. As written, it divides tokens up at spaces with two exceptions: punctuation marks are

image text in transcribed
The following cell defines a regular expression for a simple tokenizer. As written, it divides tokens up at spaces with two exceptions: punctuation marks are tokens by themselves and the contraction n 't is treated as a separate token. Modify this expression so that it meets the following additional requirements: - the punctuation marks ,", and ... (left double apostrophe, right double apostrophe, and ellipsis) should be single tokens - like n 't, the contractions 've, 'II, 're, and 's should be seperate tokens - numbers should be separate tokens, where: - a number may start with $ or end with \% - a number may start with or contain a comma or decimal point but may not end with one (technically, number tokens shouldn't start with a comma but it's okay if your regular expression aliows it) Since we're using re.findalt to separate tokens, be sure to only use non-capturing groups (?:) )

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional SQL Server 2012 Internals And Troubleshooting

Authors: Christian Bolton, Justin Langford

1st Edition

1118177657, 9781118177655

More Books

Students also viewed these Databases questions

Question

Discuss globalization issues for small to medium-sized businesses.

Answered: 1 week ago