Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields for downstream analysis. You've been asked to write

  

The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields for downstream analysis. You've been asked to write a function tokenise that does the following: 1. Accepts two arguments, raw_text and window_size, and returns a list of tuples, where the tuple contains the sub-word tokens for each word in raw_text 2. Splits raw_text into individual words on spaces and punctuation (except apostrophes) 3. Appends angle brackets to either end of a word to delimit the start and end of the word (e.g. ) 4. Creates sub-word tokens of window_size length. If a word (including the angle brackets) is shorter than window_size, then no sub-word tokens other than the special token (step 5) should be generated. 5. Appends a single, special token at the end of the sub-word token list containing the entire word with angle brackets NB. The implementation and partial solutions will be assessed too. Please don't panic if you can't pass all of the test cases! Example ### Running your function >>> tokenise (raw_text="hello, world", window_size=3) ### Returns ### NOTE: the output below has been formatted for readability. ### Your function just needs to output the list of tuples [ ] (' ', ' '), (' ', ' ') # Complete the 'tokenise' function below. # # The type signatures have been completed for you # You may use helper functions to modularise your code def tokenise (raw_text: str, window_size: int) -> List [Tuple [str]]: # Write your code here name__ == ' __main__':- > if 11

Step by Step Solution

3.40 Rating (153 Votes )

There are 3 Steps involved in it

Step: 1

To create the tokenise function based on the requirements youve pro... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Smith and Roberson Business Law

Authors: Richard A. Mann, Barry S. Roberts

15th Edition

1285141903, 1285141903, 9781285141909, 978-0538473637

More Books

Students also viewed these Programming questions

Question

What is the difference between entropy and enthalpy?

Answered: 1 week ago