Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields for downstream analysis. You've been asked to write
The Applied Sciences team is seeking to create a sub-word tokeniser to preprocess free- text fields for downstream analysis. You've been asked to write a function tokenise that does the following: 1. Accepts two arguments, raw_text and window_size, and returns a list of tuples, where the tuple contains the sub-word tokens for each word in raw_text 2. Splits raw_text into individual words on spaces and punctuation (except apostrophes) 3. Appends angle brackets to either end of a word to delimit the start and end of the word (e.g. ) 4. Creates sub-word tokens of window_size length. If a word (including the angle brackets) is shorter than window_size, then no sub-word tokens other than the special token (step 5) should be generated. 5. Appends a single, special token at the end of the sub-word token list containing the entire word with angle brackets NB. The implementation and partial solutions will be assessed too. Please don't panic if you can't pass all of the test cases! Example ### Running your function >>> tokenise (raw_text="hello, world", window_size=3) ### Returns ### NOTE: the output below has been formatted for readability. ### Your function just needs to output the list of tuples [ ] (' ', ' '), (' ', ' ') # Complete the 'tokenise' function below. # # The type signatures have been completed for you # You may use helper functions to modularise your code def tokenise (raw_text: str, window_size: int) -> List [Tuple [str]]: # Write your code here name__ == ' __main__':- > if 11
Step by Step Solution
★★★★★
3.40 Rating (153 Votes )
There are 3 Steps involved in it
Step: 1
To create the tokenise function based on the requirements youve pro...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started