Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Note: Do not filter out punctuation, since those tokens will be exactly the ones we want to consider as potential sentence boundaries! The data/brown directory
Note: Do not filter out punctuation, since those tokens will be exactly the ones we want to consider as potential sentence boundaries! The data/brown directory includes three English-language text files taken from the Brown Corpus: - editorial.txt - fiction.txt - lore.txt These files represent large strings of natural language text, with no line breaks nor other special symbols to annotate where sentence splits occur. In the data set you are working with, sentences can only end with one of 5 characters: period, colon, semi-colon, exclamation point and question mark. However, there is a catch: not every period represents the end of a sentence. Many abbreviations (U.S.A., Dr., Mon., etc., etc.) that can appear in the middle of a sentence, and the period does not indicate the end of the sentence. (If you have a phone that uses autocomplete to type, you may already have had annoying experiences where it automatically capitalized words after these abbreviations!) These texts also have many examples where colon is not the end of the sentence. The other three punctuation marks are all nearly unambiguously the ends of a sentence (yes, even semi-colons). For each of the above files, I have also provided a file in the same directory containing the character index (counting from 0 for the first character) of each of the actual locations of the ends of sentences: - editorial-eos.txt - fiction-eos.txt - lore-eos.txt Your job is to write a sentence segmenter, and to output the predicted token number of each sentence boundary
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started