Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Note: Do not filter out punctuation, since those tokens will be exactly the ones we want to consider as potential sentence boundaries! The data/brown directory

image text in transcribed

image text in transcribed

image text in transcribedimage text in transcribed

Note: Do not filter out punctuation, since those tokens will be exactly the ones we want to consider as potential sentence boundaries! The data/brown directory includes three English-language text files taken from the Brown Corpus: - editorial.txt - fiction.txt - lore.txt These files represent large strings of natural language text, with no line breaks nor other special symbols to annotate where sentence splits occur. In the data set you are working with, sentences can only end with one of 5 characters: period, colon, semi-colon, exclamation point and question mark. However, there is a catch: not every period represents the end of a sentence. Many abbreviations (U.S.A., Dr., Mon., etc., etc.) that can appear in the middle of a sentence, and the period does not indicate the end of the sentence. (If you have a phone that uses autocomplete to type, you may already have had annoying experiences where it automatically capitalized words after these abbreviations!) These texts also have many examples where colon is not the end of the sentence. The other three punctuation marks are all nearly unambiguously the ends of a sentence (yes, even semi-colons). For each of the above files, I have also provided a file in the same directory containing the character index (counting from 0 for the first character) of each of the actual locations of the ends of sentences: - editorial-eos.txt - fiction-eos.txt - lore-eos.txt Your job is to write a sentence segmenter, and to output the predicted token number of each sentence boundary

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Explain how brainstorming is used in advertising.

Answered: 1 week ago

Question

c. What were you expected to do when you grew up?

Answered: 1 week ago

Question

4. Describe how cultural values influence communication.

Answered: 1 week ago

Question

3. Identify and describe nine cultural value orientations.

Answered: 1 week ago