Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

As some of you know well, and others of you may be interested to learn, a number of languages (including Chinese and Japanese) are written

As some of you know well, and others of you may be interested to learn, a number of languages (including Chinese and Japanese) are written without spaces between the words. Consequently, software that works with text written in these languages must address the word segmentation probleminferring likely boundaries between consecutive words in the text. If English were written without spaces, the analogous problem would consist of taking a string like meetateight and deciding that the best segmentation is meet at eight (and not me et at eight or meet ate ight 1 or any of a huge number of even less plausible alternatives). How could we automate this process? A simple approach that is at least reasonably effective is to find a segmentation that simply maximizes the cumulative quality of its individual constituent words. Thus, suppose you are given a black box that, for any string of letters x = x1x2 xk, will return a number quality(x). This number can be either positive or negative; larger numbers correspond to more plausible English words. (So quality("me") would be positive while quality("ight") would be negative.) Given a long string of letters y = y1y2 yn, a segmentation of y is a partition of its letters into contiguous blocks of letters, each block corresponding to a word in the segmentation. The total quality of a segmentation is determined by adding up the qualities of each of its blocks. (So we would get the right answer above provided that quality("meet") + quality("at") + quality("eight") was greater than the total quality of any other segmentation of the string.) Give an efficient algorithm that takes a string y and computes a segmentation of maximum total quality. You can treat a single call to the black box computing quality(x) as a single computational step. Prove the correctness of your algorithm and analyze its time complexity.

image text in transcribed

As some of you know well, and others of you may be interested to learn, a number ot languages (including Chinese and Japanese) are written without spaces between the words. Consequently, software that works with text written in these languages must address the word segmentation problem -inferring likely boundaries between consecutive words in the text. If English were written without spaces, the analogous problem would consist of taking a string like "meetateight" and deciding that the best segmentation is "meet at eight" (and not "me et at eight" or "meet ate ight" or any of a huge number of even less plausible alternatives). How could we automate this process? A simple approach that is at least reasonably effective is to find a segmentation that simply max imizes the cumulative "quality" of its individual constituent words. Thus, suppose you are given a black box that, for any string of letters z-zJZ2..Zk, will return a number quality(x). This number can be either positive or negative; larger numbers correspond to more plausible English words. (So quality("me") would be positive while quality("ight") would be negative.) Given a long string of letters y-MW2yn, a segmentation of y is a partition of its letters into contiguous blocks of letters, each block corresponding to a word in the segmentation. The total quality of a segmentation is determined by adding up the qualities of each of its blocks. (So we would get the right answer above provided that quality("meet") +quality("at" quality("eight") was greater than the total quality of any other segmentation of the string.) Give an efficient algorithm that takes a string y and computes a segmentation of maximum total quality. You can treat a single call to the black box computing quality(z) as a single computational step. Prove the correctness of your algorithm and analyze its time complexity. As some of you know well, and others of you may be interested to learn, a number ot languages (including Chinese and Japanese) are written without spaces between the words. Consequently, software that works with text written in these languages must address the word segmentation problem -inferring likely boundaries between consecutive words in the text. If English were written without spaces, the analogous problem would consist of taking a string like "meetateight" and deciding that the best segmentation is "meet at eight" (and not "me et at eight" or "meet ate ight" or any of a huge number of even less plausible alternatives). How could we automate this process? A simple approach that is at least reasonably effective is to find a segmentation that simply max imizes the cumulative "quality" of its individual constituent words. Thus, suppose you are given a black box that, for any string of letters z-zJZ2..Zk, will return a number quality(x). This number can be either positive or negative; larger numbers correspond to more plausible English words. (So quality("me") would be positive while quality("ight") would be negative.) Given a long string of letters y-MW2yn, a segmentation of y is a partition of its letters into contiguous blocks of letters, each block corresponding to a word in the segmentation. The total quality of a segmentation is determined by adding up the qualities of each of its blocks. (So we would get the right answer above provided that quality("meet") +quality("at" quality("eight") was greater than the total quality of any other segmentation of the string.) Give an efficient algorithm that takes a string y and computes a segmentation of maximum total quality. You can treat a single call to the black box computing quality(z) as a single computational step. Prove the correctness of your algorithm and analyze its time complexity

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Probabilistic Databases

Authors: Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch

1st Edition

3031007514, 978-3031007514

More Books

Students also viewed these Databases questions

Question

Explain the steps involved in training programmes.

Answered: 1 week ago

Question

4. Who would lead the group?

Answered: 1 week ago

Question

2. What type of team would you recommend?

Answered: 1 week ago