Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A number of languages (including Chinese and Japanese) are writ- ten without spaces between the words. Consequently, software that works with text written in these

A number of languages (including Chinese and Japanese) are writ- ten without spaces between the words. Consequently, software that works with text written in these languages must address the word segmentation probleminferring likely boundaries between consecutive words in the text. If English were written without spaces, the analogous problem would consist of taking a string like mee- tateight and deciding that the best segmentation is meet at eight (and not me et at eight or meet ate ight or any of a huge number of even less plausible alternatives). How could we automate this process?

A simple approach that is at least reasonably effective is to find a segmentation that simply maximizes the cumulative quality of its individual constituent words. Thus, suppose you are given a black box that, for any

string of letters x = x1x2...xk, we return a number quality(x) in constant time, (1) . This number can either be positive or negative; larger numbers correspond to more plausible English words. (So quality(me11) would

be positive while quality(ight11) would be negative.)

Given a long string of letters y = y1y2...yn, a segmentation of y is a partition of its letters into contiguous blocks of letters, each block corresponding to a word in the segmentation. The total quality of a segmentation

is determined by adding up the qualities of each of its blocks. (So we would need to get the right answer above provided that quality(meet) + quality(at) + quality(eight) was greater than the total quality of any other segmentation of the string.) Give an efficient algorithm that takes a string y and computes a segmentation of maximum total quality. Prove the correctness of your algorithm and analyze its time complexity.

Derive the Bellman equation for this, and prove that it is correct.

[Hints: Remember that you are studying Chapters 9, 12, 14, 15, and 16 during Modules 7 and 8. This problem is solved by something you study in those chapters. You do not need to know the algorithm in the black box.]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

C++ Database Development

Authors: Al Stevens

1st Edition

1558283579, 978-1558283572

More Books

Students also viewed these Databases questions

Question

What are several features of an effective internal control system?

Answered: 1 week ago