Question
A number of languages (including Chinese and Japanese) are writ- ten without spaces between the words. Consequently, software that works with text written in these
A number of languages (including Chinese and Japanese) are writ- ten without spaces between the words. Consequently, software that works with text written in these languages must address the word segmentation probleminferring likely boundaries between consecutive words in the text. If English were written without spaces, the analogous problem would consist of taking a string like mee- tateight and deciding that the best segmentation is meet at eight (and not me et at eight or meet ate ight or any of a huge number of even less plausible alternatives). How could we automate this process?
A simple approach that is at least reasonably effective is to find a segmentation that simply maximizes the cumulative quality of its individual constituent words. Thus, suppose you are given a black box that, for any
string of letters x = x1x2...xk, we return a number quality(x) in constant time, (1) . This number can either be positive or negative; larger numbers correspond to more plausible English words. (So quality(me11) would
be positive while quality(ight11) would be negative.)
Given a long string of letters y = y1y2...yn, a segmentation of y is a partition of its letters into contiguous blocks of letters, each block corresponding to a word in the segmentation. The total quality of a segmentation
is determined by adding up the qualities of each of its blocks. (So we would need to get the right answer above provided that quality(meet) + quality(at) + quality(eight) was greater than the total quality of any other segmentation of the string.) Give an efficient algorithm that takes a string y and computes a segmentation of maximum total quality. Prove the correctness of your algorithm and analyze its time complexity.
Derive the Bellman equation for this, and prove that it is correct.
[Hints: Remember that you are studying Chapters 9, 12, 14, 15, and 16 during Modules 7 and 8. This problem is solved by something you study in those chapters. You do not need to know the algorithm in the black box.]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started