Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

spaces) in such a way that the resulting phrase makes the most sense. For example, consider the string: breakfastinside If we consider only valid English

image text in transcribed
image text in transcribed
spaces) in such a way that the resulting phrase makes the most sense. For example, consider the string: "breakfastinside" If we consider only valid English words, some of the segmentations include: - "break fast in side" - "breakfast in side" - "break fast inside" However, the most logical way to segment this phrase is "breakfast inside". In natural language processing, one way to tell if a phrase makes sense is by using an n-gram model. In this homework, we will use a bigram model (n=2). Given two words w1 and w2, the bigram function b(w1,w2) gives you a score representing the likelihood that the word w2 appears after w1 in actual text. A lower score means that the likelihood is higher. For example, "red apple" will have a lower score than "dog apple". You might be wondering how does a bigram function know this likelihood. Well, one way is to learn these scores by scanning real-word newspapers, articles, etc! Given a phrase with words [w1,w2,w3,w4], the total bigram score is computed as: b(w0,w1)+b(w1,w2)+b(w2,w3)+b(w3,w4) where w0 is a special token to denote the beginning of the phrase. The bigram function b(w0,w1) can be interpreted as the likelihood that w1 is the first word of a phrase. A lower total score is an indicator of how sensible the whole phrase is. For this homework, assume that: - The only valid English words are "break", "fast", "breakfast", "in", "side", and "inside". Again, assume that w0 is a special token denoting the beginning of the string. - The bigram scores are as follows (the likelihood of the first word being immediately followed by the second word*): - This is fust a simpithed example contanung syntnetic values to ulustrate the inturton behina bagrams. In practice, bigram scores are usually log-likelihoods of actual word order frequencies and contain a lot of decimal values. Given the example above, the score of "breakfast inside" should be 3+3=6, while the score of "breakfast in side" should be 3+1+5=9. 1. Formulate the word segmentation task as a search problem. Make sure that your formulation works not only for the specific example above ("breakfastinside"), but also for any other string (assuming you always have access to a dictionary of valid words and bigram scores). a. What information should a state contain? b. What is the initial state? c. Given a state, how do you get the list of actions? d. Given a state and an action, how do you compute the cost? e. What are the goal states? 2. Draw the entire search tree for the specific problem "breakfastinside" based on the formulation of the search problem. Note that since this is a short string, the search tree should not have too many nodes. If your tree has a depth of more than 7 or has more than 15 nodes, you probably need to improve your formulation of the problem. Clearly label the states, actions, action costs, and the goal state. Make sure the states and actions have descriptive labels. 3. Which algorithm (breadth-first search, depth-first search, uniform-cost search) is most appropriate for finding a solution to this problem? Justify briefly. Again, think about the problem in general and not just the specific example above

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions