Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Exercise 3.5.7 from Mining of Massive Datasets: Find the edit distances (using only insertions and deletions) between the following pairs of strings. abccdabc and acbdcab.

Exercise 3.5.7 from Mining of Massive Datasets: Find the edit distances (using only insertions and deletions) between the following pairs of strings. abccdabc and acbdcab.

Exercise 3.5.8 : There are a number of other notions of edit distance available. For instance, we can allow, in addition to insertions and deletions, the following operations:

i. Mutation, where one symbol is replaced by another symbol. Note that a mutation can always be performed by an insertion followed by a deletion, but if we allow mutations, then this change counts for only 1, not 2, when computing the edit distance.

ii. Transposition, where two adjacent symbols have their positions swapped. Like a mutation, we can simulate a transposition by one insertion followed by one deletion, but here we count only 1 for these two steps.

Repeat Exercise 3.5.7 if edit distance is defined to be the number of insertions, deletions, mutations, and transpositions needed to transform one string into another.

please answer 3.5.7 and 3.5.8
image text in transcribed
100 CHAPTER 3. FINDING SIMILAR ITEMS 3.5.5 Edit Distance This distance makes sense when points are strings. The distance between two strings 112 and y=Vi Y m is the smallest number of insertions and deletions of single characters that will convert to y. Example 3.15: The edit distance between the strings I = abcde and y acfdeg is 3. To convert to y: 1. Delete b. 2. Insert f after c. 3. Insert g after e. to y. No sequence of fewer than three insertions and/or deletions will convert Thus, d(x,y) = 3. 0 Another way to define and calculate the edit distance dx,y) is to compute a longest common subsequence (LCS) of r and y. An LCS of u and y is a string that is constructed by deleting positions from x and y, and that is as long as any string that can be constructed that way. The edit distance dz. y) can be calculated as the length of r plus the length of y minus twice the length of their LCS. Example 3.16: The strings r = abcde and y = acfdeg from Example 3.15 have a unique LCS, which is acde. We can be sure it is the longest possible. because it contains every symbol appearing in both x and y. Fortunately, these common symbols appear in the same order in both strings, so we are able to use them all in an LCS. Note that the length of ris 5. the length of y is 6, and the length of their LCS is 4. The edit distance is thus 5+6 - 2 x 4 = 3. which agrees with the direct calculation in Example 3.15. For another example, consider a ba and bab. Their edit distance is 2. For example, we can convert to y by deleting the first a and then inserting b at the end. There are two LCS's ab and ba. Each can be obtained by deleting one symbol from each string. As must be the case for multiple L.CSS of the same pair of strings. both LCS's live the same length. Therefore, we may compute the exit distance is 33-222 Edit distance is a distance measure Surely no edit distance can be negative 100 CHAPTER 3. FINDING SIMILAR ITEMS 3.5.5 Edit Distance This distance makes sense when points are strings. The distance between two strings 112 and y=Vi Y m is the smallest number of insertions and deletions of single characters that will convert to y. Example 3.15: The edit distance between the strings I = abcde and y acfdeg is 3. To convert to y: 1. Delete b. 2. Insert f after c. 3. Insert g after e. to y. No sequence of fewer than three insertions and/or deletions will convert Thus, d(x,y) = 3. 0 Another way to define and calculate the edit distance dx,y) is to compute a longest common subsequence (LCS) of r and y. An LCS of u and y is a string that is constructed by deleting positions from x and y, and that is as long as any string that can be constructed that way. The edit distance dz. y) can be calculated as the length of r plus the length of y minus twice the length of their LCS. Example 3.16: The strings r = abcde and y = acfdeg from Example 3.15 have a unique LCS, which is acde. We can be sure it is the longest possible. because it contains every symbol appearing in both x and y. Fortunately, these common symbols appear in the same order in both strings, so we are able to use them all in an LCS. Note that the length of ris 5. the length of y is 6, and the length of their LCS is 4. The edit distance is thus 5+6 - 2 x 4 = 3. which agrees with the direct calculation in Example 3.15. For another example, consider a ba and bab. Their edit distance is 2. For example, we can convert to y by deleting the first a and then inserting b at the end. There are two LCS's ab and ba. Each can be obtained by deleting one symbol from each string. As must be the case for multiple L.CSS of the same pair of strings. both LCS's live the same length. Therefore, we may compute the exit distance is 33-222 Edit distance is a distance measure Surely no edit distance can be negative

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beyond Big Data Using Social MDM To Drive Deep Customer Insight

Authors: Martin Oberhofer, Eberhard Hechler

1st Edition

0133509796, 9780133509793

More Books

Students also viewed these Databases questions

Question

2. To compare the costs of alternative training programs.

Answered: 1 week ago

Question

1. The evaluation results can be used to change the program.

Answered: 1 week ago