Question
Longest Common Substring Humans have 23 pairs of chromosomes, while other primates like chimpanzees have 24 pairs. Biologists claim that human chromosome #2 is a
Longest Common Substring Humans have 23 pairs of chromosomes, while other primates like chimpanzees have 24 pairs. Biologists claim that human chromosome #2 is a fusion of two primate chromosomes that they call 2a and 2b. We wish to verify this claim by locating long nucleotide chains shared between the human and primate chromosomes. We define the longest common substring of two strings to be the longest contiguous string that is a substring of both strings e.g. the longest common substring of DEADBEEF and EA7BEEF is BEEF.1 If there is a tie for longest common substring, we just want to find one of them.
substring1.py :
def longest_substring(s, t): """Finds the longest substring that occurs in both s and t""" best = '' for s_start in range(0, len(s)): for s_end in range(s_start, len(s)+1): for t_start in range(0, len(t)): for t_end in range(t_start, len(t)+1): if s[s_start:s_end] == t[t_start:t_end]: current = s[s_start:s_end] if len(current) > len(best): best = current return best
substring2.py :
def longest_substring(s, t): """Finds the longest substring that occurs in both s and t""" best = '' for length in range(min(len(s),len(t))+1): ans = k_substring(s, t, length) if ans is None: return best best = ans return best
def k_substring(s, t, k): """Finds a substring of length k in both s and t if there is one, and returns it. Otherwise, returns None.""" s_substrings = set() # Put all substrings of s of length k into a set: s_substrings for s_start in range(len(s)-k+1): current = s[s_start : s_start+k] s_substrings.add(current) # For every substring of t of length k, look for it in # s_substrings. If it's there, return it. for t_start in range(len(t)-k+1): current = t[t_start : t_start+k] if current in s_substrings: return current return None substring3.py :
def longest_substring(s, t): """Finds the longest substring that occurs in both s and t""" best = '' return best
def k_substring(s, t, k): """Finds a substring of length k in both s and t if there is one, and returns it. Otherwise, returns None.""" s_substrings = set() # Put all substrings of s of length k into a set: s_substrings for s_start in range(len(s) - k + 1): current = s[s_start:s_start+k] s_substrings.add(current) # For every substring of t of length k, look for it in # s_substrings. If it's there, return it. for t_start in range(len(t)-k+1): current = t[t_start:t_start+k] if current in s_substrings: return current return None
Download ps2-dna.zip from the class website. (a) (1 point) Ben Bitdiddle wrote substring1. py. What is the asymptotic running time of his code? Assume s=t=n. (b) (1 point) Alyssa P Hacker realized that by only comparing substrings of the same length, and by saving substrings in a hash table (in this case, a Python set), she could vastly speed up Ben's code. Alyssa wrote substring2.py. What is the asymptotic running time of her code? (c) (6 points) Recall binary search from Problem Set 1. Using binary search on the length of the string, implement an O(n2logn) solution. You should be able to copy Alyssa's k_substring code without changing it, and just rewrite the outer loop longest_substring. Check that your code is faster than substring2.py for chr2_first_10000 and chr2a_first_10000. Put your solution in substring 3 . py, and submit it to the class website
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started