Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

(C) In C, write a program Assume that we are implementing a program to help us study similarity among organisms. In order to do this,

(C) In C, write a program

Assume that we are implementing a program to help us study similarity among organisms. In order to do this, we will implement the dynamic programming (DP) version of the LCS (Longest Common Subsequence) algorithm in two different ways and in the same program. The first part of the program will read a file containing a pair of character strings (say, "thetwostrings.txt") corresponding to the gene sequences, compute the LCS and display the original strings and the longest common subsequence of characters to the terminal. This version of the problem will utilize the entire "c" array, but NO "b" array. You must reconstruct the sequence itself from the "c" array. The second part will not make any attempt to recount the actual sub-sequence. It will calculate measures of similarity among an arbitrary number of character strings in a file named "multiplesOfStrings.txt". This second part of the program will utilize an approach of keeping only the 2xM entries needed to compute the maximum value of EACH LCS.

It will produce as output, a table that will look like this:

1 2 3 4 5 6 7 1 - H M D M L D 2 - - H H D D H 3 - - - L M D M 4 - - - - M L M 5 - - - - - M L 6 - - - - - - M 7 - - - - - - -

The 1, 2, 3, ... are labels for the strings, and the upper triangle holds a measure of similarity between pairs of strings {1,2}, {1,3}, {1,4}... The possible entries in the table are: H = high similarity between the strings M = medium similarity between the strings L = low similarity between the strings D = the two strings are dissimilar Similarity definition:

High similarity exists if the length of the shorter string is within 10% of the length of the longer string and the longest common subsequence is at least 80% of the length of the shorter string.

Medium similarity exists if the criteria for High similarity is not met but the length of the shorter string is within 20% of the longer string and the longest common subsequence is 60% of the length of the shorter string.

Low similarity exists if the criteria for Medium similarity is not met but the length of the shorter string is within 40% of the longer string and the longest common subsequence is 50% of the length of the shorter string.

Dissimilar strings are any that meet none of the above criteria. The file of strings will first contain an integer that indicates how many strings are in the file, followed by that number of character strings. Strings will be terminated in the file by newlines. Recognize that you can end up with memory problems and you should not read all the strings into RAM at once. Use a direct access file! We are already economizing on the amount of ram by not using the O(2*m*n) space. The reason to take this approach is that these will be long strings and there might be a lot of them.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Database Systems

Authors: Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V.S. Subrahmanian, Roberto Zicari

1st Edition

155860443X, 978-1558604438

More Books

Students also viewed these Databases questions

Question

In an Excel Pivot Table, how is a Fact/Measure Column repeated?

Answered: 1 week ago