Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Sequence needs to be 1 MB when I tried to put here it throw me an error. Subsequence is 10,240 MB long. It is shorter

image text in transcribed

image text in transcribed

Sequence needs to be 1 MB when I tried to put here it throw me an error.

Subsequence is 10,240 MB long. It is shorter than the sequence but it still throws me error. If needed I can provide it on comments.

A sequence contains characters selected from (A,C,G,T. A short sequence might be something like AAATGCGCGT', for example. Our task is to locate a sequence within another (much larger) sequence. If I search for the sequence "CGCG" in the above, it is a 100% match starting at location five I will have data files that are IMbyte (1,048,576 bytes) long, which is the large string. Let's call this the "sequence". I will have other files that are 10,240 bytes long, which will contain the DNA we wish to locate, which we will call the "subsequence". I will tell you in advance that there is no guarantee of a 100% match on the subsequence What is the starting byte position, from 0 to the 1,038,336th bytel that gives the highest count of matched bytes? The slow method to determine this is a nested loop: for( each starting position 0..1,038,336 for( each byte 0..10,239) if there is a match increment a counter and the largest match wins. Did I mention that this is the slow method? It is an O(N2) algorithm, which is terrible. For our purposes it will work, but real sequence searches use faster algorithms. But that's not our point. We want to execute this in parallel. OK so here's what I want. You must write this program in either "C" or "C++. Divide the work into N processes. Each process will calculate 1/Nth of the work. The value for N will come from the command line, along with the name of the sequence file and the subsequence file, in that order. For example, suppose my program is named findDNA . $ ./findDNA 12 seqfile subseqfile Looking for string using 12 processes. Best match is at positin 123456 with 9876/10240 correct. 1 This is the 1Mbyte minus the 10,240. You can "stop early" for this assignment when you get to this position. As good programmers you should be checking the command line parameters for validity and not just blowing up if things are wrong . Instead ofsing "pthread create" use the "forkO" system c Instead of "pthread joinO" use "waitpidO" o Be very careful about this. For example, how many processes do you think this code starts? pi d [ , i ] fork(); Answer: The parent creates four children. Each child does i++ and since 1

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Accidental Data Scientist

Authors: Amy Affelt

1st Edition

1573877077, 9781573877077

More Books

Students also viewed these Databases questions

Question

What functions might this behavior be serving?

Answered: 1 week ago

Question

What is Larmors formula? Explain with a suitable example.

Answered: 1 week ago

Question

Identify examples of loaded language and ambiguous language.

Answered: 1 week ago