Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write a python program that will open a BLASTN (nucleotide to nucleotide search) output file, parse out specific information, and produce formatted output that will

Write a python program that will open a BLASTN (nucleotide to nucleotide search) output file, parse out specific information, and produce formatted output that will be written to STDOUT (i.e. Standard Output; the terminal window / command line). Before writing your program, copy the BLASTP output file,/home/jorvis1/example_blast.txtto your home directory. Look through the file and explore the format.

Here are the first 10 alignment hits:

ALIGNMENTS

>ref|XM_005094338.1| PREDICTED: Aplysia californica uncharacterized LOC101860729 (LOC101860729),

mRNA

Length=2377

Score = 1098 bits (594), Expect = 0.0

Identities = 594/594 (100%), Gaps = 0/594 (0%)

>gb|EU829582.1| Linum usitatissimum clone LU0017G02 mRNA sequence

Length=858

Score =375 bits (203), Expect = 2e-100

Identities = 386/476 (81%), Gaps = 6/476 (1%)

Strand=Plus/Plus

Strand=Plus/Plus

>ref|XM_005023737.1| PREDICTED: Anas platyrhynchos ADP-ribosylation factor 1 (ARF1),

mRNA

Length=1939

Score =372 bits (201), Expect = 2e-99

Identities = 401/501 (80%), Gaps = 0/501 (0%)

Strand=Plus/Plus

>ref|XM_004088641.1| PREDICTED: Nomascus leucogenys ADP-ribosylation factor 3 (ARF3),

mRNA

Length=3923

Score =364 bits (197), Expect = 4e-97

Identities = 387/481 (80%), Gaps = 3/481 (1%)

Strand=Plus/Plus

>ref|XM_004088640.1| PREDICTED: Nomascus leucogenys ADP-ribosylation factor 3 (ARF3),

mRNA

Length=3898

Score =364 bits (197), Expect = 4e-97

Identities = 387/481 (80%), Gaps = 3/481 (1%)

Strand=Plus/Plus

>ref|XM_003252207.1| PREDICTED: Nomascus leucogenys ADP-ribosylation factor 3 (ARF3),

mRNA

Length=4128

Score =364 bits (197), Expect = 4e-97

Identities = 387/481 (80%), Gaps = 3/481 (1%)

Strand=Plus/Plus

>gb|EU829048.1| Linum usitatissimum clone LU0031C12 mRNA sequence

Length=750

Score =364 bits (197), Expect = 4e-97

Identities = 384/476 (81%), Gaps = 6/476 (1%)

Strand=Plus/Plus

>ref|NM_001133245.1| Pongo abelii ADP-ribosylation factor 3 (ARF3), mRNA

emb|CR860810.1| Pongo abelii mRNA; cDNA DKFZp469P1914 (from clone DKFZp469P1914)

Length=3605

Score =364 bits (197), Expect = 4e-97

Identities = 387/481 (80%), Gaps = 3/481 (1%)

Strand=Plus/Plus

>ref|XM_003939112.1| PREDICTED: Saimiri boliviensis boliviensis ADP-ribosylation factor

3, transcript variant 3 (ARF3), mRNA

Length=1417

Score =359 bits (194), Expect = 2e-95

Identities = 386/481 (80%), Gaps = 3/481 (1%)

Strand=Plus/Plus

Your program should start by opening the input file (you may hardcode the filename in this case), parsing and storing both the query sequence ID (from near the top of the file; look for the string following "Query=") and the query length (found on the line below the query sequence), and displaying them both to STDOUT. Add some additional characters and formatting to your output such that these two fields appear exactly like this in STDOUT:

Query ID: IREALLYLIKEPYTHON Query Length: 15

Then, it is time to parse information about the significant alignments for this query. Each alignment begins with the ">" symbol. For just thefirst ten hits, parse out only the accession (located between the first set of pipe symbols, | | ), length and score. For each of these hits, thesethree fieldsshould then be written to STDOUT in exactly this format including capitalization, spacing, and punctuation (as shown hereusing the real values for the first hit; study the file to understand exactly where these values came from):

Alignment #1: Accession = ref|XM_005094338.1| (Length = 2377, Score = 1098)

You must use regular expressions to pull out precisely the parts of the file that you want, which is the definition of parsing. Hint: you will very likely need to use parentheses to put some parts of those expressions into temporary memory (m.group(1), etc.) for later use.

Do not have your regular expression search for hardcoded values; your program should be able to read another BLASTN output file and run successfully, not just this specific one.

Pay careful attention to the exact appearance of the sample output, above. Although it is a good start to be able to, at a minimum, report the requested values, your program must also strive to match the formats specified.

Provide the complete source code AND the output of the program as it runs.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Android 4 Application Development

Authors: Reto Meier

3rd Edition

1118223853, 9781118223857

More Books

Students also viewed these Programming questions