Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

LAB TASK: Implement a program in python that will open a FASTA file, concatenate its multiline sequences into single strings, store them in a dictionary

LAB TASK: Implement a program in python that will open a FASTA file, concatenate its multiline sequences into single strings, store them in a dictionary using the sequence ID from the sequence header (value between the | symbols) as a key, and then print the IDs and sequences as two columns in a new file.

OBJECTIVE(S):

1. Write your code in the block below. Download the file called myoglobin.fasta, and make sure to save it in the same location as your lab task script.

2. Create an empty dictionary to store sequence information.

3. Using the open function, open the FASTA file (myoglobin.fasta).

4. When you find a line beginning with the > character (a header) extract the ID code between the | symbols and start a new dictionary entry using the ID as a key.

5. If a line isnt a header (i.e. it is a sequence), strip off the newline character at the end and append the sequence to a growing string (to the growing sequence that is the dictionary value) stored within the most recent dictionary key.

6. Close the original file.

7. Open a new file for writing, e.g. myoglobin_processed.txt.

8. Loop through the dictionary and write the ID keys and their corresponding sequences to the new file, separating them with a tab (\t) to generate two columns.

9. Close the new file.

10. Run your script. Upload the script and output (myoglobin_processed.txt) for lab credit. Dont forget comments!

Your output for two sequences should look like this (note how the sequence now is a single string):

P02189 MGLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDLKKHGNTVLTALGGILKKKGHHEAELTPLAQSHATKHKIPVKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDMAAKYKELGFQG

P04247 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSEDLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRHSGDFGADAQGAMSKALELFRNDIAAKYKELGFQG

Thank you in advance!! The myoglobin.fasta file contains the following:

>sp|P02192|MYG_BOVIN Myoglobin OS=Bos taurus GN=MB PE=1 SV=3 MGLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASE DLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKIPVKYLEFISDAIIHVLHAKH PSDFGADAQAAMSKALELFRNDMAAQYKVLGFHG >sp|P02189|MYG_PIG Myoglobin OS=Sus scrofa GN=MB PE=1 SV=2 MGLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE DLKKHGNTVLTALGGILKKKGHHEAELTPLAQSHATKHKIPVKYLEFISEAIIQVLQSKH PGDFGADAQGAMSKALELFRNDMAAKYKELGFQG >sp|P02144|MYG_HUMAN Myoglobin OS=Homo sapiens GN=MB PE=1 SV=2 MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASE DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKH PGDFGADAQGAMNKALELFRKDMASNYKELGFQG >sp|P68082|MYG_HORSE Myoglobin OS=Equus caballus GN=MB PE=1 SV=2 MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASE DLKKHGTVVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKH PGDFGADAQGAMTKALELFRNDIAAKYKELGFQG >sp|P04247|MYG_MOUSE Myoglobin OS=Mus musculus GN=Mb PE=1 SV=3 MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNLKSEEDMKGSE DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISEIIIEVLKKRH SGDFGADAQGAMSKALELFRNDIAAKYKELGFQG >sp|P02197|MYG_CHICK Myoglobin OS=Gallus gallus GN=MB PE=1 SV=4 MGLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGLKTPDQMKGSE DLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKIPVKYLEFISEVIIKVIAEKH AADFGADSQAAMKKALELFRNDMASKYKEFGFQG

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Database Experts Guide To SQL

Authors: Frank Lusardi

1st Edition

0070390029, 978-0070390027

More Books

Students also viewed these Databases questions

Question

=+1 Why are local employment laws important to IHRM?

Answered: 1 week ago

Question

=+ Are some laws more important than others? If so, which ones?

Answered: 1 week ago