Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Hello, I am stuck on the following assignment. I cannot get the matrix to align. The code I have so far is: #!/usr/bin/python import sys

Hello, I am stuck on the following assignment. I cannot get the matrix to align. The code I have so far is:

#!/usr/bin/python
import sys
import os
#creating filenames
infname = sys.argv[1]
fastaname = infname + ".fasta"
mafftfname = fastaname + ".mafft"
stockname = mafftfname + ".stock"
#simple to fasta
handle = open(infname, "r")
outf = open(fastaname, "w")
for line in handle:
linearr = line.split()
seqid = linearr[0]
seq = linearr[1]
outf.write(">%s %s " % (seqid, seq))
handle.close()
outf.close()
#align using mafft
cmd = "mafft %s > %s" % (fastaname, mafftfname)
sys.stderr.write("command: %s " % cmd)
os.system(cmd)
sys.stderr.write("command done")
#convert fasta maft alignment to stockholm
cmd = "fasta_to_stockholm %s > %s" % (mafftfname,stockname)
sys.stderr.write("command: %s " % cmd)
os.system(cmd)
sys.stderr.write("command done ")
#run quicktree to get distance matrix
cmd = "quicktree -out m %s" % stockname
#sys.stderr.write("command: %s " % cmd)
os.system(cmd)

#sys.stderr.write("command done ")

Instructions:

Develop a miniature bioinformatics analysis 'pipeline.'

To complete the assignment, create an executable Python script:

~/assignments/assignment09/assignment09.py

The script should take exactly one command-line argument, the name of a file containing unaligned DNA sequences in our 'simple' sequence-file format. An example input file is available:

~/assignment09_data.simpleseqs

I would recommend copying this file into your ~/assignments/assignment09 directory, to make developing your script easier.

Your python script should print the pairwise distances among all pairs of sequences in the file to the screen (ie, standard out, or "stdout"). You should not print any other information to standard out, although you may print additional information to the standard error stream "stderr."

If you copy the input file to ~/assignments/assignment09 and run:

./assignment09.py assignment09_data.simpleseqs > test09.out

I would strongly suggest using mafft to align the sequences, using fasta_to_stockholm to convert the mafft alignment to stockholm format, and using quicktree to calculate the distance matrix.

The only requirement is that your program should read the 'simple' sequence file (name provided on command-line) and print the resulting distance matrix to the screen (with nothing else being printed to standard out).

Note that mafft (and perhaps some other programs) does print a lot of information to the standard error stream; this is fine. For this assignment, we only care about standard out.

EDIT: There is no place for me to attach the exmple file. The contents of assignment09_data.simpleseqs are:

Fc_RIG1 GAAAATAAAAAACTGCTCTGCAGAAAGTGCAAAGCCTTTGCATGTTACACTGTTGATATCAGAGTGGTGGAGGAATGCCATTACACTGTGGTTGGAGATGCTTTCAGGAAGTGCTTTGTGAGTAAACTACACCCCAAACCAAAGAGCTTTGGATATTTTGAGAAGAGAGCAAAGATCTTCTGTGCCAGACCAAACTGCAGCCATGACTGGGGAATCCATGTGAAGTATAAGATATTTGAGATTCCAGTTATAAAAATAGAAAGTTTTGTGGTGGAGGATATTGCAACTGGAGCTCAGAAACTATATGCAAAGTGGAAGGACTTTCACTTTGAGAAGATACCATTTGATGCTAAGGAAATG Pt_RIG1 GAAAATAAAAAACTGCTCTGCAGAAAGTGCAAAGCCTTGGCATGTTACACAGCTGACGTAAGAGTGATAGAGGAATGCCATTACACTGTGCTTGGAGATGCTTTTAAGGAATGCTTTGTGAGTAGACCACATCCCAAGCCAAAGCAGTTTTCAAGTTTTGAAAAAAGAGCAAAGATATTCTGTGCCCGACAGAACTGCAGCCATGACTGGGGAATCCATGTGAAGTATAAGACATTTGAGATTCCAGTTATAAAAATTGAAAGTTTTGTGGTGGAGGATATTGCAACTGGAGTTCAGACACTGTACTCGAAGTGGAAGGACTTTCATTTTGAGAAGATACCATTTGATCCAGCAGAAATGTCC Ac_MDA5 ATCAAGTTCCTCTGCAAAAACTGCACTAAGCTGATATGTTCAGGTGAAGATATTGAGGTCATTGAGAATATGCATCATGTCAATGTCAAAAAAGAATTTAAAGGCCTTTATGTTGTAAGAGAAAACAAGACACTGCAAGCAAAAGCCGCAGACTATCAAACAAATGGGGAAGTTATCTGCAAAGATTGTGGACAAGTGTGGGGAAGCATGATGGTACACCGAGGTCTAGACCTGCCTTGCCTAAAAATAAAAAACTTTGTGGTTGTATTCAATGAGAAGAAAACTACCCGAAAGGATATGTGCAAAAAATGGGCAGAGCTGCCCATTAGGTTTCCAGAGTTCAGTTATGCAAATAAT Pt_MDA5 ATAACTTTCCTTTGCAAAAACTGCAGTGTGCTAGCCTGTTCTGGGGAAGATATCCACGTAATTGAGAAAATGCATCACGTCAATATGACCCCAGAATTCAAGGAACTTTACATTGTAAGAGAAAACAAAGCACTGCAAAAGAAGTGTGCCGACTATCAAATAAATGGTGAAATCATCTGCAAATGTGGCCAGGCTTGGGGAACAATGATGGTGCACAAAGGCTTAGATTTGCCTTGTCTCAAAATAAGGAATTTTGTAGTGGTTTTCAAAAATAATTCAACAAAGAAACAATACAAAAAGTGGGTAGAATTACCTATCACATTTCCCAATCTTGACTATTCAGAATGCTGT Ss_LGP2 GAGCAAGTGCAGCTCCTGTGCATCAACTGCATGGTGGCCATGGGCTACGGGAGTGACCTGCGGAAGGTGGAGAGTGCCCACCATGTCAACGTGAACCCCAACTTCAAGATCTACTACAACGTCTCCCAGGAGCCTGTGGTCATTGACAGAGTCTTCAAGGACTGGAGGCCCGGGGGTGTCATTCGCTGCAGGAACTGTGGGGAGAGCTGGGGCATGCAGATAATCTACAAGTCCGTGAAGCTGCCAGTGCTCAAAGTCCGCAGTGTGCTTCTGGAGACGCCCAACGGGCGGATCCAGGTCAAGAAATGGTCCTGCGTGCCCTTCCCGGTGCCTGACTTCGATTACACGCAGTATTGCACCGAG Pa_LGP2 GAGCACGTGCAGCTACTCTGCATCAACTGCATGGTGGCCGTGGGCCACGGCAGCGACCTGCGGAAGGTGGAGGGCACCCACCCATGTCCAACGATCTACTATAATGTCTCCAGGGATCCTGTGGTCATCAACAAAGTCTTCAAGGACTGGAAGCCTGGGGGTGTCATCAGCTGCAGGAACTGTGGGGAGGTCTGGGGTCTGCAGATGATCTACAAGTCAGTGAAGCTGCCAGCGCTCAAAGTCCGCAGCATGCTGCTGGAGACCCCTCAGGGGCGGATCCAGGCCAAAAAGGATATGAAGCGGCCA Fc_LGP2 GAGCAGGTGCAGCTTCTCTGCATCAACTGCATGGTGGCCGTGGGCCACGGGAGTGACCTGCGGAAGGTGGAGGGCGCCCACCACGTCAACGTGAACCCCAACTTCTCGATCTACTACACTGTCTCCCGGGGGCCTGTGGTCATCGACAGAACCTTCAAGGACTGGAAGCCTGGGGGTGCCATTCACTGCAGGAACTGTGGGGAGGCCTGGGGTCTGCAGATGATCTACAAGTCAGTGAAGCTGCCAGCGCTCAAAGTTCGCAGCATGCTTCTAGAGACACCCCAAGGGAGAGTCCAGGCCAAGAAGTGGTCCCGCGTGCCCTTCCTCGTGCCTGACTTTGACTACCTGCAACACTGTACCCAG Mg_LGP2 AAGGAAGCCAGGAGCATGGAGGCCATGCACCACGTGAACATCAACCCCAACTTCAGGTTTTATTATACAGTCTCACCTGGGAAAATACACTTCGAGCGGACGTTCAGGGACTGGGAGCCCGGGTGCCGCATTGTGTGCAGTGAGTGCAGGCAGGAGTGGGGAATGGAGATGATCTATCGGAACGTGACCTTACCCATCCTCAGCATCAAAAACTTTGTGGTGGTGACCCCGGATGAGAAGAAGAAGTACAAGAAGTGGAGCAGAGTGACGTTCCCCATCGAGGAGTTCAGCTACCTGGAGTACTGCTCC Hs_LGP2 GAGCACGTGCAGCTACTCTGCATCAACTGCATGGTGGCTGTGGGCCATGGCAGCGACCTGCGGAAGGTGGAGGGCACCCACCATGTCAATGTGAACCCCAACTTCTCGAACTACTATAATGTCTCCAGGGATCCTGTGGTCATCAACAAAGTCTTCAAGGACTGGAAGCCTGGGGGTGTCATCAGCTGCAGGAACTGTGGGGAGGTCTGGGGTCTGCAGATGATCTACAAGTCAGTGAAGCTGCCAGTGCTCAAAGTCCGCAGCATGCTGCTGGAGACCCCTCAGGGGCGGATCCAGGCCAAAAAGTGGTCCCGCGTGCCCTTCTCCGTGCCTGACTTTGACTTCCTGCAGCATTGTGCCGAG Hs_MDA5 ATAACTTTCCTTTGCAAAAACTGCAGTGTGCTAGCCTGTTCTGGGGAAGATATCCATGTAATTGAGAAAATGCATCACGTCAATATGACCCCAGAATTCAAGGAACTTTACATTGTAAGAGAAAACAAAACACTGCAAAAGAAGTGTGCCGACTATCAAATAAATGGTGAAATCATCTGCAAATGTGGCCAGGCTTGGGGAACAATGATGGTGCACAAAGGCTTAGATTTGCCTTGTCTCAAAATAAGGAATTTTGTAGTGGTTTTCAAAAATAATTCAACAAAGAAACAATACAAAAAGTGGGTAGAATTACCTATCACATTTCCCAATCTTGACTATTCAGAATGCTGT

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

State the importance of motivation

Answered: 1 week ago

Question

Discuss the various steps involved in the process of planning

Answered: 1 week ago

Question

What are the challenges associated with tunneling in urban areas?

Answered: 1 week ago

Question

What are the main differences between rigid and flexible pavements?

Answered: 1 week ago

Question

What is the purpose of a retaining wall, and how is it designed?

Answered: 1 week ago

Question

Prepare for a successful job interview.

Answered: 1 week ago

Question

Describe barriers to effective listening.

Answered: 1 week ago

Question

List the guidelines for effective listening.

Answered: 1 week ago