Question
Hello, I am stuck on the following assignment. I cannot get the matrix to align. The code I have so far is: #!/usr/bin/python import sys
Hello, I am stuck on the following assignment. I cannot get the matrix to align. The code I have so far is:
#!/usr/bin/python | |
import sys | |
import os | |
#creating filenames | |
infname = sys.argv[1] | |
fastaname = infname + ".fasta" | |
mafftfname = fastaname + ".mafft" | |
stockname = mafftfname + ".stock" | |
#simple to fasta | |
handle = open(infname, "r") | |
outf = open(fastaname, "w") | |
for line in handle: | |
linearr = line.split() | |
seqid = linearr[0] | |
seq = linearr[1] | |
outf.write(">%s %s " % (seqid, seq)) | |
handle.close() | |
outf.close() | |
#align using mafft | |
cmd = "mafft %s > %s" % (fastaname, mafftfname) | |
sys.stderr.write("command: %s " % cmd) | |
os.system(cmd) | |
sys.stderr.write("command done") | |
#convert fasta maft alignment to stockholm | |
cmd = "fasta_to_stockholm %s > %s" % (mafftfname,stockname) | |
sys.stderr.write("command: %s " % cmd) | |
os.system(cmd) | |
sys.stderr.write("command done ") | |
#run quicktree to get distance matrix | |
cmd = "quicktree -out m %s" % stockname | |
#sys.stderr.write("command: %s " % cmd) | |
os.system(cmd) | |
#sys.stderr.write("command done ") |
Instructions:
Develop a miniature bioinformatics analysis 'pipeline.'
To complete the assignment, create an executable Python script:
~/assignments/assignment09/assignment09.py
The script should take exactly one command-line argument, the name of a file containing unaligned DNA sequences in our 'simple' sequence-file format. An example input file is available:
~/assignment09_data.simpleseqs
I would recommend copying this file into your ~/assignments/assignment09 directory, to make developing your script easier.
Your python script should print the pairwise distances among all pairs of sequences in the file to the screen (ie, standard out, or "stdout"). You should not print any other information to standard out, although you may print additional information to the standard error stream "stderr."
If you copy the input file to ~/assignments/assignment09 and run:
./assignment09.py assignment09_data.simpleseqs > test09.out
I would strongly suggest using mafft to align the sequences, using fasta_to_stockholm to convert the mafft alignment to stockholm format, and using quicktree to calculate the distance matrix.
The only requirement is that your program should read the 'simple' sequence file (name provided on command-line) and print the resulting distance matrix to the screen (with nothing else being printed to standard out).
Note that mafft (and perhaps some other programs) does print a lot of information to the standard error stream; this is fine. For this assignment, we only care about standard out.
EDIT: There is no place for me to attach the exmple file. The contents of assignment09_data.simpleseqs are:
Fc_RIG1 GAAAATAAAAAACTGCTCTGCAGAAAGTGCAAAGCCTTTGCATGTTACACTGTTGATATCAGAGTGGTGGAGGAATGCCATTACACTGTGGTTGGAGATGCTTTCAGGAAGTGCTTTGTGAGTAAACTACACCCCAAACCAAAGAGCTTTGGATATTTTGAGAAGAGAGCAAAGATCTTCTGTGCCAGACCAAACTGCAGCCATGACTGGGGAATCCATGTGAAGTATAAGATATTTGAGATTCCAGTTATAAAAATAGAAAGTTTTGTGGTGGAGGATATTGCAACTGGAGCTCAGAAACTATATGCAAAGTGGAAGGACTTTCACTTTGAGAAGATACCATTTGATGCTAAGGAAATG Pt_RIG1 GAAAATAAAAAACTGCTCTGCAGAAAGTGCAAAGCCTTGGCATGTTACACAGCTGACGTAAGAGTGATAGAGGAATGCCATTACACTGTGCTTGGAGATGCTTTTAAGGAATGCTTTGTGAGTAGACCACATCCCAAGCCAAAGCAGTTTTCAAGTTTTGAAAAAAGAGCAAAGATATTCTGTGCCCGACAGAACTGCAGCCATGACTGGGGAATCCATGTGAAGTATAAGACATTTGAGATTCCAGTTATAAAAATTGAAAGTTTTGTGGTGGAGGATATTGCAACTGGAGTTCAGACACTGTACTCGAAGTGGAAGGACTTTCATTTTGAGAAGATACCATTTGATCCAGCAGAAATGTCC Ac_MDA5 ATCAAGTTCCTCTGCAAAAACTGCACTAAGCTGATATGTTCAGGTGAAGATATTGAGGTCATTGAGAATATGCATCATGTCAATGTCAAAAAAGAATTTAAAGGCCTTTATGTTGTAAGAGAAAACAAGACACTGCAAGCAAAAGCCGCAGACTATCAAACAAATGGGGAAGTTATCTGCAAAGATTGTGGACAAGTGTGGGGAAGCATGATGGTACACCGAGGTCTAGACCTGCCTTGCCTAAAAATAAAAAACTTTGTGGTTGTATTCAATGAGAAGAAAACTACCCGAAAGGATATGTGCAAAAAATGGGCAGAGCTGCCCATTAGGTTTCCAGAGTTCAGTTATGCAAATAAT Pt_MDA5 ATAACTTTCCTTTGCAAAAACTGCAGTGTGCTAGCCTGTTCTGGGGAAGATATCCACGTAATTGAGAAAATGCATCACGTCAATATGACCCCAGAATTCAAGGAACTTTACATTGTAAGAGAAAACAAAGCACTGCAAAAGAAGTGTGCCGACTATCAAATAAATGGTGAAATCATCTGCAAATGTGGCCAGGCTTGGGGAACAATGATGGTGCACAAAGGCTTAGATTTGCCTTGTCTCAAAATAAGGAATTTTGTAGTGGTTTTCAAAAATAATTCAACAAAGAAACAATACAAAAAGTGGGTAGAATTACCTATCACATTTCCCAATCTTGACTATTCAGAATGCTGT Ss_LGP2 GAGCAAGTGCAGCTCCTGTGCATCAACTGCATGGTGGCCATGGGCTACGGGAGTGACCTGCGGAAGGTGGAGAGTGCCCACCATGTCAACGTGAACCCCAACTTCAAGATCTACTACAACGTCTCCCAGGAGCCTGTGGTCATTGACAGAGTCTTCAAGGACTGGAGGCCCGGGGGTGTCATTCGCTGCAGGAACTGTGGGGAGAGCTGGGGCATGCAGATAATCTACAAGTCCGTGAAGCTGCCAGTGCTCAAAGTCCGCAGTGTGCTTCTGGAGACGCCCAACGGGCGGATCCAGGTCAAGAAATGGTCCTGCGTGCCCTTCCCGGTGCCTGACTTCGATTACACGCAGTATTGCACCGAG Pa_LGP2 GAGCACGTGCAGCTACTCTGCATCAACTGCATGGTGGCCGTGGGCCACGGCAGCGACCTGCGGAAGGTGGAGGGCACCCACCCATGTCCAACGATCTACTATAATGTCTCCAGGGATCCTGTGGTCATCAACAAAGTCTTCAAGGACTGGAAGCCTGGGGGTGTCATCAGCTGCAGGAACTGTGGGGAGGTCTGGGGTCTGCAGATGATCTACAAGTCAGTGAAGCTGCCAGCGCTCAAAGTCCGCAGCATGCTGCTGGAGACCCCTCAGGGGCGGATCCAGGCCAAAAAGGATATGAAGCGGCCA Fc_LGP2 GAGCAGGTGCAGCTTCTCTGCATCAACTGCATGGTGGCCGTGGGCCACGGGAGTGACCTGCGGAAGGTGGAGGGCGCCCACCACGTCAACGTGAACCCCAACTTCTCGATCTACTACACTGTCTCCCGGGGGCCTGTGGTCATCGACAGAACCTTCAAGGACTGGAAGCCTGGGGGTGCCATTCACTGCAGGAACTGTGGGGAGGCCTGGGGTCTGCAGATGATCTACAAGTCAGTGAAGCTGCCAGCGCTCAAAGTTCGCAGCATGCTTCTAGAGACACCCCAAGGGAGAGTCCAGGCCAAGAAGTGGTCCCGCGTGCCCTTCCTCGTGCCTGACTTTGACTACCTGCAACACTGTACCCAG Mg_LGP2 AAGGAAGCCAGGAGCATGGAGGCCATGCACCACGTGAACATCAACCCCAACTTCAGGTTTTATTATACAGTCTCACCTGGGAAAATACACTTCGAGCGGACGTTCAGGGACTGGGAGCCCGGGTGCCGCATTGTGTGCAGTGAGTGCAGGCAGGAGTGGGGAATGGAGATGATCTATCGGAACGTGACCTTACCCATCCTCAGCATCAAAAACTTTGTGGTGGTGACCCCGGATGAGAAGAAGAAGTACAAGAAGTGGAGCAGAGTGACGTTCCCCATCGAGGAGTTCAGCTACCTGGAGTACTGCTCC Hs_LGP2 GAGCACGTGCAGCTACTCTGCATCAACTGCATGGTGGCTGTGGGCCATGGCAGCGACCTGCGGAAGGTGGAGGGCACCCACCATGTCAATGTGAACCCCAACTTCTCGAACTACTATAATGTCTCCAGGGATCCTGTGGTCATCAACAAAGTCTTCAAGGACTGGAAGCCTGGGGGTGTCATCAGCTGCAGGAACTGTGGGGAGGTCTGGGGTCTGCAGATGATCTACAAGTCAGTGAAGCTGCCAGTGCTCAAAGTCCGCAGCATGCTGCTGGAGACCCCTCAGGGGCGGATCCAGGCCAAAAAGTGGTCCCGCGTGCCCTTCTCCGTGCCTGACTTTGACTTCCTGCAGCATTGTGCCGAG Hs_MDA5 ATAACTTTCCTTTGCAAAAACTGCAGTGTGCTAGCCTGTTCTGGGGAAGATATCCATGTAATTGAGAAAATGCATCACGTCAATATGACCCCAGAATTCAAGGAACTTTACATTGTAAGAGAAAACAAAACACTGCAAAAGAAGTGTGCCGACTATCAAATAAATGGTGAAATCATCTGCAAATGTGGCCAGGCTTGGGGAACAATGATGGTGCACAAAGGCTTAGATTTGCCTTGTCTCAAAATAAGGAATTTTGTAGTGGTTTTCAAAAATAATTCAACAAAGAAACAATACAAAAAGTGGGTAGAATTACCTATCACATTTCCCAATCTTGACTATTCAGAATGCTGT
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started