Question

1 Approved Answer

Posted on Sep 25, 2024

Replace all #TODO in the scripts with required Coding Spec - Translating the first ORF Name of script: translate_first_orf.py Functions: find_first_orf: takes an RNA sequence

Replace all #TODO in the scripts with required

Coding Spec - Translating the first ORF Name of script: translate_first_orf.py Functions: find_first_orf: takes an RNA sequence and returns the first complete ORF as a Seq object ORFs are defined here as in the practice sections by having an "AUG" start codon, some iteration of codons containing A, C, U, and G, ending in one of the following stop codons (UAA, UAG, or UGA), and having no internal stop codons translate_first_orf: takes a DNA sequence, transcribes it into RNA, finds the first ORF, translates said ORF into a protein, and returns that protein if __name__ == "__main__": Use this area to handle user input, open FASTA files, decide which FASTA entries to process, and print the results to the command line. Output should follow the format of: :\t

Manual Tests Make sure you can provide the Drosophila genome FASTA file ("/work/courses/BINF6308/data_BINF6308/Module4/dmel-all-chromosome-r6.17.fasta") and get the following result: 2L: MHDRGSRTDI* 2R: MSF* 3L: MIAYARVVPTYCAL* 3R: MGPSTEPSTEPSTGPVRDQYGTSTGPVRDQYGTSTGPVRDQYGTSTGPVRDQYGTSTGPSTEPSTGPVRDQYGTSTGPVRD* 4: MNGIIIGNSTIFNNLYQSTNLVNALLIYLT*

Automated Tests You can find these in your repo for this week, but you can see them here as well: test_translate_first_orf.py

#!user/bin/env python3 """Test behavior of translate_first_orf.py"""

from translate_first_orf import find_first_orf from translate_first_orf import translate_first_orf from Bio.Seq import Seq

def test_short_orf(): """Identify short ORF""" assert find_first_orf("AUGCCCUAG") == "AUGCCCUAG", "expect three codon ORF"

def test_orf_in_orf(): """Identify first ORF when two present""" assert find_first_orf("AUGCUGUAACUGUAG") == "AUGCUGUAA", "expect first complete ORF"

def test_missing_stop_codon(): """Identify no ORF when missing stop codon""" assert find_first_orf("AUGCUG") == "", "expect no ORF in AUGCUG - lacks stop codon"

def test_out_of_frame_stop(): """Identify no ORF when stop codon is out of frame""" assert find_first_orf("AUGAUAA") == "", "expect no ORF in AUGAUAA - stop codon out of frame"

def test_dna_sequence(): """Identify protein sequence within DNA""" assert translate_first_orf(Seq("AAATGCCCTAG")) == "MP*", "expect MP protein within sequence"

NEXT : Templates You can find this template in your repo for this week, but you can see it here as well: assignment-4-template.py

#!/usr/bin/env python """TODO: Say what the code does

TODO: Elaborate on what the code does """

import argparse #TODO import other libraries needed

def get_args(): """Return parsed command-line arguments."""

parser = argparse.ArgumentParser( description="TODO: say what the script does.", formatter_class=argparse.ArgumentDefaultsHelpFormatter)

# get the FASTA file of sequences parser.add_argument('filename', # variable to access this data later: args.filename metavar='FASTA', # shorthand to represent the input value help='Provide name and path to FASTA file to process.', # message to the user, it goes into the help menu type=str) parser.add_argument('-p', '--pattern', # access with args.pattern help='Provide a regex pattern for filtering FASTA entries', default='^\d{1}\D*$') # default works for Drosophila chromosomes

return(parser.parse_args())

def find_first_orf(rna): """Return first open-reading frame of RNA sequence as a Bio.Seq object.

Must start with AUG Must end with UAA, UAG, or UGA Must have even multiple of 3 RNA bases between """ try: #TODO update regex to find the ORF orf = re.search('TODO: your regex', str(rna)).group() except AttributeError: # if no match found, orf should be empty orf = "" return(Seq(orf))

def translate_first_orf(dna): """TODO: what it does

Assumes input sequences is a Bio.Seq object. """

# TODO: transcribe the DNA, find the first ORF, translate said ORF

return(translated_orf)

if __name__ == "__main__": #TODO: get command-line arguments

#TODO: use SeqIO to get the records in the fasta file provided by the command-line input #TODO: if the FASTA record's ID matches the regex pattern, # then print out its record ID then a tab space then the translated first ORF