Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Inputs are the oriC_nl.txt, and outputs are the command needed for each question. There are no more information to give, the assignment looks exactly like

Inputs are the oriC_nl.txt, and outputs are the command needed for each question.

There are no more information to give, the assignment looks exactly like the wording as shown.

write Python command for each question a,b,c,d

A typical kind of sequence question might be: In prokaryotes DnaA is a protein that activates initiation of DNA replication. There are multiple DnaA binding sites and they are typically 9-basepair repeats upstream of the oriC site. There is also a DNA Unwinding Element (DUE), which is a tandem array of three 13-basepair AT-rich sequences. Given the oriC sequence of a prokaryote, find potential DnaA boxes and the DUE.

The file oriC_nl.txt contains the 540 bases in the oriC region of Vibrio cholera broken up in 20bp chunks, with newline characters at the end of each line. Write programs to do the following:

a. Reverse Complement:

Input: The oriC_nl.txt file

Output: The complementary strand, written both 5' to 3' and 3' to 5'

Bonus: Use dictionaries and the get function.

b. Sequence Frequency: Find the most frequent k-mers in a string.

Input: The file oriC_nl.txt and an integer k

Output: The most frequent k-mers in the input file

c. Pattern Matching: Find all occurrences of a pattern in a string.

Input: The file oriC_nl.txt and a Pattern string

Output: All starting positions where Pattern appears as a substring in the file.

d. Sequence Frequency with Gaps: Find the most frequent k-mers in a string with one allowed mismatch.

Input: The file oriC_nl.txt and an integer k

Output: The most frequent k-mer consensus sequences in the input file

The oriC region of Vibrio cholera: -> oriC_nl.txt

atcaatgatc aacgtaagct tctaagcatg atcaaggtgc tcacacagtt tatccacaac ctgagtggat gacatcaaga taggtcgttg tatctccttc ctctcgtact ctcatgacca cggaaagatg atcaagagag gatgatttct tggccatatc gcaatgaata cttgtgactt gtgcttccaa ttgacatctt cagcgccata ttgcgctggc caaggtgacg gagcgggatt acgaaagcat gatcatggct gttgttctgt ttatcttgtt ttgactgaga cttgttagga tagacggttt ttcatcactg actagccaaa gccttactct gcctgacatc gaccgtaaat tgataatgaa tttacatgct tccgcgacga tttacctctt gatcatcgat ccgattgaag atcttcaatt gttaattctc ttgcctcgac tcatagccat gatgagctct tgatcatgtt tccttaaccc tctatttttt acggaagaat gatcaagctg ctgctcttga tcatcgtttcimage text in transcribed

main.py oriC_nl.tx sequence-file open('oric.nl.txt','r') 2 int(raw_input ("Size k-mer to search for? ")) kmer num-mismatch of = int (raw_input("Number of mismatches tolerated in the kmer? ") = "" 6 sequence = 7-for line in sequence_file: sequence +-...join(line.split()).lower() 10 main.py oriC_nl.tx sequence-file open('oric.nl.txt','r') 2 int(raw_input ("Size k-mer to search for? ")) kmer num-mismatch of = int (raw_input("Number of mismatches tolerated in the kmer? ") = "" 6 sequence = 7-for line in sequence_file: sequence +-...join(line.split()).lower() 10

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

Describe the advantages and disadvantages of a corporation.

Answered: 1 week ago

Question

8. Demonstrate aspects of assessing group performance

Answered: 1 week ago