Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

Assume that we have one long piece of genomic DNA that contains a single pathogenicity island somewhere within it. That DNA sequence can be chunked

Assume that we have one long piece of genomic DNA that contains a single pathogenicity island somewhere within it. That DNA sequence can be "chunked" into pieces and the GC content of each piece can be computed separately. By seeing which pieces have relatively high or low GC content, you'll be able to infer roughly where the pathogenicity island resides. To get started, download the following file, and put it in the same folder as your Python file: salGenomicRegion.py Next, at the top of your file, include the line: from salGenomicRegion import * This will import a DNA string called salDNA into your environment. This string comes from a larger region surrounding Salmonella pathogenicity island 1. Your task is to write a function called gcWin(DNA,stepLen) that takes as input a DNA string and does the following: for each piece of DNA of length stepLen, the GC content is calculated (using your existing gcContent function) and that value is printed (using print rather than return). For example, imagine that our DNA string has length 13 and the stepLen is 5. Then the first region is the region DNA[0:5], the next one is DNA[5:10], and the last one is DNA[10:15]. Notice that the last slice may not be of length stepLen because of the length of the DNA sequence, but that's fine. Recall that if the string has length 13 then DNA[10:15] will simply provide a short slice. We can use a for loop to go over the DNA such that after each iteration of our loop we skip forward stepLen bases. The following syntax of the range function will be useful here: >>> range(0,100,10) [0, 10, 20, 30, 40, 50, 60, 70, 80, 90] Note how the last number we gave the range function told it how big a step to use. You can loop over the DNA in this way, slicing out each region of size stepLen in turn. For each slice, calculate the GC content by calling gcContent(DNA), and print the result to the screen. An example of gcWin at work: >>> print ("gcWin of the string 'ATCCGAGGTC' with a steplen of 2 is:"), gcWin('ATCCGAGGTC',2) 0.0 1.0 0.5 1.0 0.5 One you have written this function, run it on salDNA with stepLen set to 10,000. Based on the results, where do you think the pathogenicity island begins and ends in this region? Include your answer as a comment at the top of your file. def gcContent(DNA): '''This function calculates the proportion of bases that are G's and C's of a DNA string to be used to calculate the GC content in pathogenecity islands''' gcCounter=0 lengthDNA=len(DNA) for nuc in DNA: if nuc=='G' or nuc=='C': gcCounter=gcCounter+1 #increment the gcCounter return (float(gcCounter)/lengthDNA)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started