Question
Your task is to code up a simple NGS aligner in Python. The expectation is that you will use straightforward Python logic, but for an
Your task is to code up a simple NGS aligner in Python. The expectation is that you will use straightforward Python logic, but for an extra challenge, you are welcome to code it using more sophisticated implementations to speed up the run time!
You are given
a sample genome sequence, genome.fsa
a fastq file of NGS reads, reads.fastq
Download both of these files from Canvas.
A) Write a program for an aligner that will output a file containing the alignment coordinates of each read in the fastq file within the genome, as prescribed below. Note that reads may align to either the given genomic sequence (the + strand) or its reverse complement (the - strand), and may have up to 2 mismatches. You'll probably want to use BioPython in your program to parse the inputs and for reverse complementing, but the actual logic for the alignments should be implemented using your own code. Each read has a unique identifier (see the NGS lecture notes on FASTQ files or research this online). The output of your program should be a text file containing 4 lines for each read, as follows
The read identifier. Note that BioPython (if you use it) will remove the "@" demarcating character at the start of the ID line.
The coordinates in the genome that the read aligns to. Start counting from 1, like in Rosalind HW Q14!
Whether it aligns to the + or - strand.
The number of mismatches in the alignment.
For example, for the third read in the FASTQ file, your output file should have:
HWI-ST1216:132:c2pb5acxx:7:2115:18814:33730 101-150 - 0
Check this to make sure!
I have attached a sample of each of the code
>tpg|BK006935.2| [organism=Saccharomyces cerevisiae S288c] [strain=S288c] [moltype=genomic] [chromosome=I] [note=R64-1-1] CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACA CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTT ACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAAC CACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATC CAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATAC TGTTCTTCTACCCACCATATTGAAACGCTAACAAATGATCGTAAATAACACACACGTGCT TACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTCACTTGTATACTGATTT TACGTACGCACACGGATGCTACAGTATATACCATCTCAAACTTACCCTACTCTCAGATTC CACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATG CACGGCAC
@HWI-ST1216:132:c2pb5acxx:7:1210:6768:34415
ACACCCACACACCCACACACCACACCACACACCACACCACACCCACACAC
+
????DDDDDDDDDI@:CEBE)@@DDIIIDCDDCDIIII?DCDCDDCDDDD
@HWI-ST1216:132:c2pb5acxx:7:2315:2096:80538
ACACCCACACACCCACACACCACACCACACACCACACCACACCCACACAC
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started