Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Your task is to code up a simple NGS aligner in Python. The expectation is that you will use straightforward Python logic, but for an

Your task is to code up a simple NGS aligner in Python. The expectation is that you will use straightforward Python logic, but for an extra challenge, you are welcome to code it using more sophisticated implementations to speed up the run time!

You are given

a sample genome sequence, genome.fsa

a fastq file of NGS reads, reads.fastq

Download both of these files from Canvas.

A) Write a program for an aligner that will output a file containing the alignment coordinates of each read in the fastq file within the genome, as prescribed below. Note that reads may align to either the given genomic sequence (the + strand) or its reverse complement (the - strand), and may have up to 2 mismatches. You'll probably want to use BioPython in your program to parse the inputs and for reverse complementing, but the actual logic for the alignments should be implemented using your own code. Each read has a unique identifier (see the NGS lecture notes on FASTQ files or research this online). The output of your program should be a text file containing 4 lines for each read, as follows

The read identifier. Note that BioPython (if you use it) will remove the "@" demarcating character at the start of the ID line.

The coordinates in the genome that the read aligns to. Start counting from 1, like in Rosalind HW Q14!

Whether it aligns to the + or - strand.

The number of mismatches in the alignment.

For example, for the third read in the FASTQ file, your output file should have:

HWI-ST1216:132:c2pb5acxx:7:2115:18814:33730 101-150 - 0 

Check this to make sure!

I have attached a sample of each of the code

>tpg|BK006935.2| [organism=Saccharomyces cerevisiae S288c] [strain=S288c] [moltype=genomic] [chromosome=I] [note=R64-1-1] CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACA CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTT ACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAAC CACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATC CAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATAC TGTTCTTCTACCCACCATATTGAAACGCTAACAAATGATCGTAAATAACACACACGTGCT TACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTCACTTGTATACTGATTT TACGTACGCACACGGATGCTACAGTATATACCATCTCAAACTTACCCTACTCTCAGATTC CACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATG CACGGCAC

@HWI-ST1216:132:c2pb5acxx:7:1210:6768:34415

ACACCCACACACCCACACACCACACCACACACCACACCACACCCACACAC

+

????DDDDDDDDDI@:CEBE)@@DDIIIDCDDCDIIII?DCDCDDCDDDD

@HWI-ST1216:132:c2pb5acxx:7:2315:2096:80538

ACACCCACACACCCACACACCACACCACACACCACACCACACCCACACAC

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2019 Wurzburg Germany September 16 20 2019 Proceedings Part 2 Lnai 11907

Authors: Ulf Brefeld ,Elisa Fromont ,Andreas Hotho ,Arno Knobbe ,Marloes Maathuis ,Celine Robardet

1st Edition

3030461467, 978-3030461461

More Books

Students also viewed these Databases questions

Question

A number increased from 224 to 336. Find the percent of increase.

Answered: 1 week ago