Question
In last week's homework you wrote a script to read a FASTA file and report some basic statistics. Another important format is the FASTQ format
In last week's homework you wrote a script to read a FASTA file and report some basic statistics. Another important format is the FASTQ format Links to an external site., which stores both the sequence data as well as the quality scores for the nucleotide in the file.
Your assignment this week is to expand your script to support both FASTQ and FASTA files. It should be able to detect the file type automatically, either from the file name or file content. FASTQ files typically end in either . fq or . fastq, along with the gzipped variants.
In order to test your script, run it on the FASTQ files you download from the Human Microbiome Project using the commands below (they will take some time, these are large files):
$ wget http://downloads.hmpdacc.org/data/Illumina/PHASEII/anterior_nares/SRS077085.tar.bz2
$ tar -xjf SRS077085.tar.bz2
For example a sequence read in FASTQ format looks like:
@61JCNAAXX100503:5:100:10000:10232/1 CATGTAACATGTTCTATGTCCATAACTCCAGAATCATCAATACTTGATTTCTTCATTAGCATGTTCATAATAAATTCCCTTATTTTAAATGGTTTATAAGA +61JCNAAXX100503:5:100:10000:10232/1 GGGGGGGGGGGGGGGGGGGGGGFGGGGGGFGGGGGGGGGFGFGGGGEGGGGGGGGGFGAGCGFDFEEGEFGGDFEFFEDEE@FFFCCBDFEBCF DEDCE5
Description:
Line 1: start with an @ followed by the sequence read identifier and description
Line 2: sequence line
Line 3: start with a + symbol follow by repeat of read identifier line
Line 4: quality line, which should have the same length as the corresponding sequence line 2.
If you had troubles with last week's script or would just like a fresh start, you can copy the 'official' solution here and modify it for this assignment:
Course site on Canvas -> Modules -> Homework solutions -> M02 Sequence statistics
When you turn in your assignment you should include:
- Your script, attached as a file
- Instructions how to run it
- Summary statistics for the downloaded FASTQ files: sequence and nucleotide count, and average sequence length of the sequence reads
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started