Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 13, 2024

In last week's homework you wrote a script to read a FASTA file and report some basic statistics. Another important format is the FASTQ format

In last week's homework you wrote a script to read a FASTA file and report some basic statistics. Another important format is the FASTQ format Links to an external site., which stores both the sequence data as well as the quality scores for the nucleotide in the file. Your assignment this week is to expand your script to support both FASTQ and FASTA files. It should be able to detect the file type automatically, either from the file name or file content. FASTQ files typically end in either . fq or . fastq, along with the gzipped variants. In order to test your script, run it on the FASTQ files you download from the Human Microbiome Project using the commands below (they will take some time, these are large files): $ wget http://downloads.hmpdacc.org/data/Illumina/PHASEII/anterior_nares/SRS077085.tar.bz2 $ tar -xjf SRS077085.tar.bz2 For example a sequence read in FASTQ format looks like: @61JCNAAXX100503:5:100:10000:10232/1 CATGTAACATGTTCTATGTCCATAACTCCAGAATCATCAATACTTGATTTCTTCATTAGCATGTTCATAATAAATTCCCTTATTTTAAATGGTTTATAAGA +61JCNAAXX100503:5:100:10000:10232/1 GGGGGGGGGGGGGGGGGGGGGGFGGGGGGFGGGGGGGGGFGFGGGGEGGGGGGGGGFGAGCGFDFEEGEFGGDFEFFEDEE@FFFCCBDFEBCF DEDCE5 Description: Line 1: start with an @ followed by the sequence read identifier and description Line 2: sequence line Line 3: start with a + symbol follow by repeat of read identifier line Line 4: quality line, which should have the same length as the corresponding sequence line 2. If you had troubles with last week's script or would just like a fresh start, you can copy the 'official' solution here and modify it for this assignment: Course site on Canvas -> Modules -> Homework solutions -> M02 Sequence statistics When you turn in your assignment you should include: - Your script, attached as a file - Instructions how to run it - Summary statistics for the downloaded FASTQ files: sequence and nucleotide count, and average sequence length of the sequence reads

Last week's script requeirements: For this assignment, you should write a script which accepts the path to a FASTA file as an argument whose output is a report of a few basic statistics on the sequences found within the file. In this first HW assignment, this report should include the number of sequences found and the total number of residues (bases) that make them up.

For full credit, compress the FASTA file you download like this:

$ gzip CAM_PROJ_SargassoSea.read_pep.fa

Then, your script should detect whether a file is compressed (based on the existence of the '. gz' extension) and open the file appropriately.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Administering Relational Databases On Microsoft Azure A Detail Paradigm To Support Sql On Azure Cloud And Dp 300 Study Guide

Administering Relational Databases On Microsoft Azure A Detail Paradigm To Support Sql On Azure Cloud And Dp 300 Study Guide

Authors: Prashanth Jayaram ,Ahmad Yaseen ,Rajendra Gupta

1st Edition

979-8706128029

More Books

Students also viewed these Databases questions

Question

★★★★★

Polychlorinated dibenzo-p-dioxins, or PCDDs, are highly toxic substances that are present in trace amounts as byproducts of some chemical manufacturing processes. They have been implicated in a...

Answered: 1 week ago

Question

★★★★★

Which of the following defines the view of the operating system seen by most users?

Answered: 1 week ago

Question

★★★★★

Th e summer project proposal is submitted by: (a) the faculty guide (b) the industry guide (c) the institutes director (d) the student

Answered: 1 week ago

Question

★★★★★

a. Mantle Company has been in business several years. At the end of the current year, the unadjusted trial balance shows: Accounts Receivable...................................$310,000 Dr Sales...

Answered: 1 week ago

Question

★★★★★

In last week's homework you wrote a script to read a FASTA file and report some basic statistics. Another important format is the FASTQ format Links to an external site., which stores both the...

Answered: 1 week ago

Question

★★★★★

As noted in the syllabus there are two projects for this course. This handout will provide more information on the group micro project. For this project you will work in groups of 3 individuals....

Answered: 1 week ago

Question

★★★★★

A county government hires lawyers to defend itself in lawsuits. The local government provides its legal staff with an office building. The table below shows how many cases can be handled with...

Answered: 1 week ago

Question

★★★★★

We are at the halfway point of this course and nearly through your time with SDI. It is the perfect time to pause and reflect. Reflect upon the learning experiences you have had so far in your SDI...

Answered: 1 week ago

Question

★★★★★

You are a project manager at Unisa Mining Solutions (UMS) and you are in a fairly good mood. Your firm develops systems to help miming firms reduce their exploration costs, and you have just returned...

Answered: 1 week ago

Question

★★★★★

For the following data, draw (by hand) the histograms for X1 and X2 and the scatterplot for X1 versus X2. Which variable(s) do think is (are) normal? Explain. X1 X2 3.9 2.5 2.7 5.1 3.4 5.6 3.3 7 3.4...

Answered: 1 week ago

Question

★★★★★

A thyroid cancer patient ingests a single dose of radioactive iodine-131 to kill the cancer cells. Iodine-131 has an effective half-life of approximately 7.71 days--some is lost to radioactive decay,...

Answered: 1 week ago

Question

★★★★★

(Appendices) AGING RECEIVABLES AND UNCOLLECTIBLE ACCOUNT EXPENSE. Perkinson Corporation sells paper products to a large number of retailers. Perkinsons accountant has prepared the following aging...

Answered: 1 week ago

Question

★★★★★

(Appendices) SALES RECORDED NET. Using the data in Exercise 6-27, assume that Nevada records sales gross. LO8 REQUIRED: 1. Prepare the entries to record this sale in Nevadas journal. 2. Prepare the...

Answered: 1 week ago

Question

★★★★★

(Appendices) INVENTORY TURNOVER. A recent annual report for The Limited shows cost of goods sold for the year of approximately $5,286 million and average inventory of approximately $769 million...

Answered: 1 week ago

Previous Question Next Question