Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write your a Perl script to simply mask the low-quality positions (nucleotides). Your script will work on any FASTQ file independent of what encoding is

Write your a Perl script to simply mask the low-quality positions (nucleotides). Your script will work on any FASTQ file independent of what encoding is used for quality scores.

Write a Perl script that will mask poor quality regions of sequence stored in FASTQ files. Masked nucleotides are to be changed to 'n'. The script is to accept two command-line arguments: the base ASCII code that is used in the quality score encoding scheme followed by a lower-bound quality threshold. The script will mask any portion of sequence reads in the FASTQ file that have quality less than the threshold. The FASTQ file is provided to the script on standard input (i.e. the script reads from standard input), and it produces the resultant quality-filtered FASTQ file on standard output. Lines 1 and 3 of each FASTQ record are copied from standard input to standard output verbatim.

For example, suppose a BINF 200 student has a FASTQ file named seq1_raw.fastq that is encoded using 33 ('!') as a base. He or she wants to mask all portions that have a quality score less than 24, and store the output in a file named seq1_masked.fastq. That student might use the following command:

./q1.pl 33 24 < seq1_raw.fastq > seq1_masked.fastq

FOR EXAMPLE seq1_raw.fastq

@KXKW7:00006:00042 GCTCGCGGTTACTTTTCTTGGGTTGGTTTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + BCDBBBB=B=BBBBB4BB<@A+8-89>A=B>CDDFCBFFF398@<@<@<@<77-5<44-45 @KXKW7:00011:00047 GCTCGCCATTACTGGTCTTACG + ??BCCC>BB>???A>A??:?65 @KXKW7:00012:00030 GCTCGCGGTTACTACTCTTGGGTTGGTTTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + CCCCCCD?D@CCCDEE@@>@A:@8>-77*7<@@@??7777)7?@?C@C?B<74-4;22*11 @KXKW7:00013:00049 GCCAGCAGCCGCGGTAATACGTAGGGAGCAAAGCGTTGTCCGGAATCATTGGGCGTAAAGCGCGCGTAGGCGGCCTCATAAGTCCGTTGTGAAAGTCAAAAGGCTCAACCATTTGAAAGCCGATGGATACTGTGAGGCTAGAGTCCGGAAGAGGCGAGTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAAGGAGGAACACCAATAGCGAAGGCAGCTCGCTGGGACGGTACTGAGCTAAGGCGC + 25/1978682470/8:50,0====>3::.00&0600(+*+(*(*--++6277.65388.25<@>@@@>;38;1905990-(+/3+/9174//3)206698*94.++4.7(+.7*403+33..1.0068882522564;978867628393.++04222591503.3./2,3-+/4860.2(./44777:85444.+%-,15/5-51:1938885575<26860,0;;:::.948.485:<:7777.2-23+ @KXKW7:00014:00018 GCTCGCGGTTATTACTCTTGGGTTGGTTGATTGACTGGCAGACGGTGGAAGCACT + ACCEEDEEEAEEADCCAA<@@<@=B=A<@8848@@@@>A?7655754-4,44444 @KXKW7:00015:00018 GCTCGCCATTACTGGTTCTCCAGTTTTTTGGACTACTGGGGTCAAGGAACCTGGTCGACG + 8CCBBCD877777&78>?@@77777)7>>8>8>8>874+321111 @KXKW7:00016:00023 GCTCGCGGTTACTACTCTTGGGTTGGATTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + DB@DD999:>>5555)4<<8=:=8<642*1111)11 @KXKW7:00017:00029 GCTCGCCATTTCCCGTACCATGCTGCTTTCGCTCCGTGGTCTTGGGGTGCTGGTGTTATTGACTACTGGGGTCAAGGAACCTGGTCACCGT + KFKDEC?A>A:>>7>@BB?BDFAA@888+6>>88-4<;6;87777-43

AND THE OUTPUT FILE WILL BE seq1_masked.fastq

@KXKW7:00006:00042 GCTCGCGGTTACTTTnCTTGGnnnnGTTTGGACTACTGGGnTnAAGGAACCnnnnCnnnnn + BCDBBBB=B=BBBBB4BB<@A+8-89>A=B>CDDFCBFFF398@<@<@<@<77-5<44-45 @KXKW7:00011:00047 GCTCGCCATTACTGGTCTTAnn + ??BCCC>BB>???A>A??:?65 @KXKW7:00012:00030 GCTCGCGGTTACTACTCTTGGGTnGnnnnnGACTACnnnnnnCAAGGAACCnnnnCnnnnn + CCCCCCD?D@CCCDEE@@>@A:@8>-77*7<@@@??7777)7?@?C@C?B<74-4;22*11 @KXKW7:00013:00049 nnnnGnnnnnnnnnnAnnnnGTAGGnAGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGCGCGTAGnnGnCnnCAnnnnnnnnTnnnnnnnnnnnAnnGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCTnnnnnnnnnAnnnnnnnnnnGnnnnnnnnnnnnnnnnnnnnnnnnnnnnAnnnnnnnnnnnnnnnnnCnAnnnnnnnnGnnnnnnnGCTGGnAnnnnnnTGAnnnnnnnnnn + 25/1978682470/8:50,0====>3::.00&0600(+*+(*(*--++6277.65388.25<@>@@@>;38;1905990-(+/3+/9174//3)206698*94.++4.7(+.7*403+33..1.0068882522564;978867628393.++04222591503.3./2,3-+/4860.2(./44777:85444.+%-,15/5-51:1938885575<26860,0;;:::.948.485:<:7777.2-23+ @KXKW7:00014:00018 GCTCGCGGTTATTACTCTTGGGTTGGTTGnnnnACTGGCAnnnnnnnnnnnnnnn + ACCEEDEEEAEEADCCAA<@@<@=B=A<@8848@@@@>A?7655754-4,44444 @KXKW7:00015:00018 nCTCGCCATTACTGGTTCTCCAnnnnnnnnnACTAnnnnnnnCAnGnAnCnnnnnnnnnn + 8CCBBCD877777&78>?@@77777)7>>8>8>8>874+321111 @KXKW7:00016:00023 GCTCGCGGTTACTACTCTnnnnnnGGATnnnnCTACnnnnnnCAnGGAnCnnnnnnnnnnn + DB@DD999:>>5555)4<<8=:=8<642*1111)11 @KXKW7:00017:00029 GCTCGCCATTTCCnGTACCATGCTGnnnnnGCnnnnTGnTCTnnnnnnGCTGGTGTTATTGACTACTGGGnTCAAGGAnnnnGnnnnnnnn + KFKDEC?A>A:>>7>@BB?BDFAA@888+6>>88-4<;6;87777-43

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Analysis Using SQL And Excel

Authors: Gordon S Linoff

2nd Edition

111902143X, 9781119021438

More Books

Students also viewed these Databases questions