Question
Write your a Perl script to simply mask the low-quality positions (nucleotides). Your script will work on any FASTQ file independent of what encoding is
Write your a Perl script to simply mask the low-quality positions (nucleotides). Your script will work on any FASTQ file independent of what encoding is used for quality scores.
Write a Perl script that will mask poor quality regions of sequence stored in FASTQ files. Masked nucleotides are to be changed to 'n'. The script is to accept two command-line arguments: the base ASCII code that is used in the quality score encoding scheme followed by a lower-bound quality threshold. The script will mask any portion of sequence reads in the FASTQ file that have quality less than the threshold. The FASTQ file is provided to the script on standard input (i.e. the script reads from standard input), and it produces the resultant quality-filtered FASTQ file on standard output. Lines 1 and 3 of each FASTQ record are copied from standard input to standard output verbatim.
For example, suppose a BINF 200 student has a FASTQ file named seq1_raw.fastq that is encoded using 33 ('!') as a base. He or she wants to mask all portions that have a quality score less than 24, and store the output in a file named seq1_masked.fastq. That student might use the following command:
./q1.pl 33 24 < seq1_raw.fastq > seq1_masked.fastq
FOR EXAMPLE seq1_raw.fastq
@KXKW7:00006:00042 GCTCGCGGTTACTTTTCTTGGGTTGGTTTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + BCDBBBB=B=BBBBB4BB<@A+8-89>A=B>CDDFCBFFF398@<@<@<@<77-5<44-45 @KXKW7:00011:00047 GCTCGCCATTACTGGTCTTACG + ??BCCC>BB>???A>A??:?65 @KXKW7:00012:00030 GCTCGCGGTTACTACTCTTGGGTTGGTTTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + CCCCCCD?D@CCCDEE@@>@A:@8>-77*7<@@@??7777)7?@?C@C?B<74-4;22*11 @KXKW7:00013:00049 GCCAGCAGCCGCGGTAATACGTAGGGAGCAAAGCGTTGTCCGGAATCATTGGGCGTAAAGCGCGCGTAGGCGGCCTCATAAGTCCGTTGTGAAAGTCAAAAGGCTCAACCATTTGAAAGCCGATGGATACTGTGAGGCTAGAGTCCGGAAGAGGCGAGTGGAATTCCTGGTGTAGCGGTGGAATGCGCAGATATCAAGGAGGAACACCAATAGCGAAGGCAGCTCGCTGGGACGGTACTGAGCTAAGGCGC + 25/1978682470/8:50,0====>3::.00&0600(+*+(*(*--++6277.65388.25<@>@@@>;38;1905990-(+/3+/9174//3)206698*94.++4.7(+.7*403+33..1.0068882522564;978867628393.++04222591503.3./2,3-+/4860.2(./44777:85444.+%-,15/5-51:1938885575<26860,0;;:::.948.485:<:7777.2-23+ @KXKW7:00014:00018 GCTCGCGGTTATTACTCTTGGGTTGGTTGATTGACTGGCAGACGGTGGAAGCACT + ACCEEDEEEAEEADCCAA<@@<@=B=A<@8848@@@@>A?7655754-4,44444 @KXKW7:00015:00018 GCTCGCCATTACTGGTTCTCCAGTTTTTTGGACTACTGGGGTCAAGGAACCTGGTCGACG + 8CCBBCD877777&78>?@@77777)7>>8>8>8>874+321111 @KXKW7:00016:00023 GCTCGCGGTTACTACTCTTGGGTTGGATTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + DB@DD999:>>5555)4<<8=:=8<642*1111)11 @KXKW7:00017:00029 GCTCGCCATTTCCCGTACCATGCTGCTTTCGCTCCGTGGTCTTGGGGTGCTGGTGTTATTGACTACTGGGGTCAAGGAACCTGGTCACCGT + KFKDEC?A>A:>>7>@BB?BDFAA@888+6>>88-4<;6;87777-43
AND THE OUTPUT FILE WILL BE seq1_masked.fastq
@KXKW7:00006:00042 GCTCGCGGTTACTTTnCTTGGnnnnGTTTGGACTACTGGGnTnAAGGAACCnnnnCnnnnn + BCDBBBB=B=BBBBB4BB<@A+8-89>A=B>CDDFCBFFF398@<@<@<@<77-5<44-45 @KXKW7:00011:00047 GCTCGCCATTACTGGTCTTAnn + ??BCCC>BB>???A>A??:?65 @KXKW7:00012:00030 GCTCGCGGTTACTACTCTTGGGTnGnnnnnGACTACnnnnnnCAAGGAACCnnnnCnnnnn + CCCCCCD?D@CCCDEE@@>@A:@8>-77*7<@@@??7777)7?@?C@C?B<74-4;22*11 @KXKW7:00013:00049 nnnnGnnnnnnnnnnAnnnnGTAGGnAGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGCGCGTAGnnGnCnnCAnnnnnnnnTnnnnnnnnnnnAnnGnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnCTnnnnnnnnnAnnnnnnnnnnGnnnnnnnnnnnnnnnnnnnnnnnnnnnnAnnnnnnnnnnnnnnnnnCnAnnnnnnnnGnnnnnnnGCTGGnAnnnnnnTGAnnnnnnnnnn + 25/1978682470/8:50,0====>3::.00&0600(+*+(*(*--++6277.65388.25<@>@@@>;38;1905990-(+/3+/9174//3)206698*94.++4.7(+.7*403+33..1.0068882522564;978867628393.++04222591503.3./2,3-+/4860.2(./44777:85444.+%-,15/5-51:1938885575<26860,0;;:::.948.485:<:7777.2-23+ @KXKW7:00014:00018 GCTCGCGGTTATTACTCTTGGGTTGGTTGnnnnACTGGCAnnnnnnnnnnnnnnn + ACCEEDEEEAEEADCCAA<@@<@=B=A<@8848@@@@>A?7655754-4,44444 @KXKW7:00015:00018 nCTCGCCATTACTGGTTCTCCAnnnnnnnnnACTAnnnnnnnCAnGnAnCnnnnnnnnnn + 8CCBBCD877777&78>?@@77777)7>>8>8>8>874+321111 @KXKW7:00016:00023 GCTCGCGGTTACTACTCTnnnnnnGGATnnnnCTACnnnnnnCAnGGAnCnnnnnnnnnnn + DB@DD999:>>5555)4<<8=:=8<642*1111)11 @KXKW7:00017:00029 GCTCGCCATTTCCnGTACCATGCTGnnnnnGCnnnnTGnTCTnnnnnnGCTGGTGTTATTGACTACTGGGnTCAAGGAnnnnGnnnnnnnn + KFKDEC?A>A:>>7>@BB?BDFAA@888+6>>88-4<;6;87777-43
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started