RT 1: SED - Regular Expressions. a. Write the command to read the first two lines of the pz_CONA.fasta file. Look at these two lines. Notice the first one start with a ">" while the second line contains a sequence. Each sequence in the file has a first "descriptive" line that begins with ">" followed by numerous lines which contain the sequence. b. Read the contents of the file using UNIX. List all the ways you know to read a file. c. How many lines are in the file? d. From the pz_cDNA.fasta file print only those lines that describe the sequences. (recall that description lines in a fasta format begin with ">") e How many sequences are in this file? f. On each line that has a sequence on it, change all to U and store in a new file called Pz_CRNA.fasta. & First identify all the records that have the pattern with an "" in the sequence ID - part of the description ( so it matches PZ21537_A but not PZ832049). h. Then remove the underscore and anything that comes after it so that PZ21537_A becomes PZ21537. 1. There is another file called "pz_cDNA.Stats. First look at the first few lines to understand what each of the columns mean. Then create a new file that does not have the commented lines in it. (a commented line starts with a #) called pz_cDNA.cleanStats 1. From your newly created file (pz_cDNA.cleanStats) look at the 4th column which has the most comon Smer. a. Print out all lines that have TTTTT as a most column Smer b. (Extra credit) Print out all lines that have a Smer of all the same nucleotide (eg AAAAA, CCCCC, GGGGG, TTTTT). k Sort the pz_cDNA.cleanStats file by Length (36 column). RT 1: SED - Regular Expressions. a. Write the command to read the first two lines of the pz_CONA.fasta file. Look at these two lines. Notice the first one start with a ">" while the second line contains a sequence. Each sequence in the file has a first "descriptive" line that begins with ">" followed by numerous lines which contain the sequence. b. Read the contents of the file using UNIX. List all the ways you know to read a file. c. How many lines are in the file? d. From the pz_cDNA.fasta file print only those lines that describe the sequences. (recall that description lines in a fasta format begin with ">") e How many sequences are in this file? f. On each line that has a sequence on it, change all to U and store in a new file called Pz_CRNA.fasta. & First identify all the records that have the pattern with an "" in the sequence ID - part of the description ( so it matches PZ21537_A but not PZ832049). h. Then remove the underscore and anything that comes after it so that PZ21537_A becomes PZ21537. 1. There is another file called "pz_cDNA.Stats. First look at the first few lines to understand what each of the columns mean. Then create a new file that does not have the commented lines in it. (a commented line starts with a #) called pz_cDNA.cleanStats 1. From your newly created file (pz_cDNA.cleanStats) look at the 4th column which has the most comon Smer. a. Print out all lines that have TTTTT as a most column Smer b. (Extra credit) Print out all lines that have a Smer of all the same nucleotide (eg AAAAA, CCCCC, GGGGG, TTTTT). k Sort the pz_cDNA.cleanStats file by Length (36 column)