RT 1 SED Regular Expressions a Write the command to read the first two lines of the pz CONA fasta file Look at these two lines Notice the first one start with a while the second line contains a sequence Each sequence in the file has a first descriptive line that begins with followed by numerous lines which contain the sequence b Read the contents of the file using UNIX List all the ways you know to read a file c How many lines are in the file d From the pz cDNA fasta file print only those lines that describe the sequences (recall that description lines in a fasta format begin with ) e How many sequences are in this file f On each line that has a sequence on it, change all to U and store in a new file called Pz CRNA fasta First identify all the records that have the pattern with an in the sequence ID part of the description ( so it matches PZ21537 A but not PZ832049) h Then remove the underscore and anything that comes after it so that PZ21537 A becomes PZ21537 1 There is another file called pz cDNA Stats First look at the first few lines to understand what each of the columns mean Then create a new file that does not have the commented lines in it (a commented line starts with a ) called pz cDNA cleanStats 1 From your newly created file (pz cDNA cleanStats) look at the 4th column which has the most comon Smer a Print out all lines that have TTTTT as a most column Smer b (Extra credit) Print out all lines that have a Smer of all the same nucleotide (eg AAAAA, CCCCC, GGGGG, TTTTT) k Sort the pz cDNA cleanStats file by Length (36 column) RT 1 SED Regular Expressions a Write the command to read the first two lines of the pz CONA fasta file Look at these two lines Notice the first one start with a while the second line contains a sequence Each sequence in the file has a first descriptive line that begins with followed by numerous lines which contain the sequence b Read the contents of the file using UNIX List all the ways you know to read a file c How many lines are in the file d From the pz cDNA fasta file print only those lines that describe the sequences (recall that description lines in a fasta format begin with ) e How many sequences are in this file f On each line that has a sequence on it, change all to U and store in a new file called Pz CRNA fasta First identify all the records that have the pattern with an in the sequence ID part of the description ( so it matches PZ21537 A but not PZ832049) h Then remove the underscore and anything that comes after it so that PZ21537 A becomes PZ21537 1 There is another file called pz cDNA Stats First look at the first few lines to understand what each of the columns mean Then create a new file that does not have the commented lines in it (a commented line starts with a ) called pz cDNA cleanStats 1 From your newly created file (pz cDNA cleanStats) look at the 4th column which has the most comon Smer a Print out all lines that have TTTTT as a most column Smer b (Extra credit) Print out all lines that have a Smer of all the same nucleotide (eg AAAAA, CCCCC, GGGGG, TTTTT) k Sort the pz cDNA cleanStats file by Length (36 column)

Question

RT 1  SED   Regular Expressions  a  Write the command to read the first two lines of the pz CONA fasta file  Look at these two lines  Notice the first one start with a     while the second line contains a sequence  Each sequence in the file has a first  descriptive  line that begins with     followed by numerous lines which contain the sequence  b  Read the contents of the file using UNIX  List all the ways you know to read a file  c  How many lines are in the file  d  From the pz cDNA fasta file print only those lines that describe the sequences  (recall that description lines in a fasta format begin with    ) e How many sequences are in this file  f  On each line that has a sequence on it, change all to U and store in a new file called Pz CRNA fasta    First identify all the records that have the pattern with an    in the sequence ID   part of the description ( so it matches PZ21537 A but not PZ832049)  h  Then remove the underscore and anything that comes after it so that PZ21537 A becomes PZ21537  1  There is another file called  pz cDNA Stats  First look at the first few lines to understand what each of the columns mean  Then create a new file that does not have the commented lines in it  (a commented line starts with a  ) called pz cDNA cleanStats 1  From your newly created file (pz cDNA cleanStats) look at the 4th column which has the most comon Smer  a  Print out all lines that have TTTTT as a most column Smer b  (Extra credit) Print out all lines that have a Smer of all the same nucleotide (eg AAAAA, CCCCC, GGGGG, TTTTT)  k Sort the pz cDNA cleanStats file by Length (36 column)  RT 1  SED   Regular Expressions  a  Write the command to read the first two lines of the pz CONA fasta file  Look at these two lines  Notice the first one start with a     while the second line contains a sequence  Each sequence in the file has a first  descriptive  line that begins with     followed by numerous lines which contain the sequence  b  Read the contents of the file using UNIX  List all the ways you know to read a file  c  How many lines are in the file  d  From the pz cDNA fasta file print only those lines that describe the sequences  (recall that description lines in a fasta format begin with    ) e How many sequences are in this file  f  On each line that has a sequence on it, change all to U and store in a new file called Pz CRNA fasta    First identify all the records that have the pattern with an    in the sequence ID   part of the description ( so it matches PZ21537 A but not PZ832049)  h  Then remove the underscore and anything that comes after it so that PZ21537 A becomes PZ21537  1  There is another file called  pz cDNA Stats  First look at the first few lines to understand what each of the columns mean  Then create a new file that does not have the commented lines in it  (a commented line starts with a  ) called pz cDNA cleanStats 1  From your newly created file (pz cDNA cleanStats) look at the 4th column which has the most comon Smer  a  Print out all lines that have TTTTT as a most column Smer b  (Extra credit) Print out all lines that have a Smer of all the same nucleotide (eg AAAAA, CCCCC, GGGGG, TTTTT)  k Sort the pz cDNA cleanStats file by Length (36 column)

Accepted Answer

The Answer is in the image, click to view ...

Question

RT 1: SED - Regular Expressions. a. Write the command to read the first two lines of the pz_CONA.fasta file. Look at these two lines.

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Learning MySQL Get A Handle On Your Data

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question