Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

SHELL SCRIPT PROGRAMMING A researcher has a file containing information about the number of times particular k-mers (peptide sequences of length k, derived from actual

SHELL SCRIPT PROGRAMMING

A researcher has a file containing information about the number of times particular k-mers (peptide sequences of length k, derived from actual protein sequences) occur in the human proteome. The information for each k-mer is on one line in the file. The information is divided into columns. The first column is the position of the start of the k-mer in its source protein. The next column is the k-mer itself. Then are two counts: the number of times that the k-mer occurs in the human proteome, and the number of proteins in the human proteome which contain the k-mer. The information columns are deliminted by tab characters. For example, a portion of the data file might look like:

110 DPRRR 18 18 111 PRRRS 58 54 112 RRRSR 173 112 113 RRSRN 12 12 114 RSRNL 13 13 115 SRNLG 14 14 116 RNLGK 22 22 117 NLGKV 9 9 118 LGKVI 23 23 119 GKVID 19 19 120 KVIDT 12 12 121 VIDTL 4 4 122 IDTLQ 0 0 123 DTLQE 4 3 

The researcher is interested in those k-mers for which the counts in the last two columns are both 0; i.e. the researcher is interested in k-mers which do not occur in the human proteome. For instance, given the data above, the researcher would be interested in being informed of the k-mer IDTLQ.

Write a shell script that will output, on the standard output, the k-mers that do not occur in the human proteome assuming input as described above. Each k-mer is to be on a separate line. The script is to read from standard input. Assume that the input file contains nothing other than lines of k-mer information.

Hint: Start the design of your shell script by considering a shell command pipeline involving grep that will output, on the standard output, those lines from the standard input which have the pattern "0\t0$" in them.

Your scripts should be independent of the value of k (providing, of course, that k1). That is, your scripts should be work for data files of k-mers of any size. Further, k should not be a parameter in/to your scripts.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions