Question
Introduction A PDB file contains information obtained experimentally about a macromolecule, usually by either Xray crystallography, NMR spectroscopy, or cryoelectron microscopy. The PDB file completely
Introduction
A PDB file contains information obtained experimentally about a macromolecule, usually by either Xray crystallography, NMR spectroscopy, or cryoelectron microscopy. The PDB file completely characterizes the molecule, providing the three-dimensional positions of every single atom in the file, where the bonds are, which amino acids it contains if it is a protein or which nucleotides if it is a poly-nucleotide such as DNA or RNA, and much more. The information is not necessarily exact; associated with some of this information are confidence values or other measures that indicate the degree to which the researchers believe that the information is accurate.
Sometimes the researchers who created the file were not sure which of several measured positions of atoms to use, and rather than making a decision, they supplied several different models of the molecule's structure. PDB files with multiple models are easy to spot because they have lines that begin with the word MODEL.
A protein can be made up of multiple chains. A chain is a linear sequence of amino acid residues. The same amino acid residue can occur multiple times within a single chain. Therefore in a PDB file, each residue in a chain is given a sequence number that specifies its position in the chain, starting with 1 as the first position.
Some PDB files also have multiple records that represent the same atom within a single model because the researchers who created the file were not sure which of a few measured positions of atoms to use, and instead of creating separate models, they put different choices of position for these atoms. When an atom has more than one position, a specific character in the ATOM record in the file identifies it.
There are two kinds of atom records in a PDB file: ATOM records and HETATM records.
ATOM records describe the atoms in the molecule itself.
HETATM records are used to describe atoms that are not part of the biological polymer, such as those in the surrounding solvent or in attached molecules.
An ATOM record contains several fields, specified by column numbers on the line. The information is located on the line according to the following PDB file specification.
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------- 1 6 Record name "ATOM " 7 11 Integer serial Atom serial number 13 16 Atom name Atom name 17 Character AltLoc Alternate location indicator 18 20 Residue name resName Residue name 22 Character chainID Chain identifier 23 26 Integer resSeq Residue sequence number 31 38 Real(8.3) x Orthogonal coordinates for X in Angstroms 39 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms 47 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms 77 78 LString(2) element Element symbol, right- justified |
The table specifies which columns the data is in, what data type it is, and what it represents. For example, the serial number is up to five digits long and is in columns 7, 8, 9, 10, and 11. The atomic coordinates are fixed decimal real numbers with 3 digits of decimal precision, with x, then y, then z in that order on the line. Atom names can be up to four characters long. For example, carbon atoms that are part of a ring are named CA, CB, CG, and so on, for C-alpha, C-beta, and C-gamma respectively.
Instructions
Write a bash script and a Python program. Both versions should be named atomcoordinates, that will accept the name of a PDB file as its only command line argument.
Error checking Both the bash script and Python program should each check that it has one single command line argument, and that it is a file that the script can read. If either of these conditions is not true, the script/program should output to the user the appropriate user error, a how-to-use-me message, and then exit. The shell script and Python program are not required to check that the file is in the proper form for the PDB file format. |
Given this PDB file, the programs must find all lines that start with the word ATOM and will display, for each line that it finds, a line of output containing the atom's serial number and coordinates. For example, a line in the PDB file that looks like this:
ATOM 18 CB GLN A 3 83.556 52.126 45.080 1.00 26.06 C |
would result in the following output line being displayed both the bash script or the Python program:
Atom serial number: 18 X coordinates: 83.556 Y coordinates: 52.126 Z coordinates: 45.080 |
because the atom's serial number is 18 and its coordinates are 83.556, 52.126, and 45.080. Hint: How do you know where this information is? In the PDB file, the data is in specific columns. In particular, the atom's serial number is always in columns 7 through 11, and the three coordinates start in column 31 and end in column 54.
Therefore, your shell script, and Python program, has to extract the serial number and the coordinates from these columns and display them. For the shell script, your job is to decide which filters can achieve this. This will take some research. Figure out which filters will work the best. For the Python program, you must figure out what properties are appropriate for accessing the string data, and for formatting.
Testing Some PDB files are in the cs132 course directory, /data/biocs/b/student.accounts/cs132/data/pdb_files to give to your script and Python program as file arguments for testing. Your shell script and Python program should work for any such files. Your script and Python program should be able to accept any kind of command line arguments passed in as absolute pathnames, relative pathnames, and your own testing files, not just those located in the directory! |
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started