Step 3a
In pDynamo the construction of an atomic model from a PDB file requires a library containing definitions of components, links and variants. These are defined as follows:
- Components
- The PDB keeps a dictionary of all chemical entities that appear as residues in structures in the data base. The dictionary is called the PDB Chemical Component Dictionary and may be downloaded from the PDB website (see the Dictionaries and File Formats and PDB Ligand Dictionary submenus of the website). Each entry in the dictionary defines a component and includes lists of the component's atoms and covalent bonds. All PDB components have names consisting of three alphanumeric characters.
- Links
- These are not defined in the PDB standard but are used by pDynamo to modify the composition of two components when there is a covalent bond between them. A common protein link is one that specifies a disulfide bridge between two cysteine residues.
- Variants
- Variants, like links, are pDynamo constructs but they modify the composition of a single component only. Common protein variants include those which change the default protonation state of a residue.
The standard pDynamo distribution comes with a limited set of PDB components, links and variants which will be insufficient for users that employ PDB files extensively. In such cases it will be necessary to augment the library using functions that are provided in the module PDBComponentScripts.
A reasonable strategy for doing this is as follows:
- Choose a directory in which to create the modified component library and redefine the PBABEL_PDBDATA environment variable accordingly. It is preferable to select a directory that is not in the pDynamo distribution so that reinstallation of the program will not cause the additions to be lost.
- Download the component dictionary, in mmCIF format, from the PDB website. A version of the dictionary is included with the other files from this tutorial but it will not be the most recent one.
- Look at the file that contains the component dictionary to see if the desired components are present and, if so, whether they are in the correct form for the study at hand. In the PKA case, the components ATP, MG and PO3 were missing and these all occur in the dictionary. ATP is in a neutral form with its phosphate groups fully protonated, MG is a dication and PO3 is a fully deprotonated trianion.
- Write a program to create and/or update the modified library. One suitable for PKA is:
# . Process the library. components = ProcessPDBComponentDictionary ( "public-component-erf.cif" ) # . Make the default library. MakeDefaultPDBComponentLibrary ( components ) # . Add a default variant to ATP. components["ATP"].variants = [ "FullyDeprotonated" ] # . Create the ATP variant. variant = PDBComponentVariant ( component = "ATP", label = "FullyDeprotonated", atomsToDelete = [ "2HOG", "3HOG", "2HOB", "2HOA" ], \ formalCharges = { "O2G" : -1, "O3G" : -1, "O2B" : -1, "O2A" : -1 } ) # . Create the THR-PO3 link. leftvariant = PDBComponentVariant ( component = "PO3", formalCharges = { "P" : 0 } ) rightvariant = PDBComponentVariant ( component = "THR", atomsToDelete = [ "HG1" ] ) link = PDBComponentLink ( label = "PhosphorylatedThreonine", atomLabel1 = "P", variant1 = leftvariant, \ atomLabel2 = "OG1", variant2 = rightvariant, \ bondOrder = SingleBond ( ) ) # . Save all the items. for item in ( components["ATP"], components["MG"], components["PO3"], variant, link ): AddItemToPDBComponentLibrary ( item )- The function ProcessPDBComponentDictionary processes the component dictionary in the file public-component-erf.cif and returns a Python dictionary, components, whose keys are the three-character component labels (equivalent to residue names) and whose values are instances of the class PDBComponent.
- The default version of the library that comes with the pDynamo distribution is created with the function MakeDefaultPDBComponentLibrary in the directory pointed to by the environment variable PBABEL_PDBDATA. This statement can be skipped if a modified library already exists. The function takes an optional, boolean argument, fullLibrary, which, if given as True, writes all the components passed to it to the library — not just those from the default distribution.
- Components in the dictionary occur in a form which may not be the one that is to be used most often. Thus, for example, the components for aspartic and glutamic acids have protonated sidechains whereas they will almost always (at least in proteins) be wanted in
their deprotonated forms. To avoid having to specify variants explicitly for all such cases in PDB model files, "default" variants for a component can be defined which will be applied automatically whenever the component is required. The default
variants can be overridden by appropriate Variant statements in the model file.
The forms of the components MG and PO3 in the PDB dictionary are probably already those that will be needed most often. However, it is rare that a fully protonated ATP will be used so it is more convenient to define a default variant which automatically gives the fully deprotonated form. This is done in the third statement by defining the variants attribute of the PDB component corresponding to ATP.
- The fourth statement creates the fully deprotonated variant for ATP by instantiating the PDBComponentVariant class. The arguments to the constructor define the component to which the variant is to be applied (component), the name of the variant (label), the names of the atoms to delete in the target component (atomsToDelete) and a dictionary permitting the formal charges for atoms that remain to be changed (formalCharges). In this case, the four hydrogens bound to the phosphate oxygens are removed and the oxygens to which they were bound each acquire a negative charge. Of course, when an atom is removed any bonds it may have are also deleted. Although not needed here, there are other arguments to the constructor which allow atoms and bonds to be added, bonds to be deleted and bond orders to be changed.
- The next group of statements creates the missing PhosphorylatedThreonine link. A link is made up of two anonymous variants (i.e. variants without names) so these are created first. Both are simple. The first (left) variant is for PO3 and changes the charge of the phosphorus from minus one to zero (thereby increasing the charge for the component from -3 to -2). The second (right) variant is for THR and removes the hydroxyl hydrogen HG1. The link itself is created by instantiating the class PDBComponentLink. The arguments to the constructor give the name of the link (label), the labels of the atoms between which there is to be a bond (atomLabel1 and atomLabel2), the variants for each component participating in the link (variant1 and variant2) and the type of bond (bondOrder).
- The program terminates by inserting the necessary components and the created variant and link into the library using the function AddItemToPDBComponentLibrary. An existing library must exist at the location pointed to by PBABEL_PDBDATA for this function to work.
It is quite common that a component is desired that is not in the PDB chemical component dictionary, in which case a component must be constructed from scratch. The easiest way to do this is to create a system with the desired composition and then convert it to a component, with the desired name, using the class method FromSystem of the PDBComponent class. Systems with sufficient information are most conveniently generated from MOL files or SMILES strings. Examples are:
# . Construct a component from a MOL file.
system = MOLFile_ToSystem ( "water.mol" )
component = PDBComponent.FromSystem ( system, label = "WAT" )
# . Construct a component from a SMILES.
system = SMILES_ToSystem ( "O" )
component = PDBComponent.FromSystem ( system, label = "WAT" )
As a final point, it is advisable to gather together, or at least conserve, all scripts that modify the component library. This means that the "local" version of the library can be regenerated in case of problems or when employing future versions of pDynamo that may not be compatible in some way with earlier ones.