Step 1

In pDynamo, PDB files can be read and converted directly to systems using the function PDBFile_ToSystem. This is illustrated by Examples 2 and 19 of the pDynamo distribution, Unfortunately, the function often fails with many PDB files, including those containing experimental structures, or it generates systems of incorrect composition. When this is so, the solution is to convert the information on the PDB file into a PDB model which can then be modified as appropriate.

PDB files can be read and converted to PDB models using the function PDBFile_ToPDBModel and the model written out to a PDB model file with the function PDBModelFile_FromPDBModel. Suitable commands to do this for the file 1CDK.pdb are:

# . Read the PDB file and create a PDB model.
model = PDBFile_ToPDBModel ( "1CDK.pdb" )
model.Summary ( )

# . Write out the PDB model.
PDBModel_ToModelFile ( "1CDK.model", model ) 

The file 1CDK.pdb is read successfully although there are some warnings which, in this case, can be ignored. The model file that is produced employs YAML, which is a markup language that has been designed to have a much simpler and less verbose syntax than equivalent languages, such as XML. Websites describing the format may be found here. The file, with some parts omitted for brevity, is:

- Components: [LYS.8, GLY.9, SER.10, GLU.11, GLN.12, GLU.13, SER.14, VAL.15, LYS.16,
    GLU.17, PHE.18, LEU.19, ALA.20, LYS.21, ALA.22, LYS.23, GLU.24, ASP.25, PHE.26,
    LEU.27, LYS.28, LYS.29, TRP.30, GLU.31, ASN.32, PRO.33, ALA.34, GLN.35, ASN.36,
    THR.37, ALA.38, HIS.39, LEU.40, ASP.41, GLN.42, PHE.43, GLU.44, ARG.45, ILE.46,
    LYS.47, THR.48, LEU.49, GLY.50, THR.51, GLY.52, SER.53, PHE.54, GLY.55, ARG.56,
    VAL.57, MET.58, LEU.59, VAL.60, LYS.61, HIS.62, LYS.63, GLU.64, THR.65, GLY.66,
    ASN.67, HIS.68, PHE.69, ALA.70, MET.71, LYS.72, ILE.73, LEU.74, ASP.75, LYS.76,
    GLN.77, LYS.78, VAL.79, VAL.80, LYS.81, LEU.82, LYS.83, GLN.84, ILE.85, GLU.86,
    HIS.87, THR.88, LEU.89, ASN.90, GLU.91, LYS.92, ARG.93, ILE.94, LEU.95, GLN.96,
    ALA.97, VAL.98, ASN.99, PHE.100, PRO.101, PHE.102, LEU.103, VAL.104, LYS.105,
    LEU.106, GLU.107, TYR.108, SER.109, PHE.110, LYS.111, ASP.112, ASN.113, SER.114,
    ASN.115, LEU.116, TYR.117, MET.118, VAL.119, MET.120, GLU.121, TYR.122, VAL.123,
    PRO.124, GLY.125, GLY.126, GLU.127, MET.128, PHE.129, SER.130, HIS.131, LEU.132,
    ARG.133, ARG.134, ILE.135, GLY.136, ARG.137, PHE.138, SER.139, GLU.140, PRO.141,
    HIS.142, ALA.143, ARG.144, PHE.145, TYR.146, ALA.147, ALA.148, GLN.149, ILE.150,
    VAL.151, LEU.152, THR.153, PHE.154, GLU.155, TYR.156, LEU.157, HIS.158, SER.159,
    LEU.160, ASP.161, LEU.162, ILE.163, TYR.164, ARG.165, ASP.166, LEU.167, LYS.168,
    PRO.169, GLU.170, ASN.171, LEU.172, LEU.173, ILE.174, ASP.175, GLN.176, GLN.177,
    GLY.178, TYR.179, ILE.180, GLN.181, VAL.182, THR.183, ASP.184, PHE.185, GLY.186,
    PHE.187, ALA.188, LYS.189, ARG.190, VAL.191, LYS.192, GLY.193, ARG.194, THR.195,
    TRP.196, TPO.197, LEU.198, CYS.199, GLY.200, THR.201, PRO.202, GLU.203, TYR.204,
    LEU.205, ALA.206, PRO.207, GLU.208, ILE.209, ILE.210, LEU.211, SER.212, LYS.213,
    GLY.214, TYR.215, ASN.216, LYS.217, ALA.218, VAL.219, ASP.220, TRP.221, TRP.222,
    ALA.223, LEU.224, GLY.225, VAL.226, LEU.227, ILE.228, TYR.229, GLU.230, MET.231,
    ALA.232, ALA.233, GLY.234, TYR.235, PRO.236, PRO.237, PHE.238, PHE.239, ALA.240,
    ASP.241, GLN.242, PRO.243, ILE.244, GLN.245, ILE.246, TYR.247, GLU.248, LYS.249,
    ILE.250, VAL.251, SER.252, GLY.253, LYS.254, VAL.255, ARG.256, PHE.257, PRO.258,
    SER.259, HIS.260, PHE.261, SER.262, SER.263, ASP.264, LEU.265, LYS.266, ASP.267,
    LEU.268, LEU.269, ARG.270, ASN.271, LEU.272, LEU.273, GLN.274, VAL.275, ASP.276,
    LEU.277, THR.278, LYS.279, ARG.280, PHE.281, GLY.282, ASN.283, LEU.284, LYS.285,
    ASP.286, GLY.287, VAL.288, ASN.289, ASP.290, ILE.291, LYS.292, ASN.293, HIS.294,
    LYS.295, TRP.296, PHE.297, ALA.298, THR.299, THR.300, ASP.301, TRP.302, ILE.303,
    ALA.304, ILE.305, TYR.306, GLN.307, ARG.308, LYS.309, VAL.310, GLU.311, ALA.312,
    PRO.313, PHE.314, ILE.315, PRO.316, LYS.317, PHE.318, LYS.319, GLY.320, PRO.321,
    GLY.322, ASP.323, THR.324, SER.325, ASN.326, PHE.327, ASP.328, ASP.329, TYR.330,
    GLU.331, GLU.332, GLU.333, GLU.334, ILE.335, ARG.336, VAL.337, SER.338, ILE.339,
    ASN.340, GLU.341, LYS.342, CYS.343, GLY.344, LYS.345, GLU.346, PHE.347, SER.348,
    GLU.349, PHE.350, MN.401, MN.402, ANP.400, MYR.403, HOH.404, HOH.405, HOH.406,
    HOH.407, HOH.408, HOH.409, HOH.410, HOH.411, HOH.412, HOH.413, HOH.414, HOH.415,
    HOH.416, HOH.417, HOH.418, HOH.419, HOH.420, HOH.421, HOH.422, HOH.423, HOH.424,
    HOH.425, HOH.426, HOH.427, HOH.428, HOH.429, HOH.430, HOH.431, HOH.432, HOH.433,
    HOH.434, HOH.435, HOH.436, HOH.437, HOH.438, HOH.439, HOH.440, HOH.441, HOH.442,
    HOH.443, HOH.444, HOH.445, HOH.446, HOH.447, HOH.448, HOH.449, HOH.450, HOH.451,
    HOH.452, HOH.453, HOH.454, HOH.455, HOH.456, HOH.457, HOH.458, HOH.459, HOH.460,
    HOH.461, HOH.462, HOH.463, HOH.464, HOH.465, HOH.466, HOH.467, HOH.468, HOH.469,
    HOH.470, HOH.471, HOH.472, HOH.473, HOH.474, HOH.475, HOH.476, HOH.477, HOH.478,
    HOH.479, HOH.480, HOH.481, HOH.482, HOH.483, HOH.484, HOH.485, HOH.486, HOH.487,
    HOH.488, HOH.489, HOH.490, HOH.491, HOH.492, HOH.493, HOH.494, HOH.495, HOH.496,
    HOH.497, HOH.498, HOH.499, HOH.500, HOH.501, HOH.502, HOH.503, HOH.504, HOH.505,
    HOH.506, HOH.507, HOH.508, HOH.509, HOH.510, HOH.511, HOH.512, HOH.513, HOH.514,
    HOH.515, HOH.516, HOH.517, HOH.518, HOH.519, HOH.520, HOH.521, HOH.522, HOH.523,
    HOH.524, HOH.525, HOH.526, HOH.527, HOH.528, HOH.529, HOH.530, HOH.531, HOH.532]
  Label: A
  - {Label: GenericSingle, Left Component: ASN.171, Right Component: MN.401}
  - {Label: GenericSingle, Left Component: ASP.184, Right Component: MN.402}
  - {Label: GenericSingle, Left Component: MN.401, Right Component: ANP.400}
  - {Label: GenericSingle, Left Component: MN.401, Right Component: HOH.451}
  - {Label: GenericSingle, Left Component: MN.402, Right Component: HOH.514}
  - {Label: GenericSingle, Left Component: MN.402, Right Component: HOH.436}
  - {Label: GenericSingle, Left Component: MN.402, Right Component: ANP.400}
  - {Label: GenericSingle, Left Component: TRP.196, Right Component: TPO.197}
  - {Label: GenericSingle, Left Component: TPO.197, Right Component: LEU.198}
  - {Label: GenericSingle, Left Component: TPO.197, Right Component: TPO.197}
  - {Label: GenericSingle, Left Component: ANP.400, Right Component: ANP.400}
  - {Label: GenericSingle, Left Component: LEU.198, Right Component: TPO.197}
  - {Label: GenericSingle, Left Component: MYR.403, Right Component: MYR.403}
  - {Label: GenericSingle, Left Component: ANP.400, Right Component: MN.402}
  - {Label: GenericSingle, Left Component: HOH.451, Right Component: MN.401}
  - {Label: GenericSingle, Left Component: TPO.197, Right Component: TRP.196}
  - {Label: GenericSingle, Left Component: MN.401, Right Component: ASN.171}
  - {Label: GenericSingle, Left Component: HOH.436, Right Component: MN.402}
  - {Label: GenericSingle, Left Component: MN.402, Right Component: ASP.184}
  - {Label: GenericSingle, Left Component: ANP.400, Right Component: MN.401}
  - {Label: GenericSingle, Left Component: HOH.514, Right Component: MN.402}
# ... something similar for the B chain ...
- Components: [THR.1, THR.2, TYR.3, ALA.4, ASP.5, PHE.6, ILE.7, ALA.8, SER.9, GLY.10,
    ARG.11, THR.12, GLY.13, ARG.14, ARG.15, ASN.16, ALA.17, ILE.18, HIS.19, ASP.20,
    HOH.49, HOH.54, HOH.63, HOH.70, HOH.73, HOH.74, HOH.75, HOH.76, HOH.77, HOH.78,
    HOH.79, HOH.88, HOH.109, HOH.110, HOH.111, HOH.113, HOH.129, HOH.146, HOH.147,
  Label: I
# ... something similar for the J chain ...
Label: Camp-Dependent Protein Kinase Catalytic Subunit(E.C. (Protein Kinase
  A) Complexed With ProteinKinase Inhibitor Peptide Fragment 5-24 (Pki(5-24)Isoelectric
  Variant Ca) And Mn2+ Adenylyl Imidodiphosphate(Mnamp-Pnp) At Ph 5.6 And 7C And 4C
Linear Polymers:
- {Left Terminal Component: 'A:LYS.8', Right Terminal Component: 'A:PHE.350'}
- {Left Terminal Component: 'I:THR.1', Right Terminal Component: 'I:ASP.20'}
# ... similar entries for the B and J chains ...

Before describing the contents of the file, it is important to understand how pDynamo structures a PDB model. It adopts a three-level hierarchy, similar to that employed in mmCIF files, and divides a system into entities, components and atoms. In a PDB file, entities are equivalent, approximately, to chains and components to residues. Each item in the hierarchy has a unique path that is constructed from the item's label and the labels of its parent containers within the hierarchy. Labels are strings that may be composed of a number of fields. By default, fields within a label are separated by periods ".", and labels within a path by colons ":". Examples of labels from the 1CDK structure are: OG (atom), LYS.8 (component) and A (entity); and of paths: A:LYS.8:N (atom), I:ALA.17 (component) and I (entity).

In the 1CDK model file, there are two, essentially identical, sets of entities — A/I and B/J. Entities A and B are the catalytic subunits of PKA, whereas entities I and J are the peptide inhibitors. The entries for each entity in the file declare the entity's label and a list of the components that the entity contains. In addition, entity A contains a list of links that are non-standard bonds that have been specified in the CONECT and SSBOND records of the PDB file. Each link has a label and the labels of the components, left and right, between which it occurs. Links from CONECT records are given the generic name GenericSingle, whereas those from SSBOND records are labeled DisulfideBridge.

In addition to the entity entries, the model file contains the label for the model, derived from the title in the PDB file, and a list of the linear polymers that correspond to the A, B, I and J chains that the PDB file declares. Each linear polymer declares the first (left) and last (right) component in its chain. Note that there is not necessarily any relation between entities and linear polymers, as entities can be comprised of arbitrary mixtures of polymer and non-polymer components. In this case, all the entities have a single polymer, composed of standard and non-standard (TPO:197) amino acids, and non-polymer components, including 5'-adenyly-imido-triphosphate (ANP.400), manganese ions (MN.401 and MN.402), myristic acid (MYR.403) and waters (HOH.*).

To finish this section, it is worth remarking that the file 1CDK.pdb contains only a single PDB model. Some PDB files, though, contain multiple models — most notably for files containing structures that have been resolved by NMR spectroscopy. In such cases, the desired model can be selected from the file by passing the modelNumber keyword argument to the PDBFile_ToPDBModel function. By default, the first model on the file is returned.