Introduction

pdbeccdutils is an open-source python package for processing and analyzing small molecules in PDB. Small-molecule data in PDB is available as Chemical Component Dictionary (CCD) or Biologically Interesting Molecule reference Dictioanry (BIRD) in PDBX/mmCIF format. pdbeccdutils provides streamlined access to all metadata of small molecules in PDB and offers a set of convenient methods to compute various properties of small molecules using RDKIt such as 2D depictions, 3D conformers, physicochemical properties, matching common fragments and scaffolds, mapping to small-molecule databases using UniChem. pdbeccdutils also provides methods for identifying all the covalently attached chemical components in a macromolecular structure and calculating similarity among small molecules using PARITY method

Note The pdbeccdutils is under development and new functionality is added regularly as well as its functionality is being revised and updated. When properly installed all the code should have documentation. All the public methods do have static typing introduced in Python 3.5. All the interfaces should be well documented.

Installation

The pdbeccdutils can be presently obtained from PYPI using the following command:

pip install pdbeccdutils

Alternativelly, you can install the reposotory from Github using the following command:

pip install git+https://github.com/PDBeurope/ccdutils.git@master#egg=pdbeccdutils

If you want to contribute to the project please fork it first and then do a pull request.

Getting started

The core structural representation of small-molecules in pdbecccdutils package is a Component object, which is a wrapper around the default rdkit.Chem.rdchem.Mol object (object property mol) providing most of the functionality and access to its properties. Ideal, Model and Computed conformers are stored in the mol attribute and Depiction conformers are stroted in mol2D attributes of the Component. pdbeccdutils.core.models.ConformerType object allows accessing all of them.

Below you can find a few typical use cases.

Reading CCD mmCIF files

CCD structures can be read using ccd_reader.py module located in the pdbeccdutils.core module. By default, the molecules comes sanitized using an augmented RDKit sanitization procedure. However, this option can be turned off by specifying optional parameter sanitize=False to the function

from pdbeccdutils.core import ccd_reader

ccd_reader_result = ccd_reader.read_pdb_cif_file('HEM.cif')
ccd_reader_result

CCDReaderResult contains a list of possible warnings and errors that were encountered during the structure parsing. There is also a convenience method that allows reading in multiple chemical components, provided they are listed in different data blocks in a single mmCIF file.

Reading PRD mmCIF files

PRD structures can be read using prd_reader.py module located in pdbeccdutils.core module.

from pdbeccdutils.core import prd_reader

prd_reader_result = prd_reader.read_pdb_cif_file('PRDCC_000204.cif')
prd_reader_result

Component

Component is a wrapper around rdkit.Chem.rdchem.Mol object providing streamlined access to all metadata information from CCD/PRD files

component = ccd_reader_result.component
component
component.inchikey
component.formula

Infer Covalently Linked Components (CLC) from PDB model files

CLCs are large, complex multi-component ligands typically represented as individual components represented by individual CCDs as part of the PDB deposition and annotation process. To provide a precise and chemically complete representation of these multi-component ligands, we created CLCs encompassing the entire set of individual components.This improvement ensures researchers can analyse and interpret the interactions of these biologically relevant ligands more accurately. pdbecccdutils provides clc_reader module to infer all covalenty linked components in a single PDB model file.

from pdbeccdutils.core import clc_reader

clcs = clc_reader.read_pdb_cif_file('/path/to/xxxx_updated.cif',sanitize=True)
clc_components = [clc.component for clc in clcs]
rdkit_mols = [k.mol for k in clc_components]

The result of clc_reader.read_pdb_cif_file function is a list of instances of CLCReaderResult, with each instance representing a single Covalently Linked Components (CLC). The Component Object of CLCReaderResult can then be used to access the properties of each CLC.

Reading CLC mmCIF files

CLC structures can be read using clc_reader.py module located in pdbeccdutils.core module.

from pdbeccdutils.core import clc_reader
clc_reader_result = clc_reader.read_clc_cif_file('CLC_00004.cif')
clc_reader_result

Writing CCD/PRD/CLC files

CCD/PRD/CLC molecules represented as Component objects in pdbeccdutils can be exported to different file formats such as mmCIF, SDF, PDB, CML, XML

from pdbeccdutils.core import ccd_writer, prd_writer, clc_writer
from pdbeccdutils.core.models import ConformerType

ccd_component = ccd_reader_result.component
prd_component = prd_reader_result.component
clc_component = clc_reader_result.component

# write idealized coordinates in the SDF format.
ccd_writer.write_molecule('HEM.sdf', ccd_component)
prd_writer.write_molecule('PRDCC_000204.sdf', prd_component)
clc_writer.write_molecule('CLC_00004.sdf', clc_component)

# write model coordinates in the PDB format without hydrogens
ccd_writer.write_molecule('HEM.pdb', ccd_component, remove_hs=True, conf_type=ConformerType.Model)
prd_writer.write_molecule('PRDCC_000204.pdb', prd_component, remove_hs=True, conf_type=ConformerType.Model)
clc_writer.write_molecule('CLC_00004.pdb', clc_component, remove_hs=True, conf_type=ConformerType.Model)

# write model coordinates in the mmCIF format with hydrogens
ccd_writer.write_molecule('HEM.cif', ccd_component, conf_type=ConformerType.Model)
prd_writer.write_molecule('PRDCC_000204.cif', prd_component, conf_type=ConformerType.Model)
clc_writer.write_molecule('CLC_00004.cif', clc_component, conf_type=ConformerType.Model)