core

pdbeccdutils.core.ccd_reader

A set of methods for reading in data and creating internal representation of molecules. The basic use can be as easy as this:

from pdbeccdutils.core import ccd_reader

ccdutils_component = ccd_reader.read_pdb_cif_file(‘/path/to/cif/ATP.cif’).component rdkit_mol = ccdutils_component.mol

class pdbeccdutils.core.ccd_reader.CCDReaderResult(warnings: List[str], errors: List[str], component: Component, sanitized: bool)

NamedTuple for the result of reading an individual PDB chemical component definition (CCD).

component

internal representation of the CCD read-in.

Type:

Component

errors

A list of any errors found while reading the CCD. If no warnings found errors will be empty.

Type:

list[str]

warnings

A list of any warnings found while reading the CCD. If no warnings found warnings will be empty.

Type:

list[str]

sanitized

Whether or not the molecule was sanitized

Type:

bool

component: Component

Alias for field number 2

errors: List[str]

Alias for field number 1

sanitized: bool

Alias for field number 3

warnings: List[str]

Alias for field number 0

pdbeccdutils.core.ccd_reader.read_pdb_cif_file(path_to_cif: str, sanitize: bool = True) CCDReaderResult

Read in single wwPDB CCD CIF component and create its internal representation.

Parameters:
  • path_to_cif (str) – Path to the cif file

  • sanitize (bool) – [Defaults: True]

Raises:

ValueError – if file does not exist

Returns:

Results of the parsing altogether with the internal representation of the component.

Return type:

CCDReaderResult

pdbeccdutils.core.ccd_reader.read_pdb_components_file(path_to_cif: str, sanitize: bool = True, include: list[str] = []) Dict[str, CCDReaderResult]

Process multiple compounds stored in the wwPDB CCD components.cif file.

Parameters:
  • path_to_cif (str) – Path to the components.cif file with multiple ligands in it.

  • sanitize (bool) – Whether or not the components should be sanitized Defaults to True.

  • include (list[str]) – List of CCDs to be parsed. By default it is empty and parse

  • provided (all the CCDs. If a list of CCDs)

  • them (will only parse)

Raises:

ValueError – if the file does not exist.

Returns:

Internal representation of all the components in the components.cif file.

Return type:

dict[str, CCDReaderResult]

pdbeccdutils.core.component

class pdbeccdutils.core.component.Component(mol: Mol, ccd_cif_block: Block, properties: CCDProperties | None = None, descriptors: List[Descriptor] | None = None)

Wrapper for the rdkit.Chem.Mol object enabling some of its functionality and handling possible erroneous situations.

Returns:

instance object

Return type:

Component

property atoms_ids: Tuple[Any, ...]

Supplies a list of the atom_ids obtained from _chem_comp_atom.atom_id, see:

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Categories/chem_comp_atom.html

The order will reflect the order in the input PDB-CCD.

The atom_id is also also know as ‘atom_name’, standard amino acids have main chain atom names ‘N CA C O’

Returns:

atom_id’s for the PDB-CCD

Return type:

tuple[str]

compute_2d(manager: DepictionManager, remove_hs: bool = True) DepictionResult

Compute 2d depiction of the component using DepictionManager instance.

Parameters:
  • manager (DepictionManager) – Instance of the ligand depiction class.

  • remove_hs (bool, optional) – Defaults to True. Remove hydrogens prior to depiction.

Returns:

Object with the details about depiction process.

Return type:

DepictionResult

compute_3d(version='v3') bool

Generate 3D coordinates using EKTDG method. Version can be specified

Parameters:

version (str, optional) – Version of EKTDG to be used. Defaults to “v3”.

Returns:

Result of the structure generation process.

Return type:

bool

property descriptors: List[Descriptor]

Supply the _pdbx_chem_comp_descriptor category for the PDB-CCD Obtained from PDB-CCD’s _pdbx_chem_comp_descriptor:

http://mmcif.rcsb.org/dictionaries/mmcif_pdbx.dic/Items/_pdbx_chem_comp_descriptor.program_version.html

Returns:

List of descriptors for a given entry.

Return type:

list[Descriptor]

export_2d_annotation(file_name: str, wedge_bonds: bool = True) None

Generates 2D depiction in JSON format with annotation of bonds and atoms to be redrawn in the interactions component.

Parameters:

file_name (str) – Path to the file

export_2d_svg(file_name: str, width: int = 500, names: bool = False, wedge_bonds: bool = True, atom_highlight: Dict[Any, Tuple] | None = None, bond_highlight: Dict[Tuple, Tuple] | None = None)

Save 2D depiction of the component as an SVG file. Component id is generated in case the image cannot be drawn.

Parameters:
  • file_name (str) – path to store 2d depiction

  • width (int, optional) – Defaults to 500. Width of a frame in pixels.

  • names (bool, optional) – Defaults to False. Whether or not to include atom names in depiction. If atom name is not set, element symbol is used instead.

  • wedge_bonds (bool, optional) – Defaults to True. Whether or not the molecule should be depicted with bond wedging.

  • atomHighlight (dict of tuple of float, optional) – Defaults to None. Atoms names to be highlighted along with colors in RGB. e.g. {‘CA’: (0.5, 0.5, 0.5)} or {0: (0.5, 0.5, 0.5)}

  • bondHighlight (dict of tuple of float, optional) – Defaults to None. Bonds to be highlighted along with colors in RGB. e.g. {(‘CA’, ‘CB’): (0.5, 0.5, 0.5)} or {(0, 1): (0.5, 0.5, 0.5)}

Raises:

CCDUtilsError – If bond or atom does not exist.

property external_mappings

List external mappings provided by UniChem. fetch_external_mappings() was not called before only agreed mapping is retrieved.

Returns:

UniChem mappings

Return type:

list[tuple[str]]

fetch_external_mappings(all_mappings=False)

Retrieve external mapping through UniChem based on the InChi Key.

Parameters:

all_mappings (bool, optional) – Get UniChem mappings. Defaults to False.

Returns:

Return resource ids pairing established by UniChem.

Return type:

dict[str, str]

property formula: str

Supply the chemical formula for the PDB-CCD, for example ‘C2 H6 O’. Obtained from PDB-CCD’s _chem_comp.formula:

http://mmcif.wwpdb.org/dictionaries/mmcif_std.dic/Items/_chem_comp.formula.html

If not defined then the empty string ‘’ will be returned.

Returns:

the _chem_comp.formula or ‘’.

Return type:

str

property fragments: List[SubstructureMapping]

Lists matched fragments and atom names.

Returns:

Substructure mapping for all discovered fragments.

Return type:

list[SubstructureMapping]

get_conformer(c_type) Conformer

Retrieve an rdkit object for a deemed conformer.

Parameters:

c_type (ConformerType) – Conformer type to be retrieved.

Raises:

ValueError – If conformer does not exist

Returns:

RDKit conformer object

Return type:

rdkit.Chem.rdchem.Conformer

get_scaffolds(scaffolding_method=ScaffoldingMethod.MurckoScaffold)

Compute deemed scaffolds for a given compound.

Parameters:

scaffolding_method (ScaffoldingMethod, optional) – Defaults to MurckoScaffold. Scaffolding method to use

Returns:

Scaffolds found in the component.

Return type:

list[rdkit.Chem.rdchem.Mol]

has_degenerated_conformer(c_type: ConformerType) bool

Determine if given conformer has missing coordinates or is missing completelly from the rdkit.Mol object. This can be used to determine, whether or not the coordinates should be regenerated.

Parameters:

type (ConformerType) – type of conformer to be inspected.

Returns:

True if more than 1 atom has coordinates [0, 0, 0] or the Conformer is not present

Return type:

bool

property id: str

Supply the unique identifier for the PDB-CCD, for example ‘ATP’. Obtained from CCD’s _chem_comp.id:

http://mmcif.wwpdb.org/dictionaries/mmcif_std.dic/Items/_chem_comp.id.html

If not defined then the empty string ‘’ will be returned.

Returns:

the _chem_comp.id or ‘’.

Return type:

str

property inchi: str

Supply the InChI for the PDB-CCD. Obtained from PDB-CCD’s _pdbx_chem_comp_descriptor table line with _pdbx_chem_comp_descriptor.type=InChI, see:

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_pdbx_chem_comp_descriptor.type.html

If not defined then the empty string ‘’ will be returned.

Returns:

the InChI or ‘’.

Return type:

str

property inchi_from_rdkit: str

Provides the InChI worked out by RDKit.

Returns:

the InChI or empty ‘’ if there was an error finding it.

Return type:

str

property inchikey: str

Supply the InChIKey for the PDB-CCD. Obtained from PDB-CCD’s _pdbx_chem_comp_descriptor table line with _pdbx_chem_comp_descriptor.type=InChIKey, see:

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_pdbx_chem_comp_descriptor.type.html

If not defined then the empty string ‘’ will be returned.

Returns:

the InChIKey or ‘’.

Return type:

str

property inchikey_from_rdkit: str

Provides the InChIKey worked out by RDKit.

Returns:

the InChIKey or ‘’ if there was an error finding it.

Return type:

str

inchikey_from_rdkit_matches_ccd(connectivity_only: bool = False) bool

Checks whether inchikey matches between ccd and rdkit

Parameters:

connectivity_only (bool) – restrict to the first 14 character - the connectivity information.

Returns:

True for match

Return type:

bool

Identify fragments from the fragment library in this component

Parameters:

fragment_library (FragmentLibrary) – Fragment library.

Returns:

Matches found in this run

Return type:

list[SubstructureMapping]

locate_fragment(mol: Mol) List[List[Atom]]

Identify substructure match in the component.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – Fragment to be matched with structure

Returns:

List of fragments identified in the component as a list of atoms.

Return type:

list[list[rdkit.Chem.rdchem.Atom]]

property modified_date: date

Supply the pdbx_modified_date for the PDB-CCD Obtained from PDB-CCD’s _chem_comp.pdbx_modified_date:

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_chem_comp.pdbx_modified_date.html

Returns:

Date of the last entrie’s modification.

Return type:

datetime.date

property mol_no_h: Mol

RDKit mol object without hydrogens

Returns:

RDKit mol object with stripped Hs.

Return type:

rdkit.Chem.rdchem.Mol

property name: str

Supply the ‘full name’ of the PDB-CCD, for example ‘ETHANOL’. Obtained from PDB-CCD’s _chem_comp.name:

http://mmcif.wwpdb.org/dictionaries/mmcif_std.dic/Items/_chem_comp.name.html

If not defined then the empty string ‘’ will be returned.

Returns:

the _chem_comp.name or ‘’.

Return type:

str

property number_atoms: int

Supplies the number of atoms in the _chem_comp_atom table

Returns:

the number of atoms in the PDB-CCD

Return type:

int

property pdbx_release_status: ReleaseStatus

Supply the pdbx_release_status for the PDB-CCD. Obtained from PDB-CCD’s _chem_comp.pdbx_rel_status:

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_chem_comp.pdbx_release_status.html

Returns:

enum of the release status (this includes NOT_SET if no value is defined).

Return type:

pdbeccdutils.core.enums.ReleaseStatus

property physchem_properties

RDKit calculated properties related to the CCD compound

Returns:

A list of RDKit calculated properties

Return type:

dict[str, float]

property released: bool

Tests pdbx_release_status is REL.

Returns:

True if PDB-CCD has been released.

Return type:

bool

property scaffolds: List[SubstructureMapping]

Lists matched scaffolds and atom names

Returns:

List of substructure mappings.

Return type:

list[SubstructureMapping]

pdbeccdutils.core.ccd_writer

Structure writing module. Presently the following formats are supported:

SDF, CIF, PDB, JSON, XYZ, XML, CML.

raises CCDUtilsError:

If deemed format is not supported or an unrecoverable error occurres.

pdbeccdutils.core.ccd_writer.to_cml_str(component: Component, remove_hs=True, conf_type=ConformerType.Ideal)

Converts structure to the EBI representation of the molecule in CML format: http://cml.sourceforge.net/schema/cmlCore.xsd

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in CML format.

Return type:

str

pdbeccdutils.core.ccd_writer.to_json_dict(component: Component, remove_hs=True, conf_type=ConformerType.Ideal)

Returns component information in dictionary suitable for json formating

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Raises:
  • AttributeError – If all conformers are requested. This feature is

  • not supported not is planned.

Returns:

dictionary representation of the component

Return type:

dict of str

pdbeccdutils.core.ccd_writer.to_json_str(component: Component, remove_hs=True, conf_type=ConformerType.Ideal)

Converts structure into JSON representation. https://www.json.org/

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

json representation of the component as a string.

Return type:

str

pdbeccdutils.core.ccd_writer.to_pdb_ccd_cif_file(path, component: Component, remove_hs=True)

Converts structure to the PDB CIF format. Both model and ideal coordinates are stored. In case ideal coordinates are missing, rdkit attempts to generate 3D coordinates of the conformer.

Parameters:
  • path (str) – Path to save cif file.

  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

pdbeccdutils.core.ccd_writer.to_pdb_str(component: Component, remove_hs: bool = True, alt_names: bool = False, conf_type: ConformerType = ConformerType.Ideal)

Converts structure to the PDB format.

Parameters:
  • Component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • alt_names (bool, optional) – Defaults to False. Whether or not alternate atom names should be exported.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in the PDB format.

Return type:

str

pdbeccdutils.core.ccd_writer.to_sdf_str(component: Component, remove_hs: bool = True, conf_type: ConformerType = ConformerType.Ideal)

Converts structure to the SDF format.

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Raises:

CCDUtilsError – In case the structure could not be exported.

Returns:

String representation of the component in the SDF format

Return type:

str

pdbeccdutils.core.ccd_writer.to_xml_str(component: Component, remove_hs=True, conf_type=ConformerType.Ideal)

Converts structure to the XML format. Presently just molecule metadata are serialized without any coordinates, which is in accordance with the content of the PDBeChem area.

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in CML format.

Return type:

str

pdbeccdutils.core.ccd_writer.to_xml_xml(component, remove_hs=True, conf_type=ConformerType.Ideal)

Converts structure to the XML format and returns its XML repr.

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

XML object

Return type:

xml.etree.ElementTree.Element

pdbeccdutils.core.ccd_writer.to_xyz_str(component, remove_hs=True, conf_type=ConformerType.Ideal)

Converts structure to the XYZ format. Does not yet support ConformerType.AllConformers.

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in the XYZ format

Return type:

str

pdbeccdutils.core.ccd_writer.write_molecule(path, component: Component, remove_hs: bool = True, alt_names: bool = False, conf_type: ConformerType = ConformerType.Ideal)

Export molecule in a specified format. Presently supported formats are: PDB CCD CIF (.cif); Mol file (.sdf); Chemical Markup language (.cml); PDB file (.pdb); XYZ file (.xyz); XML (.xml). ConformerType.AllConformers is presently supported only for PDB.

Parameters:
  • path (str|Path) – Path to the file. Suffix determines format to be used.

  • component (Component) – Component to be exported

  • remove_hs (bool, optional) – Defaults to True. Whether or not hydrogens should be removed.

  • alt_names (bool, optional) – Defaults to False. Whether or not alternate names should be exported.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal. Conformer type to be exported.

Raises:

CCDUtilsError – For unsupported format

pdbeccdutils.core.prd_reader

pdbeccdutils.core.prd_reader.read_pdb_cif_file(path_to_cif: str, sanitize: bool = True) CCDReaderResult

Read in single wwPDB CCD CIF component and create its internal representation.

Parameters:
  • path_to_cif (str) – Path to the cif file

  • sanitize (bool) – [Defaults: True]

Raises:

ValueError – if file does not exist

Returns:

Results of the parsing altogether with the internal representation of the component.

Return type:

CCDReaderResult

pdbeccdutils.core.prd_reader.read_pdb_components_file(path_to_cif: str, sanitize: bool = True) Dict[str, CCDReaderResult]

Process multiple compounds stored in the wwPDB CCD components.cif file.

Parameters:
  • path_to_cif (str) – Path to the prdcc-all.cif file with multiple ligands in it.

  • sanitize (bool) – Whether or not the components should be sanitized Defaults to True.

Raises:

ValueError – if the file does not exist.

Returns:

Internal representation of all the components in the components.cif file.

Return type:

dict[str, CCDReaderResult]

pdbeccdutils.core.prd_writer

Structure writing module. Presently the following formats are supported:

SDF, CIF, PDB, JSON, XYZ, XML, CML.

raises CCDUtilsError:

If deemed format is not supported or an unrecoverable error occurres.

pdbeccdutils.core.prd_writer.to_pdb_ccd_cif_file(path, component: Component, remove_hs=True)

Converts structure to the PDB CIF format. Both model and ideal coordinates are stored. In case ideal coordinates are missing, rdkit attempts to generate 3D coordinates of the conformer.

Parameters:
  • path (str) – Path to save cif file.

  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

pdbeccdutils.core.prd_writer.to_pdb_str(component: Component, remove_hs: bool = True, alt_names: bool = False, conf_type: ConformerType = ConformerType.Ideal)

Converts structure to the PDB format.

Parameters:
  • Component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • alt_names (bool, optional) – Defaults to False. Whether or not alternate atom names should be exported.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in the PDB format.

Return type:

str

pdbeccdutils.core.prd_writer.write_molecule(path, component: Component, remove_hs: bool = True, alt_names: bool = False, conf_type: ConformerType = ConformerType.Ideal)

Export molecule in a specified format. Presently supported formats are: PDB CCD CIF (.cif); Mol file (.sdf); Chemical Markup language (.cml); PDB file (.pdb); XYZ file (.xyz); XML (.xml). ConformerType.AllConformers is presently supported only for PDB.

Parameters:
  • path (str|Path) – Path to the file. Suffix determines format to be used.

  • component (Component) – Component to be exported

  • remove_hs (bool, optional) – Defaults to True. Whether or not hydrogens should be removed.

  • alt_names (bool, optional) – Defaults to False. Whether or not alternate names should be exported.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal. Conformer type to be exported.

Raises:

CCDUtilsError – For unsupported format

pdbeccdutils.core.clc_reader

A set of methods for identifying bound-molecules (covalently bonded CCDs ) from mmCIF files of proteins and creating Component representation of molecules.

class pdbeccdutils.core.clc_reader.CLCReaderResult(warnings, errors, component, sanitized, bound_molecule)
bound_molecule

Alias for field number 4

component

Alias for field number 2

errors

Alias for field number 1

sanitized

Alias for field number 3

warnings

Alias for field number 0

pdbeccdutils.core.clc_reader.get_chem_comp_bonds(cif_block: Block, residue: str)

Returns _chem_comp_bond associated with a residue

Parameters:
  • cif_block – gemmi.cif.Block object of protein mmCIF file

  • residue – CCD ID

pdbeccdutils.core.clc_reader.infer_multiple_chem_comp(path_to_cif, bm, bm_id, sanitize=True)
Parameters:
  • path_to_cif – Path to input structure

  • bm – bound-molecules identified from input structure

  • bm_id – ID of bound-molecule

  • sanitize – True if bound-molecule need to be sanitized

Returns:

Namedtuple containing Component representation of bound-molecule

Return type:

CLCReaderResult

pdbeccdutils.core.clc_reader.read_clc_cif_file(path_to_cif: str, sanitize: bool = True) CCDReaderResult

Read in single CLC CIF component and create its internal representation.

Parameters:
  • path_to_cif (str) – Path to the cif file

  • sanitize (bool) – [Defaults: True]

Raises:

ValueError – if file does not exist

Returns:

Results of the parsing altogether with the internal representation of the component.

Return type:

CCDReaderResult

pdbeccdutils.core.clc_reader.read_clc_components_file(path_to_cif: str, sanitize: bool = True) dict[str, CCDReaderResult]

Process multiple compounds stored in the wwPDB CCD components.cif file.

Parameters:
  • path_to_cif (str) – Path to the clc-all.cif file with multiple ligands in it.

  • sanitize (bool) – Whether or not the components should be sanitized Defaults to True.

Raises:

ValueError – if the file does not exist.

Returns:

Internal representation of all the components in the components.cif file.

Return type:

dict[str, CCDReaderResult]

pdbeccdutils.core.clc_reader.read_pdb_cif_file(path_to_cif: str, to_discard: set[str] = {'HOH', 'UNX'}, sanitize: bool = True, assembly: bool = False) list[CLCReaderResult]

Read in single wwPDB Model CIF and create internal representation of its bound-molecules with multiple components.

Parameters:
  • path_to_cif (str) – Path to the cif file

  • sanitize (bool) – [Defaults: True]

Raises:

ValueError – if file does not exist

Returns:

A list of CCDResult representations of each bound-molecule.

pdbeccdutils.core.clc_writer

pdbeccdutils.core.clc_writer.to_cml_str(component: Component, remove_hs=True, conf_type=ConformerType.Model)

Converts structure to the EBI representation of the molecule in CML format: http://cml.sourceforge.net/schema/cmlCore.xsd

Parameters:
  • component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in CML format.

Return type:

str

pdbeccdutils.core.clc_writer.to_pdb_clc_cif_file(path, component: Component, remove_hs=True)

Converts structure to the PDB mmCIF format. :param path: Path to save cif file. :type path: str :param component: Component to be exported. :type component: Component :param remove_hs: Defaults to True. :type remove_hs: bool, optional

pdbeccdutils.core.clc_writer.to_pdb_str(component: Component, remove_hs: bool = True, conf_type: ConformerType = ConformerType.Model)

Converts structure to the PDB format.

Parameters:
  • Component (Component) – Component to be exported.

  • remove_hs (bool, optional) – Defaults to True.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal.

Returns:

String representation of the component in the PDB format.

Return type:

str

pdbeccdutils.core.clc_writer.to_xml_xml(component)

Converts structure to the XML format and returns its XML repr.

Parameters:

component (Component) – Component to be exported.

Returns:

XML object

Return type:

xml.etree.ElementTree.Element

pdbeccdutils.core.clc_writer.write_molecule(path, component: Component, remove_hs: bool = True, conf_type: ConformerType = ConformerType.Model)

Export molecule in a specified format. Presently supported formats are: PDB mmCIF (.cif); Mol file (.sdf); Chemical Markup language (.cml); PDB file (.pdb); XYZ file (.xyz); XML (.xml). ConformerType.AllConformers is presently supported only for PDB.

Parameters:
  • path (str|Path) – Path to the file. Suffix determines format to be used.

  • component (Component) – Component to be exported

  • remove_hs (bool, optional) – Defaults to True. Whether or not hydrogens should be removed.

  • conf_type (ConformerType, optional) – Defaults to ConformerType.Ideal. Conformer type to be exported.

Raises:

CCDUtilsError – For unsupported format

pdbeccdutils.core.boundmolecule

A set of methods for identifying bound-molecules (covalently bonded CCDs ) from mmCIF files of proteins and creating MultiDiGraph representation of the molecules.

pdbeccdutils.core.boundmolecule.find_pntr_entry(struct_conn: dict[str, list[str]], residue_pool: list[Residue], partner: int, i: int)

Helper method to find ligand residue in parsed ligands and check its connections.

Parameters:
  • str (struct_conn (dict of) – str): struct_conn table.

  • residue_pool (List of Residue) – List of all the parsed residues.

  • partner (str) – Identification of the partner (should be 1 or 2)

  • i (int) – Index in the struct_conn table

pdbeccdutils.core.boundmolecule.infer_bound_molecules(structure, to_discard, assembly=False)

Identify bound molecules in the input protein structure.

Parameters:
  • structure (str) – Path to the structure.

  • to_discard (list of str) – List of residue names to be discarded

pdbeccdutils.core.boundmolecule.parse_bound_molecules(path: str, to_discard: list[str], assembly=False) MultiDiGraph

Parse information from the information about HETATMS from the _pdbx_nonpoly_scheme and connectivity among them from _struct_conn.

Parameters:
  • path (str) – Path to the mmCIF structure

  • to_discard (list of str) – List of residue names to be discarded.

Returns:

All the bound molecules in a given entry.

Return type:

MultiDiGraph

pdbeccdutils.core.boundmolecule.parse_ligands_from_branch_scheme(branch_scheme: dict[str, list[str]], to_discard: list[str], g: MultiDiGraph, assembly=False)

Parse ligands from _pdbx_branch_scheme category of mmCIF file

Parameters:
  • branch_scheme – Dictionary of _pdbx_branch_scheme category

  • to_discard – List of residue names to be not considered as bound-molecule

  • g – A Graph object with nodes as Resiudes and their connectivity as edges

Returns:

A MultiDiGraph object with nodes as Resiudes and their connectivity a

pdbeccdutils.core.boundmolecule.parse_ligands_from_nonpoly_scheme(nonpoly_scheme, to_discard, assembly=False)

Parse ligands from the mmcif file.

Parameters:
  • str (nonpoly_scheme (dict of) – list of str): mmcif _nonpoly_scheme category.

  • to_discard (list of str) – List of residue names to be discarded.

Returns:

Ligands and their connectivity in a PDB entry

Return type:

MultiDiGraph

pdbeccdutils.core.depictions

Module to aid generation of 2D depictions and evaluation of their quality

class pdbeccdutils.core.depictions.DepictionManager(pubchem_templates_path: str = '', general_templates_path: str = '/home/runner/work/ccdutils/ccdutils/pdbeccdutils/data/general_templates')

Toolkit for depicting ligand’s structure using RDKit. One can supply either templates or 2D depictions by pubchem. PubChem templates can be downloaded using PubChemDownloader class.

depict_molecule(het_id: str, mol: Mol) DepictionResult

Given input molecule tries to generate its depictions.

Presently 3 methods are used:

Pubchem template - find 2d depiction in pubchem db User-provided templates - try to use general templates From 3D conformer - just apply default RDKit functionality

Parameters:
Returns:

Summary of the ligand depiction process.

Return type:

DepictionResult

class pdbeccdutils.core.depictions.DepictionValidator(mol)

Toolkit for estimation of depiction quality

count_bond_collisions()

Counts number of collisions among all bonds. Can be used for estimations of how ‘wrong’ the depiction is.

Returns:

number of bond collisions per molecule

Return type:

int

count_suboptimal_atom_positions(lower_bound, upper_bound)

Detects whether the structure has a pair or atoms in the range <lowerBound, upperBound> meaning that the depiction could be improved.

Parameters:
  • lower_bound (float) – lower bound

  • upper_bound (float) – upper bound

Returns:

number of atoms with crowded neighbourhood

Return type:

float

depiction_score()

Calculate quality of the ligand depiction. The higher the worse. Ideally that should be 0.

Returns:

Penalty score.

Return type:

float

has_bond_crossing()

Tells if the structure contains collisions

Returns:

Indication about bond collisions

Return type:

bool

has_degenerated_atom_positions(threshold)

Detects whether the structure has a pair or atoms closer to each other than threshold. This can detect structures which may need a template as they can be handled by RDKit correctly.

Parameters:

threshold (float) – Bottom line to use for spatial search.

Returns:

if such atomic pair is found

Return type:

(bool)

pdbeccdutils.core.fragment_library

class pdbeccdutils.core.fragment_library.FragmentLibrary(path: str = '/home/runner/work/ccdutils/ccdutils/pdbeccdutils/data/fragment_library.tsv', header: bool = True, delimiter: str = '\t', quotechar: str = '"')

Implementation of fragment library.

generate_conformers()

Generate 3D coordinates for the fragment library.

to_image(path, source='')

Export image with all fragments.

Parameters:
  • path (str) – Destination of the image

  • source (str) – Select a source which fragments are going to be drawn.

pdbeccdutils.core.models

Module housing some of the dataclasses used throughout the pdbeccdutils application.

class pdbeccdutils.core.models.AssemblyResidue(name: str, chain: str, res_id: str, ins_code: str, ent_id: str, orig_chain: str, operator: str)
to_dict() dict[str, str]

Returns a dictionary representation of a given Residue.

Returns:

Dictionary representation with the mmCIF keys.

class pdbeccdutils.core.models.CCDProperties(id: str, name: str, formula: str, modified_date: date, pdbx_release_status: ReleaseStatus, weight: float)

Properties of the component coming from _chem_comp namespace.

Parameters:
  • id (str) – _chem_comp.id

  • name (str) – _chem_comp.name

  • formula (str) – _chem_comp.formula

  • modified_date (date) – _chem_comp.pdbx_modified_date

  • pdbx_release_status (str) – _chem_comp.pdbx_release_status

  • weight (str) – _chem_comp.formula_weight

class pdbeccdutils.core.models.ConformerType(value)

Conformer type of the Component object.

Ideal
Model
Depiction

2D conformation

Computed
AllConformers
class pdbeccdutils.core.models.DepictionResult(source: DepictionSource, template_name: str, mol: Mol, score: float)

Depictions result details.

Parameters:
mol: Mol

Alias for field number 2

score: float

Alias for field number 3

source: DepictionSource

Alias for field number 0

template_name: str

Alias for field number 1

class pdbeccdutils.core.models.DepictionSource(value)

Where does the depiction come from.

Pubchem - Pubchem layout used
Template - general substructure used
RDKit - RDKit functionality using Coordgen.
Failed - Nothing worked.
class pdbeccdutils.core.models.Descriptor(type: str, program: str, program_version: str, value: str)

Descriptor obtained from the cif file. This is essentially _pdbx_chem_comp_descriptor field.

Parameters:
  • type (str) – _pdbx_chem_comp_descriptor.type in CIF language.

  • program (str) – _pdbx_chem_comp_descriptor.program in CIF language.

  • program_version (str) – _pdbx_chem_comp_descriptor.program_version in CIF language.

  • value (str) – _pdbx_chem_comp_descriptor.descriptor in CIF language.

program: str

Alias for field number 1

program_version: str

Alias for field number 2

type: str

Alias for field number 0

value: str

Alias for field number 3

class pdbeccdutils.core.models.FragmentEntry(name: str, source: str, mol: Mol)

Fragment entry in the fragment library

Parameters:
  • name (str) – Name or id of the fragment.

  • source (str) – where does this fragment come from.

  • mol (rdkit.Chem.rdchem.Mol) – rdkit mol object with the fragment.

class pdbeccdutils.core.models.InChIFromRDKit(inchi: str, warnings: str, errors: str)

InChI calculated by RDKit from rdkit.Chem.rdchem.Mol

Parameters:
  • inchi – InChI calculated by RDKit

  • warnings – WARNINGS generated by rdkit.Chem.inchi.MolToInchi API

  • errors – ERRORS generated by rdkit.Chem.inchi.MolToInchi API

errors: str

Alias for field number 2

inchi: str

Alias for field number 0

warnings: str

Alias for field number 1

class pdbeccdutils.core.models.LogType(value)

Type of logged output

ERROR

RDKit generated error

WARNING

RDKit generated warning

DEPICTION_FAILED

If 2D depiction failed

DEPICTION_SCORE

2D depiction score from DepictionValidator

class pdbeccdutils.core.models.MolFromRDKit(mol: str, warnings: str, errors: str)

rdkit.Chem.rdchem.Mol object generated from RDKit

Parameters:
  • mol – mol object geenrated by RDKit

  • warnings – WARNINGS generated by rdkit’s API

  • errors – ERRORS generated by rdkit’s API

errors: str

Alias for field number 2

mol: str

Alias for field number 0

warnings: str

Alias for field number 1

class pdbeccdutils.core.models.ParityResult(mapping: Dict[str, str], similarity_score: float)

NamedTuple for the result of parity method along with the details necessary for calculating the similarity score.

mapping (dict of str

str): Atom-level mapping template->query.

similarity_score

Calculate similarity score.

Type:

float

class pdbeccdutils.core.models.ReleaseStatus(value)

An enumeration for pdbx_release_status allowed values include REL and HOLD, see: http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_chem_comp.pdbx_release_status.html

Notes

An additional value ‘NOT_SET’ has been added for case where pdbx_release_status has not been set.

static from_str(s)

Convert wwPDB CIF CCD representation to enum

Parameters:

s (str) – str representation of the release status

Returns:

Component release status

Return type:

ReleaseStatus

class pdbeccdutils.core.models.Residue(name: str, chain: str, res_id: str, ins_code: str, ent_id: str)

Represents a single residue.

Attributes: name: Corresponds to _atom_site.label_comp_id chain: Corresponds to _atom_site.auth_asym_id res_id: Corresponds to _atom_site.auth_seq_id ins_code: Corresponds to _atom_site.pdbx_PDB_ins_code ent_id: Entity id id: ID of the Residue

to_arpeggio()

Gets Arpeggio style representation of a residue e.g. /A/129/ or /A/129A/ in case there is an insertion code.

Returns:

Residue description in Arpeggio style.

Return type:

str

to_dict() dict[str, str]

Returns a dictionary representation of a given residue.

Returns:

Dictionary representation along with the mmCIF keys.

Return type:

(dict of str)

class pdbeccdutils.core.models.SanitisationResult(mol: Mol, status: str)

Sanitisation result details.

Parameters:
  • mol – rdkit.Chem.rdchem.RWMol

  • status – Status of sanitisation process.

mol: Mol

Alias for field number 0

status: str

Alias for field number 1

class pdbeccdutils.core.models.ScaffoldingMethod(value)

Rdkit scaffold methods

class pdbeccdutils.core.models.Subcomponent(name, id)

Represents a subcompoent in a component

Parameters:
  • name – Name of the subcomponent

  • id – Id of the subcomponent

class pdbeccdutils.core.models.SubstructureMapping(name: str, mol: Mol, source: str, mappings: List[List[Any]])

Represents a fragment hit in the component

Parameters:
  • name (str) – Name of the substructure.

  • mol (Chem.rdchem.Mol) – RDKit Mol object

  • source (str) – Where does this fragment come from.

  • mapping (List[List[Any]]) – Mappings with atom names or indices.

pdbeccdutils.core.exceptions

exception pdbeccdutils.core.exceptions.CCDUtilsError

Internal error of the pdbeccdutils package.

exception pdbeccdutils.core.exceptions.EntryFailedException