{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.7.8 64-bit ('rdkit-env': conda)", "display_name": "Python 3.7.8 64-bit ('rdkit-env': conda)", "metadata": { "interpreter": { "hash": "ff61d13abe230febdcf9f05a768048a47be2ac8377dcb96e6daf2ab6fcfbf665" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# pdbeccdutils\n", "\n", "This exercise demonstrates the use of the [pdbeccdutils](https://github.com/PDBeurope/ccdutils) - a set of python tools for working with small molecule components in the Protein Data Bank archive. \n", "\n", "You can use two main resources as input data for the package. [wwPDB CCD](http://www.wwpdb.org/data/ccd) contains models of all the ligands e.g. drug molecules, cofactors etc. commonly found in the PDB archive, whereas [wwPDB BIRD](http://www.wwpdb.org/data/bird) is composed of biologically interesting polymeric molecules such as peptides or oligosaccharides.\n", "\n", "CCD components can be downloaded as a bundle either from [wwPDB FTP area](ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz), or from [PDBe FTP area](http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem_v2/components.cif). The components from the PDBe area are enriched with information about common fragments found in the molecules, their molecular scaffolds as well as links to popular small molecule chemistry databases. You can also use PDBeChem to download individual components from the following link using proper CCD id substitution: .\n", "\n", "'BIRD' molecules can be downloaded in the *.tar.gz format from the following location: \n" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "pdbeccdutils relies on [RDKit](https://www.rdkit.org/) for the most of its functionality and as such it is best to be used along with [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)\n", "\n", "Install and activate conda environment with rdkit using the following command:\n", "\n", "```bash\n", "conda create -c conda-forge -n rdkit-env rdkit python=3.7\n", "conda activate rdkit-env\n", "```" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "You can then install the pdbeccdutils either from [PYPi](https://pypi.org/project/pdbeccdutils/) or directly from the [repository](https://github.com/PDBeurope/ccdutils) using one of the following commands:\n", "\n", "```bash\n", "pip install pdbeccdutils\n", "pip install git+https://github.com/PDBeurope/ccdutils.git@master#egg=pdbeccdutils\n", "```" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "The pdbeccdutils API contains a number of different modules stored in namespaces with respect to their functionality. All off the modules along with useful tips are documented in the [documentation](https://pdbeurope.github.io/ccdutils/). In this exercise we will go through some of the use cases." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "The structure of the package looks roughly like this:\n", "\n", "```text\n", "pdbeccdutils\n", " computations\n", " parity_method.py\n", " core\n", " ccd_reader.py\n", " ccd_writer.py\n", " component.py\n", " depictions.py\n", " fragment_library.py\n", " scripts\n", " process_components_cif_cli.py\n", " setup_pubchem_library.py\n", " utils\n", " pubchem_downloader.py\n", " web_services.py\n", "```" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Before we start we need to download a CCD component for demonstaration purposes. We are going to use Heme (CCD ID: [HEM](https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/HEM))." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "def download_component(ccd_id):\n", " response = requests.get(f'http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem_v2/{ccd_id[0]}/{ccd_id}/{ccd_id}.cif')\n", " cif_path = f'{ccd_id}.cif'\n", "\n", " with open(cif_path, 'wb') as fp:\n", " fp.write(response.content)\n", " \n", " return cif_path\n", "\n", "hem_path = download_component('HEM')" ] }, { "source": [ "## Structure reading\n", "\n", "Structure reading can be done using `ccd_reader.py` module located in the `pdbeccdutils.core` module. By default, the molecules comes [sanitized](https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization) using RDKit sanitization procedure and our internal process. However, this option can be turned off by specifying optional parameter `sanitize=False` to the function.\n" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "CCDReaderResult(warnings=[], errors=[], component=)" ] }, "metadata": {}, "execution_count": 2 } ], "source": [ "from pdbeccdutils.core import ccd_reader\n", "\n", "ccd_reader_result = ccd_reader.read_pdb_cif_file(hem_path)\n", "ccd_reader_result" ] }, { "source": [ "CCDReaderResult contains a list of possible warnings and errors that were encountered during the structure parsing. There is also a convenience method that allows reading in multiple chemical components provided they are listed in different data blocks in a single mmCIF file at the same time." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'HEM': CCDReaderResult(warnings=[], errors=[], component=)}" ] }, "metadata": {}, "execution_count": 3 } ], "source": [ "components_result = ccd_reader.read_pdb_components_file('HEM.cif', sanitize=False)\n", "components_result" ] }, { "source": [ "## Component object\n", "\n", "The component object contains a list of usefull properties retrieved directly from the CCD file as well as shorthand functions to the RDKit functionality in the following excercise we will go through some of them." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 4 } ], "source": [ "component = ccd_reader_result.component\n", "component" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'C34 H32 Fe N4 O4'" ] }, "metadata": {}, "execution_count": 5 } ], "source": [ "component.formula\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'KABFMIBPWCXCRK-RGGAHWMASA-L'" ] }, "metadata": {}, "execution_count": 6 } ], "source": [ "component.inchikey" ] }, { "source": [ "### Scaffolds\n", "\n", "One of the shorthand function allows you to calculate molecular scaffolds using [get_scaffolds()](https://pdbeurope.github.io/ccdutils/pdbeccdutils.core.html#pdbeccdutils.core.component.Component.get_scaffolds) method. You can choose a number of different scaffolds to compute, if you don't provide any parameter [Murcko scaffold](https://www.rdkit.org/docs/source/rdkit.Chem.Scaffolds.MurckoScaffold.html) is computed by default.\n" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[]" ] }, "metadata": {}, "execution_count": 7 } ], "source": [ "# at first scaffolds property is empty\n", "component.scaffolds\n", "[]\n", "\n", "# returns an array of rdkit Mol objects with scaffolds found.\n", "component.get_scaffolds()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "SubstructureMapping(name='MurckoScaffold', smiles='C1=CC2=[N+]3C1=Cc1ccc4n1[Fe-2]31n3c(ccc3=CC3=[N+]1C(=C4)C=C3)=C2', source='RDKit scaffolds', mappings=[['CHA', 'CHB', 'CHC', 'CHD', 'C1A', 'C2A', 'C3A', 'C4A', 'C1B', 'C2B', 'C3B', 'C4B', 'C1C', 'C2C', 'C3C', 'C4C', 'C1D', 'C2D', 'C3D', 'C4D', 'NA', 'NB', 'NC', 'ND', 'FE']])" ] }, "metadata": {}, "execution_count": 8 } ], "source": [ "scaffold_details = component.scaffolds[0]\n", "scaffold_details" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[['CHA',\n", " 'CHB',\n", " 'CHC',\n", " 'CHD',\n", " 'C1A',\n", " 'C2A',\n", " 'C3A',\n", " 'C4A',\n", " 'C1B',\n", " 'C2B',\n", " 'C3B',\n", " 'C4B',\n", " 'C1C',\n", " 'C2C',\n", " 'C3C',\n", " 'C4C',\n", " 'C1D',\n", " 'C2D',\n", " 'C3D',\n", " 'C4D',\n", " 'NA',\n", " 'NB',\n", " 'NC',\n", " 'ND',\n", " 'FE']]" ] }, "metadata": {}, "execution_count": 9 } ], "source": [ "# list of lists of atom names with atoms that are part of the scaffold.\n", "scaffold_details.mappings" ] }, { "source": [ "### Fragments\n", "\n", "pdbeccdutils package contains also a shorthand functions to enable searching through fragment libraries. By default the code is supplied with a hand-curated library that is internally used by the PDBe as well as collaborating resources ENAMINE and DSI. You can read more about the molecular fragments the built-in library contains in the [documentation](https://pdbeurope.github.io/ccdutils/guide/fragments.html) as well as about an input file format in case you want to supply your own fragment library." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "from pdbeccdutils.core.fragment_library import FragmentLibrary\n", "\n", "# if you dont provide any parameter to the FragmentLibrary constructor\n", "# supplied fragment library is used\n", "library = FragmentLibrary()\n", "fragments = component.library_search(library)\n", "len(fragments)\n" ], "cell_type": "code", "metadata": {}, "execution_count": 10, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "2" ] }, "metadata": {}, "execution_count": 10 } ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "SubstructureMapping(name='pyrrole', smiles='c1cc[nH]c1', source='PDBe', mappings=((4, 5, 6, 7, 38), (21, 22, 23, 24, 40)))" ] }, "metadata": {}, "execution_count": 11 } ], "source": [ "fragments[1]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "1 instance(s) of the fragment porphin-like found. SMILES:C1~C~C2~C~C3~C~C~C(~C~C4~C~C~C(~C~C5~C~C~C(~C~C~1~N~2)~N~5)~N~4)~N~3\n2 instance(s) of the fragment pyrrole found. SMILES:c1cc[nH]c1\n" ] } ], "source": [ "for f in component.fragments:\n", " print(f'{len(f.mappings)} instance(s) of the fragment {f.name} found. SMILES:{f.smiles}')" ] }, { "source": [ "### Molecular depictions\n", "\n", "component object exposes method [compute_2d()](https://pdbeurope.github.io/ccdutils/pdbeccdutils.core.html#pdbeccdutils.core.component.Component.compute_2d) that takes [DepictionManager](https://pdbeurope.github.io/ccdutils/pdbeccdutils.core.html#pdbeccdutils.core.depictions.DepictionManager) object as an argument. This enables you to take advantage of depiction templates supplied with the package. Alternativelly, you can provide paths to your own depiction templates and templates from PubChem. You build a library of pubchem templates using the following [script](https://pdbeurope.github.io/ccdutils/pdbeccdutils.scripts.html#module-pdbeccdutils.scripts.setup_pubchem_library_cli).\n" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from rdkit.Chem.Draw import IPythonConsole # this import allows you to display images directly in the jupyter notebook\n", "from IPython.core.display import SVG # allows to display SVG images" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "DepictionResult(source=, template_name='hem', mol=, score=0.0)" ] }, "metadata": {}, "execution_count": 14 } ], "source": [ "from pdbeccdutils.core.depictions import DepictionManager\n", "\n", "# create an instance of the depiction manager\n", "d = DepictionManager()\n", "result_2d = component.compute_2d(d)\n", "result_2d" ] }, { "source": [ "We can use the score that is part of the result to find out the quality of the depiction. Generally, the lower the better. Higher values indicate bonds crossings and crowded atoms." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.0" ] }, "metadata": {}, "execution_count": 15 } ], "source": [ "# depiction score\n", "result_2d.score" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "image/png": "\n" }, "metadata": {}, "execution_count": 16 } ], "source": [ "# and the depiction itself\n", "result_2d.mol" ] }, { "source": [ "Let's have a look at the example where molecular depiction cannot be drawn as collision free. We are going to use [Adamantone](https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/ADO) as an example." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Depiction score is: 1.0\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ], "image/png": "\n" }, "metadata": {}, "execution_count": 17 } ], "source": [ "ado_path = download_component('ADO')\n", "ado = ccd_reader.read_pdb_cif_file(ado_path).component\n", "ado_depiction_result = ado.compute_2d(d)\n", "\n", "print(f'Depiction score is: {ado_depiction_result.score}')\n", "ado_depiction_result.mol\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "image/svg+xml": "\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCHA\nCHB\nCHC\nCHD\nC1A\nC2A\nC3A\nC4A\nCMA\nCAA\nCBA\nCGA\nO1A\nO2A\nC1B\nC2B\nC3B\nC4B\nCMB\nCAB\nCBB\nC1C\nC2C\nC3C\nC4C\nCMC\nCAC\nCBC\nC1D\nC2D\nC3D\nC4D\nCMD\nCAD\nCBD\nCGD\nO1D\nO2D\nNA\nNB\nNC\nND\nFE\n" }, "metadata": {}, "execution_count": 18 } ], "source": [ "# we can also use convenient functions to store the depiction as an svg image:\n", "component.export_2d_svg('HEM.svg')\n", "\n", "# including atom names\n", "component.export_2d_svg('HEM_with_names.svg', names=True)\n", "SVG('HEM_with_names.svg')" ] }, { "source": [ "We can also export the svg image and highlight atoms and bonds. Let's highlight the molecular scaffold we calculated earlier." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "image/svg+xml": "\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nO\nHO\nO\nHO\nN\nN+\nN\nN+\nFe2-\n" }, "metadata": {}, "execution_count": 19 } ], "source": [ "# retrieve list of atom names that are part of the scaffold \n", "# and color them in green. RGB: (0.2, 0.5, 0.2). \n", "# Scale 0-1 is used instead of the common 0-255\n", "scaffold = component.scaffolds[0]\n", "atom_names = scaffold.mappings[0]\n", "atom_color_mapping = {x: (0.2, 0.5, 0.2) for x in atom_names}\n", "\n", "# find out all the bonds that are formed among the scaffold atoms and color them in green as well.\n", "bond_color_highlight = {}\n", "bonds = component.mol_no_h.GetBonds()\n", "for bond in bonds:\n", " begin = bond.GetBeginAtom().GetProp('name')\n", " end = bond.GetEndAtom().GetProp('name')\n", "\n", " if begin in atom_names and end in atom_names:\n", " bond_color_highlight[(begin, end)] = ((0.2, 0.5, 0.2))\n", "\n", "# draw the final image\n", "component.export_2d_svg('HEM_with_scaffold.svg', \n", " atom_highlight=atom_color_mapping,\n", " bond_highlight=bond_color_highlight)\n", "SVG('HEM_with_scaffold.svg')" ] }, { "source": [ "The exactly same operations can be done by using input data from wwPDB BIRD dictionary. For the following example we are going to use [Vancomycin](https://en.wikipedia.org/wiki/Vancomycin), an antibiotic used for bacterial infection treatment. Just download `PRDCC_000204` wwPDB BIRD entry from the [FTP directory](ftp://ftp.wwpdb.org/pub/pdb/data/bird/prd/)." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ], "image/png": "\n" }, "metadata": {}, "execution_count": 20 } ], "source": [ "c = ccd_reader.read_pdb_cif_file('PRDCC_000204.cif').component\n", "c.compute_2d(d).mol" ] }, { "source": [ "## Structure writing\n", "\n", "Structure writing can be done using `ccd_writer.py` module located in the `pdbeccdutils.core` module. This module exposes [write_molecule()](https://pdbeurope.github.io/ccdutils/pdbeccdutils.core.html#pdbeccdutils.core.ccd_writer.write_molecule) method that enables writing out conformers of chemical components in a number of different formats. See example below:" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "from pdbeccdutils.core import ccd_writer\n", "from pdbeccdutils.core.models import ConformerType\n", "\n", "# write out ideal conformer in the PDB format\n", "ccd_writer.write_molecule('HEM_ideal.pdb', component, conf_type=ConformerType.Ideal)\n", "\n", "# write out model conformer in the SDF format without hydrogens\n", "ccd_writer.write_molecule('HEM_model_no_h.sdf',\n", " component, remove_hs=True, conf_type=ConformerType.Model)\n", "\n", "# write out metadata information in CML format\n", "ccd_writer.write_molecule('HEM_in_cif.cml', component)" ], "cell_type": "code", "metadata": {}, "execution_count": 21, "outputs": [] }, { "source": [ "## Parity method\n", "\n", "There is a number of chemical similarity methods that can estimate a degree of similarity between the two molecules. pdbeccdutils implements one of such methods that takes advantage of maximum common sub-structures. This method is called PARITY and was developed by Jon Tyzack while working at EBI. You can read the description of the method in the [published paper](https://doi.org/10.1016/j.str.2018.02.009).\n", "\n", "In the following example we are going to examine molecular similarity of two heme variants [HEME A](http://pdbe.org/chem/HEA) and [HEME D](http://pdbe.org/chem/DHE)" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# download the structures first\n", "hem_a_path = download_component('HEA')\n", "hem_d_path = download_component('DHE')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "ParityResult(mapping={0: 0, 5: 5, 6: 6, 1: 1, 35: 42, 31: 38, 32: 39, 4: 4, 27: 37, 23: 27, 24: 28, 3: 3, 21: 26, 17: 16, 18: 17, 2: 2, 9: 9, 8: 8, 7: 7, 12: 11, 13: 12, 14: 13, 15: 14, 16: 15, 10: 10, 19: 18, 20: 20, 42: 22, 44: 23, 45: 25, 22: 19, 25: 29, 26: 31, 29: 33, 30: 34, 28: 30, 33: 40, 34: 41, 37: 44, 38: 45, 39: 46, 40: 47, 41: 48, 36: 43}, similarity_score=0.6029411764705882)" ] }, "metadata": {}, "execution_count": 23 } ], "source": [ "from pdbeccdutils.computations.parity_method import compare_molecules\n", "\n", "hem_a = ccd_reader.read_pdb_cif_file(hem_a_path).component\n", "hem_d = ccd_reader.read_pdb_cif_file(hem_d_path).component\n", "\n", "result = compare_molecules(hem_a.mol_no_h, hem_d.mol_no_h)\n", "result\n" ] }, { "source": [ "`ParityResult.mapping` contains left to right atom equivalency." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Similarity between heme A and heme D is: 0.603.\n" ] } ], "source": [ "print(f'Similarity between heme A and heme D is: {result.similarity_score:.3f}.')" ] }, { "source": [ "## Scripts\n", "\n", "pdbeccdutils package implements also [scripts](https://pdbeurope.github.io/ccdutils/guide/pipelines.html) that we internally use for chemistry related processes in the PDBe. Directly from the package you can try out the PDBeChem that generates content of the [FTP area](http://ftp.ebi.ac.uk/pub/databases/msd/pdbechem_v2) and provides all the data that are consumed by the [service](http://pdbe.org/chem)." ], "cell_type": "markdown", "metadata": {} } ] }