{ "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8-final" }, "orig_nbformat": 2, "kernelspec": { "name": "Python 3.7.8 64-bit ('rdkit-env': conda)", "display_name": "Python 3.7.8 64-bit ('rdkit-env': conda)", "metadata": { "interpreter": { "hash": "ff61d13abe230febdcf9f05a768048a47be2ac8377dcb96e6daf2ab6fcfbf665" } } } }, "nbformat": 4, "nbformat_minor": 2, "cells": [ { "source": [ "# arpeggio\n", "\n", "arpeggio calculates interatomic contacts based on the rules defined in [CREDO](http://marid.bioc.cam.ac.uk/credo). \n", "\n", "Install and activate conda environment with openbabel (<3.0) dependency using [conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) such as:\n", "\n", "```bash\n", "conda create conda -n arpeggio-env python=3.7\n", "conda install -c openbabel openbabel\n", "conda activate arpeggio-env\n", "```\n", "\n", "and install arpeggio package as follows:\n", "\n", "```bash\n", "pip install git+https://github.com/PDBeurope/arpeggio.git@master#egg=arpeggio\n", "```" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Arpeggio output is very rich in terms of observed contacts. The static nature of protein structure models makes very challenging to correctly estimate whether or not e.g. a hydrogen bond is formed exactly between atom A and atom B and not between atom A and atom C that is 0.01Å farther. Hence arpeggio tries, in accordance with the CREDO nomenclature, to identify all the plausible contacts. The reasoning behing identification of arpeggio contacts, and their possible mutual (non)exclusivity is detaily described in the [SI](https://www.sciencedirect.com/science/article/pii/S0022283616305332?via%3Dihub#ec0005) of [arpeggio paper](https://doi.org/10.1016/j.jmb.2016.12.004)." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Arpeggio relies on PDB structures with added hydrogens and works best with mmCIF files for inferring molecular interactions.\n", "\n", "## Precomputed data\n", "\n", "In the release process of PDBe we are using possible quaternary structures generated by the [ModelServer](https://www.ebi.ac.uk/pdbe/model-server/) and protonated using [ChimeraX](https://www.rbvi.ucsf.edu/chimerax/) software. You can obtain these precomputed structures using the following link using plain PDB id substitution: (PDB id: 1cbs) in this case.\n", "\n", "You can also access precomputed interactions for vast majority of PDB entries using our [aggregated API](http://pdbe.org/aggregated-api) using one of the following API calls: [GetBoundMolecules](https://www.ebi.ac.uk/pdbe/graph-api/pdbe_doc/#api-PDB-GetBoundMolecules); [GetBoundMoleculeInteractions](https://www.ebi.ac.uk/pdbe/graph-api/pdbe_doc/#api-PDB-GetBoundMoleculeInteractions); [GetBoundLigandInteractions](https://www.ebi.ac.uk/pdbe/graph-api/pdbe_doc/#api-PDB-GetBoundLigandInteractions).\n", "\n", "Please note that chain id information (field: `_atom.site.auth_asym_id`) is different for cryo em and x-ray structures in the files provided by the ModelServer. The reason being is that these are quaternary structures that are indicated to play [possible biologicall role](https://proteopedia.org/wiki/index.php/Biological_Unit). The chain id is in the form X_Y (e.g. A_1), where X stands for `_atom.site.auth_asym_id` of a PDB entry [asymetric unit](https://proteopedia.org/wiki/index.php/Asymmetric_unit) and Y is a [symmetry operator id](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_struct_assembly_gen.oper_expression.html) that defines a collection of crystal symmetry operations applied on the original chain to generate position of the chain.\n", "\n", "## Example\n", "\n", "Potassium channel found in the pdbe entry [1k4c](http://pdbe.org/1k4c) contains just a single chain in the assymetric unit, however, the pore is formed by a [tetrameric quarternary structure](https://www.ebi.ac.uk/pdbe/model-server/v1/1k4c/full?encoding=cif&data_source=pdb-h)." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## arpeggio script\n", "\n", "arpeggio exposes a single script that you can use for inferring interatomic contacts. The only required parameters are protonated PDB entry and molecular selection.\n", "\n", "You can get the idea on how to use the script by running:\n", "```bash\n", "arpeggio -h\n", "```\n", "\n", "Nevertheless the basic usage is as follows:\n", "\n", "```bash\n", "arpeggio -s /A/200/ -o arpeggio_result 1cbs.cif\n", "\n", "INFO//14:59:04.545//Program begin.\n", "INFO//14:59:04.545//Selection perceived: ['/A/200/']\n", "DEBUG//14:59:04.605//Loaded PDB structure (BioPython)\n", "DEBUG//14:59:04.667//Loaded MMCIF structure (OpenBabel)\n", "DEBUG//14:59:04.674//Mapped OB to BioPython atoms and vice-versa.\n", "DEBUG//14:59:04.674//Detected that the input structure contains hydrogens. Hydrogen addition will be skipped.\n", "DEBUG//14:59:04.787//Determined atom explicit and implicit valences, bond orders, atomic numbers, formal charge and number of bound hydrogens.\n", "DEBUG//14:59:04.810//Initialised SIFts.\n", "DEBUG//14:59:04.812//Determined polypeptide residues, chain breaks, termini\n", "DEBUG//14:59:04.858//Percieved and stored rings.\n", "DEBUG//14:59:04.869//Perceived and stored amide groups.\n", "DEBUG//14:59:04.882//Added hydrogens to BioPython atoms.\n", "DEBUG//14:59:04.887//Added VdW radii.\n", "DEBUG//14:59:04.892//Added covalent radii.\n", "DEBUG//14:59:04.910//Completed NeighborSearch.\n", "DEBUG//14:59:04.912//Assigned rings to residues.\n", "DEBUG//14:59:04.918//Made selection.\n", "DEBUG//14:59:05.112//Expanded to binding site.\n", "DEBUG//14:59:05.113//Flagged selection rings.\n", "DEBUG//14:59:05.114//Completed new NeighbourSearch.\n", "INFO//14:59:05.438//Program End. Maximum memory usage was 77.32 MB.\n", "```\n", "\n", "This calculates all the interatomic contacts between retinoic acid (REA 200 A) in the PDB structure of 1cbs." ], "cell_type": "markdown", "metadata": {} }, { "source": [ "## arpeggio API\n", "\n", "You can achieve the very same behaviour by using arpeggio API." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'1cbs.cif'" ] }, "metadata": {}, "execution_count": 1 } ], "source": [ "import requests\n", "import pandas as pd # for example purposes only\n", "\n", "def download_protonated_structure(pdb_id):\n", " response = requests.get(f'https://www.ebi.ac.uk/pdbe/model-server/v1/{pdb_id}/full?encoding=cif&data_source=pdb-h')\n", " cif_path = f'{pdb_id}.cif'\n", "\n", " with open(cif_path, 'wb') as fp:\n", " fp.write(response.content)\n", " \n", " return cif_path\n", "\n", "# lets download a protonated quaternary structure first\n", "download_protonated_structure('1cbs')\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from arpeggio.core import InteractionComplex\n", "\n", "selection = ['/A_1/200/']\n", "\n", "# run structure checks and create internal representation of the molecule\n", "complex = InteractionComplex('1cbs.cif')\n", "complex.structure_checks()\n", "complex.address_ambiguities()\n", "complex.initialize()\n", "\n", "# calculate interactions to our selection\n", "complex.run_arpeggio(selection, interacting_cutoff=5, # cutoff for 'proximal' interactions\n", " vdw_comp=0.1, # 'compensation' factor to address structural inconsistencies\n", " include_sequence_adjacent=False)\n", "contacts = complex.get_contacts()\n" ] }, { "source": [ "len(contacts)" ], "cell_type": "code", "metadata": {}, "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "709" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'bgn': {'label_comp_id': 'LEU',\n", " 'auth_seq_id': 19,\n", " 'auth_asym_id': 'A_1',\n", " 'auth_atom_id': 'CA',\n", " 'pdbx_PDB_ins_code': ' '},\n", " 'end': {'label_comp_id': 'VAL',\n", " 'auth_seq_id': 24,\n", " 'auth_asym_id': 'A_1',\n", " 'auth_atom_id': 'CB',\n", " 'pdbx_PDB_ins_code': ' '},\n", " 'type': 'atom-atom',\n", " 'distance': 4.69,\n", " 'contact': ['proximal'],\n", " 'interacting_entities': 'INTRA_NON_SELECTION'}" ] }, "metadata": {}, "execution_count": 4 } ], "source": [ "contacts[0]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Hbond distances\n", "count 38.000000\n", "mean 3.041579\n", "std 0.238911\n", "min 2.570000\n", "25% 2.872500\n", "50% 3.015000\n", "75% 3.167500\n", "max 3.620000" ], "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Hbond distances
count38.000000
mean3.041579
std0.238911
min2.570000
25%2.872500
50%3.015000
75%3.167500
max3.620000
\n
" }, "metadata": {}, "execution_count": 5 } ], "source": [ "\n", "# we can filter out all the contacts that are not just 'proximal' only\n", "non_proximal = [x for x in contacts if x['contact'] == ['proximal']]\n", "\n", "# we can list all the possible hydrogen bonds that can be found in the binding site\n", "hbonds = [x for x in contacts if 'hbond' in x['contact']]\n", "\n", "# extract the distances\n", "hbond_distances = [x['distance'] for x in hbonds]\n", "\n", "df = pd.DataFrame(hbond_distances, columns=['Hbond distances'])\n", "df.describe()\n" ] }, { "source": [ "There is a lot one can do in terms of statistics for molecular interactions both ligand-wise and PDB-wise. I encourage you to try answering following questions:" ], "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [] }, { "source": [ "## Excercise\n", "\n", "* How many residues are forming this binding site?\n", "* What are all the interaction types that can be found in this active site?\n", "* Are there any atomic clashes?" ], "cell_type": "markdown", "metadata": {} } ] }