pdbecif

This mmcif module contains all the classes necessary to read and write either a data or a dictionary mmCIF file.

Reading files can be acheived using either CifFileReader or MMCIF2Dict:

pdbecif.mmcif

The module contains all the objects necessary to represent either a data CIF file or a dictionary CIF file.

mmCIF data files

DATA mmCIF files are represented one of 3 ways (interchangeable):

  1. As a series of objects that encapsulate each major component of mmCIF CifFile -> DataBlock -> [ SaveFrame -> ] Category -> Item

  2. As a python wrapper to a dictionary. Categories and items are accessed through the familiar python dot (.) notation.

  3. As a dictionary of the form

    {
        DATABLOCK_ID: { CATEGORY: { ITEM:  VALUE } }
    }
    

mmCIF dictionaries

DICTIONARY mmCIF files can ONLY be represented as (1) above i.e.:

  1. As a series of objects that encapsulate each major component of mmCIF

CifFile -> DataBlock -> [ SaveFrame -> ] Category -> Item

Due to the presence of SaveFrame objects they are not interchangeable as the conversion to dictionary type objects has not yet been implemented.

class pdbecif.mmcif.CIFWrapper(d, data_id=None, preserve_token_order=False)

CIFWrapper is a wrapper object for the output of the MMCIF2Dict object i.e., an mmCIF-like python dictionary object. This implies that mmCIF-like dictionaries written outside this package may be used to initialize the CIFWrapper class as well. The CIFWrapper object emulates python objects by providing access to mmCIF categories and items using the familiar python ‘dot’ notation.

unwrap()

Extract encapsulated data to return an mmCIF-like python dictionary

class pdbecif.mmcif.CIFWrapperTable(d, preserve_token_order=False)

CIFWrapperTable represents (and wraps up) mmCif category like dictionaries. Categories that are stored as dictionary like objects are represented as tables and their items and data are accessed using familiar python ‘dot’ notation.

search(item, value)

Search for values of items in tables.

Parameters
  • item (str) – Name of the data item to be looked up.

  • value (str) – Search key, can be also regular expression: e.g. re.compile(r’[A-Z][a-z]’)

Returns

This is effectivelly dictionary with row-like structure {row_id: {“category_name: “value”}}.

Return type

dict

searchiter(item, value)

Highly optimised search for values of items in tables.

Parameters
  • item (str) – Name of the data item to be looked up.

  • value (str) – Search key, can be also regular expression: e.g. re.compile(r’[A-Z][a-z]’)

Returns

This is effectivelly dictionary with row-like structure {row_id: {“category_name: “value”}}.

Return type

dict

class pdbecif.mmcif.Category(category_id, parent)

Category objects store and manage Item objects. Categories that contain Items that are lists of values would represent looped categories. Category objects are stored and managed by either DataBlock of SaveFrame objects.

getItemNames()

List the Items (by name) stored by Category

getItems()

Retrieve all Item objects

remove()

Remove Category from SaveFrame or DataBlock and add Category to SaveFrame or DataBlock recycle bin

removeChild(child)

Remove Item from the Category using Item(object) or item name ID(string)

class pdbecif.mmcif.CifFile(file_path=None, mmcif_data_map=None, preserve_token_order=False)

CifFile represents all the objects contained/part of an mmCIF file or dictionary. It stores and manages DataBlock objects.

getDataBlockIds()

List the DataBlocks (by ID) stored by CifFile

getDataBlocks()

Retrieve all DataBlock objects stored by CifFile

import_mmcif_data_map(mmcif_data_map)

Populates all objects necessary to represent mmCIF data files. mmcif_data_map is an mmCIF-like dictionary of the form:

{

DATABLOCK_ID: { CATEGORY: { ITEM: VALUE } }

}

removeChild(child)

Remove DataBlock from the CifFile using DataBlock(object) or DataBlock ID(string) @return True if child removed else False

class pdbecif.mmcif.DataBlock(block_id, parent)

DataBlock stores and manages SaveFrame and Category objects in CIF files.

getCategories()

Retrieve all Category objects

getCategoryIds()

List the Categories (by ID) stored by SaveFrame

getSaveFrameIds()

List the SaveFrames (by ID) stored by DataBlock

getSaveFrames()

Retrieve all SaveFrame objects stored by DataBlock

remove()

Remove DataBlock from CifFile and add DataBlock to CifFile recycle bin

removeChild(child)

Remove Category/SaveFrame from the DataBlock using Category/SaveFrame(object) or Category/SaveFrame ID(string)

updateId(block_id)

Change the DataBlock ID

class pdbecif.mmcif.Item(item_name, parent)

Item objects are stored and managed by Category objects while Item objects store and manage values in CIF files. Items that are lists would represent looped categories.

getFormattedValue()

Return the value as it should appear (formatted) in the CIF file

getRawValue()

Raw value is the unformatted value stored by the item

remove()

Remove Item from Category and add Item to the Category recycle bin

reset()

Clear the value of Item for one or all values to ‘.’

class pdbecif.mmcif.SaveFrame(saveFrame_id, parent)

SaveFrame objects store and manage Category objects (Dictionary CIF only). SaveFrame objects are stored and managed by DataBlock objects.

getCategories()

Retrieve all Category objects

getCategoryIds()

List the Categories (by ID) stored by SaveFrame

remove()

Remove SaveFrame from DataBlock and add SaveFrame to DataBlock recycle bin

removeChild(child)

Remove Category from the SaveFrame using Category(object) or Category ID(string)

updateId(saveFrame_id)

Change the SaveFrame definition ID

pdbecif.mmcif_io

class pdbecif.mmcif_io.CifFileReader(input='data', verbose=False, preserve_order=False)

CifFileReader takes a path to an mmCIF file location (data or dictionary CIF and once read will return mmCIF file representation

read(file_path, output='cif_dictionary', ignore=[], preserve_order=False, only=None)

Read in mmCIF file

Parameters
  • file_path (str) – Path to the mmCIF file

  • output (str, optional) – Data type of an object the cif file should be written to. should be one of: cif_dictionary (plain python dictionary); cif_wrapper (CIFWrapper); of cif_file (CifFile). Defaults to “cif_dictionary”.

  • ignore (list, optional) – List of category names to be ignored. Defaults to [].

  • preserve_order (bool, optional) – Whether the order of categories should be kept. Defaults to False.

  • only (list, optional) – List of category names to be retrieved. Others are discarded. Defaults to None.

Returns

In memory representation of the mmCIF file based on the pased parameters.

Return type

object

class pdbecif.mmcif_io.CifFileWriter(file_path=None, compress=False, mode='wt', preserve_order=False)

CifFileWriter writes mmCIF formatted files and accepts mmCIF-like dictionary files, CIFWrapper objects, and CifFile objects.

write(cifObjIn, compress=False, mode='wt', preserve_order=False)

Write out object into a mmCIF file.

Parameters
  • cifObjIn (object) – Can be one of CifFile, CIFWrapper or dict

  • compress (bool, optional) – Whether or not the result file. should be gzipped. Defaults to False.

  • mode (str, optional) – Mode used for file opening. Defaults to “wt”.

  • preserve_order (bool, optional) – Preserve order of category names in the input object. Defaults to False.

pdbecif.mmcif_tools

A very low level access to mmCIF data files. MMCIF2Dict has one method ‘parse()’ that returns (datablock_id, mmCIF_data) tuples as (str, dict)

MMCIF2DICT is very fast at reading mmCIF data.

class pdbecif.mmcif_tools.MMCIF2Dict

MMCIF2Dict is a purely algorithmic parser that takes as input public mmCIF files and creates a python dictionary from them.

Because this parser is highly optimised for public mmCIF format, it is highly unlikely that it will work successfully on any other formatted mmCIF file.

MMCIF2Dict will not work on mmCIF dictionaries!

Users are able to speed up parsing of public mmCIF data files substantially by including a list of categoriies that the parser can ignore if encountered.

For example:

parser.parse(path, ignoreCategories=[“_atom_site”, “_atom_site_anisotrop”])

will ignore all coordinate lines in the file.

parse(file_path, ignoreCategories=[], preserve_token_order=False, onlyCategories=[])

Public method which only functions to check the existence of the mmCIF file in preparation for reading in the private parseFile method.

pdbecif.ordereddict

pdbecif.utils