The IUPAC International Chemical Identifier

 

The IUPAC International Chemical Identifier (InChI) provides unique labels for well-defined chemical substances. These labels are generated by converting an input chemical structure, in the form of a 'connection table', to a unique and predictable series of ASCII characters. The protocol offers a means of representing chemical compounds in a manner that does not depend on how they were drawn. Note that Identifiers are re- expressions of chemical structures, they are not registry or registration numbers and do not require access to a database. The facility was developed primarily as a means of 'naming' a compound in digital media although the Identifier is expressed as simple text that may be manually interpreted.

 

Derivation of the InChI from an input chemical structure proceeds through three steps: 1) normalization - all input information not needed for structure identification is discarded and structure information is divided into 'layers'; 2) canonicalization - each atom is given a label that depends only on its position in the structure; 3) serialization - a string of characters, the Identifier, is generated from the canonical labels. All 'chemical' rules are applied in the first step.

 

Version 1 of the InChI protocol and software was released on April 14th 2005. Subsequently various updates were released. Version 1.02, released in January 2009, introduced the concepts of InChIKey and Standard InChI.

 

InChIKey is a fixed-length (27-character) condensed digital representation of the Identifier. This

 

Standard InChI is an InChI character string produced without options for properties such as tautomerism and stereoconfiguration fixed.

Its use enables interoperability/compatibility between large databases/web searching and information exchange.

 

The most recent update is version 1.04 (October 2011). All documentation and software is available from the InChI Trust website at www.inchi-trust.org and the IUPAC InChI web site at www.iupac.org/inchi. The software runs under 32-bit Microsoft Windows Operating Systems. The main program, wInChI-1.exe, is a conventional Windows application, although a 'command line' version (cInChI-1.exe) and a version recompiled under i386 Linux without any changes are also available. The program takes an input structure and generates both graphical and text output in a form designed to allow critical examination of the InChI. The Identifier and associated text output may be parsed and annotated in either a simple plain text or XML (eXtensible Markup Language) format.

 

As structure input, the program currently accepts standard SDfiles, Molfiles, and its own output produced when "Full Auxiliary Information" option is selected. Input may originate from individual disk files or through the Windows clipboard. InChI may be also generated directly from an application programming interface (API). It is critical for the applicability of InChI that the code is stable, and that mutants do not arise.

 

An InChI FAQ is available at the InChI Trust. Articles related to InChI are listed here and here.

 

The present project is intended to provide an Open Source focus for development of InChI facilities and applications, under the GNU Lesser General Public License (LGPL) or the more permissive IUPAC-InChI Trust License, for example

 

- porting to other platforms (e.g. Java, Mac OS X)

- InChI wrappers (e.g. GUIs, Web Services)

- InChI data model(s)

- InChI syntax specifications

- InChI parsers

- InChI validators

- InChI preprocessors: much current data is too fuzzy to create completely definitive InChIs (missing stereo descriptors, hydrogens, etc.); it will be valuable for InChI creators to be able to create the "best estimate" of their structures before submitting it to InChIfication.

- InChI processors: it is not formally allowable to edit an InChI as this destroys the normative nature; however it may be of value to carry out certain operations, such as splitting a multi-moiety InChI into components.

- InChI analysers: there are degrees of similarity between InChIs that can usefully be managed without substructure searching; these include analysis of tautomers, different hydrogen decoration, and different certainty in the specification of stereochemistry.

 

A discussion list is available (InChI-discuss); comments, questions and offers of help are welcomed:

 

List-Id:    InChI-discuss@lists.sourceforge.net

List-Subscribe:    http://lists.sourceforge.net/lists/listinfo/inchi-discuss;

InChI-discuss-request@lists.sourceforge.net?subject=subscribe

List-Unsubscribe:    http://lists.sourceforge.net/lists/listinfo/inchi-discuss;

InChI-discuss-request@lists.sourceforge.net?subject=unsubscribe

List-Post:    InChI-discuss@lists.sourceforge.net

List-Archive:    http://sourceforge.net/mailarchive/forum.php?forum=inchi-discuss

List-Help:    InChI-discuss-request@lists.sourceforge.net?subject=help

 

Return to SourceForge Project Summary