The IUPAC International Chemical Identifier
The IUPAC International Chemical Identifier (InChI)
provides unique labels for well-defined chemical substances. These labels are
generated by converting an input chemical structure, in the form of a
'connection table', to a unique and predictable series of ASCII characters. The
protocol offers a means of representing chemical compounds in a manner that
does not depend on how they were drawn. Note that Identifiers are re-
expressions of chemical structures, they are not registry or registration
numbers and do not require access to a database. The facility was developed
primarily as a means of 'naming' a compound in digital media although the
Identifier is expressed as simple text that may be manually interpreted.
Derivation of the InChI from an input chemical
structure proceeds through three steps: 1) normalization - all input
information not needed for structure identification is discarded and structure
information is divided into 'layers'; 2) canonicalization - each atom is given
a label that depends only on its position in the structure; 3) serialization -
a string of characters, the Identifier, is generated from the canonical labels.
All 'chemical' rules are applied in the first step.
Version 1 of the InChI protocol and software was
released on April 14th 2005. All documentation and software is available from
the IUPAC InChI web site at www.iupac.org/inchi.
The software runs under 32-bit Microsoft Windows Operating Systems. The main
program, wInChI-1.exe, is a conventional Windows application, although a
'command line' version (cInChI-1.exe) and a version recompiled under i386 Linux
without any changes are also available. The program takes an input structure
and generates both graphical and text output in a form designed to allow
critical examination of the InChI. The Identifier and associated text output
may be parsed and annotated in either a simple plain text or XML (eXtensible
Markup Language) format.
As structure input, the program currently accepts
standard SDfiles, Molfiles, Chemical Markup Language(CML) files (http://www.xml-cml.org) and its own output
produced when "Full Auxiliary Information" option is selected. Input
may originate from individual disk files or through the Windows clipboard.
InChI may be also generated directly from an application programming interface
(API). It is critical for the applicability of InChI that the code is stable,
and that mutants do not arise.
An InChI FAQ is available at http://wwmm.ch.cam.ac.uk/inchifaq/.
Articles related to InChI are listed here.
The present project is intended to provide an Open
Source focus for development of InChI facilities and applications, under the
Artistic Licence, for example
- porting to other platforms (e.g. Java, Mac OS X)
- InChI wrappers (e.g. GUIs, Web Services)
- InChI data model(s)
- InChI syntax specifications
- InChI parsers
- InChI validators
- InChI preprocessors: much current data is too fuzzy
to create completely definitive InChIs (missing stereo descriptors, hydrogens,
etc.); it will be valuable for InChI creators to be able to create the
"best estimate" of their structures before submitting it to
InChIfication.
- InChI processors: it is not formally allowable to
edit an InChI as this destroys the normative nature; however it may be of value
to carry out certain operations, such as splitting a multi-moiety InChI into
components.
- InChI analysers: there are degrees of similarity
between InChIs that can usefully be managed without substructure searching;
these include analysis of tautomers, different hydrogen decoration, and
different certainty in the specification of stereochemistry.
A discussion list is
available (InChI-discuss); comments, questions and offers of help are welcomed:
List-Id: InChI-discuss@lists.sourceforge.net
List-Subscribe: http://lists.sourceforge.net/lists/listinfo/inchi-discuss;
InChI-discuss-request@lists.sourceforge.net?subject=subscribe
List-Unsubscribe: http://lists.sourceforge.net/lists/listinfo/inchi-discuss;
InChI-discuss-request@lists.sourceforge.net?subject=unsubscribe
List-Post: InChI-discuss@lists.sourceforge.net
List-Archive: http://sourceforge.net/mailarchive/forum.php?forum=inchi-discuss
List-Help: InChI-discuss-request@lists.sourceforge.net?subject=help
Return to SourceForge Project Summary