Gas chromatography (GC) coupled to mass spectrometry (MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel biomarkers in metabolomics.
However, the majority of MSTs currently measured in plant metabolomic profiling experiments remains unidentified due to the lack of authenticated pure reference substances and the expensive and time-consuming effort to maintain mass spectral RI libraries required for compound identification by GC-MS. As the communication of analytical results and other approach-related details such as mass spectral and RI reference information within the scientific community is becoming increasingly popular, open access platforms for information exchange, such as the GMD, are obligatory.
These compounds serve as training set of data to apply decision trees (DT) as a supervised machine learning approach.
DT-based predictions of the most frequent substructures classify low resolution GC-MS mass spectra of the linked (potentially unknown) metabolite with respect to the presence or absence of the chemical moieties.
Batch processing is enabled via Simple Object Access Protocol (SOAP)-based web services while web-based data access services expose particular data base entities adapting Representational State Transfer (ReST) principles and mass spectral standards such as NIST-MSP and JCAMP-DX.