Title: | Automated Spectral Deconvolution, Alignment, and Metabolite Identification in GC/MS-Based Untargeted Metabolomics |
---|---|
Description: | Automated compound deconvolution, alignment across samples, and identification of metabolites by spectral library matching in Gas Chromatography - Mass spectrometry (GC-MS) untargeted metabolomics. Outputs a table with compound names, matching scores and the integrated area of the compound for each sample. Package implementation is described in Domingo-Almenara et al. (2016) <doi:10.1021/acs.analchem.6b02927>. |
Authors: | Xavier Domingo-Almenara [aut, cre, cph], Jasen P. Finch [ctb], Adria Olomi [ctb], Sara Samino [aut], Maria Vinaixa [aut], Alexandre Perera [aut, ths], Jesus Brezmes [aut, ths], Oscar Yanes [aut, ths] |
Maintainer: | Xavier Domingo-Almenara <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.0 |
Built: | 2025-02-03 05:12:26 UTC |
Source: | https://github.com/xdomingoal/erah-devel |
Alignment of GC-MS deconvolved compounds
alignComp(Experiment, alParameters, blocks.size=NULL) ## S4 method for signature 'MetaboSet' alignComp(Experiment, alParameters, blocks.size = NULL)
alignComp(Experiment, alParameters, blocks.size=NULL) ## S4 method for signature 'MetaboSet' alignComp(Experiment, alParameters, blocks.size = NULL)
Experiment |
A 'MetaboSet' S4 object containing the experiment data previously created by newExp and deconvolved by deconvolveComp. |
alParameters |
The software alignment parameters object previously created by setAlPar |
blocks.size |
For experiment of more than 1000 samples, and depending on the computer, alignment can be conducted by block segmentation. See details. |
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
For experiments containing more than 100 (Windows) or 1000 (Mac or Linux) samples (numbers depending on the computer resoures and sample type). In those cases alignment can be conducted by block segmentation. For an experiment of e.g. 1000 samples, the block.size can be set to 100, so the alignment will perform as multiple (ten) 100-samples experiments, to later align them into a single experiment.
This parameter is designed to solve the typical problem that appear when aligning under Windows operating system: "Error: cannot allocate vector of size XX Gb". Such a problem will not appear with Mac or Linux, but several hours of computation are expected when aligning a large number of samples. Using block segmentation provides a greatly improved run-time performance.
The function returns an updated S4 'MetaboSet' class, where the GC-MS samples have been now aligned.
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
newExp
setDecPar
deconvolveComp
The list of aligned metabolites and their relative quantification for each sample in a given experiment
alignList(object, by.area = TRUE) ## S4 method for signature 'MetaboSet' alignList(object, by.area = TRUE)
alignList(object, by.area = TRUE) ## S4 method for signature 'MetaboSet' alignList(object, by.area = TRUE)
object |
A 'MetaboSet' S4 object containing the experiment data. The experiment has to be previously deconvolved, aligned and (optionally) identified. |
by.area |
if TRUE (default), eRah outputs quantification by the area of the deconvolved chromatographic peak of each compound. If FALSE, eRah outputs the intensity of the deconvolved chromatographic peak. |
Returns an alignment table containing the list of aligned metabolites and their relative quantification for each sample in a given experiment.
alignList
returns a data frame object:
AlignID |
The unique Tag for found metabolite by eRah. Each metabolite found by eRah for a given experiment has an unique AlignID tag number. |
Factor |
the Factor tag name. Each metabolite has an unique 'Factor' name to enhance visual interpretation. |
tmean |
The mean compound retention time. |
FoundIn |
The number of samples in which the compound has been detected (the number of samples where the compound area is non-zero). |
Quantification |
As many columns as samples and as many rows as metabolites, where each column name has the name of each sample. |
Displays basic information of a compound in the MS library.
compInfo(comp.id, id.database = mslib)
compInfo(comp.id, id.database = mslib)
comp.id |
The DB.Id number of the compound. |
id.database |
The mass-spectra library to be compared with the empirical spectra. By default, the MassBank - Mass Bank of North America (MoNa) database are employed (mslib object). |
Returns details on a given compound such as the synonyms, CAS, KEGG, retention index, among others.
# finding proline findComp("proline") # we see that proline 2TMS has the DB.Id number 42, then: compInfo(42)
# finding proline findComp("proline") # we see that proline 2TMS has the DB.Id number 42, then: compInfo(42)
This function uses RI of mslib database and RT of the identified compounds to discrimine proper compound identification.
computeRIerror( Experiment, id.database = mslib, reference.list, ri.error.type = c("relative", "absolute"), plot.results = TRUE )
computeRIerror( Experiment, id.database = mslib, reference.list, ri.error.type = c("relative", "absolute"), plot.results = TRUE )
Experiment |
S4 object with experiment Data, Metadata and Results. Results of experiment are used to extract RT and Compound DB Id. |
id.database |
Name of the preloaded database, in this case the regular db used by erah mslib |
reference.list |
List with the compounds and their attributes (AlignId...) |
ri.error.type |
Specify wether absolute or relative RI error is to be computed. |
plot.results |
Shows the RI/RT graphic (True by default) |
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
## Not run: ex <- computeRIerror( ex, mslib, reference.list=list(AlignID = c(45,67,92,120)), ri.error.type = "relative" ) ## End(Not run)
## Not run: ex <- computeRIerror( ex, mslib, reference.list=list(AlignID = c(45,67,92,120)), ri.error.type = "relative" ) ## End(Not run)
eRah requires an instrumental and (optionally) phenotype .csv file for starting/creating a new eRah project/experiment. This function automatically creates the Phenoytpe and Instrumental data .csv files.
createdt(path)
createdt(path)
path |
the path where the experiment-folder is (where the experiment samples are stored). |
The experiment has to been organized as follows: all the samples related to each class have to be stored in the same folder (one folder = one class), and all the class-folders in one folder, which is the experiment folder.
Two things have to be considered at this step: .csv files are different when created by American and European computers, so errors may raise due to that fact. Also, the folder containing the samples, must contain only folders. If the folder contains files (for example, already created .csv files), eRah will prompt an error.
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
## Not run: # Store all the raw data files in one different folder per class, # and all the class-folders in one folder, which is the experiment # folder. Then execute createdt(path) # where path is the experiment folder path. # The experiment can be now startd by: ex <- newExp(instrumental="path/DEMO_inst.csv", phenotype="path/DEMO_pheno.csv", info="DEMO Experiment") ## End(Not run)
## Not run: # Store all the raw data files in one different folder per class, # and all the class-folders in one folder, which is the experiment # folder. Then execute createdt(path) # where path is the experiment folder path. # The experiment can be now startd by: ex <- newExp(instrumental="path/DEMO_inst.csv", phenotype="path/DEMO_pheno.csv", info="DEMO Experiment") ## End(Not run)
Create table containing instrumental information such as sample IDs and file names.
createInstrumentalTable(files)
createInstrumentalTable(files)
files |
File paths to experiment samples. |
Creates instrumental information table based on experiment sample file paths. Columns containing further information can also be added to this.
## Not run: library(gcspikelite) files <- list.files(system.file('data',package = 'gcspikelite'),full.names = TRUE) files <- files[sapply(files,grepl,pattern = 'CDF')] instrumental <- createInstrumentalTable(files) ## End(Not run)
## Not run: library(gcspikelite) files <- list.files(system.file('data',package = 'gcspikelite'),full.names = TRUE) files <- files[sapply(files,grepl,pattern = 'CDF')] instrumental <- createInstrumentalTable(files) ## End(Not run)
Create table containing sample meta information such as as sample ID and class.
createPhenoTable(files, cls)
createPhenoTable(files, cls)
files |
File paths to experiment samples. |
cls |
Character vector containing sample classes. |
Creates phenotype information table based on experiment sample file paths and sample classes. Columns containing further information can also be added to this.
newExp
createInstrumentalTable
## Not run: library(gcspikelite) data(targets) files <- list.files(system.file('data',package = 'gcspikelite'),full.names = TRUE) files <- files[sapply(files,grepl,pattern = 'CDF')] phenotype <- createPhenoTable(files,as.character(targets$Group[order(targets$FileName)])) ## End(Not run)
## Not run: library(gcspikelite) data(targets) files <- list.files(system.file('data',package = 'gcspikelite'),full.names = TRUE) files <- files[sapply(files,grepl,pattern = 'CDF')] phenotype <- createPhenoTable(files,as.character(targets$Group[order(targets$FileName)])) ## End(Not run)
The final eRah list of aligned and identified metabolites and their relative quantification for each sample in a given experiment
dataList(Experiment, id.database = mslib, by.area = TRUE) ## S4 method for signature 'MetaboSet' dataList(Experiment, id.database = mslib, by.area = TRUE)
dataList(Experiment, id.database = mslib, by.area = TRUE) ## S4 method for signature 'MetaboSet' dataList(Experiment, id.database = mslib, by.area = TRUE)
Experiment |
A 'MetaboSet' S4 object containing the experiment data. The experiment has to be previously deconvolved, aligned and identified. |
id.database |
The mass-spectra library to be compared with the empirical spectra. By default, the MassBank - Mass Bank of North America (MoNa) database are employed (mslib object). |
by.area |
if TRUE (default), eRah outputs quantification by the area of the deconvolved chromatographic peak of each compound. If FALSE, eRah outputs the intensity of the deconvolved chromatographic peak. |
Returns an identification and alignment table containing the list of aligned and identifed metabolites (names) and their relative quantification for each sample in a given experiment.
alignList
returns an S3 object:
AlignID |
The unique Tag for found metabolite by eRah. Each metabolite found by eRah for a given experiment has an unique AlignID tag number. |
tmean |
The mean compound retention time. |
FoundIn |
The number of samples in which the compound has been detected (the number of samples where the compound area is non-zero). |
Name.X |
the name of the Xst/nd/rd... hit. idList return as many X (hits) as n.putative selected with |
MatchFactor.X |
The match factor/score of spectral similarity (spectral correlation). |
DB.Id.X |
The identification number of the library. Each metbolite in the reference library has a different DB.Id number. |
CAS.X |
the CAS number of each identified metabolite. |
Quantification |
As many columns as samples and as many rows as metabolites, where each column name has the name of each sample. |
Deconvolution of GC-MS data
deconvolveComp( Experiment, decParameters, samples.to.process = NULL, down.sample = FALSE, virtualScansPerSecond = NULL ) ## S4 method for signature 'MetaboSet' deconvolveComp( Experiment, decParameters, samples.to.process = NULL, down.sample = FALSE, virtualScansPerSecond = NULL )
deconvolveComp( Experiment, decParameters, samples.to.process = NULL, down.sample = FALSE, virtualScansPerSecond = NULL ) ## S4 method for signature 'MetaboSet' deconvolveComp( Experiment, decParameters, samples.to.process = NULL, down.sample = FALSE, virtualScansPerSecond = NULL )
Experiment |
A 'MetaboSet' S4 object containing the experiment data previously created by newExp. |
decParameters |
The software deconvolution parameters object previously created by setDecPar |
samples.to.process |
Vector indicating which samples are to be processed. |
down.sample |
If TRUE, chromatograms are down sampled to define one peak with 10 scan points (according to the minimum peak width). This is to process longer chromatograms with wider peak widths (more than 20 seconds peak width and small scans per second values). See details. |
virtualScansPerSecond |
A virtual scans per second. If chromatograms are downsampled (for example, for a 1 mean peak width a 1 scans per second sampling frequency was used), eRah could not perform as expected. In these cases, the BEST solution is to re-acquire the samples. However, by selecting a different (virtual) scans per second frequency, eRah can upsample the data and process it more effectively. |
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
eRah uses multivariate methods which run-time performance depend on the amount of data to be analyzed. When peaks are wider and the #' scans per second is also a small value, the number of points (scans) that define a peak might be too many, leading eRah to a poor run#'-time performance. To solve that, use down.sample=TRUE to allow eRah to define a peak with 10 seconds, and analyze the data more #' efficiently.
The function returns an updated S4 'MetaboSet' class, where the GC-MS samples have been now deconvolved.
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
## Not run: # Deconvolve data from a created experiment by \code{\link{newExp}}. # ex <- newExp(instrumental="path") # The following will set eRah for analyzing the chromatograms # from minutes 5 to 15, and withouth taking into account the masses # 35:69,73:75,147:149, with a minimum peak width of 0.7 seconds. ex.dec.par <- setDecPar(min.peak.width=0.7, min.peak.height=5000, noise.threshold=500, avoid.processing.mz=c(35:69,73:75,147:149), analysis.time=c(5,15)) # An now deconvolve the compounds in the samples: # ex <- deconvolveComp(ex, decParameters=ex.dec.par) ## End(Not run)
## Not run: # Deconvolve data from a created experiment by \code{\link{newExp}}. # ex <- newExp(instrumental="path") # The following will set eRah for analyzing the chromatograms # from minutes 5 to 15, and withouth taking into account the masses # 35:69,73:75,147:149, with a minimum peak width of 0.7 seconds. ex.dec.par <- setDecPar(min.peak.width=0.7, min.peak.height=5000, noise.threshold=500, avoid.processing.mz=c(35:69,73:75,147:149), analysis.time=c(5,15)) # An now deconvolve the compounds in the samples: # ex <- deconvolveComp(ex, decParameters=ex.dec.par) ## End(Not run)
"eRah_DB"
The eRah_DB class contains the slots for storing and accessing a MS library.
name
The name of the stored library
version
The version of the stored library (and which is the database identifier, should be unique and used to check if is the database used in other experiments)
info
Character vector containing complementary information about the library.
database
A list of S3 objects, which each object contains the information on a different compound.
Xavier Domingo-Almenara.
The classes of a given experiment.
expClasses(object) ## S4 method for signature 'MetaboSet' expClasses(object)
expClasses(object) ## S4 method for signature 'MetaboSet' expClasses(object)
object |
A 'MetaboSet' S4 object containing the experiment. |
Returns the classes details of the experiment.
metaData phenoData
Export spectra to CEF format for comparison with the NIST library through MassHunter interface.
export2CEF(Experiment, export.id = NULL, id.database = mslib, store.path = getwd())
export2CEF(Experiment, export.id = NULL, id.database = mslib, store.path = getwd())
Experiment |
A 'MetaboSet' S4 object containing the experiment. |
export.id |
If NULL, all the spectra in the experiment will be exported. Otherwise, only the AlignID in export.id will be exported |
id.database |
The mass-spectra library used in the experiment. |
store.path |
The path where the converted files are to be exported. |
Export spectra to MSP format for comparison with the NIST library.
export2MSP( Experiment, export.id = NULL, id.database = mslib, store.path = getwd(), alg.version = 1 )
export2MSP( Experiment, export.id = NULL, id.database = mslib, store.path = getwd(), alg.version = 1 )
Experiment |
A 'MetaboSet' S4 object containing the experiment. |
export.id |
If NULL, all the spectra in the experiment will be exported. Otherwise, only the AlignID in export.id will be exported |
id.database |
The mass-spectra library used in the experiment. |
store.path |
The path where the converted files are to be exported. |
alg.version |
Different algorithm implementations. Users have to chose what version works with their NIST MSearch or other software version. By default, alg.version is set to 1. If it not works, try setting alg.version to 2 ;). |
Finds compounds in the MS library by Name, CAS or chemical formula.
findComp(name = NULL, id.database = mslib, CAS = NULL, chem.form = NULL)
findComp(name = NULL, id.database = mslib, CAS = NULL, chem.form = NULL)
name |
The name of the compound to be found. |
id.database |
The mass-spectra library to be compared with the empirical spectra. By default, the MassBank - Mass Bank of North America (MoNa) database are employed (mslib object). |
CAS |
The CAS number of the compound to be found. |
chem.form |
The chemical formula of the compound to be found. |
findComp
returns an S3 object:
DB.Id |
The identification number of the library. Each metbolite in the reference library has a different DB.Id number. |
Compound Name |
Compound Name. |
CAS |
CAS number |
Formula |
Chemical Formula. |
# finding proline findComp("proline") # be careful, exact matches are not supported, # as well as different names like these cases: findComp("L-proline (2TMS)") findComp("proline 2")
# finding proline findComp("proline") # be careful, exact matches are not supported, # as well as different names like these cases: findComp("L-proline (2TMS)") findComp("proline 2")
Identification of compounds. Each empirical spectrum is compared against a ms library.
identifyComp(Experiment, id.database = mslib,mz.range = NULL, n.putative = 3) ## S4 method for signature 'MetaboSet' identifyComp(Experiment, id.database = mslib, mz.range = NULL, n.putative = 3)
identifyComp(Experiment, id.database = mslib,mz.range = NULL, n.putative = 3) ## S4 method for signature 'MetaboSet' identifyComp(Experiment, id.database = mslib, mz.range = NULL, n.putative = 3)
Experiment |
A 'MetaboSet' S4 object containing the experiment data previously created by newExp, deconvolved by deconvolveComp and optionally aligned by alignComp. |
id.database |
The mass-spectra library to be compared with the empirical spectra. By default, the MassBank-[2] - Mass Bank of North America (MoNa) database are employed. |
mz.range |
The same as in alignComp. If specified already in alignComp, then there is no need to especify it again. If not, it has to be specified. |
n.putative |
The number of hits (compound candidate names) to be returned for each spectrum found. |
The function returns an updated S4 'MetaboSet' class, where the GC-MS samples have been now aligned.
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
[2] MassBank: A public repository for sharing mass spectral data for life sciences, H. Horai, M. Arita, S. Kanaya, Y. Nihei, T. Ikeda, K. Suwa. Y. Ojima, K. Tanaka, S. Tanaka, K. Aoshima, Y. Oda, Y. Kakazu, M. Kusano, T. Tohge, F. Matsuda, Y. Sawada, M. Yokota Hirai, H. Nakanishi, K. Ikeda, N. Akimoto, T. Maoka, H. Takahashi, T. Ara, N. Sakurai, H. Suzuki, D. Shibata, S. Neumann, T. Iida, K. Tanaka, K. Funatsu, F. Matsuura, T. Soga, R. Taguchi, K. Saito and T. Nishioka, J. Mass Spectrom., 45 (2010) 703-714.
newExp
alignComp
setAlPar
setDecPar
The list of identified metabolites in a given experiment
idList(object, id.database = mslib) ## S4 method for signature 'MetaboSet' idList(object, id.database = mslib)
idList(object, id.database = mslib) ## S4 method for signature 'MetaboSet' idList(object, id.database = mslib)
object |
A 'MetaboSet' S4 object containing the experiment data. The experiment has to be previously deconvolved, aligned and identified. |
id.database |
The mass-spectra library to be compared with the empirical spectra. By default, the MassBank - Mass Bank of North America (MoNa) database are employed (mslib object). |
Returns an identification table containing the names, match scores, and other variables for a given experiment.
idList
returns an S3 object:
AlignID |
The unique Tag for found metabolite by eRah. Each metabolite found by eRah for a given experiment has an unique AlignID tag number. |
tmean |
The mean compound retention time. |
Name.X |
the name of the Xst/nd/rd... hit. idList return as many X (hits) as n.putative selected with |
FoundIn |
The number of samples in which the compound has been detected (the number of samples where the compound area is non-zero). |
MatchFactor.X |
The match factor/score of spectral similarity (spectral correlation). |
DB.Id.X |
The identification number of the library. Each metbolite in the reference library has a different DB.Id number. |
CAS.X |
the CAS number of each identified metabolite. |
Import the Golm Metabolome Database.
importGMD(filename, DB.name, DB.version, DB.info, type = c("VAR5.ALK","VAR5.FAME","MDN35.ALK", "MDN35.FAME"))
importGMD(filename, DB.name, DB.version, DB.info, type = c("VAR5.ALK","VAR5.FAME","MDN35.ALK", "MDN35.FAME"))
filename |
The filepath containing the GMD database file. |
DB.name |
The name of the database (each user may chose its own name |
DB.version |
The version of the database (each user may chose its own version) |
DB.info |
Some info about the database for further reference |
type |
The type of RI to be imported from the database |
For more details, please see the eRah manual
Import MS libraries in MSP format to eRah DB format.
importMSP(filename, DB.name, DB.version, DB.info)
importMSP(filename, DB.name, DB.version, DB.info)
filename |
The filepath containing the MSP library file. |
DB.name |
The name of the database (each user may chose its own name) |
DB.version |
The version of the database (each user may chose its own version) |
DB.info |
Some info about the database for further reference |
The MSP input file should look like:
—–
Name: Metabolite_name
Formula: H2O
MW: 666
ExactMass: 666.266106
CAS#: 11-22-3
DB#: 1
Comments: Metabolite_name reference standard
Num Peaks: XX
53 1; 54 2; 55 5; 56 2; 57 2;
58 14; 59 18; 60 1000; 61 2; 67 1;
Name: Metabolite_name_2
Formula: H2O2
MW: 999
ExactMass: 999.266106
CAS#: 22-33-4
DB#: 2
Comments: Metabolite_name_"" reference standard
Num Peaks: XX
66 10; 67 1000; 155 560; 156 800; 157 2;
158 14; 159 1; 160 100; 161 2; 167 1;
——-
OR
—–
Name: Metabolite_name
Formula: H2O
MW: 666
ExactMass: 666.266106
CASNO: 11-22-3
DB#: 1
Comment: Metabolite_name reference standard
Num peaks: XX
53 1
54 2
55 5
Name: Metabolite_name_2
Formula: H2O2
MW: 999
ExactMass: 999.266106
CASNO: 22-33-4
DB#: 2
Comment: Metabolite_name_"" reference standard
Num Peaks: XX
66 10
67 1000
155 560
——-
Or combinations of both.
For more details, please see the eRah manual.
"MetaboSet"
The MetaboSet class is a single generic class valid for all sorts of metabolomic studies regardless of the experimental platform, the statistical processing and the annotation stage. It is the core operation class of eRah.
MetaboSet
Info
Slot Info stores the general information of the experiment and the experimental platform used in the analysis of the biological samples.
Data
Slot Data contains either the raw data or the path of the files. It also contains the list of the selected features (deconvolved compounds). In the subslot Parameters it is saved the information regarding the feature selector algorithm (type, parameters, version...) and the experimental platform used.
MetaData
Slot MetaData has two slots. In the Instrumental slot it is saved a data frame with some mandatory fields (filename, date, time, sampleID) and optional fields related to the experimental platform (Column ID, Column Type, Ioniser,...). Slot Phenotypic contains a data frame with the sample and experimental information (phenotypes, longitudinal data,...).
Results
In the Results slot it is saved the information related to the statistical and identification results. The slot Parameters contains all the values of the parameters used in the identification and statistical functions. Slot Identification has the results of the identification process as well as the identification or/and annotation steps. The results of the statistical functions are saved in the Statistics slot.
Xavier Domingo-Almenara, Arnald Alonso and Francesc Fernandez-Albert.
Displays the Experiment metadata
metaData(object) ## S4 method for signature 'MetaboSet' metaData(object)
metaData(object) ## S4 method for signature 'MetaboSet' metaData(object)
object |
A 'MetaboSet' S4 object containing the experiment. |
The default mass spectral library of eRah, which is the MassBank repository.
data(mslib)
data(mslib)
An object of class eRah_DB
of length 1.
This is the eRah default MS library, and automatically loaded with the eRah package. It contains almost 500 MS spectra. For details, see reference below.
The TOF-MS spectra were contributted by Kazusa DNA Research Institute, the Engineering Department of Osaka University and Plant Science Center of RIKEN.
MassBank (http://www.massbank.jp/)
[1] MassBank: A public repository for sharing mass spectral data for life sciences, H. Horai, M. Arita, S. Kanaya, Y. Nihei, T. Ikeda, K. Suwa. Y. Ojima, K. Tanaka, S. Tanaka, K. Aoshima, Y. Oda, Y. Kakazu, M. Kusano, T. Tohge, F. Matsuda, Y. Sawada, M. Yokota Hirai, H. Nakanishi, K. Ikeda, N. Akimoto, T. Maoka, H. Takahashi, T. Ara, N. Sakurai, H. Suzuki, D. Shibata, S. Neumann, T. Iida, K. Tanaka, K. Funatsu, F. Matsuura, T. Soga, R. Taguchi, K. Saito and T. Nishioka, J. Mass Spectrom., 45, 703-714 (2010)
Sets a new experiment for eRah
newExp(instrumental, phenotype = NULL, info = character())
newExp(instrumental, phenotype = NULL, info = character())
instrumental |
A data.frame containing the sample instrumental information. |
phenotype |
(optional) A data.frame containing sample phenotype information. |
info |
Experiment description |
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
newExp
returns an S4 object of the class 'MetaboSet'.
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
createInstrumentalTable
createPhenoTable
setDecPar
setAlPar
## Not run: library(gcspikelite) data(targets) files <- list.files(system.file('data',package = 'gcspikelite'),full.names = TRUE) files <- files[sapply(files,grepl,pattern = 'CDF')] instrumental <- createInstrumentalTable(files) phenotype <- createPhenoTable(files,as.character(targets$Group[order(targets$FileName)])) ex <- newExp(instrumental = instrumental, phenotype = phenotype, info = "DEMO Experiment") ## End(Not run)
## Not run: library(gcspikelite) data(targets) files <- list.files(system.file('data',package = 'gcspikelite'),full.names = TRUE) files <- files[sapply(files,grepl,pattern = 'CDF')] instrumental <- createInstrumentalTable(files) phenotype <- createPhenoTable(files,as.character(targets$Group[order(targets$FileName)])) ex <- newExp(instrumental = instrumental, phenotype = phenotype, info = "DEMO Experiment") ## End(Not run)
Displays the Experiment phenotypic data (if included).
phenoData(object) ## S4 method for signature 'MetaboSet' phenoData(object)
phenoData(object) ## S4 method for signature 'MetaboSet' phenoData(object)
object |
A 'MetaboSet' S4 object ciontaining the experiment. |
Plots the chromatophic profiles of the compounds found by eRah. Similarly to plotProfile, but with two sub-windows, showing the chromatophic profiles before and after alignment.
plotAlign(Experiment,AlignId, per.class = T, xlim = NULL) ## S4 method for signature 'MetaboSet' plotAlign(Experiment, AlignId, per.class = T, xlim = NULL)
plotAlign(Experiment,AlignId, per.class = T, xlim = NULL) ## S4 method for signature 'MetaboSet' plotAlign(Experiment, AlignId, per.class = T, xlim = NULL)
Experiment |
A 'MetaboSet' S4 object containing the experiment after being deconolved, aligned and (optionally) identified. |
AlignId |
the Id identificator for the compound to be shown. |
per.class |
logical. if TRUE the profiles are shown one color per class, if FALSE one color per sample. |
xlim |
x axsis (retention time) limits (see |
Xavier Domingo-Almenara. [email protected]
Plot the sample chromatogram
plotChr( Experiment, N.sample = 1, type = c("BIC", "TIC", "EIC"), xlim = NULL, mz = NULL ) ## S4 method for signature 'MetaboSet' plotChr( Experiment, N.sample = 1, type = c("BIC", "TIC", "EIC"), xlim = NULL, mz = NULL )
plotChr( Experiment, N.sample = 1, type = c("BIC", "TIC", "EIC"), xlim = NULL, mz = NULL ) ## S4 method for signature 'MetaboSet' plotChr( Experiment, N.sample = 1, type = c("BIC", "TIC", "EIC"), xlim = NULL, mz = NULL )
Experiment |
A 'MetaboSet' S4 object containing the experiment. |
N.sample |
Integer. The number of the sample to query. |
type |
The type of plotting, Base Ion Chromatogram (BIC), Total Ion Chromatogram (TIC), or Extracted Ion Chromatogram (EIC). |
xlim |
The range in minutes, separated by comas: c(rt.min, rt.max) of the limits of plotting. By default, all the chromatogram is plotted. |
mz |
Just when EIC is selected. The range separated by comas: c(mz.min, mz.max) or a vector of numbers: c(50,67,80), of the masses to be ploted. |
## Not run: plotChr(Experiment, 1, "BIC") # Plots from minute 5 to 7. plotChr(Experiment, 1, "TIC", xlim=c(5,7)) # Plots from minute 5 to 7, and only the masses from 50 to 70. plotChr(Experiment, 1, "EIC", mz=50:70 xlim=c(5,7)) # Plots the EIC from minute 7 to 7.5, and only the masses 50, 54 and 70. plotChr(Experiment, 1, "EIC", xlim=c(7,7.5), mz=c(50,54,70)) ## End(Not run)
## Not run: plotChr(Experiment, 1, "BIC") # Plots from minute 5 to 7. plotChr(Experiment, 1, "TIC", xlim=c(5,7)) # Plots from minute 5 to 7, and only the masses from 50 to 70. plotChr(Experiment, 1, "EIC", mz=50:70 xlim=c(5,7)) # Plots the EIC from minute 7 to 7.5, and only the masses 50, 54 and 70. plotChr(Experiment, 1, "EIC", xlim=c(7,7.5), mz=c(50,54,70)) ## End(Not run)
Plots the chromatophic profiles of the compounds found by eRah.
plotProfile(Experiment,AlignId, per.class = T, xlim = NULL, cols=NULL) ## S4 method for signature 'MetaboSet' plotProfile(Experiment, AlignId, per.class = T, xlim = NULL, cols = NULL)
plotProfile(Experiment,AlignId, per.class = T, xlim = NULL, cols=NULL) ## S4 method for signature 'MetaboSet' plotProfile(Experiment, AlignId, per.class = T, xlim = NULL, cols = NULL)
Experiment |
A 'MetaboSet' S4 object containing the experiment after being deconolved, aligned and (optionally) identified. |
AlignId |
the Id identificator for the compound to be shown. |
per.class |
logical. if TRUE (by default) the profiles are shown one color per class, if FALSE one color per sample. |
xlim |
x axsis (retention time) limits (see |
cols |
vector of colors. Colors are used cyclically. |
Xavier Domingo-Almenara. [email protected]
Plots the empirical spectra found by eRah, and allows comparing it with the reference spectra.
plotSpectra(Experiment, AlignId, n.putative = 1, compare = T, id.database = mslib, comp.db = NULL, return.spectra = F, draw.color = "purple", xlim = NULL) ## S4 method for signature 'MetaboSet' plotSpectra( Experiment, AlignId, n.putative = 1, compare = T, id.database = mslib, comp.db = NULL, return.spectra = F, draw.color = "purple", xlim = NULL )
plotSpectra(Experiment, AlignId, n.putative = 1, compare = T, id.database = mslib, comp.db = NULL, return.spectra = F, draw.color = "purple", xlim = NULL) ## S4 method for signature 'MetaboSet' plotSpectra( Experiment, AlignId, n.putative = 1, compare = T, id.database = mslib, comp.db = NULL, return.spectra = F, draw.color = "purple", xlim = NULL )
Experiment |
A 'MetaboSet' S4 object containing the experiment after being deconolved, aligned and (optionally) identified. |
AlignId |
the Id identificator for the compound to be shown. |
n.putative |
The hit number (position) to be returned when comparing the empirical spectrum with the reference. See details |
compare |
logical. If TRUE, then the reference spectrum from the library is shown for comparison. |
id.database |
The mass-spectra library to be compared with the empirical spectra. By default, the MassBank-[2] - Mass Bank of North America (MoNa) database are employed. |
comp.db |
If you want to compare the empirical spectrum with another spectrum from the database, select the comp.db number from the database. |
return.spectra |
logical. If TRUE, the function returns the empirical spectrum for the selected compound |
draw.color |
Selects the color for the reference spectrum (see |
xlim |
x axsis (mass - m/z) limits (see |
When identification is applied (see identifyComp
), the number of hits to be returned (n.putative) has to be selected. Therefore, here you can compare the empirical spectrum (found by eRah) with each n.putative hit returned (1, 2, ...) by (see identifyComp
).
plotSpectra
returns an vector when return.spectra=TRUE.
x |
vector. Containts the empirical spectrum. |
Xavier Domingo-Almenara. [email protected]
[1] eRah: an R package for spectral deconvolution, alignment, and metabolite identification in GC/MS-based untargeted metabolomics. Xavier Domingo-Almenara, Alexandre Perera, Maria Vinaixa, Sara Samino, Xavier Correig, Jesus Brezmes, Oscar Yanes. (2016) Article in Press.
[2] MassBank: A public repository for sharing mass spectral data for life sciences, H. Horai, M. Arita, S. Kanaya, Y. Nihei, T. Ikeda, K. Suwa. Y. Ojima, K. Tanaka, S. Tanaka, K. Aoshima, Y. Oda, Y. Kakazu, M. Kusano, T. Tohge, F. Matsuda, Y. Sawada, M. Yokota Hirai, H. Nakanishi, K. Ikeda, N. Akimoto, T. Maoka, H. Takahashi, T. Ara, N. Sakurai, H. Suzuki, D. Shibata, S. Neumann, T. Iida, K. Tanaka, K. Funatsu, F. Matsuura, T. Soga, R. Taguchi, K. Saito and T. Nishioka, J. Mass Spectrom., 45, 703-714 (2010)
"RawDataParameters"
The RawDataParameters class contains the slots for storing and accessing into a MS sample, and the essential parameters for performing its processing (deconvolution).
data
The data matrix of the sample to be processed
min.mz
The minimum adquired mz number
max.mz
The maximum adquired mz number
start.time
Starting time of adquisition
mz.resolution
Mz resolution
scans.per.second
Scans per second
avoid.processing.mz
Which mz do not have to be processed
min.peak.width
Minimum peak width (stored in scans)
min.peak.height
Minimum peak height
noise.threshold
The noise threshold
compression.coef
Compression coefficient (parameter for Orthogonal Signal Deconvolution)
Xavier Domingo-Almenara.
Missing compounds recovery: fits a general model (all the compounds above a certain minimum number of samples) to all the samples.
recMissComp(Experiment, min.samples, free.model = F) ## S4 method for signature 'MetaboSet' recMissComp(Experiment, min.samples, free.model = F)
recMissComp(Experiment, min.samples, free.model = F) ## S4 method for signature 'MetaboSet' recMissComp(Experiment, min.samples, free.model = F)
Experiment |
A 'MetaboSet' S4 object containing the experiment data previously created by newExp, deconvolved by deconvolveComp and aligned by alignComp. |
min.samples |
The minimum number of samples in which a compound has to appear to be considered for searching into the rest of the samples where this compound missing. |
free.model |
If TRUE, the spectra found in the samples where the compound is missing is used to get the final average spectra. (See details) |
WARNING: If compounds were previously identified, they have to be identified again after applying the "recMissComp" function. This means that "identifyComp" function has to be executed always after "recMissComp" for identification of compounds, even if "identifyComp" has been previously applied before.
The free.model parameter is recomended to be always FALSE (except for carbon tracking applications). This is because the spectra of the samples where the compound is missing is usually affected by noise, and this could decrease the matching score for a certain compound.
The function returns an updated S4 'MetaboSet' class, where the GC-MS samples have been now aligned.
Xavier Domingo-Almenara. [email protected]
[1] Domingo-Almenara X, et al. Compound deconvolution in GC-MS-based metabolomics by blind source separation. Journal of Chromatography A (2015). Vol. 1409: 226-233. DOI: 10.1016/j.chroma.2015.07.044
newExp
alignComp
setAlPar
setDecPar
Returns basic information on the samples.
sampleInfo(Experiment, N.sample = 1) ## S4 method for signature 'MetaboSet' sampleInfo(Experiment, N.sample = 1)
sampleInfo(Experiment, N.sample = 1) ## S4 method for signature 'MetaboSet' sampleInfo(Experiment, N.sample = 1)
Experiment |
A 'MetaboSet' S4 object containing the experiment. |
N.sample |
Integer. The number of the sample to query. |
Returns details on a given sample of the experiment, such as name, start time, end time, minium and maximum adquired m/z and scans per second.
Setting alignment parameters for eRah.
setAlPar(min.spectra.cor, max.time.dist,mz.range = c(70:600))
setAlPar(min.spectra.cor, max.time.dist,mz.range = c(70:600))
min.spectra.cor |
Minimum spectral correlation value. From 0 (non similar) to 1 (very similar). This value sets how similar two or more compounds have be to be considered for alignment between them. |
max.time.dist |
Maximum retention time distance. This value (in seconds) sets how far two or more compounds can be to be considered for alignment between them. |
mz.range |
The range of masses that is considered when comparing spectra. |
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
## Not run: # The following will set eRah for aligning compounds which are # at least 90 (per cent) similar, and which peaks are at a # maximum distance of 2 seconds. All the masses are considered when # computing the spectral similarity. ex.al.par <- setAlPar(min.spectra.cor=0.90, max.time.dist=2, mz.range=1:600) ## End(Not run)
## Not run: # The following will set eRah for aligning compounds which are # at least 90 (per cent) similar, and which peaks are at a # maximum distance of 2 seconds. All the masses are considered when # computing the spectral similarity. ex.al.par <- setAlPar(min.spectra.cor=0.90, max.time.dist=2, mz.range=1:600) ## End(Not run)
Sets Software Parameters for eRah.
setDecPar( min.peak.width, min.peak.height = 2500, noise.threshold = 500, avoid.processing.mz = c(73:75, 147:149), compression.coef = 2, analysis.time = 0 )
setDecPar( min.peak.width, min.peak.height = 2500, noise.threshold = 500, avoid.processing.mz = c(73:75, 147:149), compression.coef = 2, analysis.time = 0 )
min.peak.width |
Minimum compound peak width (in seconds). This is a critical parameter that conditions the efficiency of eRah. Typically, this should be the half of the mean compound width. |
min.peak.height |
Minimum compound peak height |
noise.threshold |
Data above this threshold will be considered as noise |
avoid.processing.mz |
The masses that do not want to be considered for processing. Typically, in GC-MS those masses are 73,74,75,147,148 and 149, since they are they are ubiquitous mass fragments typically generated from compounds carrying a trimethylsilyl moiety. |
compression.coef |
Data is compressed when using the orthogonal signal deconvolution (OSD) algorithm according to this value. A level 2 of compression is recomended. |
analysis.time |
The chromatographic retention time window to process. If 0, all the chromatogram is processed. |
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
newExp
deconvolveComp
alignComp
setAlPar
## Not run: # The following will set eRah for analyzing the chromatograms #from minutes 5 to 15, and withouth taking into account the masses #35:69,73:75,147:149, widht a minimum peak widht of 0.7 seconds. ex.dec.par <- setDecPar(min.peak.width = 0.7, min.peak.height = 5000, noise.threshold = 500, avoid.processing.mz = c(35:69,73:75,147:149), analysis.time = c(5,15)) ## End(Not run)
## Not run: # The following will set eRah for analyzing the chromatograms #from minutes 5 to 15, and withouth taking into account the masses #35:69,73:75,147:149, widht a minimum peak widht of 0.7 seconds. ex.dec.par <- setDecPar(min.peak.width = 0.7, min.peak.height = 5000, noise.threshold = 500, avoid.processing.mz = c(35:69,73:75,147:149), analysis.time = c(5,15)) ## End(Not run)
Show MetaboSet object
## S4 method for signature 'MetaboSet' show(object)
## S4 method for signature 'MetaboSet' show(object)
object |
S4 object of class MetaboSet |
show-MetaboSet
This function uses RI of mslib database and RT of the identified compounds to discrimine proper compound identification.
showRTRICurve( Experiment, reference.list, nAnchors = 4, ri.thrs = "1R", id.database = mslib )
showRTRICurve( Experiment, reference.list, nAnchors = 4, ri.thrs = "1R", id.database = mslib )
Experiment |
S4 object with experiment Data, Metadata and Results. Results of experiment are used to extract RT and Compound DB Id. |
reference.list |
List with the compounds and their attributes (AlignId...) |
nAnchors |
The desired equivalent number of degrees of freedom for the smooth.spline function |
ri.thrs |
Retention Index treshold given by the user to discrimine bewteen identification results |
id.database |
Name of the preloaded database (mslib by default, the regular db used by erah) |
See eRah vignette for more details. To open the vignette, execute the following code in R: vignette("eRahManual", package="erah")
Xavier Domingo-Almenara. [email protected]
[1] Xavier Domingo-Almenara, et al., eRah: A Computational Tool Integrating Spectral Deconvolution and Alignment with Quantification and Identification of Metabolites in GC-MS-Based Metabolomics. Analytical Chemistry (2016). DOI: 10.1021/acs.analchem.6b02927
## Not run: # The following set erah to determine which indetified compounds are in RI treshold RTRICurve <- showRTRICurve(ex, list, nAnchors=4, ri.thrs='1R') ## End(Not run)
## Not run: # The following set erah to determine which indetified compounds are in RI treshold RTRICurve <- showRTRICurve(ex, list, nAnchors=4, ri.thrs='1R') ## End(Not run)