Abstract
This chapter is a short introduction into the data analysis pipeline, which is typically utilized to analyze Raman spectra. We empathized in the chapter that this data analysis pipeline must be tailored to the specific application of interest. Nevertheless, the tailored data analysis pipeline consists always of the same general procedures applied sequentially. The utilized procedures correct for artefacts, standardize the measured spectral data and translate the spectroscopic signals into higher level information. These computational procedures can be arranged into separate groups namely data pre-treatment, pre-processing and modeling. Thereby the pre-treatment aims to correct for non-sample-dependent artefacts, like cosmic spikes and contributions of the measurement device. The block of procedures, which needs to be applied next, is called pre-processing. This group consists of smoothing, baseline correction, normalization and dimension reduction. Thereafter, the analysis model is constructed and the performance of the models is evaluated. Every data analysis pipeline should be composed of procedures of these three groups and we describe every group in this chapter. After the description of data pre-treatment, pre-processing and modeling, we summarized trends in the analysis of Raman spectra namely model transfer approaches and data fusion. At the end of the chapter we tried to condense the whole chapter into guidelines for the analysis of Raman spectra.
References
[1] Bocklitz TW, Guo S, Ryabchykov O, Vogler N, Popp J. Raman based molecular imaging and analytics: a magic bullet for biomedical applications!? Anal Chem. 2016;88:133–51.10.1021/acs.analchem.5b04665Suche in Google Scholar PubMed
[2] Ehrentreich F, Sümmchen L. Spike removal and denoising of Raman spectra by wavelet transform methods. Anal Chem. 2001;73:4364–73.10.1021/ac0013756Suche in Google Scholar PubMed
[3] Schulze HG, Turner RF. A two-dimensionally coincident second difference cosmic ray spike removal method for the fully automated processing of Raman spectra. Appl Spectrosc. 2014;68:185–91.10.1366/13-07216Suche in Google Scholar PubMed
[4] Ryabchykov O, Bocklitz T, Ramoji A, Neugebauer U, Foerster M, Kroegel C, et al. Automatization of spike correction in Raman spectra of biological samples. Chemometrics Intell Lab Syst. 2016;155:1–6.10.1016/j.chemolab.2016.03.024Suche in Google Scholar
[5] Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Z Phys Chem Int J Res Phy Chem Chem Phy. 2011;225:753–64.10.1524/zpch.2011.0077Suche in Google Scholar
[6] Bocklitz T, Dörfer T, Heinke R, Schmitt M, Popp J. Spectrometer calibration protocol for Raman spectra recorded with different excitation wavelengths. Spectrochimica Acta A: Mol Biomol Spectrosc. 2015;149:544–9.10.1016/j.saa.2015.04.079Suche in Google Scholar PubMed
[7] McCreery RL. Raman spectroscopy for chemical analysis Vol. 157. New York: John Wiley & Sons, 200010.1002/0471721646Suche in Google Scholar
[8] Berg RW, Nørbygaard T. Wavenumber calibration of CCD detector Raman spectrometers controlled by a sinus arm drive. Appl Spectrosc Rev. 2006;41:165–83.10.1080/05704920500510786Suche in Google Scholar
[9] Carrabba MM. Wavenumber standards for Raman spectrometry. In: Griffiths P, Chalmers JM, editor(s). Handbook of vibrational spectroscopy. Chichester: Wiley online library, 2006Suche in Google Scholar
[10] E1840-96, A., Standard guide for Raman shift standards for spectrometer calibration. ASTM International, West Conshohocken, PA, 2014. 03.06.Suche in Google Scholar
[11] Fryling M, Frank CJ, McCreery RL. Intensity calibration and sensitivity comparisons for CCD/Raman spectrometers. Appl Spectrosc. 1993;47:1965–74.10.1366/0003702934066226Suche in Google Scholar
[12] Davis W, Forney G, Bukowski R. National institute of standards and technology, Gaithersburg MD, USA.Suche in Google Scholar
[13] Rodriguez JD, Westenberger BJ, Buhse LF, Kauffman JF. Standardization of Raman spectra for transfer of spectral libraries across different instruments. Analyst. 2011;136:4232–40.10.1039/c1an15636eSuche in Google Scholar PubMed
[14] Guo S, Heinke R, Stöckel S, Rösch P, Bocklitz T, Popp J. Towards an improvement of model transferability for Raman spectroscopy in biological applications. Vib Spectrosc. 2017;91:111–8.10.1016/j.vibspec.2016.06.010Suche in Google Scholar
[15] Bocklitz T, Walter A, Hartmann K, Rosch P, Popp J. How to pre-process Raman spectra for reliable and stable models? Anal Chim Acta. 2011;704:47–56.10.1016/j.aca.2011.06.043Suche in Google Scholar PubMed
[16] Savitzky A, Golay MJE. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem. 1964;36:1627–39.10.1021/ac60214a047Suche in Google Scholar
[17] Lieber CA, Mahadevan-Jansen A. Automated method for subtraction of fluorescence from biological Raman spectra. Appl Spectrosc. 2003;57:1363–7.10.1366/000370203322554518Suche in Google Scholar
[18] Eilers PH, Boelens HF. Baseline correction with asymmetric least squares smoothing. Leiden Univ Med Centre Rep. 2005;1:1.Suche in Google Scholar
[19] Ryan CG, Clayton E, Griffin WL, Sie SH, Cousens DR. SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nucl Instrum Methods Phys Res B: Beam Interact Mater Atoms. 1988;34:396–402.10.1016/0168-583X(88)90063-8Suche in Google Scholar
[20] Martens H, Stark E. Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. J Pharm Biomed Anal. 1991;9:625–35.10.1016/0731-7085(91)80188-FSuche in Google Scholar PubMed
[21] Guo S, Bocklitz T, Popp J. Optimization of Raman-spectrum baseline correction in biological application. Analyst. 2016;141:2396–404.10.1039/C6AN00041JSuche in Google Scholar PubMed
[22] Gautam R, Vanga S, Ariese F, Umapathy S. Review of multidimensional data processing approaches for Raman and infrared spectroscopy. EPJ Tech Instrum. 2015;2:8.10.1140/epjti/s40485-015-0018-6Suche in Google Scholar
[23] Black MJ, Anandan P. The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comput Vis Image Understand. 1996;63:75–104.10.1006/cviu.1996.0006Suche in Google Scholar
[24] Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–82.Suche in Google Scholar
[25] Malinowski ER. Factor analysis in chemistry, 2 ed. New York: Wiley, 1991Suche in Google Scholar
[26] Zhang X, Tauler R. Application of multivariate curve resolution alternating least squares (MCR-ALS) to remote sensing hyperspectral imaging. Anal Chim Acta. 2013;762:25–38.10.1016/j.aca.2012.11.043Suche in Google Scholar PubMed
[27] Piqueras S, Krafft C, Beleites C, Egodage K, Von Eggeling F, Guntinas-Lichius O, et al. Combining multiset resolution and segmentation for hyperspectral image analysis of biological tissues. Anal Chim Acta. 2015;881:24–36.10.1016/j.aca.2015.04.053Suche in Google Scholar PubMed
[28] Brereton RG, Jansen J, Lopes J, Marini F, Pomerantsev A, Rodionova O, et al. Chemometrics in analytical chemistry – Part I: history, experimental design and data analysis tools. Anal Bioanal Chem. 2017;409:5891–9.10.1007/s00216-017-0517-1Suche in Google Scholar PubMed
[29] Bruce LM, Koger CH, Li J. Dimensionality reduction of hyperspectral data using discrete wavelet transform feature extraction. IEEE Trans Geosci Remote Sens. 2002;40:2331–8.10.1109/TGRS.2002.804721Suche in Google Scholar
[30] Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Series B Methodol. 1996;58:267–88.10.1111/j.2517-6161.1996.tb02080.xSuche in Google Scholar
[31] Chun H, Keleş S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Series B Stat Methodol. 2010;72:3–25.10.1111/j.1467-9868.2009.00723.xSuche in Google Scholar PubMed PubMed Central
[32] Zhang Z, Chow TW, Zhao M. M-Isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans Syst Man Cybern. 2013;43:180–91.10.1109/TSMCB.2012.2202901Suche in Google Scholar PubMed
[33] Silva VD, Tenenbaum JB. Global versus local methods in nonlinear dimensionality reduction. In: Advances in neural information processing systems. Cambridge, MA, USA: MIT Press, 2003.Suche in Google Scholar
[34] Shan R, Cai W, Shao X. Variable selection based on locally linear embedding mapping for near-infrared spectral analysis. Chemometrics Intell Lab Syst. 2014;131:31–6.10.1016/j.chemolab.2013.12.002Suche in Google Scholar
[35] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–7.10.1126/science.1127647Suche in Google Scholar PubMed
[36] Wang W, Huang Y, Wang Y, Wang L. Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014.10.1109/CVPRW.2014.79Suche in Google Scholar
[37] Kalivas JH, Palmer J. Characterizing multivariate calibration tradeoffs (bias, variance, selectivity, and sensitivity) to select model tuning parameters. J Chemom. 2014;28:347–57.10.1002/cem.2555Suche in Google Scholar
[38] Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79.10.1214/09-SS054Suche in Google Scholar
[39] Guo S, Bocklitz T, Neugebauer U, Popp J. Common mistakes in cross-validating classification models. Anal Methods. 2017;9:4410–7.10.1039/C7AY01363ASuche in Google Scholar
[40] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning; data mining, inference and prediction. New York: Springer, 2008.10.1007/978-0-387-84858-7Suche in Google Scholar
[41] Hedegaard M, Matthäus C, Hassing S, Krafft C, Diem M, Popp J. Spectral unmixing and clustering algorithms for assessment of single cells by Raman microscopic imaging. Theor Chem Acc. 2011;130:1249–60.10.1007/s00214-011-0957-1Suche in Google Scholar
[42] Bezdek JC, Ehrlich R, Full W. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10:191–203.10.1016/0098-3004(84)90020-7Suche in Google Scholar
[43] Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, et al. A comprehensive study of classification methods for medical diagnosis. J Raman Spectrosc. 2009;40:1759–65.10.1002/jrs.2529Suche in Google Scholar
[44] Acquarelli J, Van Laarhoven T, Gerretzen J, Tran TN, Buydens LM, Marchiori E. Convolutional neural networks for vibrational spectroscopic data analysis. Anal Chim Acta. 2017;954:22–31.10.1016/j.aca.2016.12.010Suche in Google Scholar PubMed
[45] Breiman L. Random forests. Mach Learn. 2001;45:5–32.10.1023/A:1010933404324Suche in Google Scholar
[46] Mevik B-H, Wehrens R, Liland KH. pls: partial least squares and principal component regression. R Package Version. 2011;2(3).Suche in Google Scholar
[47] Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27:1413–32.10.1007/s11222-016-9696-4Suche in Google Scholar
[48] Héberger K. Sum of ranking differences compares methods or models fairly. TrAC Trends Anal Chem. 2010;29:101–9.10.1016/j.trac.2009.09.009Suche in Google Scholar
[49] Refaeilzadeh P, Tang L, Liu H. Cross-validation, in Encyclopedia of database systems. New York: Springer, 2009:532–8.10.1007/978-0-387-39940-9_565Suche in Google Scholar
[50] Xu QS, Liang YZ, Du YP. Monte Carlo cross‐validation for selecting a model and estimating the prediction error in multivariate calibration. J Chemom. 2004;18:112–20.10.1002/cem.858Suche in Google Scholar
[51] Wan C, Harrington PDB. Screening GC-MS data for carbamate pesticides with temperature-constrained-cascade correlation neural networks. Anal Chim Acta. 2000;408:1–12.10.1016/S0003-2670(99)00865-XSuche in Google Scholar
[52] De Boves Harrington P. Statistical validation of classification and calibration models using bootstrapped Latin partitions. TrAC Trends Anal Chem. 2006;25:1112–24.10.1016/j.trac.2006.10.010Suche in Google Scholar
[53] Qi N, Zhang Z, Xiang Y, Yang Y, Liang X, Harrington PD. Terahertz time-domain spectroscopy combined with support vector machines and partial least squares-discriminant analysis applied for the diagnosis of cervical carcinoma. Anal Methods. 2015;7:2333–8.10.1039/C4AY02665ASuche in Google Scholar
[54] Krstajic D, Buturovic LJ, Leahy DE, Thomas S. Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform. 2014;6:10.10.1186/1758-2946-6-10Suche in Google Scholar PubMed PubMed Central
[55] Copas JB. Regression, prediction and shrinkage. J R Stat Soc Series B Methodol. 1983;45:311–54.10.1111/j.2517-6161.1983.tb01258.xSuche in Google Scholar
[56] Shahbazikhah P, Kalivas JH. A consensus modeling approach to update a spectroscopic calibration. Chemometrics Intell Lab Syst. 2013;120:142–53.10.1016/j.chemolab.2012.06.006Suche in Google Scholar
[57] Guo S, Heinke R, Stöckel S, Rösch P, Popp J, Bocklitz T. Model transfer for Raman‐spectroscopy‐based bacterial classification. J Raman Spectrosc. 2018;49:627–37.10.1002/jrs.5343Suche in Google Scholar
[58] Kalivas JH, Siano GG, Andries E, Goicoechea HC. Calibration maintenance and transfer using Tikhonov regularization approaches. Appl Spectrosc. 2009;63:800–9.10.1366/000370209788701206Suche in Google Scholar PubMed
[59] Liang C, Yuan H-F, Zhao Z, Song C-F, Wang J-J. A new multivariate calibration model transfer method of near-infrared spectral analysis. Chemometrics Intell Lab Syst. 2016;153:51–7.10.1016/j.chemolab.2016.01.017Suche in Google Scholar
[60] Bloemberg TG, Gerretzen J, Lunshof A, Wehrens R, Buydens LM. Warping methods for spectroscopic and chromatographic signal alignment: a tutorial. Anal Chim Acta. 2013;781:14–32.10.1016/j.aca.2013.03.048Suche in Google Scholar PubMed
[61] Kalivas JH, Brownfield B, Karki BJ. Sample‐wise spectral multivariate calibration desensitized to new artifacts relative to the calibration data using a residual penalty. J Chemom. 2017;31:e287310.1002/cem.2873Suche in Google Scholar
[62] Bevilacqua M, Marini F. Local classification: locally weighted-partial least squares-discriminant analysis (LW-PLS-DA). Anal Chim Acta. 2014;838:20–30.10.1016/j.aca.2014.05.057Suche in Google Scholar PubMed
[63] Kalivas JH. Overview of two‐norm (L2) and one‐norm (L1) Tikhonov regularization variants for full wavelength or sparse spectral multivariate calibration models or maintenance. J Chemom. 2012;26:218–30.10.1002/cem.2429Suche in Google Scholar
[64] Castanedo F. A review of data fusion techniques. Sci World J. 2013;2013:19.10.1155/2013/704504Suche in Google Scholar PubMed PubMed Central
[65] Márquez C, López MI, Ruisánchez I, Callao MP. FT-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud. Talanta. 2016;161:80–6.10.1016/j.talanta.2016.08.003Suche in Google Scholar PubMed
[66] Teglia CM, Azcarate SM, Alcaráz MR, Goicoechea HC, Culzoni MJ. Exploiting the synergistic effect of concurrent data signals: low-level fusion of liquid chromatographic with dual detection data. Talanta. 2018;186:481–8.10.1016/j.talanta.2018.04.090Suche in Google Scholar PubMed
[67] Borràs E, Ferré J, Boqué R, Mestres M, Aceña L, Busto O. Data fusion methodologies for food and beverage authentication and quality assessment – a review. Anal Chim Acta. 2015;891:1–14.10.1016/j.aca.2015.04.042Suche in Google Scholar PubMed
[68] Durrant-Whyte H, Stevens M, Nettleton E. Data fusion in decentralised sensing networks. In: 4th International Conference on Information Fusion, 2001.Suche in Google Scholar
[69] Bocklitz T, Bräutigam K, Urbanek A, Hoffmann F, Von Eggeling F, Ernst G, et al. Novel workflow for combining Raman spectroscopy and MALDI-MSI for tissue based studies. Anal Bioanal Chem. 2015;407:7865–73.10.1007/s00216-015-8987-5Suche in Google Scholar PubMed
[70] Bocklitz T, Crecelius AC, Matthaus C, Tarcea N, Von Eggeling F, Schmitt M, et al. Deeper understanding of biological tissue: quantitative correlation of MALDI-TOF and Raman imaging. Anal Chem. 2013;85:10829–34.10.1021/ac402175cSuche in Google Scholar PubMed
© 2018 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- The environmental fate of synthetic organic chemicals
- Forensics: evidence examination via Raman spectroscopy
- Optical spectroscopy as a tool for battery research
- Selenium and Tellurium Electrophiles in Organic Synthesis
- Introduction to cheminformatics for green chemistry education
- Analyzing Raman spectroscopic data
- Green chemistry in secondary school
- Recent advances in the self-assembly of polynuclear metal–selenium and –tellurium compounds from 14–16 reagents
- Physicochemical approaches to gold and silver work, an overview: Searching for technologies, tracing routes, attempting to preserve
Artikel in diesem Heft
- The environmental fate of synthetic organic chemicals
- Forensics: evidence examination via Raman spectroscopy
- Optical spectroscopy as a tool for battery research
- Selenium and Tellurium Electrophiles in Organic Synthesis
- Introduction to cheminformatics for green chemistry education
- Analyzing Raman spectroscopic data
- Green chemistry in secondary school
- Recent advances in the self-assembly of polynuclear metal–selenium and –tellurium compounds from 14–16 reagents
- Physicochemical approaches to gold and silver work, an overview: Searching for technologies, tracing routes, attempting to preserve