Abstract
This chapter is a short introduction into the data analysis pipeline, which is typically utilized to analyze Raman spectra. We empathized in the chapter that this data analysis pipeline must be tailored to the specific application of interest. Nevertheless, the tailored data analysis pipeline consists always of the same general procedures applied sequentially. The utilized procedures correct for artefacts, standardize the measured spectral data and translate the spectroscopic signals into higher level information. These computational procedures can be arranged into separate groups namely data pre-treatment, pre-processing and modeling. Thereby the pre-treatment aims to correct for non-sample-dependent artefacts, like cosmic spikes and contributions of the measurement device. The block of procedures, which needs to be applied next, is called pre-processing. This group consists of smoothing, baseline correction, normalization and dimension reduction. Thereafter, the analysis model is constructed and the performance of the models is evaluated. Every data analysis pipeline should be composed of procedures of these three groups and we describe every group in this chapter. After the description of data pre-treatment, pre-processing and modeling, we summarized trends in the analysis of Raman spectra namely model transfer approaches and data fusion. At the end of the chapter we tried to condense the whole chapter into guidelines for the analysis of Raman spectra.
References
[1] Bocklitz TW, Guo S, Ryabchykov O, Vogler N, Popp J. Raman based molecular imaging and analytics: a magic bullet for biomedical applications!? Anal Chem. 2016;88:133–51.10.1021/acs.analchem.5b04665Search in Google Scholar PubMed
[2] Ehrentreich F, Sümmchen L. Spike removal and denoising of Raman spectra by wavelet transform methods. Anal Chem. 2001;73:4364–73.10.1021/ac0013756Search in Google Scholar PubMed
[3] Schulze HG, Turner RF. A two-dimensionally coincident second difference cosmic ray spike removal method for the fully automated processing of Raman spectra. Appl Spectrosc. 2014;68:185–91.10.1366/13-07216Search in Google Scholar PubMed
[4] Ryabchykov O, Bocklitz T, Ramoji A, Neugebauer U, Foerster M, Kroegel C, et al. Automatization of spike correction in Raman spectra of biological samples. Chemometrics Intell Lab Syst. 2016;155:1–6.10.1016/j.chemolab.2016.03.024Search in Google Scholar
[5] Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Z Phys Chem Int J Res Phy Chem Chem Phy. 2011;225:753–64.10.1524/zpch.2011.0077Search in Google Scholar
[6] Bocklitz T, Dörfer T, Heinke R, Schmitt M, Popp J. Spectrometer calibration protocol for Raman spectra recorded with different excitation wavelengths. Spectrochimica Acta A: Mol Biomol Spectrosc. 2015;149:544–9.10.1016/j.saa.2015.04.079Search in Google Scholar PubMed
[7] McCreery RL. Raman spectroscopy for chemical analysis Vol. 157. New York: John Wiley & Sons, 200010.1002/0471721646Search in Google Scholar
[8] Berg RW, Nørbygaard T. Wavenumber calibration of CCD detector Raman spectrometers controlled by a sinus arm drive. Appl Spectrosc Rev. 2006;41:165–83.10.1080/05704920500510786Search in Google Scholar
[9] Carrabba MM. Wavenumber standards for Raman spectrometry. In: Griffiths P, Chalmers JM, editor(s). Handbook of vibrational spectroscopy. Chichester: Wiley online library, 2006Search in Google Scholar
[10] E1840-96, A., Standard guide for Raman shift standards for spectrometer calibration. ASTM International, West Conshohocken, PA, 2014. 03.06.Search in Google Scholar
[11] Fryling M, Frank CJ, McCreery RL. Intensity calibration and sensitivity comparisons for CCD/Raman spectrometers. Appl Spectrosc. 1993;47:1965–74.10.1366/0003702934066226Search in Google Scholar
[12] Davis W, Forney G, Bukowski R. National institute of standards and technology, Gaithersburg MD, USA.Search in Google Scholar
[13] Rodriguez JD, Westenberger BJ, Buhse LF, Kauffman JF. Standardization of Raman spectra for transfer of spectral libraries across different instruments. Analyst. 2011;136:4232–40.10.1039/c1an15636eSearch in Google Scholar PubMed
[14] Guo S, Heinke R, Stöckel S, Rösch P, Bocklitz T, Popp J. Towards an improvement of model transferability for Raman spectroscopy in biological applications. Vib Spectrosc. 2017;91:111–8.10.1016/j.vibspec.2016.06.010Search in Google Scholar
[15] Bocklitz T, Walter A, Hartmann K, Rosch P, Popp J. How to pre-process Raman spectra for reliable and stable models? Anal Chim Acta. 2011;704:47–56.10.1016/j.aca.2011.06.043Search in Google Scholar PubMed
[16] Savitzky A, Golay MJE. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem. 1964;36:1627–39.10.1021/ac60214a047Search in Google Scholar
[17] Lieber CA, Mahadevan-Jansen A. Automated method for subtraction of fluorescence from biological Raman spectra. Appl Spectrosc. 2003;57:1363–7.10.1366/000370203322554518Search in Google Scholar
[18] Eilers PH, Boelens HF. Baseline correction with asymmetric least squares smoothing. Leiden Univ Med Centre Rep. 2005;1:1.Search in Google Scholar
[19] Ryan CG, Clayton E, Griffin WL, Sie SH, Cousens DR. SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nucl Instrum Methods Phys Res B: Beam Interact Mater Atoms. 1988;34:396–402.10.1016/0168-583X(88)90063-8Search in Google Scholar
[20] Martens H, Stark E. Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. J Pharm Biomed Anal. 1991;9:625–35.10.1016/0731-7085(91)80188-FSearch in Google Scholar PubMed
[21] Guo S, Bocklitz T, Popp J. Optimization of Raman-spectrum baseline correction in biological application. Analyst. 2016;141:2396–404.10.1039/C6AN00041JSearch in Google Scholar PubMed
[22] Gautam R, Vanga S, Ariese F, Umapathy S. Review of multidimensional data processing approaches for Raman and infrared spectroscopy. EPJ Tech Instrum. 2015;2:8.10.1140/epjti/s40485-015-0018-6Search in Google Scholar
[23] Black MJ, Anandan P. The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comput Vis Image Understand. 1996;63:75–104.10.1006/cviu.1996.0006Search in Google Scholar
[24] Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–82.Search in Google Scholar
[25] Malinowski ER. Factor analysis in chemistry, 2 ed. New York: Wiley, 1991Search in Google Scholar
[26] Zhang X, Tauler R. Application of multivariate curve resolution alternating least squares (MCR-ALS) to remote sensing hyperspectral imaging. Anal Chim Acta. 2013;762:25–38.10.1016/j.aca.2012.11.043Search in Google Scholar PubMed
[27] Piqueras S, Krafft C, Beleites C, Egodage K, Von Eggeling F, Guntinas-Lichius O, et al. Combining multiset resolution and segmentation for hyperspectral image analysis of biological tissues. Anal Chim Acta. 2015;881:24–36.10.1016/j.aca.2015.04.053Search in Google Scholar PubMed
[28] Brereton RG, Jansen J, Lopes J, Marini F, Pomerantsev A, Rodionova O, et al. Chemometrics in analytical chemistry – Part I: history, experimental design and data analysis tools. Anal Bioanal Chem. 2017;409:5891–9.10.1007/s00216-017-0517-1Search in Google Scholar PubMed
[29] Bruce LM, Koger CH, Li J. Dimensionality reduction of hyperspectral data using discrete wavelet transform feature extraction. IEEE Trans Geosci Remote Sens. 2002;40:2331–8.10.1109/TGRS.2002.804721Search in Google Scholar
[30] Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc Series B Methodol. 1996;58:267–88.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar
[31] Chun H, Keleş S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Series B Stat Methodol. 2010;72:3–25.10.1111/j.1467-9868.2009.00723.xSearch in Google Scholar PubMed PubMed Central
[32] Zhang Z, Chow TW, Zhao M. M-Isomap: orthogonal constrained marginal isomap for nonlinear dimensionality reduction. IEEE Trans Syst Man Cybern. 2013;43:180–91.10.1109/TSMCB.2012.2202901Search in Google Scholar PubMed
[33] Silva VD, Tenenbaum JB. Global versus local methods in nonlinear dimensionality reduction. In: Advances in neural information processing systems. Cambridge, MA, USA: MIT Press, 2003.Search in Google Scholar
[34] Shan R, Cai W, Shao X. Variable selection based on locally linear embedding mapping for near-infrared spectral analysis. Chemometrics Intell Lab Syst. 2014;131:31–6.10.1016/j.chemolab.2013.12.002Search in Google Scholar
[35] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–7.10.1126/science.1127647Search in Google Scholar PubMed
[36] Wang W, Huang Y, Wang Y, Wang L. Generalized autoencoder: a neural network framework for dimensionality reduction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014.10.1109/CVPRW.2014.79Search in Google Scholar
[37] Kalivas JH, Palmer J. Characterizing multivariate calibration tradeoffs (bias, variance, selectivity, and sensitivity) to select model tuning parameters. J Chemom. 2014;28:347–57.10.1002/cem.2555Search in Google Scholar
[38] Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79.10.1214/09-SS054Search in Google Scholar
[39] Guo S, Bocklitz T, Neugebauer U, Popp J. Common mistakes in cross-validating classification models. Anal Methods. 2017;9:4410–7.10.1039/C7AY01363ASearch in Google Scholar
[40] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning; data mining, inference and prediction. New York: Springer, 2008.10.1007/978-0-387-84858-7Search in Google Scholar
[41] Hedegaard M, Matthäus C, Hassing S, Krafft C, Diem M, Popp J. Spectral unmixing and clustering algorithms for assessment of single cells by Raman microscopic imaging. Theor Chem Acc. 2011;130:1249–60.10.1007/s00214-011-0957-1Search in Google Scholar
[42] Bezdek JC, Ehrlich R, Full W. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1984;10:191–203.10.1016/0098-3004(84)90020-7Search in Google Scholar
[43] Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, et al. A comprehensive study of classification methods for medical diagnosis. J Raman Spectrosc. 2009;40:1759–65.10.1002/jrs.2529Search in Google Scholar
[44] Acquarelli J, Van Laarhoven T, Gerretzen J, Tran TN, Buydens LM, Marchiori E. Convolutional neural networks for vibrational spectroscopic data analysis. Anal Chim Acta. 2017;954:22–31.10.1016/j.aca.2016.12.010Search in Google Scholar PubMed
[45] Breiman L. Random forests. Mach Learn. 2001;45:5–32.10.1023/A:1010933404324Search in Google Scholar
[46] Mevik B-H, Wehrens R, Liland KH. pls: partial least squares and principal component regression. R Package Version. 2011;2(3).Search in Google Scholar
[47] Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27:1413–32.10.1007/s11222-016-9696-4Search in Google Scholar
[48] Héberger K. Sum of ranking differences compares methods or models fairly. TrAC Trends Anal Chem. 2010;29:101–9.10.1016/j.trac.2009.09.009Search in Google Scholar
[49] Refaeilzadeh P, Tang L, Liu H. Cross-validation, in Encyclopedia of database systems. New York: Springer, 2009:532–8.10.1007/978-0-387-39940-9_565Search in Google Scholar
[50] Xu QS, Liang YZ, Du YP. Monte Carlo cross‐validation for selecting a model and estimating the prediction error in multivariate calibration. J Chemom. 2004;18:112–20.10.1002/cem.858Search in Google Scholar
[51] Wan C, Harrington PDB. Screening GC-MS data for carbamate pesticides with temperature-constrained-cascade correlation neural networks. Anal Chim Acta. 2000;408:1–12.10.1016/S0003-2670(99)00865-XSearch in Google Scholar
[52] De Boves Harrington P. Statistical validation of classification and calibration models using bootstrapped Latin partitions. TrAC Trends Anal Chem. 2006;25:1112–24.10.1016/j.trac.2006.10.010Search in Google Scholar
[53] Qi N, Zhang Z, Xiang Y, Yang Y, Liang X, Harrington PD. Terahertz time-domain spectroscopy combined with support vector machines and partial least squares-discriminant analysis applied for the diagnosis of cervical carcinoma. Anal Methods. 2015;7:2333–8.10.1039/C4AY02665ASearch in Google Scholar
[54] Krstajic D, Buturovic LJ, Leahy DE, Thomas S. Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform. 2014;6:10.10.1186/1758-2946-6-10Search in Google Scholar PubMed PubMed Central
[55] Copas JB. Regression, prediction and shrinkage. J R Stat Soc Series B Methodol. 1983;45:311–54.10.1111/j.2517-6161.1983.tb01258.xSearch in Google Scholar
[56] Shahbazikhah P, Kalivas JH. A consensus modeling approach to update a spectroscopic calibration. Chemometrics Intell Lab Syst. 2013;120:142–53.10.1016/j.chemolab.2012.06.006Search in Google Scholar
[57] Guo S, Heinke R, Stöckel S, Rösch P, Popp J, Bocklitz T. Model transfer for Raman‐spectroscopy‐based bacterial classification. J Raman Spectrosc. 2018;49:627–37.10.1002/jrs.5343Search in Google Scholar
[58] Kalivas JH, Siano GG, Andries E, Goicoechea HC. Calibration maintenance and transfer using Tikhonov regularization approaches. Appl Spectrosc. 2009;63:800–9.10.1366/000370209788701206Search in Google Scholar PubMed
[59] Liang C, Yuan H-F, Zhao Z, Song C-F, Wang J-J. A new multivariate calibration model transfer method of near-infrared spectral analysis. Chemometrics Intell Lab Syst. 2016;153:51–7.10.1016/j.chemolab.2016.01.017Search in Google Scholar
[60] Bloemberg TG, Gerretzen J, Lunshof A, Wehrens R, Buydens LM. Warping methods for spectroscopic and chromatographic signal alignment: a tutorial. Anal Chim Acta. 2013;781:14–32.10.1016/j.aca.2013.03.048Search in Google Scholar PubMed
[61] Kalivas JH, Brownfield B, Karki BJ. Sample‐wise spectral multivariate calibration desensitized to new artifacts relative to the calibration data using a residual penalty. J Chemom. 2017;31:e287310.1002/cem.2873Search in Google Scholar
[62] Bevilacqua M, Marini F. Local classification: locally weighted-partial least squares-discriminant analysis (LW-PLS-DA). Anal Chim Acta. 2014;838:20–30.10.1016/j.aca.2014.05.057Search in Google Scholar PubMed
[63] Kalivas JH. Overview of two‐norm (L2) and one‐norm (L1) Tikhonov regularization variants for full wavelength or sparse spectral multivariate calibration models or maintenance. J Chemom. 2012;26:218–30.10.1002/cem.2429Search in Google Scholar
[64] Castanedo F. A review of data fusion techniques. Sci World J. 2013;2013:19.10.1155/2013/704504Search in Google Scholar PubMed PubMed Central
[65] Márquez C, López MI, Ruisánchez I, Callao MP. FT-Raman and NIR spectroscopy data fusion strategy for multivariate qualitative analysis of food fraud. Talanta. 2016;161:80–6.10.1016/j.talanta.2016.08.003Search in Google Scholar PubMed
[66] Teglia CM, Azcarate SM, Alcaráz MR, Goicoechea HC, Culzoni MJ. Exploiting the synergistic effect of concurrent data signals: low-level fusion of liquid chromatographic with dual detection data. Talanta. 2018;186:481–8.10.1016/j.talanta.2018.04.090Search in Google Scholar PubMed
[67] Borràs E, Ferré J, Boqué R, Mestres M, Aceña L, Busto O. Data fusion methodologies for food and beverage authentication and quality assessment – a review. Anal Chim Acta. 2015;891:1–14.10.1016/j.aca.2015.04.042Search in Google Scholar PubMed
[68] Durrant-Whyte H, Stevens M, Nettleton E. Data fusion in decentralised sensing networks. In: 4th International Conference on Information Fusion, 2001.Search in Google Scholar
[69] Bocklitz T, Bräutigam K, Urbanek A, Hoffmann F, Von Eggeling F, Ernst G, et al. Novel workflow for combining Raman spectroscopy and MALDI-MSI for tissue based studies. Anal Bioanal Chem. 2015;407:7865–73.10.1007/s00216-015-8987-5Search in Google Scholar PubMed
[70] Bocklitz T, Crecelius AC, Matthaus C, Tarcea N, Von Eggeling F, Schmitt M, et al. Deeper understanding of biological tissue: quantitative correlation of MALDI-TOF and Raman imaging. Anal Chem. 2013;85:10829–34.10.1021/ac402175cSearch in Google Scholar PubMed
© 2018 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- The environmental fate of synthetic organic chemicals
- Forensics: evidence examination via Raman spectroscopy
- Optical spectroscopy as a tool for battery research
- Selenium and Tellurium Electrophiles in Organic Synthesis
- Introduction to cheminformatics for green chemistry education
- Analyzing Raman spectroscopic data
- Green chemistry in secondary school
- Recent advances in the self-assembly of polynuclear metal–selenium and –tellurium compounds from 14–16 reagents
- Physicochemical approaches to gold and silver work, an overview: Searching for technologies, tracing routes, attempting to preserve
Articles in the same Issue
- The environmental fate of synthetic organic chemicals
- Forensics: evidence examination via Raman spectroscopy
- Optical spectroscopy as a tool for battery research
- Selenium and Tellurium Electrophiles in Organic Synthesis
- Introduction to cheminformatics for green chemistry education
- Analyzing Raman spectroscopic data
- Green chemistry in secondary school
- Recent advances in the self-assembly of polynuclear metal–selenium and –tellurium compounds from 14–16 reagents
- Physicochemical approaches to gold and silver work, an overview: Searching for technologies, tracing routes, attempting to preserve