Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models
Introduction
The current developments in instrumentation make it possible to obtain complex information that tries to adequately describe the multivariate reality of the problem under investigation. Such information should be analyzed according to the nature of both the data and the problem. During the last years, multi-way analysis has become increasingly important because it proved to be a valuable tool in interpreting some such complex data. Among the multi-way models, the most widely used in chemometrics are parallel factor analysis (PARAFAC) [1], [2], [3], Tucker3 [4], [5], [6] and multilinear partial least squares (n-PLS) [7]. PARAFAC, which is a generalization of principal component analysis (PCA) [8] to higher order arrays, has some very attractive features. One of them is the fact that there is no rotational indeterminacy in PARAFAC. For an application of PARAFAC to fluorescence emission/excitation data, this means that under mild assumptions, pure spectral profiles can be obtained provided that the multi-way spectral data follow the same structure as the PARAFAC model [9], [10]. That is, the PARAFAC solution is unique. Despite its usefulness in some cases [11], [12], to date there is no available tool in the literature to estimate the standard errors associated with the parameter estimates. In this paper, we apply the so-called jack-knife technique to PARAFAC in order to find the associated standard errors in the parameter estimates from the PARAFAC model. The jack-knife technique is also shown to be useful for detecting outliers.
An example using the determination of four analytes from fluorescence data (emission/excitation landscapes) is used to show the usefulness of the method. In this application, the focus will be on outlier detection. Removing the samples (and variables) that do not stem from the same overall population as the bulk of the samples is mandatory for jack-knife standard error estimates to make sense. After removal of outliers, as discussed in this paper, the calculation of adequate standard error estimates is straightforward as will be shown.
Section snippets
PARAFAC model
PARAFAC [1], [2], [3] is a decomposition method that can be considered as one possible generalization of PCA to higher order arrays. For three-way data, PARAFAC decomposes the original data into trilinear components, each component consisting of one score vector and two loading vectors. For decomposition of fluorescence emission/excitation matrices, the scores correspond to the estimated relative concentration values and the loadings to the estimated pure emission and excitation profiles of
Experimental part
The data set used consists of 27 fluorescence landscapes of 233 emission wavelengths (250–482 nm) and 24 excitation wavelengths (200–315 nm taken at 5 nm intervals) corresponding to 27 synthetic samples containing different concentrations of four analytes: hydroquinone, tryptophan, phenylalanine and dopa. A Perkin-Elmer LS50 B fluorescence spectrometer was used to measure the fluorescence landscapes. An example of a fluorescence landscape is shown in Fig. 1. More results can be found in Ref.
First model
With the application of jack-knife to the initial set of 27 samples, 27 PARAFAC submodels are obtained. Fig. 2 shows the estimated pure emission spectral profiles for the four analytes and Fig. 3 shows the pure excitation spectral profiles. In all the figures for all the models, the first component corresponds to tryptophan, the second component to dopa, the third component to hydroquinone and the fourth component to phenylalanine. There are 27 estimates of each of the pure spectra profiles,
Conclusions
In this paper, we have applied the jack-knife technique in order to find the standard errors associated with the score values in a PARAFAC model. The jack-knife segments involved in finding these standard errors turned out to be normally distributed for most of the samples according to the Kolmogorov–Smirnov test. Only some samples did not have all the jack-knife segments normally distributed (and about half of these samples corresponded to approximate zero score values), but deleting some
Acknowledgements
Jordi Riu would like to thank the ‘Secretarı́a de Estado de Educación y Universidades’ of the Spanish Ministry of Education, Culture and Sports for providing his postdoctoral fellowship.
R. Bro acknowledges support provided by Frame program Advanced Quality Monitoring in the Food Production Chain (AQM), as well as the EU-project Project GRD1-1999-10337, NWAYQUAL. The data and m-file for performing jack-knifed PARAFAC are available from http://www.models.kvl.dk. The authors would also like to
References (27)
Chemom. Intell. Lab. Syst.
(1997)- et al.
Chemom. Intell. Lab. Syst.
(1987) - et al.
Chemom. Intell. Lab. Syst.
(1998) - et al.
Food Chem.
(2000) - et al.
Food Qual. Prefer.
(2000) - et al.
Chemom. Intell. Lab. Syst.
(2000) UCLA Work. Pap. Phon.
(1970)J. Chemom.
(2001)Psychometrika
(1966)
Psychometrika
Psychometrika
J. Chemom.
Cited by (113)
Resampling for estimation of parameters uncertainty in genetic algorithm based model fitting
2023, Microchemical JournalA novel estimation procedure for robust CANDECOMP/PARAFAC model fitting
2023, Econometrics and StatisticsEvidence on the causes of the rising levels of COD<inf>Mn</inf> along the middle route of the South-to-North Diversion Project in China: The role of algal dissolved organic matter
2022, Journal of Environmental Sciences (China)Methods for unsupervised contribution analysis of raw EEM data in water monitoring. Contaminant identification and quantification
2022, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyNetwork anomaly detection based on tensor decomposition
2021, Computer NetworksNORMO: A new method for estimating the number of components in CP tensor decomposition
2020, Engineering Applications of Artificial Intelligence
- 1
On leave from: Department of Analytical and Organic Chemistry, Institute of Advanced Studies, Universitat Rovira i Virgili, Pl. Imperial Tarraco 1, 43005-Tarragona, Catalonia, Spain.