Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models

https://doi.org/10.1016/S0169-7439(02)00090-4Get rights and content

Abstract

In the last years, multi-way analysis has become increasingly important because it has proved to be a valuable tool, e.g. in interpreting data provided by instrumental methods that describe the multivariate and complex reality of a given problem. Parallel factor analysis (PARAFAC) is one of the most widely used multi-way models. Despite its usefulness in many applications, up to date there is no available tool in the literature to estimate the standard errors associated with the parameter estimates. In this study, we apply the so-called jack-knife technique to PARAFAC in order to find the associated standard errors to the parameter estimates from the PARAFAC model. The jack-knife technique is also shown to be useful for detecting outliers. An example of fluorescence data (emission/excitation landscapes) is used to show the applicability of the method.

Introduction

The current developments in instrumentation make it possible to obtain complex information that tries to adequately describe the multivariate reality of the problem under investigation. Such information should be analyzed according to the nature of both the data and the problem. During the last years, multi-way analysis has become increasingly important because it proved to be a valuable tool in interpreting some such complex data. Among the multi-way models, the most widely used in chemometrics are parallel factor analysis (PARAFAC) [1], [2], [3], Tucker3 [4], [5], [6] and multilinear partial least squares (n-PLS) [7]. PARAFAC, which is a generalization of principal component analysis (PCA) [8] to higher order arrays, has some very attractive features. One of them is the fact that there is no rotational indeterminacy in PARAFAC. For an application of PARAFAC to fluorescence emission/excitation data, this means that under mild assumptions, pure spectral profiles can be obtained provided that the multi-way spectral data follow the same structure as the PARAFAC model [9], [10]. That is, the PARAFAC solution is unique. Despite its usefulness in some cases [11], [12], to date there is no available tool in the literature to estimate the standard errors associated with the parameter estimates. In this paper, we apply the so-called jack-knife technique to PARAFAC in order to find the associated standard errors in the parameter estimates from the PARAFAC model. The jack-knife technique is also shown to be useful for detecting outliers.

An example using the determination of four analytes from fluorescence data (emission/excitation landscapes) is used to show the usefulness of the method. In this application, the focus will be on outlier detection. Removing the samples (and variables) that do not stem from the same overall population as the bulk of the samples is mandatory for jack-knife standard error estimates to make sense. After removal of outliers, as discussed in this paper, the calculation of adequate standard error estimates is straightforward as will be shown.

Section snippets

PARAFAC model

PARAFAC [1], [2], [3] is a decomposition method that can be considered as one possible generalization of PCA to higher order arrays. For three-way data, PARAFAC decomposes the original data into trilinear components, each component consisting of one score vector and two loading vectors. For decomposition of fluorescence emission/excitation matrices, the scores correspond to the estimated relative concentration values and the loadings to the estimated pure emission and excitation profiles of

Experimental part

The data set used consists of 27 fluorescence landscapes of 233 emission wavelengths (250–482 nm) and 24 excitation wavelengths (200–315 nm taken at 5 nm intervals) corresponding to 27 synthetic samples containing different concentrations of four analytes: hydroquinone, tryptophan, phenylalanine and dopa. A Perkin-Elmer LS50 B fluorescence spectrometer was used to measure the fluorescence landscapes. An example of a fluorescence landscape is shown in Fig. 1. More results can be found in Ref.

First model

With the application of jack-knife to the initial set of 27 samples, 27 PARAFAC submodels are obtained. Fig. 2 shows the estimated pure emission spectral profiles for the four analytes and Fig. 3 shows the pure excitation spectral profiles. In all the figures for all the models, the first component corresponds to tryptophan, the second component to dopa, the third component to hydroquinone and the fourth component to phenylalanine. There are 27 estimates of each of the pure spectra profiles,

Conclusions

In this paper, we have applied the jack-knife technique in order to find the standard errors associated with the score values in a PARAFAC model. The jack-knife segments involved in finding these standard errors turned out to be normally distributed for most of the samples according to the Kolmogorov–Smirnov test. Only some samples did not have all the jack-knife segments normally distributed (and about half of these samples corresponded to approximate zero score values), but deleting some

Acknowledgements

Jordi Riu would like to thank the ‘Secretarı́a de Estado de Educación y Universidades’ of the Spanish Ministry of Education, Culture and Sports for providing his postdoctoral fellowship.

R. Bro acknowledges support provided by Frame program Advanced Quality Monitoring in the Food Production Chain (AQM), as well as the EU-project Project GRD1-1999-10337, NWAYQUAL. The data and m-file for performing jack-knifed PARAFAC are available from http://www.models.kvl.dk. The authors would also like to

References (27)

  • R. Bro

    Chemom. Intell. Lab. Syst.

    (1997)
  • S. Wold et al.

    Chemom. Intell. Lab. Syst.

    (1987)
  • L. Munck et al.

    Chemom. Intell. Lab. Syst.

    (1998)
  • D. Baunsgaard et al.

    Food Chem.

    (2000)
  • H. Martens et al.

    Food Qual. Prefer.

    (2000)
  • S.D. Peddada
  • C.A. Andersson et al.

    Chemom. Intell. Lab. Syst.

    (2000)
  • R.A. Harshman

    UCLA Work. Pap. Phon.

    (1970)
  • R.A. Harshman

    J. Chemom.

    (2001)
  • L. Tucker

    Psychometrika

    (1966)
  • P.M. Kroonenberg et al.

    Psychometrika

    (1980)
  • J.M.F. Ten Berge et al.

    Psychometrika

    (1987)
  • R. Bro

    J. Chemom.

    (1996)
  • Cited by (113)

    View all citing articles on Scopus
    1

    On leave from: Department of Analytical and Organic Chemistry, Institute of Advanced Studies, Universitat Rovira i Virgili, Pl. Imperial Tarraco 1, 43005-Tarragona, Catalonia, Spain.

    View full text