Elsevier

Journal of Process Control

Volume 81, September 2019, Pages 209-220
Journal of Process Control

Gaussian process modelling with Gaussian mixture likelihood

https://doi.org/10.1016/j.jprocont.2019.06.007Get rights and content

Highlights

  • Extend Gaussian process modelling from single Gaussian noise to mixture Gaussian noises.

  • Develop robust Gaussian process modelling to handle outliers in process data analysis.

  • The problem is solved under maximum likelihood framework through expectation maximization algorithm.

Abstract

Gaussian Process (GP), as a probabilistic non-linear multi-variable regression model, has been widely used in nonparametric Bayesian framework for the data based modelling of complex processes. The noise dynamics in standard GP regression is assumed to follow a Gaussian distribution. In this setting, the point estimation of the model parameters can be obtained analytically using the maximum likelihood (ML) approach in a straight forward fashion. However, in practical scenarios, processes may have been corrupted by the outliers and other disturbances or have multiple modes of operation, resulting a non-Gaussian data likelihood. In this work, to model such scenarios, we propose to employ a mixture of two Gaussian distributions as the noise model to capture both regular noise and irregular noise, thereby enhancing the robustness of the regression model. Further, we present an Expectation Maximization (EM) algorithm-based approach to obtain the optimal parameters set of the proposed GP regression model. The predictive distribution can then be found according to the estimated hyperparameters from the EM algorithm. The efficacy and practicality of the proposed method are illustrated with two sets of synthetic data, a simulated example, as well as an industrial dataset.

Introduction

Modeling of complex processes is essential for optimization, control, and process monitoring. However, developing first principles based models for complex chemical processes is a tedious task. Hence, data based models have been considered as a promising alternative in such scenarios. An extensive range of artificial intelligent and machine learning techniques provide a powerful modelling framework for data-based models. At present, data-based modelling methods, namely, particularly principal component analysis (PCA) [57], [53] and partial least squares (PLS) regression based modelling [24], [31], artificial neural networks (ANNs)[5], [39], [15], fuzzy logic methods [18], support vector regression (SVR) [50], [11], Gaussian process regression (GPR) [43], hybrid methods [25], and so on have demonstrated significant improvement to model large number of highly correlated variables. Recently, significant attention has been drawn into data-driven non parametric models as well. Some popular non-parametric regression models include Gaussian Process Regression (GPR), Support Vector Regression (SVR), among others. Non-parametric models can learn any functional form of models from the training data without any prior knowledge. They require only input-output sets of data alone for the modeling [45]. For instance, SVR, proposed by [50] as a regression method, constructs a hyperplane to maximize the separation between data points.

Gaussian Process (GP), a non-parametric modeling paradigm, was initially introduced in the field of geo-statistics in the name “kriging” [22]. Kriging calculates the weights based on the inverse distance between predicted value and the measured inputs as well as the spatial auto correlation of the measured inputs. The basic underlying assumption of Gaussian Process Regression (GPR) is that, a collection of any arbitrary function values can be modeled using multivariate Gaussian distribution [38]. It was shown by [35] that Bayesian neural networks with infinite hidden nodes in one layer is equivalent to GPs. Hence, it can be viewed as flexible and interpretable alternatives to neural networks. GP can also be derived from other models such as Bayesian kernel machines, and linear regression with basis functions [55]. Due to the computational difficulty of Bayesian analysis of neural networks [28], [34], GP was used by [56] as a regression model to make the predictive Bayesian analysis straightforward. The Bayesian interpretation of GPs was further enriched and extended due to [36] and [14]. The ability to model complex datasets makes GPR promising in the area of data based process modeling. For instances, spectroscopic calibration [9], development of soft sensors [26], state estimation of Lithium-ion batteries [17] and model predictive control [33] have found its application.

A natural way to model such industrial data is by attributing Gaussian distribution to the noise. The fully Bayesian framework of GP is computationally tractable for the Gaussian noise model. However, in realistic scenarios, the industrial data seldom follows Gaussian distribution as it may contain outliers due to sensor malfunctions and process disturbances or due to data emanating from multiple operational modes of process. To deal with such scenarios, a possible work around is to employ non-Gaussian distributions for modeling the noise dynamics resulting in a more robust model [7]. Various approaches have been followed by different researchers for accommodating outliers while modeling the industrial process data. For instances, [37] has discussed the distributions with thick tails and termed them “outlier- prone” as they reject outlying observations, and [19] proposed a two-model strategy containing a good and a bad sampling distribution to model regular and outlying observations. Further, in similar lines, use of Student's-t distribution, as a heavy tailed distribution to accommodate outliers, has been described by [54], a mixture of two Gaussian distributions is introduced by [8], and Laplace distribution was also used as a noise distribution in [44]. Among the above, appraoches that use mixtures of two Gaussian distributions assume that the regular noise is sampled from a Gaussian distribution having lower variance and outliers are sampled from Gaussian distribution having higher variance [4], [2], [21].

In the context of GPR, [23] investigated the possibility of Student's-t distribution for describing the noise model. [23] applied variational inference, Expectation propagagation (EP) and Markov chain Monte Carlo(MCMC) methods for inference of the GPR model with Student's-t likelihood, extending work of [47]. Moreover, [49] used the Laplace's approximation for approximating log-marginal likelihood of the complete data, while [20] proposed expectation propagation (EP) for the approximate inference of the GPR model with Student's t likelihood. Recently, [40] proposed an EM algorithm based approach for robust GPR identification using non-Gaussian noise distributions, namely, Student's-t and Laplace distribution. In this work, we develop a GPR model with a mixture of two Gaussian distributions as data likelihood. As indicated before, the considered model would capture scenarios like, data with outliers from a contaminated distribution as well data obtained from a process operating in multiple modes, which are not uncommon in chemical processes. Further, we propose to use EM algorithm to learn hyperparameters of the proposed GPR model. The EM algorithm is a powerful approach for obtaining maximum likelihood estimates (MLE) and is useful when the observed data is incomplete or containing hidden or latent variables [10], [29] has beeen widely used in variety of parameter identification problems [27], [16], [46]. Even though [23] investigated the scenario for a mixture of two Gaussian noises model in GPR, the entire focus was on inference rather than determining the model's hyperparameters. In this work we address this lacunae by deriving parameter estimates of GPR model for a mixture of Gaussian likelihood. This work, therefore, concerns in identifying GPR as the model structure for outlier contaminated data while the works [48], [3] used mixture of Gaussian distribution for gross error detection problem which is not related to any GPR modelling problem. To the best of the authors’ knowledge, there exists no approach in literature to estimate the parameters of the GPR with a mixture of Gaussian likelihood. Finally, we also validate our results with synthetic data, a simulated CSTR example and an industrial dataset.

The rest of this paper is organized as follows: Section 2 provides a revisit of GPR. The problem is described in Section 3. In Section 4, an EM algorithm based approach is derived to estimate hyperparamters of GPR. After learning the hyperparameters, a procedure for prediction using test data is discussed in Section 5. Section 6 presents an algorithmic flowchart for the estimation of hyperparameters. In Section 7, three validation studies are presented to verify the efficiency of the proposed GPR model. Summary of our findings and conclusions are provided in Section 8.

Section snippets

Revisit of GPR

The GPR modeling paradigm tries to find a distribution over a set of possible non-parametric functions for modeling a set of input and output data-sets. Suppose we observe some inputs xi and corresponding outputs fi, where fi = f(xi) represents the unknown underlying mapping function.

Let xid be the set of inputs for the ith training sample. Then we define a new variable X for the collection of n training samples, having d dimensions as follows:X=x11x12x13x1dx21x22x23x2dx31x32x33x3dxn1x

Problem statement

This section is allocated for describing the problem statement. Fig. 2 shows the graphical model for GPR with a mixture of two Gaussian noises. The GPR model presented in Eq. (5) assumes the existence of a latent function f(x, θ) mapping the the deterministic input x to the noise free output,f, where θ are the set of underlying hyperparameters. Also, y denotes the observed output which is disturbed by a noise. In this case, the noise term, ϵ, is assumed to be a mixture of two Gaussian

Parameter estimation using the EM algorithm

The EM algorithm consists of the following iterative steps, which are repeated until convergence [10], [29] to obtain approximate ML estimates:

  • E-Step: In this step, the expectation of the logarithm of the likelihood probability of all hidden and observed data with respect to the conditional distribution of hidden data given observed data and the current estimate of the hyperparameters, called Q-function, will be derived:Q(ϑ;ϑ(t))=ECmis|Cobs,ϑ(t)[log(P(Cobs,Cmis|ϑ))]

  • M-Step: In this step, the

Prediction with proposed GPR model

After convergence of EM algorithm, the parameter estimates of the proposed GPR model are obtained, which can be further employed for prediction. To make predictions for given test data, we compute the conditional distribution of function values f+ corresponding to test input data X+. To compute the posterior predictive distribution of f+|y, we need to first calculate the joint distribution of f+, y as,P(y,f+|X,X+,ϑ)N(m1m+1,K(X,X)+(diag(σI))2K(X,X+)K(X+,X)K(X+,X+))where (diag(σI))−2 is the

Algorithm

Fig. 3 presents the flow-chart of the proposed GPR model parameter estimation for prediction. The first step of the algorithm is to set the initial value for both Gaussian process hyperparamters and noise likelihood hyperparamters. Further, a standard GP is used to train the model from the training dataset and the predictive mean thus obtained, is set as a initial value for f. In the E-step, the posterior probability of P(f|y, X, I) and P(I|y, X) is inferred. Then, noise mode identity vector is

Examples

To evaluate the efficacy of the presented GPR in this paper, we provide simulation examples for various cases, namely, (i) two synthetic datasets with one dimensional and multidimensional inputs, respectively, (ii) a process simulation example of continuous stirred tank reactor (CSTR) and (iii) an industrial example. To statistically characterize the performance, we employ three metrics, namely, mean absolute error (MAE), root mean square error (RMSE), and negative log of predictive probability

Conclusion

In this paper, a Robust GPR with a mixture of Gaussian likelihood has been proposed to model the processes affected by multi-modal noise. Further, we presented an approach based on EM algorithm to obtain the point estimation of the proposed model parameters. Two numerical example and a simulated chemical process have been used to demonstrate the advantage of the proposed method. In addition, the method is applied to model industrial data from SAGD process, which further verified effectiveness

References (58)

  • I. Tjoa et al.

    Simultaneous strategies for data reconciliation and gross error detection of nonlinear systems

    Comput. Chem. Eng.

    (1991)
  • S. Wold et al.

    Principal component analysis

    Chemometr. Intel. Lab. Syst.

    (1987)
  • M. Abramowitz et al.

    Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Vol. 55

    (1965)
  • D. Agarwal

    Detecting anomalies in cross-classified streams: a Bayesian approach

    Knowl. Inform. Syst.

    (2007)
  • H. Alighardashi et al.

    Expectation maximization approach for simultaneous gross error detection and data reconciliation using gaussian mixture distribution

    Ind. Eng. Chem. Res.

    (2017)
  • C.M. Bishop

    Novelty detection and neural network validation

    IEE Proc.-Vision Image Signal Process.

    (1994)
  • C.M. Bishop

    Neural Networks for Pattern Recognition

    (1995)
  • S. Borman

    The Expectation Maximization Algorithm – A Short Tutorial

    (2004)
  • G.E. Box et al.

    A further look at robustness via Bayes's theorem

    Biometrika

    (1962)
  • G.E. Box et al.

    A Bayesian approach to some outlier problems

    Biometrika

    (1968)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the em algorithm

    J. Royal Stat. Soc. Series B (Methodological)

    (1977)
  • Ebden, M., 2015. Gaussian processes: A quick introduction. arXiv preprint...
  • J.H. Friedman

    Multivariate adaptive regression splines

    Ann. Stat.

    (1991)
  • M.N. Gibbs

    Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis

    (1998)
  • Y.-J. He et al.

    State of health estimation of lithium-ion batteries: A multiscale gaussian process regression modeling approach

    AIChE J.

    (2015)
  • J.-S.R. Jang et al.

    Neuro-fuzzy and soft computing-a computational approach to learning and machine intelligence [book review]

    IEEE Trans. Automatic Control

    (1997)
  • E.T. Jaynes

    Probability Theory: The Logic of Science

    (2003)
  • P. Jylänki et al.

    Robust Gaussian process regression with a student-t likelihood

    J. Mach. Learn. Res.

    (2011)
  • S. Khatibisepehr et al.

    A Bayesian approach to robust process identification with arx models

    AIChE J.

    (2013)
  • Cited by (52)

    • A novel performance degradation prognostics approach and its application on ball screw

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      It is commonly known that the distribution of extracted sensitive features in low dimensional feature space would change accordingly with the decline of equipment performance [42–43]. Therefore, the distribution of sensitive features in different time periods form different data clusters [44–45]. In this paper, the GMM is constructed by data clusters in different time periods.

    View all citing articles on Scopus
    View full text