Gaussian process modelling with Gaussian mixture likelihood
Introduction
Modeling of complex processes is essential for optimization, control, and process monitoring. However, developing first principles based models for complex chemical processes is a tedious task. Hence, data based models have been considered as a promising alternative in such scenarios. An extensive range of artificial intelligent and machine learning techniques provide a powerful modelling framework for data-based models. At present, data-based modelling methods, namely, particularly principal component analysis (PCA) [57], [53] and partial least squares (PLS) regression based modelling [24], [31], artificial neural networks (ANNs)[5], [39], [15], fuzzy logic methods [18], support vector regression (SVR) [50], [11], Gaussian process regression (GPR) [43], hybrid methods [25], and so on have demonstrated significant improvement to model large number of highly correlated variables. Recently, significant attention has been drawn into data-driven non parametric models as well. Some popular non-parametric regression models include Gaussian Process Regression (GPR), Support Vector Regression (SVR), among others. Non-parametric models can learn any functional form of models from the training data without any prior knowledge. They require only input-output sets of data alone for the modeling [45]. For instance, SVR, proposed by [50] as a regression method, constructs a hyperplane to maximize the separation between data points.
Gaussian Process (GP), a non-parametric modeling paradigm, was initially introduced in the field of geo-statistics in the name “kriging” [22]. Kriging calculates the weights based on the inverse distance between predicted value and the measured inputs as well as the spatial auto correlation of the measured inputs. The basic underlying assumption of Gaussian Process Regression (GPR) is that, a collection of any arbitrary function values can be modeled using multivariate Gaussian distribution [38]. It was shown by [35] that Bayesian neural networks with infinite hidden nodes in one layer is equivalent to GPs. Hence, it can be viewed as flexible and interpretable alternatives to neural networks. GP can also be derived from other models such as Bayesian kernel machines, and linear regression with basis functions [55]. Due to the computational difficulty of Bayesian analysis of neural networks [28], [34], GP was used by [56] as a regression model to make the predictive Bayesian analysis straightforward. The Bayesian interpretation of GPs was further enriched and extended due to [36] and [14]. The ability to model complex datasets makes GPR promising in the area of data based process modeling. For instances, spectroscopic calibration [9], development of soft sensors [26], state estimation of Lithium-ion batteries [17] and model predictive control [33] have found its application.
A natural way to model such industrial data is by attributing Gaussian distribution to the noise. The fully Bayesian framework of GP is computationally tractable for the Gaussian noise model. However, in realistic scenarios, the industrial data seldom follows Gaussian distribution as it may contain outliers due to sensor malfunctions and process disturbances or due to data emanating from multiple operational modes of process. To deal with such scenarios, a possible work around is to employ non-Gaussian distributions for modeling the noise dynamics resulting in a more robust model [7]. Various approaches have been followed by different researchers for accommodating outliers while modeling the industrial process data. For instances, [37] has discussed the distributions with thick tails and termed them “outlier- prone” as they reject outlying observations, and [19] proposed a two-model strategy containing a good and a bad sampling distribution to model regular and outlying observations. Further, in similar lines, use of Student's-t distribution, as a heavy tailed distribution to accommodate outliers, has been described by [54], a mixture of two Gaussian distributions is introduced by [8], and Laplace distribution was also used as a noise distribution in [44]. Among the above, appraoches that use mixtures of two Gaussian distributions assume that the regular noise is sampled from a Gaussian distribution having lower variance and outliers are sampled from Gaussian distribution having higher variance [4], [2], [21].
In the context of GPR, [23] investigated the possibility of Student's-t distribution for describing the noise model. [23] applied variational inference, Expectation propagagation (EP) and Markov chain Monte Carlo(MCMC) methods for inference of the GPR model with Student's-t likelihood, extending work of [47]. Moreover, [49] used the Laplace's approximation for approximating log-marginal likelihood of the complete data, while [20] proposed expectation propagation (EP) for the approximate inference of the GPR model with Student's t likelihood. Recently, [40] proposed an EM algorithm based approach for robust GPR identification using non-Gaussian noise distributions, namely, Student's-t and Laplace distribution. In this work, we develop a GPR model with a mixture of two Gaussian distributions as data likelihood. As indicated before, the considered model would capture scenarios like, data with outliers from a contaminated distribution as well data obtained from a process operating in multiple modes, which are not uncommon in chemical processes. Further, we propose to use EM algorithm to learn hyperparameters of the proposed GPR model. The EM algorithm is a powerful approach for obtaining maximum likelihood estimates (MLE) and is useful when the observed data is incomplete or containing hidden or latent variables [10], [29] has beeen widely used in variety of parameter identification problems [27], [16], [46]. Even though [23] investigated the scenario for a mixture of two Gaussian noises model in GPR, the entire focus was on inference rather than determining the model's hyperparameters. In this work we address this lacunae by deriving parameter estimates of GPR model for a mixture of Gaussian likelihood. This work, therefore, concerns in identifying GPR as the model structure for outlier contaminated data while the works [48], [3] used mixture of Gaussian distribution for gross error detection problem which is not related to any GPR modelling problem. To the best of the authors’ knowledge, there exists no approach in literature to estimate the parameters of the GPR with a mixture of Gaussian likelihood. Finally, we also validate our results with synthetic data, a simulated CSTR example and an industrial dataset.
The rest of this paper is organized as follows: Section 2 provides a revisit of GPR. The problem is described in Section 3. In Section 4, an EM algorithm based approach is derived to estimate hyperparamters of GPR. After learning the hyperparameters, a procedure for prediction using test data is discussed in Section 5. Section 6 presents an algorithmic flowchart for the estimation of hyperparameters. In Section 7, three validation studies are presented to verify the efficiency of the proposed GPR model. Summary of our findings and conclusions are provided in Section 8.
Section snippets
Revisit of GPR
The GPR modeling paradigm tries to find a distribution over a set of possible non-parametric functions for modeling a set of input and output data-sets. Suppose we observe some inputs xi and corresponding outputs fi, where fi = f(xi) represents the unknown underlying mapping function.
Let be the set of inputs for the ith training sample. Then we define a new variable X for the collection of n training samples, having d dimensions as follows:
Problem statement
This section is allocated for describing the problem statement. Fig. 2 shows the graphical model for GPR with a mixture of two Gaussian noises. The GPR model presented in Eq. (5) assumes the existence of a latent function f(x, θ) mapping the the deterministic input x to the noise free output,f, where θ are the set of underlying hyperparameters. Also, y denotes the observed output which is disturbed by a noise. In this case, the noise term, ϵ, is assumed to be a mixture of two Gaussian
Parameter estimation using the EM algorithm
The EM algorithm consists of the following iterative steps, which are repeated until convergence [10], [29] to obtain approximate ML estimates:
- •
E-Step: In this step, the expectation of the logarithm of the likelihood probability of all hidden and observed data with respect to the conditional distribution of hidden data given observed data and the current estimate of the hyperparameters, called Q-function, will be derived:
- •
M-Step: In this step, the
Prediction with proposed GPR model
After convergence of EM algorithm, the parameter estimates of the proposed GPR model are obtained, which can be further employed for prediction. To make predictions for given test data, we compute the conditional distribution of function values f+ corresponding to test input data X+. To compute the posterior predictive distribution of f+|y, we need to first calculate the joint distribution of f+, y as,where (diag(σI))−2 is the
Algorithm
Fig. 3 presents the flow-chart of the proposed GPR model parameter estimation for prediction. The first step of the algorithm is to set the initial value for both Gaussian process hyperparamters and noise likelihood hyperparamters. Further, a standard GP is used to train the model from the training dataset and the predictive mean thus obtained, is set as a initial value for f. In the E-step, the posterior probability of P(f|y, X, I) and P(I|y, X) is inferred. Then, noise mode identity vector is
Examples
To evaluate the efficacy of the presented GPR in this paper, we provide simulation examples for various cases, namely, (i) two synthetic datasets with one dimensional and multidimensional inputs, respectively, (ii) a process simulation example of continuous stirred tank reactor (CSTR) and (iii) an industrial example. To statistically characterize the performance, we employ three metrics, namely, mean absolute error (MAE), root mean square error (RMSE), and negative log of predictive probability
Conclusion
In this paper, a Robust GPR with a mixture of Gaussian likelihood has been proposed to model the processes affected by multi-modal noise. Further, we presented an approach based on EM algorithm to obtain the point estimation of the proposed model parameters. Two numerical example and a simulated chemical process have been used to demonstrate the advantage of the proposed method. In addition, the method is applied to model industrial data from SAGD process, which further verified effectiveness
References (58)
- et al.
Gaussian process regression for multivariate spectroscopic calibration
Chemometr. Intel. Lab. Syst.
(2007) - et al.
Soft-sensor development for fed-batch bioreactors using support vector regression
Biochem. Eng. J.
(2006) - et al.
Ann-based soft-sensor for real-time process monitoring and control of an industrial polymerization process
Comput. Chem. Eng.
(2009) - et al.
Robust identification for nonlinear errors-in-variables systems using the em algorithm
J. Process Control
(2017) - et al.
Geometric properties of partial least squares for process monitoring
Automatica
(2010) On-line soft sensor for polyethylene process with multiple production grades
Control Eng. Practice
(2007)- et al.
Robust multiple-model lpv approach to nonlinear process identification using mixture t distributions
J. Process Control
(2014) - et al.
Robust gaussian process modeling using em algorithm
J. Process Control
(2016) - et al.
A review of the expectation maximization algorithm in data-driven process identification
J. Process Control
(2019) - et al.
Variational inference for student-t models: Robust Bayesian interpolation and generalised component analysis
Neurocomputing
(2005)
Simultaneous strategies for data reconciliation and gross error detection of nonlinear systems
Comput. Chem. Eng.
Principal component analysis
Chemometr. Intel. Lab. Syst.
Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Vol. 55
Detecting anomalies in cross-classified streams: a Bayesian approach
Knowl. Inform. Syst.
Expectation maximization approach for simultaneous gross error detection and data reconciliation using gaussian mixture distribution
Ind. Eng. Chem. Res.
Novelty detection and neural network validation
IEE Proc.-Vision Image Signal Process.
Neural Networks for Pattern Recognition
The Expectation Maximization Algorithm – A Short Tutorial
A further look at robustness via Bayes's theorem
Biometrika
A Bayesian approach to some outlier problems
Biometrika
Maximum likelihood from incomplete data via the em algorithm
J. Royal Stat. Soc. Series B (Methodological)
Multivariate adaptive regression splines
Ann. Stat.
Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis
State of health estimation of lithium-ion batteries: A multiscale gaussian process regression modeling approach
AIChE J.
Neuro-fuzzy and soft computing-a computational approach to learning and machine intelligence [book review]
IEEE Trans. Automatic Control
Probability Theory: The Logic of Science
Robust Gaussian process regression with a student-t likelihood
J. Mach. Learn. Res.
A Bayesian approach to robust process identification with arx models
AIChE J.
Cited by (52)
A novel performance degradation prognostics approach and its application on ball screw
2022, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :It is commonly known that the distribution of extracted sensitive features in low dimensional feature space would change accordingly with the decline of equipment performance [42–43]. Therefore, the distribution of sensitive features in different time periods form different data clusters [44–45]. In this paper, the GMM is constructed by data clusters in different time periods.
An adaptive surrogate model-based fast planning for swarm safe migration along halo orbit
2022, Acta Astronautica