Gaussian process modelling with Gaussian mixture likelihood

doi:10.1016/j.jprocont.2019.06.007

Journal of Process Control

Volume 81, September 2019, Pages 209-220

https://doi.org/10.1016/j.jprocont.2019.06.007 Get rights and content

Highlights

•
Extend Gaussian process modelling from single Gaussian noise to mixture Gaussian noises.
•
Develop robust Gaussian process modelling to handle outliers in process data analysis.
•
The problem is solved under maximum likelihood framework through expectation maximization algorithm.

Abstract

Gaussian Process (GP), as a probabilistic non-linear multi-variable regression model, has been widely used in nonparametric Bayesian framework for the data based modelling of complex processes. The noise dynamics in standard GP regression is assumed to follow a Gaussian distribution. In this setting, the point estimation of the model parameters can be obtained analytically using the maximum likelihood (ML) approach in a straight forward fashion. However, in practical scenarios, processes may have been corrupted by the outliers and other disturbances or have multiple modes of operation, resulting a non-Gaussian data likelihood. In this work, to model such scenarios, we propose to employ a mixture of two Gaussian distributions as the noise model to capture both regular noise and irregular noise, thereby enhancing the robustness of the regression model. Further, we present an Expectation Maximization (EM) algorithm-based approach to obtain the optimal parameters set of the proposed GP regression model. The predictive distribution can then be found according to the estimated hyperparameters from the EM algorithm. The efficacy and practicality of the proposed method are illustrated with two sets of synthetic data, a simulated example, as well as an industrial dataset.

Introduction

Modeling of complex processes is essential for optimization, control, and process monitoring. However, developing first principles based models for complex chemical processes is a tedious task. Hence, data based models have been considered as a promising alternative in such scenarios. An extensive range of artificial intelligent and machine learning techniques provide a powerful modelling framework for data-based models. At present, data-based modelling methods, namely, particularly principal component analysis (PCA) [57], [53] and partial least squares (PLS) regression based modelling [24], [31], artificial neural networks (ANNs)[5], [39], [15], fuzzy logic methods [18], support vector regression (SVR) [50], [11], Gaussian process regression (GPR) [43], hybrid methods [25], and so on have demonstrated significant improvement to model large number of highly correlated variables. Recently, significant attention has been drawn into data-driven non parametric models as well. Some popular non-parametric regression models include Gaussian Process Regression (GPR), Support Vector Regression (SVR), among others. Non-parametric models can learn any functional form of models from the training data without any prior knowledge. They require only input-output sets of data alone for the modeling [45]. For instance, SVR, proposed by [50] as a regression method, constructs a hyperplane to maximize the separation between data points.

Gaussian Process (GP), a non-parametric modeling paradigm, was initially introduced in the field of geo-statistics in the name “kriging” [22]. Kriging calculates the weights based on the inverse distance between predicted value and the measured inputs as well as the spatial auto correlation of the measured inputs. The basic underlying assumption of Gaussian Process Regression (GPR) is that, a collection of any arbitrary function values can be modeled using multivariate Gaussian distribution [38]. It was shown by [35] that Bayesian neural networks with infinite hidden nodes in one layer is equivalent to GPs. Hence, it can be viewed as flexible and interpretable alternatives to neural networks. GP can also be derived from other models such as Bayesian kernel machines, and linear regression with basis functions [55]. Due to the computational difficulty of Bayesian analysis of neural networks [28], [34], GP was used by [56] as a regression model to make the predictive Bayesian analysis straightforward. The Bayesian interpretation of GPs was further enriched and extended due to [36] and [14]. The ability to model complex datasets makes GPR promising in the area of data based process modeling. For instances, spectroscopic calibration [9], development of soft sensors [26], state estimation of Lithium-ion batteries [17] and model predictive control [33] have found its application.

A natural way to model such industrial data is by attributing Gaussian distribution to the noise. The fully Bayesian framework of GP is computationally tractable for the Gaussian noise model. However, in realistic scenarios, the industrial data seldom follows Gaussian distribution as it may contain outliers due to sensor malfunctions and process disturbances or due to data emanating from multiple operational modes of process. To deal with such scenarios, a possible work around is to employ non-Gaussian distributions for modeling the noise dynamics resulting in a more robust model [7]. Various approaches have been followed by different researchers for accommodating outliers while modeling the industrial process data. For instances, [37] has discussed the distributions with thick tails and termed them “outlier- prone” as they reject outlying observations, and [19] proposed a two-model strategy containing a good and a bad sampling distribution to model regular and outlying observations. Further, in similar lines, use of Student's-t distribution, as a heavy tailed distribution to accommodate outliers, has been described by [54], a mixture of two Gaussian distributions is introduced by [8], and Laplace distribution was also used as a noise distribution in [44]. Among the above, appraoches that use mixtures of two Gaussian distributions assume that the regular noise is sampled from a Gaussian distribution having lower variance and outliers are sampled from Gaussian distribution having higher variance [4], [2], [21].

In the context of GPR, [23] investigated the possibility of Student's-t distribution for describing the noise model. [23] applied variational inference, Expectation propagagation (EP) and Markov chain Monte Carlo(MCMC) methods for inference of the GPR model with Student's-t likelihood, extending work of [47]. Moreover, [49] used the Laplace's approximation for approximating log-marginal likelihood of the complete data, while [20] proposed expectation propagation (EP) for the approximate inference of the GPR model with Student's t likelihood. Recently, [40] proposed an EM algorithm based approach for robust GPR identification using non-Gaussian noise distributions, namely, Student's-t and Laplace distribution. In this work, we develop a GPR model with a mixture of two Gaussian distributions as data likelihood. As indicated before, the considered model would capture scenarios like, data with outliers from a contaminated distribution as well data obtained from a process operating in multiple modes, which are not uncommon in chemical processes. Further, we propose to use EM algorithm to learn hyperparameters of the proposed GPR model. The EM algorithm is a powerful approach for obtaining maximum likelihood estimates (MLE) and is useful when the observed data is incomplete or containing hidden or latent variables [10], [29] has beeen widely used in variety of parameter identification problems [27], [16], [46]. Even though [23] investigated the scenario for a mixture of two Gaussian noises model in GPR, the entire focus was on inference rather than determining the model's hyperparameters. In this work we address this lacunae by deriving parameter estimates of GPR model for a mixture of Gaussian likelihood. This work, therefore, concerns in identifying GPR as the model structure for outlier contaminated data while the works [48], [3] used mixture of Gaussian distribution for gross error detection problem which is not related to any GPR modelling problem. To the best of the authors’ knowledge, there exists no approach in literature to estimate the parameters of the GPR with a mixture of Gaussian likelihood. Finally, we also validate our results with synthetic data, a simulated CSTR example and an industrial dataset.

The rest of this paper is organized as follows: Section 2 provides a revisit of GPR. The problem is described in Section 3. In Section 4, an EM algorithm based approach is derived to estimate hyperparamters of GPR. After learning the hyperparameters, a procedure for prediction using test data is discussed in Section 5. Section 6 presents an algorithmic flowchart for the estimation of hyperparameters. In Section 7, three validation studies are presented to verify the efficiency of the proposed GPR model. Summary of our findings and conclusions are provided in Section 8.

Section snippets

Revisit of GPR

The GPR modeling paradigm tries to find a distribution over a set of possible non-parametric functions for modeling a set of input and output data-sets. Suppose we observe some inputs x_i and corresponding outputs f_i, where f_i = f(x_i) represents the unknown underlying mapping function.

Let $x_{i} \in ℝ^{d}$ be the set of inputs for the ith training sample. Then we define a new variable X for the collection of n training samples, having d dimensions as follows: $X = [\begin{matrix} x_{11} & x_{12} & x_{13} & \dots & x_{1 d} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2 d} \\ x_{31} & x_{32} & x_{33} & \dots & x_{3 d} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n 1} & x \end{matrix}]$

Problem statement

This section is allocated for describing the problem statement. Fig. 2 shows the graphical model for GPR with a mixture of two Gaussian noises. The GPR model presented in Eq. (5) assumes the existence of a latent function f(x, θ) mapping the the deterministic input x to the noise free output,f, where θ are the set of underlying hyperparameters. Also, y denotes the observed output which is disturbed by a noise. In this case, the noise term, ϵ, is assumed to be a mixture of two Gaussian

Parameter estimation using the EM algorithm

The EM algorithm consists of the following iterative steps, which are repeated until convergence [10], [29] to obtain approximate ML estimates:

•
E-Step: In this step, the expectation of the logarithm of the likelihood probability of all hidden and observed data with respect to the conditional distribution of hidden data given observed data and the current estimate of the hyperparameters, called Q-function, will be derived: $Q (ϑ; ϑ^{(t)}) = E_{C_{mis} | C_{obs}, ϑ^{(t)}} [log (P (C_{obs}, C_{mis} | ϑ))]$
•
M-Step: In this step, the

Prediction with proposed GPR model

After convergence of EM algorithm, the parameter estimates of the proposed GPR model are obtained, which can be further employed for prediction. To make predictions for given test data, we compute the conditional distribution of function values f₊ corresponding to test input data X₊. To compute the posterior predictive distribution of f₊|y, we need to first calculate the joint distribution of f₊, y as, $P (y, f_{+} | X, X_{+}, ϑ) \sim N ([\begin{matrix} m 1 \\ m_{+} 1 \end{matrix}], [\begin{matrix} K (X, X) + {(diag (σ_{I}))}^{2} & K (X, X_{+}) \\ K (X_{+}, X) & K (X_{+}, X_{+}) \end{matrix}])$ where (diag(σ_I))⁻² is the

Algorithm

Fig. 3 presents the flow-chart of the proposed GPR model parameter estimation for prediction. The first step of the algorithm is to set the initial value for both Gaussian process hyperparamters and noise likelihood hyperparamters. Further, a standard GP is used to train the model from the training dataset and the predictive mean thus obtained, is set as a initial value for f. In the E-step, the posterior probability of P(f|y, X, I) and P(I|y, X) is inferred. Then, noise mode identity vector is

Examples

To evaluate the efficacy of the presented GPR in this paper, we provide simulation examples for various cases, namely, (i) two synthetic datasets with one dimensional and multidimensional inputs, respectively, (ii) a process simulation example of continuous stirred tank reactor (CSTR) and (iii) an industrial example. To statistically characterize the performance, we employ three metrics, namely, mean absolute error (MAE), root mean square error (RMSE), and negative log of predictive probability

Conclusion

In this paper, a Robust GPR with a mixture of Gaussian likelihood has been proposed to model the processes affected by multi-modal noise. Further, we presented an approach based on EM algorithm to obtain the point estimation of the proposed model parameters. Two numerical example and a simulated chemical process have been used to demonstrate the advantage of the proposed method. In addition, the method is applied to model industrial data from SAGD process, which further verified effectiveness

References (58)

T. Chen et al.
Gaussian process regression for multivariate spectroscopic calibration
Chemometr. Intel. Lab. Syst.
(2007)
K. Desai et al.
Soft-sensor development for fed-batch bioreactors using support vector regression
Biochem. Eng. J.
(2006)
J. Gonzaga et al.
Ann-based soft-sensor for real-time process monitoring and control of an industrial polymerization process
Comput. Chem. Eng.
(2009)
F. Guo et al.
Robust identification for nonlinear errors-in-variables systems using the em algorithm
J. Process Control
(2017)
G. Li et al.
Geometric properties of partial least squares for process monitoring
Automatica
(2010)
J. Liu
On-line soft sensor for polyethylene process with multiple production grades
Control Eng. Practice
(2007)
Y. Lu et al.
Robust multiple-model lpv approach to nonlinear process identification using mixture t distributions
J. Process Control
(2014)
R. Ranjan et al.
Robust gaussian process modeling using em algorithm
J. Process Control
(2016)
N. Sammaknejad et al.
A review of the expectation maximization algorithm in data-driven process identification
J. Process Control
(2019)
M.E. Tipping et al.
Variational inference for student-t models: Robust Bayesian interpolation and generalised component analysis
Neurocomputing
(2005)

I. Tjoa et al.

Simultaneous strategies for data reconciliation and gross error detection of nonlinear systems

Comput. Chem. Eng.

(1991)

S. Wold et al.

Principal component analysis

Chemometr. Intel. Lab. Syst.

(1987)

M. Abramowitz et al.

Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Vol. 55

(1965)

D. Agarwal

Detecting anomalies in cross-classified streams: a Bayesian approach

Knowl. Inform. Syst.

(2007)

H. Alighardashi et al.

Expectation maximization approach for simultaneous gross error detection and data reconciliation using gaussian mixture distribution

Ind. Eng. Chem. Res.

(2017)

C.M. Bishop

Novelty detection and neural network validation

IEE Proc.-Vision Image Signal Process.

(1994)

C.M. Bishop

Neural Networks for Pattern Recognition

(1995)

S. Borman

The Expectation Maximization Algorithm – A Short Tutorial

(2004)

G.E. Box et al.

A further look at robustness via Bayes's theorem

Biometrika

(1962)

G.E. Box et al.

A Bayesian approach to some outlier problems

Biometrika

(1968)

A.P. Dempster et al.

Maximum likelihood from incomplete data via the em algorithm

J. Royal Stat. Soc. Series B (Methodological)

(1977)

Ebden, M., 2015. Gaussian processes: A quick introduction. arXiv preprint...

J.H. Friedman

Multivariate adaptive regression splines

Ann. Stat.

(1991)

M.N. Gibbs

Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis

(1998)

Y.-J. He et al.

State of health estimation of lithium-ion batteries: A multiscale gaussian process regression modeling approach

AIChE J.

(2015)

J.-S.R. Jang et al.

Neuro-fuzzy and soft computing-a computational approach to learning and machine intelligence [book review]

IEEE Trans. Automatic Control

(1997)

E.T. Jaynes

Probability Theory: The Logic of Science

(2003)

P. Jylänki et al.

Robust Gaussian process regression with a student-t likelihood

J. Mach. Learn. Res.

(2011)

S. Khatibisepehr et al.

A Bayesian approach to robust process identification with arx models

AIChE J.

(2013)

Cited by (52)

A study on friction induced tribological characteristics of steel 316 L against 100 cr6 alloy under different lubricating conditions with machine learning model
2024, Tribology International
The material steadily wears away from touching surfaces when two solid entities are constantly moving against one other. When more parameters and extreme materials are involved in tribological testing, then it is very difficult to analyze and observe the working phenomena. With this aim, this study uses the gaussian process regression (GPR) approach to estimate friction forces when testing SS 316 L against 100 Cr6 alloy under cryogenic and cryo + minimum amount lubrication conditions. The friction forces from ball-on test experiments were used to develop the prediction models. Then, the wear surfaces and surface morphology are analyzed under cryo and cryo+MQL conditions. The results demonstrated that the combination of MQL and CRYO cooling reduced the friction forces more than 10 times for sliding distances above ∼30 m and loads below ∼25 n. Hence, the cryo+MQL conditions are beneficial in enhancing the tribological features due to the dual cooling and lubricating effects.
A double-layer fault diagnosis strategy for electric vehicle batteries based on Gaussian mixture model
2023, Energy
Battery fault diagnosis is essential to ensure the safe operation of electric vehicles (EVs). In this paper, due to the complexity of EVs’ battery thermal runaway tracing investigation and the limited capacity of on-board computing system, a double-layer fault diagnosis strategy for abnormal cells is proposed. The method bases on probability distribution, which can accurately trace a faulty cell and avoid misinterpreting a normal cell. In this method, unified statistical features are extracted from the big data during vehicle charging, and the corresponding statistical values are analyzed based on Gaussian mixture model and abnormal alarm is made based on the risk accumulation in double-layer diagnostics. The electric vehicles with thermal runaway accident are taken as examples to verify the method, and based on the data of normal-running vehicles, the false alarm tests are carried out. The verification results show that the proposed method can not only successfully identify the outlier cells, but also not generate false alarm, which is conducive to the practical application of fault diagnosis in the on-board battery management system.
Development of advanced machine learning models for optimization of methyl ester biofuel production from papaya oil: Gaussian process regression (GPR), multilayer perceptron (MLP), and K-nearest neighbor (KNN) regression models
2023, Arabian Journal of Chemistry
Data-driven machine learning (ML) methods are extensively employed for modeling and simulation of highly complicated processes. ML techniques confirmed their great predictive capability compared to conventional techniques for modeling and management of non-linear relationships between input and output parameters. Biofuels as renewable sources of energy are a significant potential alternative to fossil fuels. Due to the non-linearity and complexity of biofuels production processes and increasing energy conversion, accurate and fast modeling tools are necessary for design and optimization of these processes. Hence, in this research, ML modeling techniques were developed for simulation of biofuel production from energy conversion of Papaya oil through transesterification process. In order to simulate and optimize the content Papaya oil methyl ester (POME) production, Gaussian Process Regression (GPR), Multilayer perceptron (MLP), and K-nearest neighbor (KNN) regression models, as well as adaptive boosting for amplification, were employed. The temperature of reaction, catalyst quantity, time of process, and methanol to oil molar ratio were considered as the inputs of models while the POME yield was the model output. The obtained results showed that the R²-score of 0.988, 0.993, and 0.994 were obtained for Boosted MLP, Boosted GPR, and Boosted KNN, respectively, which demonstrate the high predictive ability of these models. Also, the RMSE metric error rates of 9.8071, 4.8150, and 6.5180 corresponded to Boosted MLP, Boosted GPR, and Boosted KNN, respectively. We examined performance using another metric, MAE: 8.38008, 2.3184, and 5.21954 errors were observed for Boosted MLP, Boosted GPR, and Boosted KNN, respectively. The optimized POME production yield of 99.89% was observed at temperature of 62.5 °C, 6.47 min of reaction, catalyst quantity of 0.8125 wt% and methanol to oil molar ratio of 10.33. The obtained results of this study show that the ML techniques are highly recommended for prediction of biofuels production as cost and time saving methods.
Development of green technology based on supercritical solvent for production of nanomedicine: Solubility prediction using computational methods
2023, Journal of Molecular Liquids
Supercritical solvent-based engineering integrated nanomedicine has been developed in this study for development of advanced pharmaceutical manufacturing. Analysis of machine learning was employed to estimate the solubility of drug in supercritical CO₂. Indeed, bioavailability of solid-dosage oral formulations is of great importance in which the majority of newly invented drugs possess poor water solubility which make them inefficient for patients. In order to improve drugs bioavailability, their solubility need to be increased which can be done by nanonization of drugs. The method of computation is machine learning in which different algorithms are selected and tuned to best fit the solubility data. For this purpose, we employed a tiny data set with pressure and temperature as input features and solubility as an output. Three models, the Multilayer Perceptron (MLP), the Kernel Ridge Regression (KRR), and the Gaussian Process Regression (GPR), have been employed to examine and model the data. Finally, the tuned models were created by optimizing the three models' hyper-parameters with the help of the Bat optimization algorithm (BA). In the end, the models were assessed using multiple metrics. On the basis of the R² metric, the GPR model was found to be the most effective. In addition, the MAPE criterion yields a final model error of 2.52 × 10^-2, the RMSE criterion yields an error of 1.96 × 10^-2, and the Max error value yields an error of 2.90 × 10^-2.
A novel performance degradation prognostics approach and its application on ball screw
2022, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
It is commonly known that the distribution of extracted sensitive features in low dimensional feature space would change accordingly with the decline of equipment performance [42–43]. Therefore, the distribution of sensitive features in different time periods form different data clusters [44–45]. In this paper, the GMM is constructed by data clusters in different time periods.
The performance degradation prognostics of ball screw means important economic value and engineering application prospect. This paper proposes a performance degradation prognostics method which can be applied on ball screw. A clustering-based ensemble deep auto-encoders (EDAEs) was designed based on the selective ensemble and majority voting to extract features from the acceleration data. Then the sensitive feature distributions of different degradation cycles constitute the Gaussian mixture model (GMM), and the overlap degree of these distributions can be calculated to construct the health indicator. Finally, deep forest algorithm was introduced to achieve trend prognosis of health indicator. Meanwhile, the validity of the proposed method is confirmed by whole life cycle data of ball screw. The experimental results demonstrate that the proposed method can accurately identify the risk level of performance and realize the performance degradation prognostics.
An adaptive surrogate model-based fast planning for swarm safe migration along halo orbit
2022, Acta Astronautica
Taken the optimal assignment, trajectory optimization and collision avoidance into consideration, this research focuses on the fast-planning method for swarm safe-migration along Halo orbit. The swarm safe-migration planning problem is an inherently high-dimensional nonlinear problem. Repeatedly solving the bottom layer’s collision-free trajectory optimization problem causes heavy computation burden in the traditional two-layer optimization framework. To reduce the computation cost and keep the optimization performance, an adaptive surrogate model-based swarm safe-migration planning method is developed in this paper. The surrogates fitted to the collision-avoidance constraints are investigated and the input variables of the surrogates are analyzed and selected for improved efficiency. In particular, a classifier-based surrogate management strategy is designed for the swarm safe-migration problems to improve the optimization capability of the proposed algorithm. Finally, the proposed method is applied to two real-world swarm migration problems, i.e., a reconfiguration problem with geometric constraint and an assignment optimization problem. Numerical results indicate that, compared with the optimal result from the widely-used Genetic Algorithm method, our method could obtain a similar solution within only about 4% of the computational time.

View all citing articles on Scopus

View full text

Gaussian process modelling with Gaussian mixture likelihood

Highlights

Abstract

Introduction

Section snippets

Revisit of GPR

Problem statement

Parameter estimation using the EM algorithm

Prediction with proposed GPR model

Algorithm

Examples

Conclusion

Chemometr. Intel. Lab. Syst.

Biochem. Eng. J.

Comput. Chem. Eng.

J. Process Control

Automatica

Control Eng. Practice

J. Process Control

J. Process Control

J. Process Control

Neurocomputing

Comput. Chem. Eng.

Chemometr. Intel. Lab. Syst.

Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Vol. 55

Detecting anomalies in cross-classified streams: a Bayesian approach

Knowl. Inform. Syst.

Expectation maximization approach for simultaneous gross error detection and data reconciliation using gaussian mixture distribution

Ind. Eng. Chem. Res.

Novelty detection and neural network validation

IEE Proc.-Vision Image Signal Process.

Neural Networks for Pattern Recognition

The Expectation Maximization Algorithm – A Short Tutorial

A further look at robustness via Bayes's theorem

Biometrika

A Bayesian approach to some outlier problems

Biometrika

Maximum likelihood from incomplete data via the em algorithm

J. Royal Stat. Soc. Series B (Methodological)

Multivariate adaptive regression splines

Ann. Stat.

Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis

State of health estimation of lithium-ion batteries: A multiscale gaussian process regression modeling approach

AIChE J.

Neuro-fuzzy and soft computing-a computational approach to learning and machine intelligence [book review]

IEEE Trans. Automatic Control

Probability Theory: The Logic of Science

Robust Gaussian process regression with a student-t likelihood

J. Mach. Learn. Res.

A Bayesian approach to robust process identification with arx models

AIChE J.