Gaussian processes methods for nostationary regression

Muñoz González, Luis

Gaussian processes methods for nostationary regression

Muñoz González, Luis

Dirigida por:

Miguel Lázaro Gredilla Director/a
Aníbal Ramón Figueiras Vidal Director/a

Universidad de defensa: Universidad Carlos III de Madrid

Fecha de defensa: 23 de septiembre de 2014

Tribunal:

David Ríos Insua Presidente
Joaquín Miguez Arenas Secretario/a
Pedro Larrañaga Múgica Vocal

Tipo: Tesis

Teseo: 367371 DIALNET e-Archivo editor

Resumen

Gaussian Processes (GPs) are a powerful nonparametric Bayesian tool for nonlinear regression. As it is common in most regression approaches, GPs models observations as the sum of some unknown (latent) function plus Gaussian noise. Unlike other regression methods, GPs proceed in a purely Bayesian fashion to infer the posterior distribution of the unknown function through the likelihood and a Gaussian prior distribution placed over this unknown function. One of the strengths of GPs is that they produce probabilistic predictions, i.e., average and dispersion values, in a natural way. On the other hand, they usually employ a reduced number of hyperparameters, that can be tuned with a simple continuous optimization of the evidence: This makes them resilient to overfitting. Unfortunately, GPs cannot be applied to large-scale data sets due to their O(N3) time scalability, limiting the scope of application to data sets with a few thousands samples (using present desktop computers), although sparse approximations allow to use GPs in bigger data sets. The standard GP regression is formulated under stationarity hypotheses: The noise power is assumed constant throughout the input space and the covariance of the prior distribution is typically modeled as dependent only on the difference between input samples. This stationary assumption can be too restrictive and unrealistic for many real-world applications. Pursuing nonstationarity, in this Thesis we propose a Divisive GP (DGP) model, where two GPs are combined to achieve amplitude nonstationarity and heteroscedastic regression. The posterior of the DGP model is analytically intractable, so that approximate inference techniques or Markov Chain Monte Carlo (MCMC) methods are needed to make inference on the model. One of the advantages of the DGP model is that the likelihood is log-concave, which leads to a unimodal posterior when combined with a GP prior. This favors the convergence of approximate inference algorithms as Expectation Propagation (EP) or the Laplace method. We first propose EP-DGP, an EP posterior approximation to make inference on the DGP model. The experimental results show the high quality of the EP posterior approximation compared to an MCMC implementation using Elliptical Slice Sampling (ESS) for the same model, but at a reduced cost. The experimental results on different (homoscedastic and heteroscedastic) data sets show the improvements of the proposed method compared with the state-of-the-art methods in heteroscedastic GP regression and the standard GP. However, the computational burden of EP-DGP is high compared to the standard GP or other similar variational approximations for heteroscedastic regression. We also propose to use the Laplace approximation for the DGP model. The characteristics of the likelihood make the posterior have quite a Gaussian shape, which allows that the Laplace approximation (L-DGP) provides accurate posterior approximations, as in the case of EP-DGP, but at a reduced cost. Finally, we have also applied the Laplace approximation to make inference on a GP model for volatility forecasting in financial time series, which is a direct application of heteroscedastic regression methods. The use of the Ornstein-Uhlenbeck covariance function, suitable to model the behavior of this kind of time series, allows the Laplace implementation to scale linearly with the number of samples. As in the case of L-DGP, the characteristics of the likelihood make the Laplace approximation an accurate inference procedure, but at a reduced computational load, compared to the MCMC method applied to the same volatility model. The experimental results corroborate the good performance of the Laplace method compared to other similar GP algorithms, reducing the computational burden, and showing better prediction capabilities than the commonly used Generalized AutoRegressive Conditional Heteroscedastic (GARCH) models in volatility forecasting.