# Gaussian process-based model

#### 1. Introduction:

The aim of this report is to present the results of predicting hospitalisations and admissions to the intensive care unit in the Basque Country. The problem has been posed as a time series prediction and has been carried out by means of Gaussian processes. Our model is a Bayesian model that does not use any parameter related to the disease, that is, it does not use parameters such as probability of infection, time from infection to the appearance of symptoms, etc.

The decision not to use more complex models is based on the existing uncertainty regarding the values of these parameters. In this sense, the model we propose is a mainly agnostic model based on the data available. It is possible, however, to introduce information a priori into the model, and this is done by assuming an expected value for each of the values to be predicted.

The model we present here has quite a few limitations. First, although it could be used to predict over the long term, we believe that predictions beyond four or five days will accumulate so many errors that they will be meaningless. On the other hand, the model is not capable of detecting a change in trend. If such a change occurs, it will initially produce bad estimates and, after two or three measurements, it will start to generate quality predictions again.

**Predicting the incidence of hospitalizations: **

**Predicting the incidence of ICU admissions: **

**Predicting the prevalence of hospitalizations: **

**Predicting the prevalence of ICU admissions: **

#### 2. Gaussian processes:

A Gaussian (GP) process is a stochastic process defined by a collection of random variables, such that any finite random variable set follows a multivariate Gaussian distribution. It can be interpreted as a distribution over functions, and each sample is a function.

GPs are defined by an average function *m*(**x**) and a covariance function that depends on the semi-defined positive kernel *k*(**x**, **x**′). Therefore, a GP is represented as follows:

assuming that **x** ∈ ℝ* ^{d}*.

In this case, it is used a Bayesian model and its distribution is obtained later from a set of training data. The joint distribution of the training results **f**=(*f*_{1}, *f*_{2},..., *f*_{n}) (where f_{i }∈ R, *i* ∈ {1, ..., *n*} y * n* ∈ N) and the results of the * f*_{*}=(*f*_{n+1}, *f*_{n+2}, ..., *f*_{n+n*}) are given by:

where N(μ, Σ) is a multivariate Gaussian distribution, *X*=(**x _{1}**,

**x**, ...,

_{2}**x**) (

_{n}**xi**∈ ℝi,i∈1,...,n y n∈ℕ) corresponds to the training data set and X

_{∗}=(x

_{n+1},x

_{n+2},...,x

_{n+n∗}) to the test data set.

*K*(X, X

_{∗}) is the matrix of covariances

*n*×

*n*for each pair (

_{∗}*X,X*).

_{∗}The predictive Gaussian distribution is obtained from the conditioned distribution given the training data set and the input test set:

To adequately model the data, different media functions are considered as a priori information for the model, as follows:

- Logistic function:

where *L* is the maximum value of the curve, *k* is the growth rate of the curve and *x*_{0} is the midpoint of the sigmoid curve.

- Gompertz function:

where *a* is the maximum value of the curve, *b* sets the displacement along the x-axis, and *c* is the growth rate.

#### 3. Model selection:

To select the parameters of these averages a priori *m*(·), the data set is divided into two. Through the first of them, the values of the parameters are learned by going through a range of possible candidates, and those that minimise the prediction error of the second set, the validation er- ror, are selected. Once the parameters are optimized, we proceed to learn a new model with all the data set. The aim here is to adjust the hyperparameters of the kernel *k*(·, ·) by maximizing the marginal likelihood. In particular, the Quadratural Exponential kernel and the Matern 52 kernel have been used:

- Squared exponential Kernel item:

where *θ*_{0 }is the amplitude parameter, *θ*_{1} is the length parameter and *θ _{n}* is the noise parameter.

Kernel Matern 52:

where *θ*_{0 }is the amplitude parameter, *θ*_{1} is the length parameter and *θ*_{n} is the noise parameter.

From all models that are learned, the final model selected to make the prediction is that of maximum likelihood.