SKELLAM PROCESS MODELING FOR FINANCIAL HIGH-FREQUENCY DATA

High-frequency data are observations collected at fine time scale. Such data largely incorporates pricing and transactions, of which institutional rules prevent from drastically rising or falling within a short period of time. This results in data changes based on the measure of one tick, a measure of the minimum upward or downward movement in the price of a security. The discreteness brings that the observations are in Z. A Skellam distribution has a unique property that returns values in Z. We are interested in studying the Skellam process where the time-dependent intensities are Gaussian process. Such doubly stochastic Poisson process, also known as Cox process, is a point process which is a generalization of a Poisson process. We then investigate if this Skellam model provide better fit to high frequency financial data and how Gaussian process can capture the market microstructure.

The probability mass function of a Poisson distribution with given intensity The positive real number λ is equal to the expected value and also to its variance.
Definition 1.2. The Skellam distribution [12], is the discrete probability distribution of the difference of two independent Poisson distributions. Each Poisson distributed with respective intensities λ 1 and λ 2 .
The probability mass function of S(λ 1 , λ 2 ) is is the modified Bessel function of the first kind. The expected value is λ 1 − λ 2 , while the variance is such that the Skellam distribution is symmetric when λ 1 = λ 2 , right-skewed when λ 1 > λ 2 , and left-skewed when λ 1 < λ 2 .

The Gaussian Process
A Gaussian process (GP) is a stochastic process, which provides a powerful tool for probabilistic inference directly on distributions over functions and which has gained much attention in recent years. While being well known and extensively used in the statistical literature (for instance in spatial and spatio-temporal models) Gaussian processes have received considerable attention in the machine learning literature in several nonlinear regression and classification problems [11].
It offers a flexible non-parametric Bayesian framework for estimating latent functions from data.
and will write the Gaussian process as

Main goals
Market microstructure is the study of financial markets and how they operate.
It is one of the fastest growing fields of financial research due to rapid development of algorithm and electronic trading. Such high speed evolution of financial markets results in tremendously amount of trading venues and exchanges. The major thrust of market microstructure research examines the ways in which the working processes of a market affect determinants of transaction costs, prices, quotes, volume, and trading behavior. We introduce a new modeling extension for the Skellam process, which includes a non parametric GP specification for the log intensity functions. We test the new model with simulated data. For illustrative purposes we show also the analysis of some real data, collected at moderate frequency.
Chapter 2 describes the Skellam process model with Gaussian process priors.
We then introduce the Laplace approximation and MCMC methods to approach the conditional posterior with Skellam likelihood.
Next, we use simulated data to investigate the performance of the Skellam model in terms of prediction in Chapter 3. In addition, we compare the prediction between Gaussian likelihood and Skellam likelihood.
Chapter 4 presents the empirical results related to model fit and the forecasting performance on three high frequency datasets from S&P 500 index, NASDAQ, and IBM stock trading data in 1-min frequency.
In the last chapter, we will discuss the results and direction of future research. When the likelihood is also a Gaussian, the posterior over the latent function is described by a new GP that is obtained analytically. In all other non-Gaussian cases, e.g. Skellam likelihood, exact inference is intractable and approximate inference methods are needed. In this chapter, we introduce the Laplace approximation and Markov chain Monte Carlo (MCMC) algorithms for inference in GP models.

List of References
Without loss of generality, we denote the Skellam process where λ 1,t = exp(f 1,t ) and λ 2,t = exp(f 2,t ). Let f = {f 1 , f 2 } be the latent variables in the Skellam process and θ i the hyperparameters of f i .

The Model
In this Skellam model, we consider the intensities λ 1 and λ 2 are exponential of Gaussian processes. We then summarize the model as follows: Observation model: Y|f ∼ n t=1 s(y t ; λ 1,t , λ 2,t ) = n t=1 s(y t ; exp(f 1,t ), exp(f 2,t )) GP prior: , hyperparameter θ i and the covariance function k(x, x |θ i ), the GP prior over function f i (x) have a multivariate Gaussian distribution Assume that we want to predict the valuesf i at new input locationsX. The joint prior for latent variables f i andf i is Therefore, the conditional distribution off i given f is The posterior distribution contains all information about the latent function delivered from the data D = {X, y}, Given the conditional posterior distribution p(f |D, θ), we can evaluate the posterior predictive distribution from the conditional mean Ef |f ,θ [f (x)] = k(x, X)K −1 f ,f f and the posterior predictive covariance between any set of latent variables,f is The posterior predictive distribution is To approximate the predictive posterior distribution, we will introduce Laplace approximation and MCMC in the next 2 sections.

Laplace approximation
Laplace approximation can compute very accurate approximations to the pos- Using K −1 f ,ff = ∇ log p(y|f )| f =f , the approximate posterior predictive distribution is given byf The approximate conditional predictive density ofỹ i can now be evaluated with quadrature integration over eachf 1,i andf 2,i separately.

Markov Chain Monte Carlo
In MCMC methods, one constructs a Markov chain whose stationary distri- We first sample from the conditional posterior distribution Next, we sample the hyperparameters in the latent variables (GPs). Sampling from such distributions is carried out by using some proposal distribution, for instance a log uniform.
After having the posterior sample of latent variables, we can sample from the posterior predictive distribution of any set off by sample with each f (j) from p(f |f (j) , θ). We can obtain a sample ofỹ from p(ỹ|f , θ) ∼ S(expf 1 , expf 2 ). This toolbox is designed to allow adding new model blocks in the package, e.g.

List of References
likelihood, covariance function, and prior. Therefore, we add Skellam likelihood into this package and use Laplace approximation and MCMC.

Dataset I
Consider the most widely-used covariance function, the squared exponential The generating process is describe as below: we select different values for the length scale l and the magnitude σ 2 to generate GPs. Let and .
f 1 and f 2 are Gaussian process with covariance functions k 1 and k 2 , respectively.
We now let the Skellam model  N (0, σ 2 )) are the GPs with noise σ = 0.1. Figure 1 is the generated dataset.

Laplace approximation
We now fit this dataset into the Skellam model via Laplace approximation.
We simply assign log uniform distribution to all hyperparameters of the Gaussian priors. Then the Laplace approximation to the posterior distribution which can estimate the latent values and prediction of y. We will discuss the comparisons between the actual values of the latent variables and y below.
Recall that latent values are generated via Gaussian process. The intensities in the Poisson distributions are created by adding noise to the latent values. Figure   2 compares the latent values between true values and predicted values. In Figure   3, we can see that the prediction of Skellam model is well centered in the noisy true values.
Comparison of the difference of Poisson distributions and the difference of the predicted latent values, Eλ 1 − Eλ 2 , are shown in Figure 4. In this figure, we also find the comparison between y and difference of Poisson processes.

MCMC approach
We now use MCMC to approximate the model. Although MCMC is a powerful algorithm in Bayesian methods, sampling posterior with MCMC is still time- consuming. Therefore, we use less observations, 600 observations, to process the MCMC. Unlike the Laplace approximation, assigning a log uniform prior to the logarithm of the hyperparameters of the covariance function lead to inefficient sampling. Therefore I used a N (0, 2).
Sampling multiple hyperparameters can be a complex process. It sometimes is difficult to converge well and hard to predict well on all hyperparameters. Figure   5 shows the trace plots of hyperparameters of the Gaussian priors and Figure 6 are the posterior distributions of each hyperparameters. Recall that, in Figure 5, σ 2 i and l i correspond to the hyperparameters in the covariance functions ). Table 1

Dataset II
We now consider one of the covariance function to be the periodic covariance where the parameter γ controls the inverse length of the periodicity and l the smoothness of the process. We simulate the data with one squared exponential and one periodic covariance functions Again, the Skellam model is where λ i = exp(f i + N (0, σ 2 )) and f i is GP with covariance function k i .
The generated data is shown in Figure 9.

Laplace approximation
The comparison of true and predicted latent values is in Figure 10, and the prediction of Skellam model,ŷ can be seen in Figure 11. Comparison of the difference of Poisson distributions and the difference of the predicted latent values, Eλ 1 −Eλ 2 , are shown in Figure 12. In this figure, we will again find the comparison between y and difference of Poisson processes.
). Table 2 present the mean and 95% CI in the sampled hyperparameters. Figure   15 compares the true latent values and predicted latent values. Figure 16 gives predictions of the second dataset.

Skellam likelihood and Gaussian likelihood
According to the problems that we are interested in, we fit the data to the model that suitable to our purpose. In this section, we fit the datasets into a single latent variable model, Gaussian likelihood. Gaussian likelihood might be cheaper in computation, but it obvious can not model the intensities in the Poisson processes. Therefore, we use Gaussian likelihood only to compare the predictive distribution. Figure 17   We will discuss the model fitting for each datasets below. We use 5000 observations to fit the model via Laplace approximation and 1000 observations via MCMC approach. In this chapter, we assign Gaussian priors with squared exponential covariance functions. That is,

Discussion
These three datasets present relatively quiet market. Figure 23, 27, and 32 show that the predictions of latent values are almost overlapping to each other.
While fitting the three datasets into Skellam model via Laplace approximation, we faced computational stability challenges in NASDAQ and IBM datasets. It might be caused by the larger range of price changes in NASDAQ and IBM. We need further study to understand the issues. In our simulation study, the two algorithms perform very well in estimating the latent trajectories of the intensity functions While we fitting the Skellam model into S&P 500, NASDAQ, and IBM dataset via Laplace approximation approach, the time scale was sensitive. We had to adjust the scale on time variable. However, we haven't find the general rule in scaling the time variable.
Future research directions: 1. Improve the computational efficiency of the Skellam process due to the high volume of data.
2. Discover the subtlety of time variable among the indices and stocks datasets.