Bubble Modeling by Mixed Casual-Noncausal Autoregressive Processes

Some financial time series exhibit short periods of explosive local trends followed by an abrupt decline. Such trends can be a result of speculative bubble phenomena. A bubble is formed when investors’ future profits expectations influence the present market value of securities. Mixed causal-noncausal autoregressive processes (MAR) are able to better capture such behavior in comparison to standard causal ARIMA models. In the first part of this work we propose an alternative distribution (Voigt) to model the disturbances in the MAR processes. The Voigt, a convolution of Gaussian and Cauchy distributions, is used in atomic and molecular spectroscopy, and is more flexible than other heavy-tail distributions. The second part of this work extends the MAR models to Markov switching mixed causal-noncausal autoregressive processes (MSMAR) with Cauchy distributed errors to account for changes in regime at different times. Parameter estimation of both models MAR with Voigt errors and MSMAR is performed in a Bayesian framework via MCMC algorithms. The models are tested for performance with a simulation study and then applied to Bitcoin/USD exchange rate data.

Introduction Bubble effects have been known primarily as a financial phenomenon. In the New Palgrave: A Dictionary of Economics a bubble is defined as: "A sharp rise in the price of an asset or a range of assets in a continuous process, with the initial rise generating expectations of further rises and attracting new buyers -generally speculators interested in profits from trading in the asset rather than its use or earnings capacity." [1] Peter M. Garber provided an overview of several bubbles in his "Famous First Bubbles" paper [2]. One of them and arguably the most famous is "tulipmania"  (1720) can be found in [2].
It is important to study bubbles as they have a significant effect on human life and national economy, as it can be seen from "tulipmania" example.
According to [3] there are three parts in a bubble phenomenon: an object that lies in the center of the phenomenon, and two object's attributes which are called external and driving values. In the tulipmania example, the object is the tulip. The external value of the tulip is its market value, the price payed for the bulb. The tulip's driving value is the people's expectation of the profit that can be made by reselling it in the near future. The external value and driving value are not related to the "true" value of a tulip, which is called a "fundamental value".
A bubble appears when a positive feedback loop is formed between the object's driving value and its external value [3]. In other words, people's expectations of future profits influence the present market value of an object and vice-versa. A feedback loop is a fundamental notion in system theory that emerges in fields such as organizational theory, electrical engineering, epidemiology, etc. [3].
Mixed causal-noncausal autoregressive processes take into account the positive feedback by allowing present values to depend explicitly on future and past values. In contrast, purely causal autoregressive processes force the variable to depend only on past values. Due to explicit dependence on future the noncausal autoregressive processes often provide a better fit to economic time series where future expectations play a central role, such as bubble phenomena, in comparison to standard purely causal ARMA models [4]. Several authors attempted to model bubble effects by mixed and noncausal autoregressive processes.
[7] Y. Ouyang Where L and L −1 are the "lag-backward" Ly t = y t−1 and the "lag-forward" L −1 y t = y t+1 operators, respectively. The disturbances t are assumed to be a white noise with some distribution f (·|η η η) parameterized by η η η. In order for the process (1) to be stationary the roots of the noncausal Ψ(L −1 ) = 1 − s j=1 ψ j L −j and causal Φ(L) = 1 − r j=1 ψ j L j polynomials must lie outside of the unit circle [1]. The noncausal and causal polynomials can be inverted and the MAR process (1) can have an infinite two-sided moving average representation in terms of the disturbances: The weighted average at time t is over the disturbances at current, past and future times. The errors t can have a distribution with infinite variance and expectation [2]. Noncausal and causal parameters of the MAR process (1) are non identifiable in the case when disturbances { t } are Gaussian white noise [1], see [3] for discussion.
Purely causal or (backward looking) autoregressive processes, as considered in Box-Jenkins methodology [4], are widely used in time series analysis and are special cases of the mixed AR processes. This can be easily seen by setting the noncausal coefficients ψ to 0 in equation 1, so it reduces to a purely causal autoregressive process, On the other hand, when all of the φ coefficients are equal to 0, the process (1) reduces to a purely noncausal (forward looking) autoregressive process,

Voigt Distribution
As was mentioned previously, the causal and noncausal coefficients of the MAR processes are non identifiable under the Gaussian assumption for the errors.
Because of the explosive behavior of bubbles and the identifiability problem a typical choice for the distribution of errors found in the literature is Cauchy and t-distribution, see for example [2], [1], [5]. In this work we want to compare the performance of MAR with Voigt errors to the MAR with Cauchy errors.
Voigt distribution is named after Woldemar Voigt, a German physicist. It is used in physics for atomic and molecular spectroscopy [6]. In particular, the Voigt line shape is used to model the distribution of photoelectron energies in x-ray photoelectron spectroscopy (XPS) where the data generating process is assumed to be a convolution of an instrumental function and the intrinsic line broadening mechanism.
Specifically, let X 1 ∼ G(µ, σ) and X 2 ∼ C(0, γ) be independent Gaussian and Cauchy distributed random variables, then a sum of X 1 and X 2 ,X = X 1 + X 2 is Voigt distributed. It's density has the following form: Where f C (x; l, γ), f G (x − x; 0, σ) are Cauchy and Gaussian densities, respectively.
Since the Cauchy and Gaussian densities are normalized, the corresponding convolution (Voigt) is also normalized.
Similarly to the Cauchy, the Voigt distribution does not have finite moments.
The parameter l can be interpreted as the location of the maximum of the density.
In the physics literature, as a measure of the spread of the Voigt distribution is taken a half-width at half-maximum (HWHM).
The Voigt density as a convolution of C(0, 1) and G(0, 1) is shown in figure 1a and it can be seen that it is lower than Gaussian and Cauchy near the center of the distribution and larger than Cauchy and Gaussian in it's tails. Far in the tails the difference between Cauchy and Voigt is negligible. The reason for such behavior is that the density of the Gaussian distribution decreases exponentially away from the center, so the contribution of the Gaussian density becomes negligible in comparison to the tails of Cauchy.    and for values of r 1 the Voigt distribution is approaching Cauchy. Such property of the Voigt density makes it more flexible than Cauchy and Gaussian in modeling distribution of disturbances.

Estimation of the Parameters of the MAR Processes
In this work parameter estimation was performed within Bayesian framework.
An alternative method for parameter estimation of the MAR processes using maximum likelihood can be found in [13].
Bayesian formalism requires specification of an observational equation which is called "likelihood" if viewed as a function of the parameters and prior distributions which express our uncertainty about the parameters of interest before the data is observed.
Combining the distribution of errors and the definition of the MAR process (1), the joint likelihood of the mixed causal-noncausal processes has the following form: where ψ = (ψ 1 , ψ 2 , ..., ψ s ) T and φ = (φ 1 , φ 2 , ..., φ r ) T are vectors of noncausal and causal autoregressive coefficients respectively. The distribution of disturbances is parameterized by η. Due to the dependence of the present observations on the past and future observations in MAR(s, r), the error terms t can not be computed for the first r and last s observations, unless the marginal distribution is available at these time points. For a relatively long time series this problem can be solved by approximating the likelihood (4) by conditioning on initial r and last s observations leading to the so called conditional likelihood: where,ỹ is the vector of the first r and last s observations. In case of the Voigt distributed errors t the conditional likelihood (5) becomes: The unknown parameters of the MAR process are causal φ φ φ and noncausal ψ ψ ψ coefficients and a vector of nuisance parameters η η η = (σ, γ) that parameterize the Voigt distribution. A prior specification for parameters of the MAR p(ψ ψ ψ, φ φ φ, η η η) completes the Bayesian specification and leads to the following posterior distribution: The prior in (7) is assumed to be have an independent structure so it factorizes as As mentioned previously the Voigt distribution does not have an analytic expression, so conjugate priors are not available. A particular choice of priors is based on the range that the parameters can take as well as computational convenience.
In this work the prior uncertainty in causal and noncausal parameters is reflected by the multivariate normal distributions, that is: In order for the MAR process to be stationary, solutions of the causal and noncausal polynomials must lie outside of the unit circle. Thus the priors must be chosen in such a way that they put a high probability mass on a range of causal and noncausal parameters that lead to stationary MAR processes. In the case of MAR(1, 1) a reasonable prior for both causal and noncausal coefficients would be normal with mean µ = 0 and variance σ 2 = 0.25 that puts a 95% of the probability mass on the range of values between -0.980 and 0.980.
Since σ and γ must be strictly positive that is η = (σ, γ) T ∈ R 2 >0 , reasonable priors are Gamma distributions with shapes ν 1 , α 1 and rates ν 2 , α 2 : Particular values of the hyperprior parameters ν 1 , α 1 and ν 2 , α 2 are chosen in such a way that the prior puts a hight probability mass on a reasonable range of values.
For example, if we believe that the reasonable value for the Gaussian standard deviation is 1 with an uncertainty in this value of about 0.25, then by choosing ν 1 = 16 and ν 2 = 16 a 95% of the probability mass would be concentrated on the range of values between 0.5716 and 1.5463.
Combining the priors (8) - (11) and the likelihood (6) we are interested in estimating the following posterior distribution: Since the Voigt density does not have a closed form, it is not possible to get an analytic expression for the posterior (12), although it can be approximated by Monte Carlo methods. In particular, a hybrid of Metropolis, Metropolis-Hastings and blocked-Gibbs algorithms can be used to sample from the posterior distribution (12). The dimensionality of the joint posterior distribution (12) can be broken by considering the following full conditional distributions: Sampling from the full conditionals (13) - (16) follows the structure of a blocked Gibbs algorithm. Unfortunately none of the full conditional distrubtions (13) - (16) have a closed form, thus a Metropolis algorithm with normal proposals can be used to sample from (13), (14) and Metropolis-Hastings to sample from (15), (16). Reasonable proposal distributions for σ and γ would be Gamma.
The MAR processes have two free parameters, noncausal and causal orders s and r that require selection. In this work, the selection of s and r is based on minimizing the Deviance Information Criterion (DIC). The DIC tries to estimate the out-of-sample predictive performance of the model. It can only provide an insight about the model performance relative to another model. There are several defenitions of the DIC, the one that is used in this work can be found in [14] and is defined as: where D(y y y) = −2 log(p(y y y|η η η)) is deviance and p d can be interpreted as the effective number of parameters, or a measure of complexity. The advantage of defining the measure of complexity p d as in (17) is discussed in [15].

Simulation of the MAR Processes
The simulation of the mixed autoregressive processes is based on partial fraction decomposition [2]. The MAR can be decomposed into two unobserved processes. Let u t = Φ(L)y t and v t = Ψ(L −1 )y t , the equation (1) can be rewritten as: The process u t is y t -causal and t -noncausal, the process v t is y t -noncausal and t -causal. Mixed AR process y t can be simulated by recovering from the unobserved components u t and v t . Using partial fraction decomposition and the unobserved processes {u t } and {v t }, the mixed AR processes has two equivalent representations which are useful for simulations:

Noncausal representation
For example, using the causal representation of the mixed AR processes, the simulation of the MARV(1, 1) can be performed in the following steps [2]: For t = t 0 + 1, · · · , T − 1, obtain y t by computing:

Simulation Study
The simulation study is first performed on the data that was simulated from the MAR(1, 1) process with V(0, 1, 1) errors, to asses the behavior of the MARV models under the model misspecification and to check if the DIC will be able to select the true model. The shape of the V(0, 1, 1) can be seen from the figure   1a and is a convolution of the standard Gaussian and standard Cauchy densities.
More specifically, for this simulation study the true data generating process is the following: Assuming that the true parameters of the process (22) are unknown the models that we choose to fit are: 1. MARV(0, 1) which is a regular causal AR with Voigt distributed errors, 3. Mixed causal-noncausal AR processes, MARV(1, 1) 4. A higher, or second order mixed causal-noncausal AR process, MARV (1, 2) ( After running an MCMC algorithm, the estimated parameters of the models (23) - (26)  Parameter   From the estimates of the DIC criterion, the MARV fits the data better when the simulated process has the Gaussian standard deviation much greater than the    (c) A histogram of the true disturbances with estimated densities of the MARV and MARC models for the case when r 1.
(d) A histogram of the true disturbances with estimated densities of the MARV and MARC models for the case when r 1. Figure 3: Simulation study of the MARV processes with the extreme ratios of r.

Analysis of the Bitcoin/USD Exchange Rate Data
In this section, the MARV and MARC models are applied to the Bitcoin/USD exchange rate, in order to compare their performance on a real data. The Bitcoin(BTC) is an electronic currency created by Satoshi Nakamoto and is one of the so called "cryptocurrencies". These type  It can be seen from figure 4 that the exchange rate series have a global trend with locally explosive behavior and a bubble that burst at the beginning of April 2013. Since the MAR is a stationary process, the global trend in the data needs to be subtracted before the MAR model can be used to fit the data. The global trend is modeled with a polynomial function of the time as in [1].
The residuals y t of the model (27) The MARV(2, 2) model was able to remove the autocorrelation in the data that can be seen by comparing the ACF of the detrended series, figure 5d with the ACF of the residuals from the best model fit shown in figure 7c.      in order to use the MAR models, the time series must be detrended. Since the dynamics is quite complicated, it is hard to assume any functional dependence on time as was done previously. Rather than assuming any particular form of the underlying deterministic trend, we used a nonparametric local constant least squares estimator (LCLS), also known as Nadaraya-Watson smoother. The only parameter that controls the smoothness of the LCLS estimator is a bandwidth.
For detailed overview of the properties of the LCLS refer to [16], [17]. By incorporating the nonparametric trend m(t) with an MAR stochastic component y t a full specification of the model for the exchange rate u t becomes: where h is a bandwidth and K h (·) is a kernel (Gaussian in our case). The nonparametric trend is not a part of the MAR model, so the bandwidth and the coefficients of the MAR were not estimated simultaneously. In this settings, the LCLS smoother acts rather as a data preprocessing tool. The bandwidth h was chosen 40.5 to smooth the data just enough to account for the global trend without smoothing the bubble dynamics. The residuals after subtracting the trend, are assumed to follow an MAR process.
There were several models considered for fitting the exchange rate series. Table 5 contains posterior means and 95% quantile based coverage intervals for the parameters of the models under consideration. By comparing the DIC of the estimated models, overall MARC models slightly outperformed MARV models. Based on the DIC the best performing model is purely noncausal MARC(1, 0), that is: The best MARC(1, 0) fit and and the residuals are shown in figure 9. Although, visually, the fit looks good, the model was not able to completely remove the autocorrelation structure in the series, which can be seen from the plot of the autocorrelation function of the residuals (figure 9e). Based on the pattern that the autocorrelation function exhibits, it is possible that there is some seasonality in the data, which is not modeled by the MARC(1, 0) model.     [2] C. Gourieroux and J. Jasiak, "Filtering, prediction and simulation methods for noncausal processes," Journal of Time Series Analysis, 2015.

The MSMAR processes
Introduced in this chapter, the MSMAR process is a more general case of the MAR. The mixed causal-noncausal processes allow for an explicit dependence of the present values on the past as well as the future. A MAR of order (s, r) is defined as: To account for different regimes, in the correlation structure, at different times, the MAR process can be extended to MSMAR process by introducing a hidden regime (state) indicator. More specifically, let S t ∈ {1, 2, ..., K} be a latent state indicator with a transition matrix ξ ξ ξ, then the MSMAR of order (s s s, r r r, K) is a stochastic process {y t : t ∈ Z} defined as: The transition matrix of the latent state variable S t has the following form: where P (S t = l|S t−1 = m) = ξ ml for l = 1, . . . , K; m = 1, . . . , K. Every row of the transition matrix must satisfy K l=1 ξ jl = 1 and ξ jl ≥ 0 for all j, l = 1, 2, . . . , K, so it is a probability mass function. It is also assumed that the rows are independent of one another.
As in the case of the MAR processes, the polynomials Ψ St (L −1 ) = 1 − L −i are causal and noncausal in the lag-forward and the lag-backward operators respectively. The flexibility of the MS-MAR processes is due to the fact that the model can incorporate regime specific orders of causal and noncausal dependence, as well as regime specific parameters of the distribution of errors. The MSMAR process can include a mixture of purely causal with purely noncausal and mixed AR processes at different regimes. In this work the switching is modeled in the correlation structure rather than in the level, although the level can be also incorporated in the model.
In order for the MSMAR process to be regime stationary, the roots of the causal and noncausal polynomials must lie outside of the unit circle.

Parameter Estimation
Parameter estimation is carried out within Bayesian framework. The joint sampling distribution of the observations and the latent state variables, as a function of the parameters, is called a "complete data likelihood" [1] and, in general, has the following form: p(y y y, S S S|ψ ψ ψ, φ φ φ, γ γ γ) = p(y y y|S S S, ψ ψ ψ, φ φ φ, γ γ γ)p(S S S|ξ ξ ξ) In case of the MSMAR with the Cauchy distributed errors { t }, the joint density of the time series given a vector of states becomes: p(y y y|S S S, ψ ψ ψ, φ φ φ, γ γ γ) = K k=1 t:St=k Due to the recursive structure of the process, the errors { t } can not be computed for the entire time series. Unless the marginal distribution of the observations is known at each time point, the complete data likelihood is not available for the entire vector of observations. A straightforward way to bypass this difficulty, is to approximate the likelihood (32) by conditioning on a vector of the first r k and the last s k observations i.e.ỹ y y k = {y 1 , y 2 , . . . , y r k , y T −s k , y T −s k +1 , . . . , y T }. Notice that the the vectorỹ y y k is regime specific. The approximation of the likelihood takes the following form: p(y y y|ỹ y y, S S S, ψ ψ ψ, φ φ φ, γ γ γ) = K k=1 t :St=k The second factor in eq. (31) is the sampling distribution of the hidden state vector S S S. Given the transition matrix ξ ξ ξ, p(S S S|ξ ξ ξ) can be explicitly written as: where N jm is an operator that counts the number of transitions from state j to state m.
The approximate likelihood of (33) and the distribution of the hidden state variables (34) give the following complete-data conditional likelihood: p(y y y, S S S|ỹ y y, ψ ψ ψ, φ φ φ, γ γ γ, ξ ξ ξ) = K k=1 t :St=k A joint posterior distribution of all the parameters and hidden state indicators given the data is quite complicated to sample from directly. However, the problem can be simplified by first sampling the state-specific parameters, given the vector of state indicators, and then sample the state indicators by conditioning on the state-specific parameters.
Essentially, every row of the transition matrix is a probability mass function. Since the rows are mutually independent, a prior over the probability mass functions, that in conjunction with the complete data likelihood, will provide a closed form conditional posterior is ξ ξ ξ k. ∼ D(a j1 , a j2 , . . . , a jK ) ≡ D(a a a k. ) for k = 1, 2, . . . , K.
The complete data likelihood has a convenient structure that by combining with all the prior specifications mentioned above, leads to the following full conditional distributions: ξ ξ ξ k. ∼ D(a k1 + N k1 (S S S), a k2 + N k2 (S S S), . . . a kK + N kK (S S S)) for k = 1, 2, . . . , K.
A Metropolis algorithm with normal proposals can be used to sample from the full conditionals (37) -(38). A Metropolis-Hastings algorithm with gamma proposals can be used to sample from (39). Notice that with the complete data likelihood and the Dirichlet prior, the posterior for the rows of the transition matrix is also Dirichlet, so the rows of the matrix can be sampled directly from the Dirichlet distribution (40).

(b) Sample state indicator:
S t ∼ M ultinom(1, θ θ θ) θ θ θ = (P (S t = 1|y y y, η η η), · · · , P (S t = K|y y y, η η η)) T Several important points about the FFBS algorithm for the states estimation need to be discussed. Since the filtering recursion starts at t = 1 + r max the states The MSMAR process has regime specific causal r k and noncausal s k orders that can be regarded as free parameters. In this work the model selection is performed via minimization of the Deviance Information Criterion (DIC), refer to section 2.2 for the description.

Simulation of the MSMAR processes
The method of simulation of Markov switching MAR processes is similar to the MAR processes described in section 2.3, with the additional sampling of a sequence of the state indicators S S S. The method is also based on the partial fraction representation of the mixed AR process [2]. Consider the following reparameterization of the MAR process, u t = Φ(L)y t and v t = Ψ(L −1 )y t . The equation (28) can be rewritten as: The Mixed AR processes {y t } can be simulated by recovering from the unobserved components u t and {v t } by using two representations [2],

Causal representation
2. Noncausal representation In the case of the MSMAR, the process y t can be recovered conditionally on the simulated sequence of state indicators S S S. A sequence of the state indicators can be simulated by using a transition matrix (30) and the recurrence relation, Conditionally on a sequence of state indicators S S S, the process y t can be recovered by using causal representation and noncausal representation: 1. Causal representation In case of the MSMAR(1, 1) process, the simulation can be performed in the following steps: 1. Generate a vector of state indicators S t and disturbances t , Generate a sequence of u t and v t : For t = 1, 2, · · · , T recover y t by:

Simulation Study
This simulation study tries to access the performance of the MSMAR under the model misspecification. The study was performed on the data that was simulated from the following MSMAR((1, 1), (1, 1)) process: with the transition matrix: ξ ξ ξ = 0.995 0.005 0.005 0.995 Assuming that the true parameters of the process (47) are unknown, the models that were considered for estimating the parameters are: The values of the estimated parameters and the corresponding 95% quantile based coverage intervals are reported in table 6. It is interesting to note that all estimated causal and noncausal parameters are very similar in values to each other, even under the model misspecification. The DIC has a minimum value at the "true" model MSMAR ((1, 1), (1,1)). The simulated data and the corresponding fit are shown in figure 10.
(b) Distribution of the residuals of the best MSMAR fit.   Table 6: Posterior means and 95% quantile based coverage intervals for the parameters of various MSMAR models applied to simulated data.

Application to Bitcoin/USD Exchange Rate
In this section the MSMAR model is applied to the Bitcoin/USD exchange rate time series and it's performance is compared to the MAR models on the basis of the DIC criterion. The Bitcoin/USD exchange rate data is discussed in section 2.5.
Recall that, the series have a global nonlinear trend that can be clearly seen from figure 8a. As in the case of the MAR models, the trend was model by LCLS estimator. The bandwidth h was chosen 40.5 as in the case of the MAR models, to smooth the data just enough to account for the global trend without smoothing the bubble dynamics. The residuals after subtracting the trend, are assumed to follow an MSMAR process, so the full model specification is: The number of parameters in the MSMAR models grow quickly with increasing causal-noncausal orders and the number of regimes. Let p k = s k + r k be the total number of the regime-specific causal and noncausal parameters and m k = |η η η k | be the cardinality of the set of the parameters of the distribution of the errors. The total number of parameters of the MSMAR processes is K k=1 (p k + m k ). The MS-MAR models are usually over parameterized, so the analysis was performed with two-regime (K = 2) models with the most complex model MSMAR ((1, 1), (1,1)).
One feature of the MSMAR processes that reduces the space of possible models, given a fixed number of regimes K, is that the MSMAR processes are invariant under the permutation of the regime assignments. For example the MSMAR((0, 1), (1,1)) is equivalent to the MSMAR((1, 1), (0, 1)). This fact greatly reduces the space of the distinct models to consider, for a fixed number of regimes K. On the other hand, this feature introduces a label-switching phenomena in the MCMC simulations, that makes estimation of the marginal posteriors of the parameters harder, although it does not affect the overall fit.
The models that were fitted to the data were: 1. MSMAR((0, 1), (1, 0)) Other second order models were also considered such as MSMAR( (1, 2), (1,2)), MSMAR( (2, 1), (1,2)), MSMAR( (2, 2), (1, 1)), MSMAR( (2, 2), (1,2)), MS-MAR((2, 0), (0, 2)). The transition matrix of all the models under consideration has the following form: In all models the errors are assumed to follow the Cauchy distribution with a regime specific scale parameter, that is: The summaries of the draws of the parameters from the posterior distributions, and the corresponding DIC values are provided in table 7. Since the DIC values of the most second order models were larger than the first order models, we decided not to include the summaries of these models in the table. It can be seen that the DIC criterion favors the MSMAR((2, 1), (2, 1)) model. The final chosen model for the Bitcoin/USD exchange rate analysis is thus the MSMAR((2, 1), (2,1)). The estimated model has the following form: with an estimated transition matrix:  Figure 11 shows the data with the regime assignments. It is clear from the plot, that one regime corresponds to relatively linear constant trends, and the other regime corresponds to the explosive behavior. The MSMAR( (2, 1), (2, 1)) suggests that there are two regimes with the second order noncausal and first order causal dependence. The MSMAR model as well as MAR model still was not able to completely remove the autocorrelation in the series. As was discussed in section 2.5, the series may include seasonal components which are not modeled by the MSMAR in this work.  (2, 1), (2,1)) fit added to the nonparametric estimate,û t =m(x) +ŷ t .
(e) A histogram of the residuals of the MSMAR((2, 1), (2, 1)) fit.   Table 7: Posterior means and 95% quantiled based coverage intervals for the parameters of various MSMAR models in application to the Bitcoin/USD exchange rate series.

List of References
In modeling the bubble phenomena, the Voigt distribution seems promising as an alternative to the Cauchy distribution. The DIC criterion suggests that the MARV processes are able to slightly better model the bubble phenomena than the MARC in the bitcoin data over the period between 02/20/2013 -07/20/2013. On the other hand the MARC process slightly outperformed MARV in application to a longer series of the bitcoin data. The next step in developing the MARV processes would be to compare the performance of the MARV to MARC on a different dataset, for example the U.S inflation, which was analyzed in [1]. It would be also interesting to compare the performance of the MARV with the mixed causalnoncausal processes with t-distributed errors.
This work proposed the Markov-switching extension of the mixed causalnoncausal processes and it was shown that it is able to model the bitcoin data much better than the MAR processes, based on the DIC criterion. One of the directions in exploring the Markov switching causal-noncausal processes would be to consider a mixture of t-distributions. A possible advantage in using a mixture of t-distributions is that by allowing the degrees-of-freedom to depend on the regime, it may be more flexible in modeling complex variation in the data.
The MSMAR model, considered in this work, has also several limitations.
The simulation study revealed that the transition matrix is underestimated in the settings of a frequently changing regimes. A possible explanation for this behavior is that the model does not include a state specific intercept, so under some parameter values the regimes are not well separated and the model can not identify the regimes correctly. In other words, if the true data generating process has the regimes that are changing very often, the MSMAR may not be able to correctly identify the state indicators. Another limitation is that the MSMAR model does not completely remove the autocorrelation in the bitcoin data, this may be due to the seasonality in the series, so an addition of an AR(5) may be needed. There is also a limitation in the current implementation of the MCMC sampling algorithm. The MCMC algorithm used in this work does not take into account the label switching phenomena. One of the possible solutions to this problem is to use a random permutation algorithm described in [2].
Yet another direction to take in exploring the mixed causal-noncausal processes is an incorporation of the trend in the model, rather than as a data preprocessing tool.
One of the most important uses of time series analysis is forecasting. The next step in extending the MSMAR processes is to develop the forecasting method.