Effect of Average Happiness for Twitter on the Dow Jones Industrial Average Return Volatility

The stock market is well known for its volatility and many models are proposed to capture the volatility. Volatility is naturally unobservable and the absolute values of the returns work as the realized volatility. The Dow Jones Industrial Average is the study object and the models used are generalized autoregressive conditional heteroskedasticity (GARCH) models with different extensions. The unique extension in this study is to add happiness data into the model and check whether it helps to better capture the volatility and improve the forecasting accuracy. The happiness data is extracted from Twitter and it is an index to show people’s happiness level based on their online expressions. The one day lagged happiness data is also used as one extension to the models. The leverage effects and the heavy tails problems are also addressed in this study, EGARCH models and GJR-GARCH models with other error distributions such as student’s T distribution are used to deal with these specific problems. The forecasting performance of these models is checked and we find out that the happiness data does help to better capture the volatility. However, the forecasting accuracy of the models with happiness data is not statistically different compared to the models without happiness data. This illustrates that the happiness data does not help to improve the forecasting performance.

. AICs for the GARCH(1,1) and GARCH(1,1)-ARMA(1,1) models. ..... 15  Table 5. AICs for the GARCH(1,1) and GARCH(1,1)-ARMA(1,1) models. ..... 19  The intrinsic value of the stock is the present value of future dividends. A stock market is where the price of the stock forms because it is the aggregation of the buyers and sellers. It helps companies to raise money and the smooth function of this activity contributes to the economic growth.
The stock market index is created in order to describe the stock market, and it is the measurement of the value of a portion of the stock market. It is computed using the selected stock values. The stock selected depends on the goal of the index. One example is the Dow Jones Industrial Average Index. The Dow Jones Industrial Average (DJIA) is the most quoted stock market index in the world (Shoven and Sialm, 2000 −1 = the closing price at time t-1. The logarithmic of the closing price is the discrete return with continuous compounding (Fama et al., 1969). In this work, the logarithmic return is preferred to use. There are both theoretical and empirical reasons for preferring the logarithmic return (Strong, 1992). Theoretically, the logarithmic return is analytically more tractable when returns are calculated over longer intervals (simply add up the subperiod returns). Empirically, the logarithmic return is more likely to be normally distributed.
Volatility of the stock market return is often perceived as a measure of risk. It is a statistical measure for variation of the return over time. In finance, the volatility is also a core parameter in many models such as the Capital Asset Price Model (Sharpe, 1964).
Volatility is inherently unobservable, and what we know about volatility has been learned either by fitting parametric econometric models, or by studying some indicators of volatilities such as the absolute returns (Andersen et al., 2001). It is often calculated as the standard deviation of the return (Poon and Granger, 2003) denoted by σ.
Many researchers have studied the movement of stock market volatility, and raised the question of why the volatility changes so much over time. Officer (1973) relates changes to the macroeconomic variables. There are also attempts to connect volatility to changes in expected stock returns, including Merton (1980), French et al. (1987. Also, a number of studies have used measure of the variance or "volatility" of speculative asset prices to provide evidence against simple models of market efficiency (Shiller, 1981).

Stochastic Process
A stochastic process is a sequence of random variables { , = 1,2, … } defined at fixed sampling intervals, representing the evolution of random values over time.
The index t represents time, and a stochastic process is also known as a random process.

Time Series
A time series is a sequence of observations on a particular variable, and it can be interpreted as a realization of the stochastic process. Examples of time series are inflation rates, unemployment rates and market shares. The main features of time series include trends, seasonal variations and the observations that are close in time are correlated. So time series models are needed to explain this correlation.

Autocorrelation
A correlation of a variable with itself at different times is known as autocorrelation. The number of time steps between the variables is known as the lag.
The autocorrelation function, or ACF, express the autocorrelation as a function of the lag k for k = 1,2…. Let { , ∈ } be a time series and ̅ is the sample mean.
The autocorrelation can be estimated by the sample autocorrelation function (ACF), or the empirical ACF. The sample autocorrelation function or correlogram is given

White Noise
White noise is a stochastic process used for many time series models. A time series {wt} is white noise if w1, w2, w3,…, wn are independent and identically distributed random variables with mean of zero. This means that the variables have the same variance and the covariance between them is zero.
The stock market returns are expected to be white noise under the efficient market hypothesis. In an efficient market, asset prices adjust instantaneously to reflect new information, which eliminates the possibility to predict future prices using only past prices (Logue and Sweeney, 1977). This implies that the current price of a security "fully reflects" available information (Fama, 1970). Thus, the successive price changes (or returns) are independent and identically distributed, which makes them a white noise. However, there are many reasons which may cause violations to the efficient market hypothesis. The arbitrage risk, for example, is one of them. Under the efficient market hypothesis, any arbitrage opportunities results from mis-pricing will be removed by rational traders' transactions. In the real world, however, arbitrageurs are subject to many constraints, such as transaction fees and holding costs (Pontiff, 2006). Therefore, the price may not fully reflects available information which violates the efficient market hypothesis.

Autoregressive (AR) Models
An AR model is a linear combination of p most recent past values of a random variable and the current white noise term.
The series {xt} is an autoregressive process of order p, if where {wt} is white noise and αi are the model parameters with αi≠0.

Moving Average (MA) Models
A moving Average (MA) Model is also one foundation of other models. A moving average (MA) process of order q is a linear combination of the current white noise and the q most recent past white noise terms.
The series {xt} is a moving average (MA) process of order q, if where {wt} is white noise.

Autoregressive Moving Average (ARMA) Models
In the time series analysis, Box-Jenkins method (Box and Jenkins, 1970) applies autoregressive moving average (ARMA) models to find more appropriate fit of one time series. The ARMA model is the combination of AR and MA model.
Dependence is very common in time series data, and ARMA models could be used to capture this dependence.

Time Series Models for Financial Data
In financial area, the random walk is often used to predict the price of the financial asset. That means we can use normal distribution to simulate the trend of the stock price. It is quite convenient to use this easy model to predict the stock price but the shortcoming is also quite obvious. For example, in Figure 1, the return does change with time but we can find the volatility clustering happens. That means the volatility is higher during one period of time (like in 2009) compared to other periods of time. According to Poon and Granger (2003), there are two ways to forecast the volatility, one is to use the time series data and the other one is to use the option prices. In this study, we will use the time series data and we need to use other models to predict the volatility. A lot of volatility forecasting models have been investigated in the previous studies, but no consensus has been reached on which model is better than others (Poon and Granger, 2003). Therefore, many researchers try to add other external variable in the model (like the Investor sentiment index) to better fit the volatility.
According to (Lee et al., 2002), the shifts in sentiment are negatively correlated with the market volatility. In this research, the data used is from Twitter instead of using proxy like the turnover ratio from the market (Baker and Wurgler, 2006). Engle (1982) introduced Autoregressive conditional heteroskedasticity (ARCH) to model the volatility changed with time.
Autoregressive conditional heteroskedasticity (ARCH) model of order p

= ℎ
Where h t 2 = 0 + 1 −1 2 + ⋯ + p −1 ARCH model. It has longer memory and more flexible lag structure by adding lagged conditional variance into the model.
Basic GARCH (1, 1) model: Where h t 2 = 0 + 1 −1 2 + 1 h t−1 2 In the classic GARCH model, the error is normally distributed = , is standard normal distribution ~ iid (0,1). The density function of normal distribution Where μ is the mean of the distribution and σ is the standard deviation.

Research Goal and Thesis Outline
Bollen ( help to fit the return data? Or it may have a lagged impact on the return volatility. In the term of forecasting, the happiness data may help to forecast the return volatility.
That is to check whether there is any improvement of the prediction accuracy when adding the happiness data into the model.
The thesis is organized as follows: Chapter 2 will fit the data with basic GARCH models and the result is shown to decide which model has more appropriate fitting. Also, some basic features of the dataset will be discussed. In Chapter 3, more advanced models are used to deal with asymmetry problems and heavy tails. Chapter 4 is going to present forecasting based on the advanced models selected and compare the estimation power of different models with each other. Plot A in Figure 2 shows that the return is a stationary series in mean averaging around zero. However, the volatility is clustered especially during the end of 2009 when the financial crisis still has its influential impact all over the world. This is the reason non-linear models (like GARCH) are needed to fit the data. The correlogram plot implies that autocorrelation exists in this series. The auto-correlation means the correlation of a variable with itself at different times. It is typically modeled with autoregressive moving average model (ARMA). In this study, ARMA component is added to GARCH model and I will check whether it is significant as one extension to GARCH model. The density and the QQ plot indicate this series has heavy tails and potential asymmetric problems. Especially in the QQ plot, the two tails deviated from the red line which represents the normal distribution. So in Chapter 3, more advanced models which deal with these two problems.

Benchmark Model
After checking the significance of the parameters, the preferable model in the GARCH (p,q) for p from 1 to 5 and q from 1 to 2 was GARCH (1,1). So GARCH (1,1) is used as the benchmark model.
Basic GARCH (1, 1) model: The maximum likelihood method is used to estimate the parameters in the GARCH models. In basic GARCH models, the error is normally distributed with mean of and standard deviation of . So the likelihood function is: Where will be the parameters. In practice, it is more convenient to use the logarithm of the likelihood function which is:  The mean is modelled for the GARCH(1,1) model, so μ is the estimated mean. ω is the variance intercept and α is the ARCH(q) parameter and β is the GARCH(q) parameter. They are all significant.
So, the estimated benchmark model is Where h t 2 = 0.0224 + 0.1256 −1 2 + 0.8564h t−1 2 GARCH(1,1) is frequently used as the benchmark model because it is a relative simple model but with great performance fitting the financial time series data. The plots in figure 3 display the performance of GARCH(1,1) model. is not all zero. Hence, a GARCH-ARMA model is fitted next.

GARCH(1,1)-ARMA(p,q) Models
As indicated in the correlogram of the standardized residuals, one potential extension is to add ARMA component into the GARCH model. That is to include an ARMA model for the conditional mean of the process. I will specify the mean equation with a low order of ARMA process to capture the autocorrelation of the return.
GARCH (1, 1) -ARMA (p, q) Model: After checking the AIC and the significance of the parameters, the model chosen is GARCH(1,1)-ARMA(1,1).    The difference between the fitting of these two models (GARCH and GARCH-ARMA) is quite small based on the plot. The autocorrelation at lag 1 for GARCH-ARMA model is zero indicating that it is reasonable to add ARMA component into the mean function of the GARCH model.

GARCHX-ARMAX Framework
The main goal of this study is to check the impact of happiness data on the stock market return volatility. The very intrinsic application is to add happiness data as an external regressor in the GARCH-ARMA model.
The happiness data comes from Hedonometer.org, which is based on people's online expressions on Twitter. To quantify happiness, it merged 5,000 most frequent words from a collection of: Google Books, New York Times articles, Music Lyrics, and Twitter messages, resulting in a composite set of roughly 10,000 unique words.
These words scored on a nine point scale of happiness: (1) sad to (9)  In the GARCHX-ARMAX model, external regressor can be added into the conditional mean equation or the conditional variance equation or both. So I will check these different combinations of the models and choose some of them to fit more advanced models. The notation for happiness data is .
Basic GARCHX(1,1)-ARMAX(1,1) model (with happiness in mean and variance): The GARCH-ARMA model with happiness in mean or variance equation is considered as the same scenario as the basic GARCHX-ARMAX model.
Also, if happiness data has a lagged influence on the volatility, we can add lagged value into the GARCH-ARMA model. Then the model will be: Again, for the models with lagged happiness data, there are 2 other scenarios which are the GARCH-ARMA model lagged happiness in mean or variance function.
Based on the significance of the parameters and the AIC of different models, the GARCH-ARMA model with happiness data in the mean equation and the model with lagged happiness in the mean equation are the preferable models. Table 4. Estimates of GARCH(1,1)-ARMA(1,1) with happiness data in the mean equation and GARCH(1,1)-ARMA(1,1) with lagged happiness data in the mean function. Note: the word lag is short for lagged, the word sim is short for simultaneous happiness data and short for happiness data in all the tables. GARCH(1,1)-ARMA(1,1) with happiness data in the mean function: Where h t 2 = 0.0223 + 0.1261 −1 2 + 0.8561h t−1 2 GARCH(1,1)-ARMA(1,1) with lagged happiness data in the mean function: The estimated GARCH parameter, β is close to one and the ARCH parameter, α is close to zero. The sum of them is very close to one indicating that the conditional variance is covariance stationary.  The The model with lagged happiness in the mean equation and simultaneous one in the variance equation will be Furthermore, simultaneous and lagged happiness can be kept both in the mean equation or both in the variance equation. For both of them in the mean equation, the model will be Both of them in the variance equation, the model will be Based on the significance of the parameters and the AIC of different models, the models chosen are: GARCH(1,1)-ARMA(1,1) with simultaneous happiness data in the mean equation and lagged happiness data in the variance equation Where h t 2 = 0.1249 −1 2 + 0.8569h t−1 2 + 0.0037 −1 GARCH(1,1)-ARMA(1,1) with lagged and simultaneous happiness data both in the mean equation

Basic Conclusions
The  negative shock will cause a larger increase in volatility than a positive shock. That is to say an unexpected drop in price (bad news) increases volatility more than an unexpected increase in price (good news).
Diagnostic tests introduced by Engle and Ng (1993) including sign bias test, negative sign bias, and positive sign bias. These tests will be used to check whether there is leverage effect in the DJIA returns.
The diagnostic procedure is to test for the significance of 1 in the regression: The  EGARCH model is used to deal with the leverage effect. In GARCH model, we assume that good and bad news have same effects on the volatility. In the real world, however, the volatility usually increased more after bas news compared to the good news. Exponential GARCH (EGARCH) model: The EGARCH(1,1) model will be When −1 is negative, the total effects are ( + ) − 2 . Therefore, leverage effect implies that are positive numbers.

Comparison Between EGARCH and GJR-GARCH Models
The  to deal with leverage effect is obvious stronger than the GJR-GARCH models.

EGARCH and GJR-GARCH Models with ARMA Component
The ARMA component is added to the EGARCH and GJR-GARCH models in order to remove the potential autocorrelations in the residuals. Figure 6 shows the correlogram of the residuals of the EGRACH model with simultaneous happiness in the mean equation and lagged happiness in the variance. The significant value at lag one is the reason to add ARMA component into EGARCH and GJR-GARCH models. The estimates of the parameters for the EGARCH-ARMA models are in the

Comparison between EGARCH and GARCH Models
From last section, the selected models are the EGARCH models for the asymmetric problems after testing potential ARMA intensions to the EGARCH and GJR-GARCH models. In this section, the compassion between EGARCH models and GARCH models will be addressed to show the benefice to use the EGARCH models.   The parameter α is less than zero, which means the leverage effect does happen.
The news impact curve is used to check the effect of news on conditional heteroskedasticity. The news impact curve is the functional relationship between conditional variance at time t and the shock at time t-1, assuming all the information before time t-2 is constant. Difference between these two plots is quite clear. The curve in the first plot is symmetric meaning the shock has the same impact on the conditional variance no matter it is positive or negative. The second curve is asymmetric, that is why EGARCH model allows good news and bad news to have different impact on volatility. The leverage effect implies that the bad news tends to increase the volatility more than the good news does. This is why the curve has a steeper slope in the left part.

Heavy Tails
One of the features of the financial series is the observed excess of kurtosis in the error distribution which also means heavy tails exist in the distribution. The classic GARCH assumes the error is normally distributed, but in reality, this is often not the case. The extensions of models to other distributions with heavier tails are needed. The QQ plot from basic models also shows that heavy tails problem exists.
A few more distributions are needed instead of only using normal distribution in the GARCH model to deal with it. The possible distributions are Student's T, the generalized error, and the generalized hyperbolic distributions.
In addition, excess of skewness is another issue with the financial series. There are other distributions with both heavy tails and skewness like skewed student's T distribution, and generalized hyperbolic distribution.

Student's T Distribution
The density function is where , , and are the location, scale and the shape parameters and Γ is the gamma function which is defined as   The estimated GARCH coefficients α and β are significant at 1% level, and the sum of them is less than one implies that the GARCH model is stationary. The estimated degree of freedom of the conditional t-distribution is 6.53 which means that the return is conditionally non-normally distributed. According to Connolly (1989), the estimated degree of freedom may indicate the source of the excess kurtosis in the return. If it is less than 10, both non-normality and conditional heteroskedasticity explain the excess kurtosis, where as if it is bigger than 30, the conditional heteroskedasticity is the only source of heavy tails in the return.
Therefore, both non-noamality and conditional heteroskedasticity explain the excess turtosis. The QQ plot helps to check the power of student's T distribution. Comparing this plot to the one in data description, the heavy tails problem is not an issue anymore. The density of the standardized residual shows that student's T distribution captures the shapes of the residuals more accurately than the normal distribution.

Generalized Error Distribution
Generalized Error Distribution (GED) is defined with parameter > 0. If x is GED distributed then   In the generalized error distribution, if is between 0 and 2, the distribution will have a fatter tail than normal distribution; if equals to 2, it is the normal distribution.
In our case, the estimated is 1.33 which is less than 2. This means the distribution  The performance of GED is similar to the student's T distribution. For the return data, GED seems preferable based on AIC, but the difference between them is small.
These plots provide the same information as the plots for the student's T distribution, and the other issue is the excess skewness.

Generalized Hyperbolic Distribution
The generalized hyperbolic distribution can be parameterized as (Prause, 1999) f In the above expression, Kj is the modified Bessel function of the third kind of order j (Abramowitz and Stegun, 1972) and if λ < 0 determines the shape, determines the skewness, is a location parameter, serves for scaling, and λ influences the kurtosis and represents the subclass of the generalized hyperbolic distribution. The first important subclass is when λ =1, the GED will become Hyperbolic Distribution.
The second subclass is when λ =-0.5, this distribution is Normal Inverse Gaussian Distribution (NIG). The model selected is the EGARCH-GHD with simultaneous happiness in the mean function and lagged happiness in the variance function. Although the last model is the one with lowest AIC, the parameter λ is not significant. The estimated parameters are presented in the table 19.  The skewness parameter is -0.2 < 0, which implies the distribution is skewed to the left. The skewness is also shown in the density plot. The GHD deals with the skewness as well as the heavy tails. Also displayed in the QQ plot, the GHD has one almost perfect performance dealing with the heavy tail problem.

Skewed Student's T Distribution
Skewed student's T distribution can be defined in many ways. In this study, skewed student's T distribution will be defined as a limiting case of the Generalized Hyperbolic distribution.
Let λ= -ν/2 and → | | in the generalized hyperbolic distribution, it will become generalized hyperbolic skewed student distribution. This distribution is popularized by Aas and Haff (2006) because of its uniqueness in having one tail with polynomial and one with exponential behavior. The skewness and kurtosis do not exist when ν ≤ 6, and ν ≤ 8, respectively. The density function is given by   slightly bigger than 0 meaning it is positively skewed. The shape parameter is 7.88 which is almost 8, the existence of kurtosis is uncertain. Figure 11. Standard normal QQ plot and density of the standardized residuals from the EGARCH-SSTD model.
These two plots in the figure 11 provide very similar results as the generalized hyperbolic distribution does. Compared with the results from GED, generalized hyperbolic skewed student distribution is not as good as GED. The tail is still a little heavy to the left in the QQ plot and the fitting of the residuals is definitely much better than normal distribution but not as good as generalized hyperbolic distribution.
To sum up, the generalized hyperbolic distribution has the biggest power to deal with the heavy tail problem and excess skewness. The simultaneous happiness data works in the mean equation and the lagged happiness data works in the variance function.
FORECASTING THE RETURN VOLATILITY

Unconditional Forecasting
Unconditional forecasting, also named as recursive forecasting. It means a series of data is used to predict the data n times ahead. Mean Squared Forecasting Error (MSE) and Mean Absolute Forecasting Error (MAE) are calculated to measure the error of the forecasting. Assume is the absolute value of the return at time T and ̂ is the estimated conditional variance. The mean squared forecasting error and the mean absolute forecasting error are used to test the accuracy of the forecasting.

The Mean Squared Forecasting Error is
The Mean Absolute Forecasting Error is

Rolling Forecasting
Another way to do the forecasting is to use the rolling forecasting method. There is one set of time period in the rolling forecast which will shift each time a new value is collected. The number of rolling is set to be 100 in this research and each time get the 1 step ahead forecasting. The length of dataset is 1528, so the first 1429 data is used as the training data and the last 100 data is used as the testing data. For the rolling forecasting, MSE and MAE are also the ways to get a cumulative measurement of the error over the forecasting range.
The error is defined as = ̂-, (i=1,2). The loss function is the square or the absolute value of the error as g( )= 2 or g( )= | |. The Diebold-Mariano (DM) Test is based on the loss differential dt= g( 1 )-g( 2 ) and we say that the two forecast are equally good if the differential has zero expected value. So the null hypothesis is H0 : E (dt) =0 versus the alternative hypothesis H1 : E (dt) ≠0. In the DM test, one density used is the spectral density: (0) is the spectral density of the loss differential at frequency 0, which means ( ) is the autocovariance of the loss differential at lag k.
The DM test statistics is Where ̂( 0) is a consistant estimate of the (0) defined by Under the null hypothesis, the DM test statistics is N(0,1) distributed. So if the DM statistics falls outside of (α/2 , α/2 ), we will reject the null hypothesis. That is to say the two models have differences in the prediction accuracy.

Unconditional Forecasting Results
The    The sophisticated models are tested about the forecasting power and the result indicates that they actually don't have a better performance compared to GARCH model. This raised the question whether the happiness data helps in the forecasting of basic GARCH models.  The DM test results still indicate that the happiness data does not help when added into GARCH models in the forecasting performance.

Rolling Forecasting Results
In the unconditional forecasting, only 10 points are estimated. In the rolling forecasting, 100 points are estimated out of sample. The table 26 contains the results from rolling forecasting.  Figure 13. Rolling forecasting comparison between GARCH and GARCH-GH models.
The figure 13 indicates the EGARCH(1,1)-GH with lagged and simultaneous happiness in the mean equation has a better performance of estimation which is contradictory to the MSE and MAE results. Again, DM test is used to find out whether these two models have different estimation accuracy. T distribution in the GARCH model, the errors are dropped significantly. This phenomenon is quite easy to understand as these distributions deal with the skewness and heavy tail problems. However, when happiness is added into these models, although the forecasting is more accurate compared to the GARCH(1,1) model, the MAE and MSE are bigger than the GARCH models with these distributions only.
Therefore, there is still no evidence to indicate the happiness data helps to improve the forecasting accuracy.