Bayesian Model Averaging of Space Time Car Models with Application to U.S. House Price Forecasting

The housing market has been a significant contribution to U.S. GDP. Forecasting the house price growth rate helps to regulate risks associated with the housing sector and further helps to stabilize the economy. However, due to the volatility in the housing market, forecasting the house price growth rate has been a tough task. In this thesis, we built a conditional autoregressive model incorporated with bayesian model averaging (BMA-CAR) based on quarterly observations from 1976 to 1994 and tested forecasting capability over 1995 to 2012. We extended upon the results of Bork [International Journal of Forecasting, 31, 1 (2015)] to include the effects of spatial autocorrelation but inhibited the allowance for the model and coefficients shifts over time. Our model is based on a hierarchical structure that allows BMA to average out the effects from predictors along with CAR model to account for the remaining spatial structures in the data.

Understanding the housing market dynamics are crucial, as the housing sector constitutes a significant share of the GDP and it is the largest component of household wealth in the U.S. [1]. The Bureau of Labor Statistics has estimated in 2010 that roughly 24 percent of the total consumption of American homeowners goes toward housing [2]. Starting from the late 1990s, the U.S. housing market was in a boom period until 2007 when the sub-prime crisis occurred and was followed instantly by a downturn. Modeling and predicting U.S. housing price has received attention from governments, real estate developers and investors. However, it has been a challenging task due to the strong vulnerability of the housing sector to structural changes, macroeconomic policies, regime switching, and market imperfections [1].
For the past four decades, many economists have adopted time series approaches to study the relationship between the U.S. house price and the socioeconomic variables. Originally, a large portion of the research pool focused on exploring explanatory variables in the time series regression ( [3], [4], [5]) and/or modifying the error terms( [6], [7], [8]). Recently, research focused on developing dynamic models that allow for time-varying coefficients. These models could substantially improve prediction by accounting for subsample parameter instability seen in real estate data. Most common methods include, but are not limited to, regime switching models ( [9], [10]) and AR, VAR, GARCH models coupled with Kalman Filter techniques. ([11], [1], [12]).
As noted by [13], it is possible that contiguous states may influence each other's housing prices. [2] grouped real house price into 8 Bureau of Economic Analysis (BEA) regions. The within region correlation is larger than the between region correlation with only one exception. Other studies, such as [14], proved the necessity of using spatial model accounting for dependent residuals arose from only using ordinary least squares. There had been several studies that accounted for spatial autocorrelation, which were summarised in [2]. One major branch of methods is the spatially adapted version of VARS (SpVAR) [15]. SpVAR belongs to the group of Spatial-Temporal Autoregressive Moving Average (STARMA) methods. Seemingly Unrelated Regression (SUR) and error panel data models have also been largely used, though it favors a relatively small number of regions just as STARMA models do. Other authors attempted common correlated effects estimator (CCE) [16] [13] and Spatial Dynamic Structural Equation models (SD-SEM) [2] to model common shocks on housing price.
In the very recent study [12], the authors implemented Dynamic Model Selection and Dynamic Model Averaging methods and demonstrated the importance of allowing for both time-varying parameters and model changes.
One limitation, however, is that the natural spatial structures existing in the data are not considered in their model. In fact, univariate analysis on each state separately was performed. In this study, we propose to use Conditional Autoregressive (CAR) methods incorporated with Bayesian Model Averaging (BMA) as an alternative forecasting model on the data from [12]. We developed the model in a hierarchical structure that is composed of a fixed effect and a random effect. Fixed effect models the influence from the predictors and the random effect captures the spatial-temporal variations in the data after removing the fixed effect. We hope to achieve better forecasts by taking into consideration the spatial autocorrelation.

Areal Data and Spatial CAR
Unlike point process where points are all neighbors to each other on a continuous surface, areal data have well defined boundaries and observed data are frequently aggregations within the boundaries or the areal units themselves constitute the units of observation [17]. CAR models are widely used to describe the spatial variation of areal data. CAR models have been extensively used for the analysis of spatial data in diverse areas, such as demography, economics, epidemiology and geography [18].
Modeling spatial autocorrelation for areal data requires the creation of neighbor structures and corresponding weight matrix. There are multiple ways to define "neighbor", such as contiguity-based, graph-based, and k nearest neighbor, after which the neighbor object is converted into a spatial matrix to quantify spatial dependence. Spatial weights matrix W has elements ω ks that represents the weights of the spatial link between spatial units S k and S s .
When little is known about the spatial process, a common approach is to take binary representation in which one is for neighbors and zero otherwise [17].
CAR model is a method for smoothing areal data that was originally proposed by [19]. In a CAR model, the spatial component of the center areal unit is seen as conditionally dependent on a weighted average of the spatial components from all the other units, and the weights come from previously created weight matrix. If we use y k to indicate any observed value of interest, the full conditional distribution for y k has the following format: where the ω k+ is the total number of neighbors of areal unit k, and τ 2 ω k+ can be viewed as the local variance for areal unit k. The CAR model is often used as a prior for the random effect in the context of a hierarchical model. In a hierarchical model, we normally use φ k to represent the spatial random effect that failed to be modeled by the fixed effects. Replacing the y k in Equation 1 by φ k and we would have: This model can be denoted as CAR(W , σ −2 ) [20] with a joint distribution as the following by applying Brook's Lemma [21]: where D is the diagonal matrix with elements ω k+ . The precision matrix is singular so that the above joint distribution is improper [20]. After some modifications, a proper CAR model has the following covariance [22] : where 0 ≤ ρ s ≤ 1. If ρ s = 1, CAR model goes back to the improper case as in Equation 1.2. The precision matrix is non-singular if Σ −1 φ = 1.

Bayesian Model Averaging
As for the fixed effects, a set of predictors are needed to capture covariates effects. However, it remains a question as to which variables to include in the model; including all covariates adds to computational burden as well as overfitting. This variable selecting problem has a natural Bayesian solution [23].
Advantages of using BMA include but is not limited to automatic adjustment for multiple comparisons and efficient model space exploration as well as lower forecasting error compared to using one single model [24].
Since the number of variables is relatively small in our case, we can exhaust all possible models and compute their posterior probability in the model space M. Suppose a set of K models M = M 1 , ..., M k are under consideration for data Y , and θ k is a vector of unknown parameters that indexes the members of M k . The probability that M k in fact generated the data, conditionally on having observed Y , is the following posterior model probability [25]: where p(M k ) is the prior for model M k and p(Y |M k ) is the marginal distribution of data under this model that is achieved by integrating out θ k : List of References [3] D. DiPasquale and W. C. Wheaton, "Housing market dynamics and the future of housing prices," Journal of urban economics, vol. 35, no. 1, pp. 1-27, 1994. [4] N. Pain and P. Westaway, "Modelling structural change in the uk housing market: a comparison of alternative house price models," Economic Modelling, vol. 14, no. 4, pp. 587-610, 1997.
[22] B. N. Leroux BG, Lei X, "Estimation of disease rates in small areas: a new mixed model for spatial dependence," in Statistical models in epidemiology, the environment, and clinical trials. Springer, 2000, p. 179191.
[24] A. Rodriguez and G. Puggioni, "Mixed frequency models: Bayesian approaches to estimation and prediction," International Journal of Forecasting, vol. 26, no. 2, pp. 293-311, 2010. convenient candidate for modeling this type of data. The hierarchical structure is as following [1]: We define here Y kt as the observed house price growth rate at time t and location k that has in total K × T rows of observations, and X kt as a vector of p + 1 covariates (including the intercept). β is a vector of p + 1 that are invariant to space and time changes. φ kt has a length of K ×T . We expand the second line of the two equations above to show the dimension of each variable as the following: And we provide here an explanation of the two-stage structure in the above hierarchical model: . At the first stage, the observed value of the house price growth rate Y kt at specific time t and location k is from a normal distribution with true mean µ kt and observational error σ 2 .
(ii) g(µ kt ) = X kt β + φ kt . At the second stage, the likelihood was chosen to be Gaussian since the house price growth rate is a continuous response.
Therefore, the g function is just the identity link. With identity link, g(µ kt ) becomes µ kt and it is approximated by the fixed effects X kt β that captures the influence from predictors plus the spatial-temporal random effects φ kt that models the patterns seen in the data after fitting the fixed effects. We assume here that the second stage of the model is a simple linear mixed effects model, that both of the components are additive and enter the model linearly without higher orders or interaction terms.

CAR for Spatial Temporal Random Effects φ kt
In the context of a hierarchical model, we would like to use CAR model as a prior for this random effect φ kt . The random effect φ kt can be modeled in several methods.
(i) Linear CAR In this model, a linear trend with time is assumed. Each areal unit k has its own variation of intercept φ k and slope δ k from the mean linear trend (intercept of φ 1 and slope α) [2]. Both of φ k and δ k are modeled by CAR prior from the paper [3].

(ii) Time Autoregressive CAR(CARar)
CARar is a spatially autocorrelated autoregressive time series model first introduced by [4]. The single set of random effects φ = (φ 1 , φ 2 , ..., φ T ) is decomposed as: The decomposition above induces temporal autocorrelation by allowing φ j to depend on φ j−1 . Then, spatial autocorrelation is introduced at φ 1 by using Gaussian Markov Random Field (GMRF) prior that is constant over time. The joint prior distribution for φ 1 is given by: where the spatial component is introduced by the variance τ 2 Q(W , ρ S ) −1 that corresponds to proper CAR prior. The precision Q is defined in the previous chapter in the same way as displayed in Equation 4 but with slightly differrent representation [3]. D is the diagonal matrix with elements ω k+ but can also be written as W 1: where I is the K * K identity matrix and 1 is a vector of ones. The ρ S controls the spatial autocorrelation structure and represents λ as in Equation 4: ρ = 1 indicates intrinsic CAR prior [5] where the conditional expectation is the average of the random effects from neighbor units; ρ = 0 corresponds to independent random effects. Temporal autocorrelation is then induced by: where ρ T is the temporal autoregressive coefficient: ρ T = 1 means strong temporal autocorrelation and in fact the temporal process turns into a random walk; ρ T = 0 leads to temporal independence. In this model, the multivariate temporal autoregressive component is introduced via the mean ρ T φ t−1 [4].
(iii) Adaptive CAR This model assumes the autoregressive structure as the previous model CARar. However, it allows for localized spatial structure by modeling the non-zero elements of the neighbor matrix W as unknown parameters [6]. The adjacent elements w + = {w ks |k ∼ s} can be estimated and w ks ∈ (0, 1).
We implemented empirical studies on our dataset using CARlinear, CARar and CARadaptive without model averaging. CARar and CARadaptive gave comparable results and were superior to CARlinear. Because CARar is the simpler method and it gave slightly better forecasting results than CARadaptive, we only used CARar for the final model.

Prior Distributions 2.3.1 Model Space Priors
Imagine we have M models with model index m{1, 2, ..., M } and in total p covariates. It would be convinient to index each of the 2 p models by the vector γ = (γ 1 , ...γ p ) , where γ j is an indicator for inclusion of variable X j under model M m . Many Bayesian variable selection implementations have used independent priors of the Bernoulli form [7]: in which case q γ = γ j the number of non-zero parameters under model M m and the hyperparameter η is the expected probability that each variable is included. Beta prior is assumed for η ∼ Beta (1,1).
A refinement of the Bernoulli priors is known in the literature as dilution priors.
In order to downweight the probability of M m for the collinearity in X m , we imposed dilution prior as shown in Equation 9 where R m is the correlation

Parameter Space Priors
We used g-prior to simplify our posteriors for the model averaging process.
We would like to introduce the concept of g-prior starting from an ordinary linear regression case.
Assume the goal is to obtain the posterior distribution of β and σ and we do not use g-prior, we would need to sample from the full conditionals using the Markov-Chain Monte-Carlo procedure. If we let β and 1/σ have the priors as multivariate normal(β 0 , Σ 0 ) and gamma(ν 0 /2, ν 0 σ 2 0 /2), we obtain the following: Under a Zellner's g-prior [9] β 0 = 0, Σ 0 = gσ 2 (X T X) −1 where amount of information from n/g observations is assumed for our prior, we can simplify the Gibbs Sampling as the Monte Carlo: Now we would like to apply g-prior to our model. The value of g is commonly decided as g ∼ max(n, p 2 ), where n is the sample size, p is the number of covariates including the intercept. The other parameters have the following assignment of priors: Here the prior for σ 2 has the underlying form of σ 2 ∼ Inverse Gamma(ν 0 /2, ν 0 σ 2 0 /2).

Posterior Computation
CARBayesST is an extension of R package CARBayes, the latter of which models spatial autocorrelation that remains in the data after the covariates effects are accounted for [10]. CARBayesST is the first dedicated software for spatial-temporal areal unit modeling with conditional autoregressive priors and is capable of capturing temporally changing spatial dynamics [1]. We adopted part of the code from the package and added Bayesian Model Averaging for our model. We used Gibbs Sampling to sample from the full conditional distributions since the closed forms of the posterior distributions are not available.
A description of algorithm is as follows.
whereβ m is the ordinary least squares estimate under model M m . Also, (ii) Sample σ 2 from the marginal: Under iteration s and model γ, we set the last time point estimation of φ kt in the training set as the initial value for φ t for prediction. Then we produce one step ahead prediction from time point t as following: Multiple steps head predictions are implemented as an extension from one step ahead prediction.

MSFE and MSFE ratio
Model validation is based on mean squared forecasting error (MSFE) on out-of-sample set 1995:1-2012:4, in order to be consistent with [11]. MSFE is derived from squared forecast error (SFE). SFE for model j is computed as: k,t is the prediction of house price growth rate Y at location k and time t from iteration s and model j; Y k,t is the observed Y at the same time and location.
The benchmark model was chosen as expanding window OLS fit with only intercept:Ŷ SFE for benchmark mean model is computed as: When checking the prediction performance by state, we introduce the MSFE ratio r that is the ratio between MSFE from model j relative to the MSFE from the beancmark:

CHAPTER 3
Application

Data Description
Data are retrieved from the provided appendix of the article [1]. The same data were used in order to make the modeling and forecasting comparable.
House price growth rate data were converted by: where P k,t denotes the level of real house prices in state k and time t. testing period has 3456 rows with 72 quarters of data (1995:1-2012:4). We split our data to stay in consistency with [1]'s methodology and so our forecasting results will be directly comparable.
To account for strong regional difference, both state-level predictors (priceincome ratio, unemployment rate, real per capita income growth and labor force growth) and national level ones (30-year mortgage rate, the spread between 10-year and 3-month Treasury rates, industrial production growth, real consumption growth and housing starts) were used. Table 1 summarized the covariates and their abbreviations that are used later in this chapter. Figure   1 included the time series plots of both the dependent variable (house price growth rate) and the independent variables (all the 9 predictors) to show their variations with time.  would generate vectors of zeros in the spatial neighbor matrix that can lead to computational error. Therefore, these two states were ruled out from our analysis. Modeling and forecasting in this paper are based on the remaining states.

Preliminary Analysis of the Data 3.2.1 Neighbor Structure
To qualitatively check the spatial associations between the states as well as prepare ourselves with CAR modeling, we create a neighbor structure among the 48 states. Out of the multiple neighbor defining ways, such as contiguitybased, graph-based, and k nearest neighbor, here we chose to use contiguity based method to create our neighbor matrix. We have explored several other neighbor structures while they did not exhibit significant improvement in our final model. Hence, here we skip the discussion on the selection of neighbor structure. Eventually, we came with Queen style contiguity neighbor structure that allows any two states sharing at least one point on the boundary to be neighbors [2]. Neighboring structure used in our model is shown in Figure   2. This neighbor objects described above is then transformed into a binary weight matrix W , where ω ij = 1 if jth area is neighbor of ith area; and ω ij = 0 otherwise. This is a conservative approach for creating spatial weight matrix, as little is known about the assumed spatial process. Figure 2: Neighbor structure used in CAR models.

Exploratory Data Analysis
We first tabulated the correlations among the BEA regions and plotted the time series plot for each region to gain the first insight into the spatial and temporal structure in our data. Previous studies have explored the correlations within and between BEA regions to check spatial interactions. The compositions for each BEA region are shown in Figure 3). BY studying the correlation matrix between and within the BEA regions for the duration of the year 1984 to 2011 [3] and the year of 1975 to 2003 [4] house price, it was found that the within region correlations are larger than the between region correlations with few exceptions. A similar feature is shown in our study using data 1975-2012 (see Table 2), that the diagonal correlation coefficients are generally larger than the off-diagonal numbers. Exceptions include Great Lakes -Far West and Southwest -Far west.   occurred. Therefore, forecasting over the last half of the time series by studying the first half is not an easy task, since the test set contains features that are fairly distinguishable from the training set.
Next, we quantify the spatial autocorrelation and temporal autocorrelation existing in the data by using Moran's test and time series autocorrelation function (ACF).
Moran's I test is one of the commonly used tests for spatial autocorrelation: where the observations at areal units are denoted as y i , y j and spatial structure as ω i,j [2]. Moran's I values for each time point was plotted and shown in Figure   5. More than 70% of the time there was significant spatial autocorrelation present. This finding further suggests the necessity of using CAR model to account for the spatial structure.
Autocorrelation function ρ t,s shows the correlation between time points t, s as: The ACF plot that has the correlation coefficient against lags was used to check spatial autocorrrelation embedded in the data ( Figure 6). The first five lags showed large variation with respect to the temporal autocorrelation coefficient, with a significant proportion appearing above the significance range bounded by ±0. 16. The mean of the coefficients across states also went above the range for the first five lags, with the first and fifth lag right at the boundary. The ACF plot of the original data exhibits temporal autocorrelation within the data that suggests the necessity of using time series models.

Model Selection Results
We first examined the model selection results by checking the distribution of the selected model sizes. Figure 7 and Figure 8 reveal the distribution of the selected sample size for BMA-CAR without and with dilution prior respectively. The two dilution prior models we tested gave rather similar results thus only one of them is listed here. The two bar charts both reflected the centrality around model sizes of four and five and the reduction of frequency towards the two ends. The latter model (BMA-CAR without dilution prior) sampled models with a size of five slightly more than the latter (BMA-CAR with dilution prior).
We also investigated the variables selected by those most frequently visited models. Table 3 and Table 4 summarize the top five selected models without and with dilution prior h(γ) = γ. Results from the two tables are almost identical, with the most visited models having four to five predictors that evolve around price income ratio, unemployment rate, labor force growth, mortgage rate and housing starts. These five models account for > 73% of all the models been visited, and in fact, the top two models which are PIR + UNE + LFG + MOR + HOS and UNE + LFG + MOR + HOS alone account for > 60%. The combination of variables of UNE + LFG + MOR + HOS has a significant weight in our model selection process.

Estimation Results
Moran's I tests were also conducted on the residuals after fitting of BMA-CAR models. The goal is to see if spatial autocorrelation is accounted for and if so, to what extent. BMA-CAR with dilution priors gave comparable residuals as BMA-CAR, so results from BMA-CAR with dilution prior are not included here. As revealed in Figure 9, most of the spatial autocorrelation were removed. In fact, only 11% of the spatial autocorrelation were significant which is a drastic decline compared to 70% before model fitting. The ACF plot of the residuals is displayed in Figure 10. Except for the line (California) that showed significant coefficients for the first five lags, the other states all exhibited patterns of oscillations centered around the mean 0 after the first lag.
Also, most of the coefficients were reduced with regard to the amplitude. A much higher proportion of coefficients were within the bounds when compared to Figure 6. Hence, the temporal autocorrelation originated from the data were largely removed after fitting the BMA-CAR model.

Forecasting Results
We first would like to check the overall forecasting performance over time by using the squared forecasting error difference between the MEAN benchmark and our model. We adopted the method proposed by [5]: And made a little modification in that we summed CDSFE to each time point instead of using the whole time periods: In Figure 11, we plotted out the sum of CDSFE for 48 states as a total: (LTV) threshold of 105%, 125% and no restriction [6]. Due to the high accordance between the timelines of the HARP program and the performance drops, we believe that the deterioration of the model forecasting performance can be ascribed to policy change.  Table 5 were based on 50 states rather than 48. BMS-CAR is a model similar to BMA-CAR, except that only the model with the highest posterior probability (PIR + UNE + LFG + MOR + HOS) was used for modeling.
For the BMA-CAR model, an average MSFE ratio of 0.818 for BMA-CAR as shown in Table 5 is smaller than the reported MSFE ratio for BMA(0.858) and BMS(0.903), indicating that incorporating spatial dynamics did improve the forecast accuracy compared to solely static Bayesian Model Selection and Bayesian Model Averaging [1]. Adding dilution prior did not improve the forecasting results by a significant amount. BMS-CAR showed poor performance compared to the others. Since BMA-CAR model gave the best forecasting performance and there is no significant difference adding dilution prior, discussion for the rest of this chapter uses only BMA-CAR as an example. To take a closer look at which states generated better and worse forecasts, we use Figure 12 to illustrate the state-wise distribution of the MSFE ratio between our model and the MEAN benchmark.
Most of the worst predicted states were center around the middle of the country while the coastal states tend to be predicted okay. We ranked the states by MSFE ratios and the top four and bottom four states were (Idaho, Georgia, Washington, Delaware) and (Indiana, Nebraska, Iowa, Kentucky) respectively.
To discuss the reason behind this forecasting pattern, we would need to first introduce the concept of volatility that came from [7]. Volatility was defined as an indicator of the magnitude of the boom-bust cycle; larger volatility implies a larger cycle. As found by them, housing markets growth volatility and forecasting accuracy have almost a one-to-one relationship and coastal states tend to have higher volatility than the interior states. The relationship between the volatility and the MSFE level implies better forecast in the interior states than the coastal states.
However, when it comes to MSFE ratio, coastal states tend to have smaller ratios that represent larger performance gain than the MEAN benchmark. As a matter of fact, MSFE ratio is only a relative number that it is not linked to the forecast performance directly. For those interior (stable) states, using the MEAN benchmark is not a bad choice since the historic average may not be too different from the current point. As a result, using very complicated models (such as CAR-BMA) will not cause great gain the performance. In fact, two of the four least volatility states (Iowa, Kentucky see Figure 15) are among the worst MSFE ratio states from our model BMA-CAR ( Figure 13).
We plotted Figure 15 and Figure 16 to explore the forecasting performance difference between the most and least volatile states. We also plotted out the best performing states and worst performing states from our BMA-CAR model ( Figure 13 and 14) to discuss the model forecast performance variation during the boom-and-bust cycle.
In Figure 13, numbers from the forecast were plotted against the observed.
In the bottom four figures that show the worst predicted states, BMA-CAR tends to overestimate the growth rate before the recession and underestimate the growth rate after the recession.      [7] D. E. Rapach and J. K. Strauss, "Differences in housing price forecastability across us states," International Journal of Forecasting, vol. 25, no. 2, pp. 351-372, 2009.

CHAPTER 4
Discussion and Future Work

Performance Gain by Spatial Components and Limitations
Here we present a discussion on why our model predicted better in the coastal states and not so well for the interior states. We compare our forecasting results with the forecasting results from [1]. Here we talk about the results in the context of MSFE ratio that represnt the forecast gain from using just the MEAN benchmark. Even though the forecasting results from DMA [1] are not the true model (there is no true model anyway), DMA is a more flexible model that allows for both state-wise and timewise model variations. Moreover, DMA gave smaller forecasting error than BMA-CAR (Table 5) thereby serving as a reasonable benchmark to check with.

Model Dimension and Variable Inclusion
We first start the discussion from the perspective of model dimension.
Our posterior distribution for model space favored models of four and five variables. However, discoveries from [1] suggested that many of the low volatility states preferred parsimonious models, that typically two to four variables were needed in their model, and larger model sizes occurred more often around the end of the forecasting period when there was a financial crisis. Consequently, our model potentially selected too many variables for these low volatility states and it leads to overfitting and bad forecasting performance.
We further studied the influence from the choice of variables on forecasting we were able to "borrow strength" across the states to come up with improved estimate for the house price in each [2]. As to the degree of which forecast performance improved, the lower and upper bounds are BMA/BMS and DMA/DMS respectively. That is to say, despite significant improvement compared to BMA/BMS methods due to additional spatial information, CAR-BMA model was not able to beat the model with time-variant features. One of the challenges in our study comes from the great difference seen in the house price growth rate data between the training period and the testing period.
House price growth rate underwent significant fluctuations in the 1970s (see and persistence of inflation since the early 1980s [3], [4], [5]. This structural change in the economy may have implications for the housing market.
In addition, assuming constant observation error σ 2 and revolution error τ 2 may not be feasible in our case. The states with high and low volatility showed distinguishable patterns of MSFE ratios, thus including the variations in the volatility in our model can help reduce this MSFE ratio difference and improve the overall forecasts.

An Empirical Model
Although BMA-CAR did not generate better forecasting results than DMA, it is worth noting that we were able to obtain comparable result to what was achieved by DMA by running a space-time CAR model while just using unemployment, mortgage and consumption as predictors. As revealed in Figure 17, all states except for Nebraska had MSFE ratio smaller than one so that in 47/48 states the MEAN benchmark was beaten. Contrary to forecasting results from DMA or BMA-CAR, results from CAR show generally better predictions near the coast than inland, with the exception of South Dakota and Nebraska.

Future Work 4.2.1 Data Acquisition
Our model could be improved at the data acquisition step, in that the data aggregated at the state level may not be reflective on the housing market at smaller scale. By leveraging out the housing data at the state level, we would lose information about the variations in the housing data at smaller scales. For instance, it is not reasonable to assume similar housing market structure in the New York City and the rest of the New York State.
We can instead use data collected at the county level, which is more sim-ilar to the scale of the housing prices study in real life. Data collected at the metropolitan statistical area (MSA) level is also available and can be a good candidate if the primary interest is to study the hosuing market in the population dense areas. MSA is a geographical region with a relatively high population density at its core and close economic ties throughout the area.
Currently, the United States Office of Management and Budget (OMB) has defined 382 Metropolitan Statistical Areas (MSAs) for the United States [6].
One of the most populous MSA is New York-Newark-Jersey City that spans to three states (New York, New Jersey and Pennsylvania) [7]. Although this MSA covers three states, people tend to live and commute to work within this one area and it makes sense to study them as a whole. Real life applications include real estate investors that use MSA data to study housing trends and population movement [7].

Improvement of BMA-CAR Model
We can improve the forecasts by altering the neighbor structure. Our model currently used geometric definition of neighbors, that the states sharing common borders were called "neighbors". This may not work well for areas such as New England, where the states tend to be very small but highly similar. One alternative way of defining neighbor structure is to use k-nearest neighbors, in which we define each center states having k nearest neighbors based on the distance from their centroids. In this way, we are not limited to the states that are adjacent to each other.
Future research includes creating state-wise variables for each predictor and use Metropolis Hastings algorithm to select a model. Creating a variable for each state allows for variable and coefficient shift between states and further adds to the flexibility of BMA-CAR that assumes global fixed effects. And due