Discrete Classification Problems in Agricultural and Behavioral Economics

The focus of this dissertation is on discrete classification problems in agricultural and behavioral economics. In my first two manuscripts, I take up the issue of producer misperceptions of yield risk relative to their objective, a well-established phenomenon in which farmers tend to be overly optimistic in their perceptions of yield risk, forecasting yields with higher mean and lower variance than historical outcomes would suggest. Manuscript 1 focuses on estimating both how such misperceptions are distributed across individual forecasts, as well as how such misperceptions might arise. Manuscript 2 goes on to look at how these misperceptions of yield risk affect farm-level crop insurance coverage level choices, simulating cross-coverage crop insurance demand across a broad set of scenarios. In my third chapter, I present a hierarchical Bayesian methodology for disaggregating, or downscaling, aggregated count data using an outside statistical sample. As an application, this chapter demonstrates how stakeholders can use readily available yet incomplete land use (e.g. agricultural) count data in combination with censored/aggregated census data provided at the county level to recover/estimate land use count data at the municipality (town, city, or any other sub-county region) level. In Manuscript 1, we estimate the distribution of miscalibrated perceptions of yield risk, using the expectation maximization algorithm to perform a latent class analysis to uncover potential heterogeneity (clustering) in the parameters of our yield miscalibration model. Using self-reported yield forecasts and yield history from rural Chinese farmers, we estimate miscalibration parameters for each of our 879 forecast/history observations, using expectation maximization to fit these parameters to a Gaussian mixture model. We find that forecasts can best be described as coming from three distinct distributions or clusters that can best described as ‘optimistic’, ‘unbiased’, and ‘pessimistic’. We find that roughly 67% of forecasts can be defined as optimistic with producers perceiving that, on average, yields face only half (55%) of their true risk. 12% of forecasts can be defined as pessimistic with producers perceiving that, on average, yields face 50% more risk than that of their objective risk. The remaining 21% of forecasts can be classified as unbiased, with perceived yield risk being largely in line with objective yield risk. In addition, we find that our optimistic group separates cleanly into two distinct clusters of roughly equal size – one comprised of ‘mild optimists’, and another comprised of ‘extreme’ optimists. We go on to examine the possible causes of these misperceptions of risk, finding that such misperceptions are not inherent to the producer, but rather result from cropspecific yield experience. Using regression methods, we find statistically significant evidence that recent historic losses increase the amount of producer’s level of perceived risk, while increases in the length of time since experiencing a historic loss decrease the level of perceived risk. These results have important implications for crop insurance demand modeling. These findings also suggest that a targeted subsidy approach based on outcome history may be more cost-effective at inducing insurance participation than subsidies that are fixed across locations. Namely, Not only is it important to incorporate miscalibrated perceptions of risk in crop insurance demand models, it is also important to include heterogeneity with regard to those misperceptions. Manuscript 2 takes up the question of how misperceptions of yield risk effect producers’ decisions regarding which crop insurance coverage level to participate in. We simulate cross-coverage level crop insurance demand for both yield and revenue insurance across four potential models of risk misperception and three potential models of decisionmaking one based on expected utility and two based on cumulative prospect theory yet differentiated by whether decisions are framed within the broader context of farm riskmanagement, or whether decisions/outcomes are more narrowly framed for a total of twelve choice models. Optimal coverage level choices are simulated for both corn (based on data from York Count, NE and considered to be ‘low risk’) and wheat (based on data from Sumner County, KS and considered ‘high risk’). We find that increases in optimism bias drive down the optimal choice of coverage level, eventually inducing producers not to participate in crop insurance at all. Conversely, pessimism causes producers to increase their coverage level to the point of maximum coverage. We also find that this effect is strongest in the case of yield insurance, although the effect is still significant for revenue insurance. Further sensitivity analyses suggest that these results are not highly sensitive to correlations between prices and yields. The aim of Manuscript 3 is to help stakeholders obtain policy-critical micro-level statistical data in cases where such data may only exist at a higher level of aggregation than is desired (e.g. aggregated census data). In this manuscript, published in the December 2017 edition of Agricultural and Resource Economics Review, we develop a hierarchical Bayesian methodology for downscaling regional count data to the sub-region level through the incorporating of an outside statistical sample in the form of sub-regional lower bounds (e.g. sub-regioni has at least xi farms, sub-regionj has at least xj farms, etc...). Our methodology combines numerical simulations with exact calculations of combinatorial probabilities in order to determine which values of sub-regional counts are most likely to have resulted in the available statistical sample given the information contained in our two data sources. Although our method is designed to provide municipality count data based on county level data, as a proof of concept, we demonstrate our approach by estimating Rhode Island county level farm counts (which are known, but are not used in the estimation procedure) using state level farm count data provided by the Ag Census, along with a sample of Rhode Island farm locations collected by the University of Rhode Island. By estimating values that are known we are able to measure the accuracy of our estimates. We are able to show that not only do our estimates outperform those obtained via maximum likelihood, but that they are robust to sampling variability across heterogeneous population sizes. We go on to expand our model to incorporate spatial considerations and demonstrate how the use of an informative prior based on relevant sub-region characteristics (land area, in our application) can further improve the estimates.

to look at how these misperceptions of yield risk affect farm-level crop insurance coverage level choices, simulating cross-coverage crop insurance demand across a broad set of scenarios. In my third chapter, I present a hierarchical Bayesian methodology for disaggregating, or downscaling, aggregated count data using an outside statistical sample.
As an application, this chapter demonstrates how stakeholders can use readily available yet incomplete land use (e.g. agricultural) count data in combination with censored/aggregated census data provided at the county level to recover/estimate land use count data at the municipality (town, city, or any other sub-county region) level.
In Manuscript 1, we estimate the distribution of miscalibrated perceptions of yield risk, using the expectation maximization algorithm to perform a latent class analysis to uncover potential heterogeneity (clustering) in the parameters of our yield miscalibration model. Using self-reported yield forecasts and yield history from rural Chinese farmers, we estimate miscalibration parameters for each of our 879 forecast/history observations, using expectation maximization to fit these parameters to a Gaussian mixture model. We find that forecasts can best be described as coming from three distinct distributions or clusters that can best described as 'optimistic ', 'unbiased', and 'pessimistic'. We find that roughly 67% of forecasts can be defined as optimistic with producers perceiving that, on average, yields face only half (55%) of their true risk. 12% of forecasts can be defined as pessimistic with producers perceiving that, on average, yields face 50% more risk than that of their objective risk. The remaining 21% of forecasts can be classified as unbiased, with perceived yield risk being largely in line with objective yield risk. In addition, we find that our optimistic group separates cleanly into two distinct clusters of roughly equal size -one comprised of 'mild optimists', and another comprised of 'extreme' optimists.
We go on to examine the possible causes of these misperceptions of risk, finding that such misperceptions are not inherent to the producer, but rather result from cropspecific yield experience. Using regression methods, we find statistically significant evidence that recent historic losses increase the amount of producer's level of perceived risk, while increases in the length of time since experiencing a historic loss decrease the level of perceived risk.
These results have important implications for crop insurance demand modeling.
These findings also suggest that a targeted subsidy approach based on outcome history may be more cost-effective at inducing insurance participation than subsidies that are fixed across locations. Namely, Not only is it important to incorporate miscalibrated perceptions of risk in crop insurance demand models, it is also important to include heterogeneity with regard to those misperceptions.
Manuscript 2 takes up the question of how misperceptions of yield risk effect producers' decisions regarding which crop insurance coverage level to participate in. We simulate cross-coverage level crop insurance demand for both yield and revenue insurance across four potential models of risk misperception and three potential models of decisionmaking -one based on expected utility and two based on cumulative prospect theory yet differentiated by whether decisions are framed within the broader context of farm riskmanagement, or whether decisions/outcomes are more narrowly framed -for a total of twelve choice models. Optimal coverage level choices are simulated for both corn (based on data from York Count, NE and considered to be 'low risk') and wheat (based on data from Sumner County, KS and considered 'high risk'). We find that increases in optimism bias drive down the optimal choice of coverage level, eventually inducing producers not to participate in crop insurance at all. Conversely, pessimism causes producers to increase their coverage level to the point of maximum coverage. We also find that this effect is strongest in the case of yield insurance, although the effect is still significant for revenue insurance. Further sensitivity analyses suggest that these results are not highly sensitive to correlations between prices and yields.
The aim of Manuscript 3 is to help stakeholders obtain policy-critical micro-level statistical data in cases where such data may only exist at a higher level of aggregation than is desired (e.g. aggregated census data). In this manuscript, published in the December 2017 edition of Agricultural and Resource Economics Review, we develop a hierarchical Bayesian methodology for downscaling regional count data to the sub-region level through the incorporating of an outside statistical sample in the form of sub-regional lower bounds (e.g. sub-region i has at least x i farms, sub-region j has at least x j farms, etc…).
Our methodology combines numerical simulations with exact calculations of combinatorial probabilities in order to determine which values of sub-regional counts are most likely to have resulted in the available statistical sample given the information contained in our two data sources. Although our method is designed to provide municipality count data based on county level data, as a proof of concept, we demonstrate our approach by estimating Rhode Island county level farm counts (which are known, but are not used in the estimation procedure) using state level farm count data provided by the Ag Census, along with a sample of Rhode Island farm locations collected by the University of Rhode Island. By estimating values that are known we are able to measure the accuracy of our estimates. We are able to show that not only do our estimates outperform those obtained via maximum likelihood, but that they are robust to sampling variability across heterogeneous population sizes. We go on to expand our model to incorporate spatial considerations and demonstrate how the use of an informative prior based on relevant sub-region characteristics (land area, in our application) can further improve the estimates.

Abstract
Understanding how farmers perceive risk is crucial in designing effective riskmanagement tools and policies. Federal crop insurance subsidies have grown dramatically over time to address low program participation, and it has been suggested that the widespread behavioral phenomenon of optimism bias may play a role. Nonetheless, no model currently exists to map optimistically biased forecasts into crop insurance demand. We develop a new behavioral model of perceived yield risk as shifting and scaling the objective distribution of yield risk. We fit the model to yield forecasts and yield histories of wheat and corn farmers in China, and find that forecasts are anchored to historical positive experience and they are optimistically biased, on average. To evaluate multi-modal heterogeneity in the forecasts, we estimate a Gaussian mixture model using expectation-maximization and find evidence for three basic forecast types: optimistic (roughly 67%), realistic or neutral (about 20%), and pessimistic (about 12%), with a small, extremely pessimistic outlier group. We find further clustering within the optimistic group, with about half highly optimistic and about half mildly so. The group-wise means and mixture weights are robust to inclusion of additional elicited data, and also to inclusion of a shape parameter in the forecast model. Moreover, we find no evidence that forecast classifications map to a classification of the individual farmer making them. Instead, recent severe loss experience in the crop of interest appears to be the strongest single predictor of pessimism, though we also find statistically significant gender differences.

Introduction
Since passage of the Federal Crop Insurance Act of 1980, policy makers and agricultural economists have struggled to understand the factors driving farmer adoption of crop insurance in the United States. Despite the obvious benefits of risk management, participation in crop insurance programs has historically been puzzlingly low unless heavily subsidized by the federal government (Coble and Barnett, 2013). Congress passed reform bills in 1994 and 2000 that dramatically increased the size and scope of the Federal Crop Insurance Program because of failures to attract adequate participation at sufficiently high coverage levels (Glauber and Collins, 2002), and program costs have now grown to almost 30 times 1980s levels: the federal government is projected to spend over 6 billion dollars on crop insurance subsidy programs in 2017 (USDA ERS).
Common hypotheses for the presence of "demand frictions" limiting crop insurance participation have included adverse selection, expectations of ad hoc disaster relief, and availability of alternative risk management tools (see, for example, Coble and Barnett, 2013;Coble et al., 1999;Just, Calvin and Quiggin, 1999;Smith and Goodwin, 1996;Skees and Reed, 1986), though these factors are observed in other insurance markets without the same severe effects. Some recent studies, however, posit that lower than expected demand for crop insurance results from a systematic miscalibration of subjective yield risks by farmers (Egelkraut et al. 2006;Turvey et al., 2013). There is empirical evidence that, on average, producers are optimistically biased with respect to yields and yield variability, expecting better than average yields and below average yield risk Umarov and Sherrick, 2005;Egelkraut et al., 2006;Turvey et al., 2013), and a number of studies have shown that perceived yield risk affects crop insurance demand around the world, including farmers in the United States (Horowitz and Lichtenberg, 1993;Sherrick et al., 2004;Egelkraut et al., 2006;Shaik et al., 2008), in France (Enjolras and Sentis, 2011) and in China (Wang, Ye and Shi, 2016). While extensive work has been done attempting to model the distribution of crop yields, especially for crop insurance rating purposes (see Woodard and Verteramo-Chiu, 2017;Ker et al., 2015;and Woodard and Sherrick, 2011, for recent examples), modeling the transformation of perceived yields due to optimism bias remains an open question. Addressing this gap in the literature is a critical step in the development of behavioral models of crop insurance demand going forward.
In this article, we formulate and test a parsimonious and distribution-agnostic model in which subjective yield risks (herein, "forecasts") are derived as shift-andscale transformations of the historical distribution of outcomes. The model can be estimated by regression on a per-farmer and per-crop basis using simple elicitation data, and it allows for inclusion of a reference point anchoring the forecast to a summary statistic of the historical distribution. Using data from corn and wheat farmers in Shanxi Province, China, we find statistical evidence that farmers anchor to past positive yield experience and that they are optimistically biased on average.
Critically, the distribution of model parameters appears to be multi-modal, meaning that there are heterogeneous clusters among forecasts which are not well-described by the population average (cf., Bruhin, Fehr-Duda and Epper, 2010;Sproul and Michaud, 2017).
Using a finite Gaussian mixture model, we find evidence for three basic forecast types: optimistic (about two thirds), realistic or neutral (about 20%), and pessimistic (most of the remainder), plus a small outlier group of extreme pessimists.
On average, optimistic forecasts had half the risk (a scale parameter, proportional to standard deviation) of their objective historical yield distribution, while pessimistic forecasts had approximately twice the risk of their objective distribution. Expanding our analysis to consider more classes, we find that the optimistic group cleanly separates into mildly and strongly optimistic groups of roughly equal size. Our results are robust to changing the number of data points per elicitation, and to adding a shape parameter to the model.
Since our data set contains many farmers who make more than one forecast, we are able to test whether forecast classifications are also classifications of the farmers themselves. We do not find statistical evidence in support of this hypothesis.
To gain further insight, we conduct a regression analysis to identify factors influencing the degree of optimism bias in the parameters, and find that the recency of historic losses is the most statistically and economically significant. Farmers' forecasts exhibit a 1% decrease in scale (risk) for each additional year since their historic minimum yield. We find no similar effect for historic gains, and also find no evidence of a cross-crop effect within farmers, i.e. wheat forecasts do not seem affected by the same farmer experiencing corn losses, and vice versa. We also find statistically significant differences in optimism bias by gender: men forecast about 85% of their historical yield risk, while women forecast only 74% on average, after controlling for other factors. Finally, we find that optimism bias appears to be persistent over time, as it does not vary with years of farming experience.
The remainder of this article is organized as follows: the next section provides a background of the literature pertaining to optimistically biased forecasting by farmers, and of the relevant behavioral economics literature more generally. The following two sections introduce our model and our data set. We then estimate the basic model, address the presence of reference points in forecasts, and estimate a mixture model to evaluate heterogeneity. Finally, we examine predictors of optimism bias, including demographic variables and yield history. The last section discusses implications for policy design and future research, and concludes.

Background
In the words of De Bondt and Thaler (1995, p. 389), "perhaps the most robust finding in the psychology of judgment is that people are overconfident." There is a vast body of widely replicated experimental studies showing that individuals are systematically overly optimistic in the face of risk: by and large, individuals believe that they are more likely than average to experience positive future events, while being less likely to experience negative events (cf. Slovic, 2016;Sandroni and Squintani, 2007). The finding is consistent across domains as diverse as post-college job prospects (Weinstein, 1980;Hoch, 1985), construction costs (Statman and Tyebjee, 1985), health risks (Kreuter and Strecher, 1995;Robb et al., 2004), entrepreneurship (Cooper et al., 1988Camerer and Lovallo, 1999;Landier and Thesmar, 2009), and the nowcanonical popular culture example, in which over 80% of drivers think they have above average ability (Svenson, 1981;Groeger and Grande, 1996). A unifying feature of overconfident beliefs is their dependence on the perception of control or ability to exert skill over the outcome (Sandroni and Squintani, 2004). Though the terms 'optimism bias' and 'overconfidence' are often used interchangeably, there is a noteworthy distinction that is particularly relevant to this analysis. Overconfidence generally refers to the belief that one's expected outcome will be more favorable than the expected outcome of one's peers. Optimism bias, on the other hand, is a more general term that refers to the belief that one's future outcomes will be more favorable than past outcomes, possibly including those experienced by others.
Unfortunately this distinction is not always carefully applied in the literature, so to avoid constant context-switching and qualification of terms, we will refer to these phenomena herein under the umbrella term 'optimism bias'.
The above examples are not simply cases of wishful thinking or otherwise erroneous reporting on surveys, nor are they confined to the behavioral laboratory: acting on misperceptions about risk has been observed as affecting health outcomes , legal settlements , mergers and acquisitions (Malmendier and Tate, 2006;, and insurance purchasing decisions . In a theory paper, Spinnewijn (2013) presents a model in which individuals are overly optimistic with regard to probabilities of discrete health outcomes as a function of effort, demonstrating how such a belief decreases an individual's willingness to pay for health insurance. One feature of behavioral phenomena is that some can be "unlearned" through market or professional experience, like the endowment effect (List, 2003), while others persist, like myopic loss aversion (Haigh and List, 2005). It appears that optimism bias falls in the latter category, and persists in the face of experience (Dalziel and Job, 1997), consistent with our results.
Despite the widespread evidence, optimistically biased beliefs do not apply to everyone in a given domain. Nearly every article on optimism bias has a subtext in which a surprisingly large percentage of subjects are overly optimistic and a surprisingly small (but non-zero) percentage are overly pessimistic. For example, Svenson (1981) finds that 82% of students placed themselves among the top 30% of drivers, and Wenglert and Rosén (2000) find that 72% of their subjects were classified as being overly optimistic about their personal future. Cooper et al. (1988) find that 68% of entrepreneurs believe their startup is more likely to succeed than comparable enterprises and that furthermore, only 5% believe that their odds of success are worse than their competitors (similar findings are reported by Landier and Thesmar, 2009). In the investment domain, a survey by Benartzi (2001) found that only 16.4% percent of respondents believed that their company's stock was riskier than the stock market as a whole.
Despite the breadth of this literature, there are two key areas where it lacks depth that we hope to address here. The same challenges exist in the relevant agricultural economics literature, discussed below. First, most approaches rely on estimating optimism bias on average, either by regression or comparison of distribution moments, but none of them (to our knowledge) explicitly estimate how past experience is transformed into an optimistically biased forecast. Second, there is a consistent finding of widespread optimism but always at least some pessimism, and little work has been done to estimate the incidence of these beliefs more broadly. A key contribution in this article is the classification of forecasts into optimistic, neutral and pessimistic types.
Within agricultural economics,  is the first study we are aware of to statistically compare farmers' subjective and historical crop yield probability distributions, evaluating historical yields and subjective expectations for the 1987 crop year for 98 Western Kentucky grain farmers. Pease finds that in many individuals, there exist large differences in the moments of the subjective and historical distributions. He also finds that corn forecasts were mildly pessimistic, on average, while soybean forecasts were optimistic. These differences were driven primarily by regional differences, with severe drought conditions affecting yields in both 1980 and 1983 for areas predominately planted in corn, while drought conditions affected yields only in 1983 for areas planted predominately in soybeans. The result that pessimistic forecasting bias is influenced by recent negative yield outcomes is consistent with our results as discussed above.
In a conference paper, Umarov and Sherrick (2005) test the "better than average" effect in a survey of 870 corn and soybean farmers in Illinois, Iowa and Indiana. They find strong evidence of optimism bias in terms of above average yields and below average stability (lack of variance) relative to their primary county, consistent across both crops. For each crop, approximately 60% of farmers reported mean yields different from their county average, and of these, 90% reported above average yields. Only 2% of large farms in their sample reported below average mean yields. With respect to variance, about 45% of farmers reported yield stability different than county average, and of these, about 75% reported above average yield stability. Their results are telling in much of the literature we review here: optimism bias is a widespread phenomenon, but nearly every paper observing optimism bias also observes some pessimistic members of the population (and possibly some neutral ones as well).
In another conference paper, Egelkraut et al. (2006) replicate the results in Umarov and Sherrick on outside data while adding a probability elicitation task.
Farmers were asked to compare their own yields and yield risks to other farmers in their county. Only 12% reported below-average yields, while 42% and 46% reported average or above-average yields, respectively. On the risk side, 20% reported aboveaverage yield risk, while 38% and 42% reported average and below-average yield risk, respectively. In a first approach to (implicitly) model farmers' optimism bias in yields, the authors elicit a forecast distribution as a Weibull and compare it to a fitted Weibull distribution at the county level, using a Q-Q plot (quartile-quartile).
Unfortunately, their data set lacks the farm-level yield histories necessary for the type of evaluation done by Pease, or herein. Turvey et al. (2013), the source of our data set, elicit yield forecasts and yield histories for 570 corn farmers in China using the Beta-PERT expert elicitation method. Similar methods have been used elsewhere in agricultural economics to fit Beta distributions to yield forecasts (e.g., Shaik et al., 2008). Turvey et al. indicate two aims of their study: first, to propose a solution for estimating the historical yield distribution for insurance pricing in the absence of annual data, and second, to evaluate the degree to which that insurance would be demanded by farmers according to their subjective forecasts. The authors find strong evidence of optimism bias, with 82% of households expecting average yields to exceed their historical experience, and 72% forecasting lower risk. These results are significantly correlated with farmers' self-reported "interest" in crop insurance, suggesting that optimism bias may be a key explanation for the need to subsidize crop insurance premiums.
In addition to their yields, there is consistent evidence that farmers are optimistically biased with respect to the prices they will receive. Eales et al. (1990) find that Illinois farmers and grain merchandisers had accurate forecasts of futures prices but were optimistically biased in consistently underestimating the futures price volatility, while Kenyon (2001) finds producer price expectations to be skewed toward higher prices and consistently underestimating the risk of large price changes during the season. More recently, in a conference paper, Riley and Anderson (2009) find the forecasts of Mississippi corn, cotton and soybean farmers to overestimate price and underestimate volatility relative to commodity futures and options markets.
As mentioned above, the perception of skill or control over outcomes is a critical feature of optimism bias. While it is not reasonable to expect that farmers think they influence commodity prices, there may be a framing effect in which farmers recognize that they influence their income through skill (via yields), and this may carry over to optimism bias being exhibited in price forecasts when elicited separately. In a case without this framing issue, Sherrick (2002) finds farmers to be pessimistic on average with respect to weather forecasts.
To summarize, there is extensive evidence of optimism bias in the population at large and in farmers, as well. The agricultural economics literature is by-and-large consistent with the mainstream behavioral economics literature, finding optimism bias to be widespread and persistent in the face of experience. There is not, to our knowledge, a mainstream model of optimism bias in terms of the way in which historical experience is explicitly transformed into an optimistically biased forecast.
In the paper of Spinnewijn (2013) discussed above, there is a model of overconfidence about effort affecting a model of binary outcomes of success or failure, but this model does not generalize to continuous outcomes or non-binary distributions. There is some evidence in the above literature that optimism about average outcomes (e.g., mean yields) may materially differ from pessimism about risk (e.g., higher yield variance), but no model exists to bring these findings together under a cohesive structure. In what follows, we aim to satisfy these gaps in the literature and provide a structural model for future researchers to build upon.

A Location-Scale Model of Forecasts from Historical Data
We model the forecast distribution as an affine, "shift and scale" transformation of the historical distribution of the form: (1.1) In the above equation, X and Y are random variables representing the historical distribution and forecast distribution, respectively, is a shift parameter, and is a scale parameter. We select this specification because it is a priori reasonable and consistent with basic statistical intuition. The model's simplicity does come with the downside that higher moments of the distribution are not considered (e.g., a shape α β parameter), though we will show in what follows that allowing a skewness adjustment from historical to forecast does not meaningfully change our results.
The benefits of this simple specification appear to dramatically outweigh its downside: i) the model requires only the assumption that the historical and forecast distributions come from the same family, ii) it does not impose any directional bias on the nature of forecasts, and iii) it includes as a special case a normalization step, in which the historical X might be adjusted to have zero mean and unit variance before shifting and scaling. Past research has tended to find that forecasts deviate systematically from average, or typical, experience, and that they usually do so in an optimistic fashion for the majority of the population. Lack of directional bias is important in a model precisely because what is optimistic may change sign depending on the application: an optimistic earning forecast might include an upward shift, whereas an optimistic forecast of commute time might include a downward shift.
The generality of our model with respect to normalizing X is demonstrated as follows. First, assume that X has known mean and standard deviation, and , so the normalized forecast would be constructed with parameters a and b according to: This is a special case of our model obtained by setting equal to the parenthetical term and by setting . In fact, our model substantially generalizes this approach by the simple fact that and need not be known in order to make progress.
Rather, estimating the model's parameters requires only a minimum of two pairs of matching points from the support of each distribution. The pairs can be specific points in the support of the distribution, as well as the mean (even if it is not in the support, as for a categorical random variable), or they might be elicited according to specific percentiles (min, max, median, IQR, or endpoints of a confidence interval). The two modeling parameters, and , can be solved exactly from two such pairs of data points, or they can be estimated by regression in cases where more data are available and the system of equations is treated as overidentified with attendant errors. The specification in Equation 1.1 lends itself naturally to a regression approach.
The basic model can be extended to include a reference point. In this setting, the shift parameter is applied to a reference point, r, which might be a point in the support of X or a summary statistic derived from the historical distribution. This transformation takes the form: where X is not adjusted for r by the same reasoning as in the normalization discussion above. In practice, r might be the mean or median, or some other salient point that serves to anchor the forecast. In our application, we apply goodness-of-fit testing to determine that including a reference point outperforms a raw shift parameter, and to select a model in which forecasts are anchored to the reported historical maximum value.
It is important to point out that our model is not designed to be an alternative to the probability weights used in cumulative prospect theory (Tversky and α β Y = αr + β X Kahneman, 1992), despite the familiar use of the reference point terminology.
Prospect theory probability weights, often referred to as decision weights, are simply another measure of risk preferences, and are not designed to model an individual's subjective beliefs about risk. In contrast, our forecasting model is designed to model beliefs about future risk as a function of a historic outcome distribution, in which the support of the distribution is allowed to vary from history to forecast. Probability weights transform the relative probabilities associated with each outcome, but do not contemplate the potential for changing support of the distribution. That being said, our forecasting model could easily be incorporated into a cumulative prospect theory framework by layering probability weights on top of elicited, subjective probabilities.

Data
The observations for corn farmers in our data set were originally published in Turvey To avoid inducing the farmers to anchor their forecasts to historic experience, the questions about historic experience were administered only after the forwardlooking forecast information was collected. In this manner, the forecasts are intended to capture farmers' true risk assessments, which still would be based on historical experience, but would not be explicitly anchored to their reported historical experience in the same survey. The historical yield information collected from each farmer (for each crop, if applicable) included the lowest ever and highest ever yields in his/her memory, the years when those yields occurred, and the average yield in their experience. We eliminate incomplete questionnaires, farmers who do not grow corn or wheat, and also farmers with a degenerate distribution (minimum equal to maximum) reported for either their historical experience or their future forecast (these were the nonsense forecasts mentioned above).
[ Figure 1.1 ] The nature of the forecast and historical information elicited was determined by the authors' choice of the well-known Beta-PERT expert elicitation procedure to estimate the forecast and historical distributions (Malcolm et al., 1959;Bewley et al., 2010 There are also three potential downsides of the data collection method that we try to rule out empirically: i) the elicitation of future mode versus historical mean, ii) the potential for technological change affecting yields over time, and iii) the potential for inconsistent mapping between historical and forecasted minimum/maximum as representing comparable confidence intervals (discussed in further detail in section 1.5). First, the elicitation of the future distribution includes the mode but not the mean, whereas the elicitation of the historical distribution includes only the mean. We are confident this was not due to translation error since the survey was prepared in English, translated into Chinese, and then back-translated into English by two independent bilinguals. All students were trained prior to the survey, and were debriefed twice daily by attending faculty while in the field. Rather, the modal value was chosen for the forecast because it is specified in the Beta-PERT methodology, and it is likely the mean was chosen for the historical data because of relevance. With yields averaging 800-1,000 Jin/Mu for many farmers, the historical values might all have been unique (farmers were asked to round values to the nearest 10 Jin/Mu), rendering a request for the historical mode ambiguous. In a later section of this article, we test inclusion of the mode/mean pair as a data point for regression fitting of our basic two parameter model, and find that it does not materially change the results. We also test a 3-parameter model that includes an adjustment for skewness net of the shift and scale parameters. Depending on the type of forecast, we find that subjects report, on average, a forecast mean between -1% and 3% higher than their historic mode, after accounting for shift and scale adjustments. However, we find no meaningful change in goodness of fit or in the number of mixture components selected.
When modeling crop yields in the United States, it is common knowledge that technological improvements have played a role in dramatically increasing yield per acre over time. In the case of corn, national average yields per acre have increased by more than 7x in the last 100 years, in a manner that distinctly resembles a linear trend reported means. Model 1 shows a baseline specification and model 2 allows for distinct yield trends by crop. On this basis, we conclude that no yield trend adjustment is necessary for our data set. This finding is further confirmed in our later results showing the estimated optimism bias does not vary systematically with years of farming experience. Furthermore, we perform additional regression analysis and find no evidence that differences in the reported ranges (minimum yield -maximum yield) are significantly affected by years farming. This implies that the historical ranges reported are independent of the number of yields experienced, a finding which is somewhat at odds with the typical assumption of yields as following a Markov process (more sample draws should give a wider range, on average). We therefore recommend that future surveys also ask respondents to report yields in the previous growing season in order to test for such a possibility, and therefore reduce the need for such assumptions.
[ Table 1.2 ] goodness of fit criteria unambiguously selected model 2, indicating that the best fit is achieved using the reported historical maximum as a reference point. This is, in and of itself, a finding of optimism in the sense that forecasts are apparently anchored to the upside of historical experience. The finding is also intuitively appealing in the sense that anchoring to a reference point is more meaningful: it would be difficult to interpret an arbitrary shift parameter without historical context.

Model Selection
[ Table 1.3 ] As discussed above, there were three data points elicited from each farmer for each of their historical and forecast distribution. They were asked for the historical minimum, maximum and mean yields, and they were asked for minimum, maximum and modal yields in the upcoming year. The results in table 1.3 use only the minimum and maximum, reported for both future and historic, since they are clearly representative of the same points in the distribution. In fact, even if farmers were only reporting endpoints of a confidence interval (e.g., the 10 th and 90 th percentiles, as in Shaik et al., 2008), the data are still valid for our approach. However, this conclusion relies on the assumption that the minimum and maximum values being reported represent the same confidence intervals for both forecasted and historical outcomes.
In practice, this may not be the case since farmers are asked to recall historical worst and best yields as specific events in specific years, whereas their forecast max and min values may not represent true extremes of the distribution. The same validity does not apply when comparing the historical mean against the forecast mode: we have no reasonable assurance that one will map to the other according to our basic shift and scale model. Nonetheless, we find some evidence that they are not far off, and that the subjects may not be differentiating between the two. To show this, and as a robustness check, we re-estimated the results in

Identifying Heterogeneity with a Gaussian Mixture Model
An examination of the empirical densities over the population of forecasts for estimates of and reveals substantial heterogeneity. Of particular interest, the empirical marginal densities appear to be multi-peaked and a joint plot indicates that the peaks may coincide. This is shown in figure 1.3, which is a bivariate hex-plot of the joint density, with marginal kernel densities along each edge. Accompanying summary statistics for forecast parameters by crop and by gender are presented in table 1.5. Together, these features give reason for concern that heterogeneity in the form of clustering may be present, which could be better represented by a mixture distribution over types or "classes" of forecasts. If so, simply reporting the population-mean values (implicitly as the center point of a single-peaked distribution) may result in estimates that do not represent any one peak in the distribution (e.g., as noted in Sproul and Michaud, 2017). To conduct an explicit test of whether the observed phenomenon is statistically meaningful, we compare goodness-of-fit statistics from a set of (finite) Gaussian Mixture Models (GMMs) fitted with different numbers of components.
[ Table 1.5 ] Each GMM is fit using the well-known expectation-maximization (EM) algorithm of Dempster, Laird and Rubin (1977), an iterative procedure comprised of two steps. In the expectation step ("E-step"), the likelihood function is used to calculate so-called 'membership probabilities,' , denoting the probability that each individual i belongs to each type (or 'class') c. The average of these membership probabilities becomes the updated mixture probability for each class, . In the maximization step ("M-step"), the updated mixture probabilities are held fixed while the log likelihood is maximized by varying the parameters for each class, collectively referred to as the vector, . After the M-step, the algorithm repeats until suitable convergence is achieved.
Formally, let t denote an iteration of the algorithm. In the E-step, the updated membership probabilities for step t + 1, for each individual, i, and each class, c, are given according to: . (1.4) Here, are the data for individual i, are the most recent maximum likelihood estimates for the parameters describing type c (from the previous step), and f denotes the likelihood of the data given the estimated parameters. The updated mixture probabilities for each type (at the population level) are then simply the averages of the membership probabilities: .
The M-step then maximizes the log-likelihood function, holding the mixture probabilities constant. The updated estimate, , solves (1.6) Recall that the vector represents the parameters collectively describing the multivariate normal for each type, c. Since our forecasting model is a 2-parameter model, each vector contains two mean parameters, as well as two additional parameters to populate the covariance matrix, which we assume to be diagonal following Bruhin, Fehr-Duda and Epper (2010).
A known shortcoming of applying expectation-maximization to fit a mixture model is the researcher must specify ex ante the number of mixture components (types). The algorithm does have some capacity to "zero out" redundancies by giving near-zero weights (the mixture probabilities, ) to extraneous classes as it endogenously determines the classification of data points. In practice, however, specification of too many mixture components can lead to over-fitting, exhibited by ambiguous membership probabilities (e.g., close to 0.5).
Information criteria such as the corrected Akaike information criterion (AICc), BIC, or comparable cross-validation approaches, are often used for model selection but may be inadequate in mixture model applications due to insufficient penalization of extra parameters (Celeux and Soromenho, 1996;Biernacki et al., 2000). The latter authors introduce the integrated completed likelihood (ICL) criterion, a modification of BIC with an additional penalty for entropy with the goal of achieving better out-of-sample classification. More entropy means more ambiguous assignments to the various classes, corresponding to τ ic values that are far from both 0 and 1, an indication that the fitted mixture model is not effectively classifying the data into distinct types. In effect, ICL adjusts BIC for the ambiguity of classifications, and this ambiguity is conveniently measured by the posterior membership probabilities estimated in the course of the EM procedure. As an additional check on the robustness of classification, Sproul and Michaud (2017) also report percentages of individuals with at least one posterior membership probability greater than 0.90, 0.95 and 0.99, respectively.
Our estimation results for GMMs with up to 6 mixture components are presented below in table 1.6. We present results separately for corn forecasts (N=442), for wheat forecasts (N=437) and pooled (N=879), with boldface indicating model selection for each criterion. A number of features of the model selection process are apparent, including i) the selection by ICL of more parsimonious models with fewer components, ii) the ability of the EM algorithm to "zero out" extraneous outlier groups, iii) the "preference" of AICc and BIC for more components due to increases in the likelihood function, and iv) the consistent "preference" of ICL for models with high rates of unambiguous classifications, as evidenced by the percentage of individuals classified with close to 1. Across data sets, each criterion selects C > 1 components, providing empirical justification of our mixture model approach to examine heterogeneity.
[ Table 1.6 ] For each data set, ICL is minimized with a 4-component mixture, comprised of three main groups/classes and one small outlier group. The three larger classes are τ ic roughly the same size in each of the corn, wheat, and pooled data sets, and as will be seen below, they correspond to roughly identical model parameters as well. These classes correspond to optimistic forecasts (67%), neutral forecasts (20%) and pessimistic forecasts (11%), for which the interpretation of parameters into labels will be discussed momentarily. The final outlier group (2%) includes extremely pessimistic forecasts that dramatically differed from their peers. Because of the consistent small size of the outlier group, we will discuss these results as We observe this feature here. When increasing from a single component to a 2-component mixture, the pessimistic group is split off from the rest in each data set.
For the corn forecasts, increasing from 2 to 3 components gives us our three main In each case described here, the parameter estimates for and support our characterizations of the groups in a manner indicative of consistent subdivisions when the number of components is increased. Of particular interest are the mixture models with four and six components respectively, which are best characterized as a 3component mixture with outliers (due to the 4 th outlier group), and a 4-component mixture with outliers (due to the outlier group being subdivided into two smaller outlier groups). We now explore the parameters of these two models in detail.
Contour plots visualizing our mixture results are shown in figure 1.4.
estimates of (0.58, 0.40) indicating expected yields even higher and even less risky than those indicated by the mean estimates of the consolidated optimistic group from the 3-component model. The mildly optimistic group, on the other hand, exhibits parameter estimates of (0.31, 0.70) indicating a more neutral or realistic outlook but still clearly optimistic (30% reduction in risk over neutral). The mildly optimistic group also has consistent parameters across the data sets, and comprises 31% of forecasts overall.

Factors Influencing Forecasts
Having identified distinct types of forecasts, it is worthwhile to examine how these differing biases (or lack thereof) might arise. There are two key questions. First, are forecast types actually revealing types of the people making them? That is, must an optimistic forecast necessarily come from an optimistic person? Second, to the extent that forecast types might differ within individuals, what are the key sources of variation?
One reason for choosing our data set was the presence of multiple forecasts for many subjects interviewed, which gives the opportunity to test whether forecast classifications are generally consistent within individuals. A review of our evidence suggests they are not, in two ways. First, if we take the view that the posterior membership probabilities, τ ic , arising from the EM procedure represent a posteriori classification probabilities, then the forecast types cannot represent types of subjects.
Simply put, only 65% of subjects growing both crops gave the same type of forecast for both crops. If the posterior probabilities indicate the probability that an individual bears a particular classification, then this outcome is statistically impossible given the that passes since a historic loss, the average farmer reduces the relative scale ( β ) of their future yield risk by 1% (in the optimistic direction). Effects for the shift parameter (α ) are in the opposite direction nominally, but also optimistic.
While these numbers may appear small, the effect can be dramatic. As shown in table 1.1, the average time since a historic loss is 8.65 years with a standard deviation of 8.49 years, and with 3 and 11 years, respectively as the bounds of the interquartile range (IQR). All else equal, the relative scale (compared to their own historic distribution) for the future yield forecast of a farmer at the top of the IQR for historic losses will be eight percentage points below that of a farmer who suffered a historic loss 3 years ago (eight percentage points is equivalent to a 13% deviation of β below the mean).
In contrast, future forecasts were not significantly affected by recent events in the gains domain (i.e. recent maximum historic yields). This suggests a finding similar to the theory intuition, namely that losses are more salient that gains.
Explicitly testing this hypothesis, however, is beyond the scope of both our data and this article.
As table 1.11 shows, we tested a number of history-related features of the data set, including cross-crop experience, but did not find statistically significant prediction from any of them, apart from recent severe losses in the crop being forecast. We also tested crop-level fixed effects and found no significant variation, implying that recent loss history is likely the dominant crop-specific factor in determining differential forecasts in our data set. At least in the case of cross-crop experiences, it is possible that effects are not identifiable due to correlation induced by weather. Naturally, strong positive correlation due to weather would result in collinearity in our regression tests; this is shown in figure 1.4, which plots the years since a historic loss in corn versus in wheat for those farmers growing both.

Gender Differences in Forecasts
In addition to crop-specific effects arising from loss history, we also observe subjectspecific effects in the form of statistically significant differences in parameter estimates by gender. Our regression results show that on average, women appear to be even more optimistic with regard to yield risk than their male counterparts. While male forecasters expect future yields to have 85% the scale of their historic yield distribution (i.e., β = 0.85 ), female forecasters expect future yields to come from a distribution with roughly only 74% the scale of their historic distribution (i.e., β = 0.74 ). This finding is consistent across both crops, despite the presence of within-subject heterogeneity in beliefs between crops.
Our results regarding gender differences in forecasts tend to stand somewhat in contrast with the literature on risk perceptions. Namely, the consensus is that on average, risk tends to be judged as lower by men than by women (see, for example, Gutteling and Wiegman, 1993;Stern et al., 1993;Flynn et al., 1994;Slovic et al., 1997;Finucane et al., 2000), and by white men in particular (Flynn et al., 1994).
Despite race and gender being found to be strong predictive factors for risk perceptions, Finucane et al. (2000) found considerable variation across both males and females of ethnic minorities in their replication of Flynn et al., concluding that risk is a social construct depending largely on the individual, even if its perception may be influenced by cultural and biological factors.
From the prospect theory literature, there has also been evidence of gender differences in risk preferences (as opposed to risk perceptions) and in probability weights. For example, Fehr-Duda, de Gennaro and Schubert (2006)  To summarize this discussion, our results indicate that women tend to forecast less risk than men, all else equal, but the literature on risk perception suggests that they generally perceive risks to be more severe. At the same time, research on prospect theory suggests that women are more averse to risk than men, but that they are insensitive to marginal changes in risk for probabilities that are not close to certain. Clearly, further research is needed to determine whether these associations are unified by some underlying mechanism or model.

Persistence of Optimism Bias
As mentioned in the background section, existing evidence from Dalziel and Job (1997) suggests that optimism bias persists in the face of professional experience: they showed that even professional drivers tend to underestimate their risk of an accident. Our data support this finding, in the sense that we find no statistical evidence of changing optimism bias as a function of years of farming experience. In fact, across all models tested in table 1.11, our estimate of the marginal effect of experience is best described as being precisely estimated and near-zero in terms of economic significance while also not being statistically different from zero.

Extending the Model to Include a Shape Parameter
While we have shown our basic shift-and-scale model to be fairly robust, we have not thus far tackled the question of whether the model might include a meaningful shape parameter to account for changes in higher moments. In particular, it might be the case that forecasts are not only classified according to optimistic, realistic or pessimistic shifting and scaling, but also that these classifications vary meaningfully with respect to shape changes in the transformation from the historical distribution to the forecast. To address this concern, we extend the model from a 2-parameter shiftand-scale form to a 3-parameter shift-scale-and-shape form. Specifically, we include a third parameter, γ , mapping the historic mean into the future mode, net of changes already induced by the shift and scale parameters, α and β . In terms of our regression specification, the shape parameter is interacted with a dummy, 1 mid , equal to one for rows in which central points of the historic and forecast distribution are matched: It is worth repeating that any results obtained with this model on our current data are exploratory in nature: our data set is based on an elicitation method that does not match mean-to-mean or mode-to-mode for the historical and forecast distributions, but only allows (potentially) matching historic mean to forecast mode.
That said, our earlier results are remarkably robust to this extension of our model.

Manuscript 2 The Effect of Optimism Bias on Cross-Coverage Level Crop Insurance Demand
This manuscript has not yet been submitted for publication It is single authored by Clayton Michaud

Abstract
This manuscript examines the effect of producer misperceptions of yield risk on crosscoverage level demand for crop insurance. Using real-world data, we simulate both yield and revenue insurance coverage decisions for two representative farms ("low risk" and "high risk") across four potential models of risk miscalibration and three potential models of decision-making (generalized utility models). We find that the effect of optimism bias is remarkably consistent across our 48 scenarios. Given current crop insurance subsidy levels, we find the counterintuitive result that as optimism bias regarding future yields increases, producers choose to insure their crops at decreasing levels of coverage, eventually opting not to purchase insurance at all. Additionally, we find that the magnitude of this effect is larger in the case of yield insurance than revenue insurance. We perform a further sensitivity analysis on the effects of price-yield correlation but find that are results are not highly sensitive to this parameter.
Furthermore, we demonstrate that these results are driven primarily by current subsidy levels, namely the fact that lower coverage levels are awarded higher proportionate subsidies. We go on to discuss how these results can be used to develop more costeffective crop insurance subsidy policies, and discuss avenues of future research.

Introduction
Optimism Bias is a ubiquitous and well-known phenomenon wherein individuals assume they are more likely to succeed and less likely to fail than their peers (Slovic, 1987;DeBondt and Thaler, 1995). There is a large body of evidence that such optimism bias leads to biased decision-making and has affects over a wide swath of domains ranging from health outcomes , legal settlements , mergers and acquisitions Tate, 2006, 2008), and insurance purchasing decisions Spinnewijn, 2013Spinnewijn, , 2015Spinnewijn, , 2017.  (Horowitz and Lichtenberg, 1993;Sherrick et al., 2004;Egelkraut et al., 2006;Shaik et al., 2008, Enjolras andSentis, 2011;Wang, Ye and Shi, 2016).
While optimism bias has been argued as contributing to lower than expected willingness-to-pay for crop insurance, another important question regarding farmers' insurance decisions remains unanswered, the question of why farmers choose the coverage levels that they do. Expected utility theory predicts that producers will purchase crop insurance at a coverage level at least as high as the level that maximizes per-acre subsidies. In reality, however, there is no statistical evidence to support this We find that the effects of optimism bias on cross-coverage level demand for crop insurance are consistent across all 12 models of decision-making under biased perceptions of yield risk, and are as follows.
(1) As optimism bias increases (decreases), farmers prefer lower (higher) levels of crop insurance coverage than they would otherwise prefer, with extreme optimism bias inducing farmers to buy no insurance at all.
(2) Optimism bias affects cross-coverage demand for revenue and yield insurance similarly, however the effect is more pronounced for yield insurance.
(3) The effect of optimism bias on cross-coverage demand for revenue insurance is relatively stable across changes in the correlation between prices and yields. We go on to discuss why our main result is both surprising and significant to policy design, pointing out that such a result is primarily driven by decreasing proportionate subsidies as coverage levels increase. Further simulations go on to demonstrate that under equal proportionate subsidies across coverage levels, expected utility maximizers will always choose the maximum level of coverage until optimism bias eventually induces them not to purchase insurance at all.

Modeling Optimism bias
In our model of decision-making under optimism bias we assume that farmers make their decision of which coverage level of insurance to purchase based not on F( y, P) , the true distribution of yields and prices 3 , but rather based on F(ŷ, P) , which is their forecasted or perceived distribution of yields and prices which need not be equal to the true distribution. We model forecasted or perceived yield risk as being miscalibrated from the objective yield risk such that their perceived yield distribution is a function of the true distribution of yields, and one or more miscalibration parameters such that ŷ i ( y i ,φ) . We consider four potential models of optimism (pessimism) bias. Following standard yield modeling procedures we assume that y i and ŷ i each follow a beta distribution, B a,b,S 1 ,S 2 ( ) and Bâ,b,Ŝ 1 ,Ŝ 2 ( ) , where a and b are the lower and upper bounds of the distribution, respectively, while S 1 and S 2 are shape parameters. Given these four parameters, the first three moments can be defined as follows: Additionally, we assume the lower bound of the objective yield distribution, a, to be zero. This is done both because this is the case for both of the objective crop yield distributions we will be simulating outcomes from, as well as because for no crop is it ever truly the case that Pr(y i = 0) = 0. Prices are assumed to be distributed according to a log-normal distribution which is accurately perceived by the farmer, and may or may not be correlated with yields.

Optimism Bias Model 1: Underestimating Downside Yield Risk
Under this model, farmers increase the mean and decrease the variance of their perceived yield distribution relative to their objective yield distribution by increasing the lower-bound, while holding the upper-bound fixed such that where β 1 is our miscalibration parameter and where optimism bias is defined as β 1 < 1. . This model of optimism bias can be described as the belief that 'the bad things that happen to others can't happen to me'.

Optimism Bias Model 2: Over(under)estimating Upside Yield Risk
Under this model of optimism (pessimism) bias, farmers increase (decrease) the mean and decrease (increase) the variance of their perceived yield distribution relative to the objective yield distribution by increasing (decreasing) the upper-bound, while holding the lower-bound fixed. In this case, and optimism bias is defined as β 2 > 1, while underconfidence is defined as β 2 < 1. This model of optimism (pessimism) bias can be described as the belief that 'just because good (bad) things do not happen to other people does not mean they will not happen to me'.

Optimism Bias Model 3: Underestimating Yield Variability
In this model of optimism bias, farmers decrease the variance of their perceived yield distribution as compared to their objective yield distribution by increasing the lowerbound and decreasing the upper-bound, while holding the mean fixed. In this setting, This model of optimism bias is akin to the belief that 'really good and really bad things might happen to others, but neither will happen to me'.

Optimism Bias Model 4: Optimism Regarding the Shape of the Yield Distribution
In our fourth model, farmers decrease (increase) the skewness of their perceived yield distribution as compared to the objective yield distribution in a way that increases (decreases) the mean and scales the variance. Under this specification, where Z is now our miscalibration parameter. Here, optimism bias is defined as Z>1 while underconfidence is defined as Z<1. By specifying the model such that is independent of our assumption that a = 0. Unlike with our β s, the effect of Z on the new variance can go in either direction depending on the skewness of original objective yield distribution. More specifically, adding positive (negative) skewness to an already positively (negatively) skewed distribution will decrease the variance, while adding positive (negative) skewness to a negatively (positively) skewed distribution will decrease the variance. This model of optimism bias is analogous to the belief that 'although the best and worst things that could happen are the same for me as for everyone else, I am more likely than others to experience good things'.
For a graphical example of how each of the four models modifies the objective yield distribution, see figure 2.1.

Modeling Farmer Crop Insurance Coverage Level Decision-Making
We consider three potential models of producer decision-making regarding which crop insurance cover level (if any) to purchase, specifically one based on the standard expected utility model, and two based on cumulative prospect theory, one in which the decision is framed broadly and one in which the decision is framed more narrowly.

The Objective Function
Given the choice between various insurance coverage levels, c, the farm's objective function can be written as denotes the decision to purchase insurance at cover level c, and C represents the highest coverage level available. Given that d c ∈! + , the constraint ensures that a producer must pick one and only one coverage level. Rather than allow π n + = w + ( p n ), and where is the coefficient of loss aversion and represents the degree with which losses are felt more strongly than corresponding gains, such that . One of the major strengths of the cumulative prospect theory model is that when λ, γ, δ = 1 and x i ∈! + , the model generalizes to that of the traditional expected utility model, where farmers are risk averse for r > 0, risk neutral for r = 0, and risk loving for r < 0. This allows us to use the same model even in cases where farmers are simply expected utility maximizers, rather than prospect theory value function maximizers.

The Expected Utility Model
In our first model of decision-making λ, γ, δ = 1 and x i ∈! + so that the cumulative prospect theory model becomes equivalent to that of the expected utility model.

The Cumulative Prospect Theory Model
A key feature of prospect theory is the idea of a reference point from which outcomes Another important and related question is how outcomes are framed. Is the outcome of an insurance contract broadly framed such that it includes the entirety of the farm's financial risk, or is the outcome framed more narrowly such that only the outcome explicitly derived from the insurance is considered? In order to gain a better understanding of how these framing decisions might change the effect of optimism bias on coverage choices, we consider two reference dependent outcomes, one in which outcomes are framed narrowly and one in which outcomes are framed more broadly.

Broadly Framed Outcomes
When outcomes are framed more broadly and which states that x i is the sum of the forecasted difference between the i th realization of revenue and expected revenue, and the difference between the forecasted i th indemnity realization and the subsidized premium paid in order to receive that indemnity.

Narrowly Framed Outcomes
When outcomes are framed more narrowly and R = ρ(c)(1− s c ) , which states that x i is simply the difference between the forecasted i th indemnity realization and the subsidized premium paid in order to receive that indemnity. Under this model, farmers essentially treat insurance as a one-off lottery wherein they only win . As a result of not fully integrating the entirety of the farm's financial risk, the narrow-frame farmer becomes more likely to realize an outcome that is perceived as being a loss causing the decision to be more affected by loss aversion than it otherwise would be. As demonstrated by Babcock (2015) in a non-optimism bias setting, this model of narrow framing outperforms the other two models when it comes to explaining the seemingly 'anomalous' coverage level choices observed in the real world.

Simulating Optimal Cover Level Choices
Optimal coverage level choices under our various decision-making models cannot be solved analytically, at least not under any realistic distribution of yields and prices. 4 Premium rates are determined based on historical data and then loaded by a factor of 13.6% in order to account for potential future losses that are not reflected in the historical data. 5 Insurance companies are reimbursed by the federal government for the administrative costs of providing fair-priced crop insurance. Price-yield and yield distribution plots for our simulated data are shown in figure 2.4. In order to reduce any potential error causes by sampling variability, both guarantees and premiums were re-calculated based on our simulated data.
[ For all three of our decision-making models, r, the coefficient of risk aversion, is set equal to 0.12. In our expected utility model λ, γ, δ = 1, while in both of our cumulative prospect theory models λ = 2.25, γ = 0.61, δ = 0.69, which are the values estimated by Tversky and Kahneman (1992) based on their experimental data 7 .

Calculating Decision Weights on Simulated Outcomes
The

Results
Our main results are presented in tables 2.2A through 2.2D. Although we calculate significantly different optimal coverage levels across our twelve models, two crops, and two insurance types, the overall effects of optimism bias on coverage level choices are largely consistent across all twelve specifications for both crops and both insurance types. Tables 2.3A through 2.3D present the results of our additional sensitivity analysis with regard to correlation between prices and yields. [

The Effect of Optimism (Pessimism) Bias on Cross-Coverage Level Demand
Our first result is that as optimism bias increases, the optimal coverage level selected decreases. This pattern holds true for all 24 of our yield insurance simulations, and 22 of the 24 revenue simulations. In this case of optimism bias model 3 under the broad framed prospect theory model, the optimal coverage choice stay unchanged across changes in β 3 for both crops. Furthermore, as optimism bias continues to increase, producers move towards preferring no insurance.
We find that the converse effect also holds true, where increasing pessimism bias causes producers to prefer higher levels of coverage than they would in the unbiased scenario. This holds true for all cases where the unbiased coverage choice was not already at .85, with all choices eventually approaching .85 coverage.

Further Sensitivity Analyses The Effect of Optimism Bias Across Insurance Type
While the overall effect of optimism bias on coverage level choice is consistent across both revenue and yield insurance, we find that in general the magnitude of the effect of optimism bias on cross-coverage level demand is larger for yield insurance than for revenue insurance. This is the case for 22 of our 24 comparisons, with 2 comparisons -wheat, expected utility, optimism bias models 1 and 2 -having identical coverage choices across both yield and revenue insurance. This effect is perhaps not surprising given that our optimism bias model assumes producers are accurate in their perceptions of price risk.

The Effect of Optimism bias Across Optimism bias Models
When comparing the effects of optimism bias on coverage level choice across our four optimism bias models we find varying results. Model 1 produces the most dramatic effects, followed by model 3. It is probably not surprising that underestimating downside risk has a more dramatic effect than simply overestimating upside risk, given that it is downside risk that insurance is protecting against.

The Effect of Optimism Bias Across Decision-Making Models
Across the parameters of our four optimism bias models it is always that case does not always hold as optimism bias increases. While it is not surprisingly that coverage choices under the narrow CPT frame are more affected than those under the broad frame given that the narrow frame is more effected by loss aversion, it is surprising that coverage choices under expected utility are more affected than the broad CPT frame given that the broad CPT frame is effected by loss aversion, whereas the expected utility choices are not.

The Effect of Optimism Bias Across Crops
Differences in the effect of optimism bias between our two crops varies across our four models. Under model 1, the effect is stronger for wheat (our 'high risk', positively skewed crop) than it is for corn (our 'low risk', negatively skewed crop).
However, under model 2 we find just the opposite, with corn being more effect by optimism bias than wheat. Under model 3. We obtain conflicting results. For revenue insurance, the effect of optimism bias is more dramatic for wheat, whereas for yield insurance, the effect is more dramatic on corn. This suggests that the effect of each model depends on the skewness of the objective distribution and may be an area for future research.

The Effect of Optimism Bias Across Price-Yield Correlations
Our sensitivity analyses with regard to the correlation between prices and yields is displayed in tables 2.3A through 2.3D.
[ We find that results do not change dramatically based on whether or not prices and yields are negatively correlated compared to when no correlation exists. Although some coverage level choices based on whether the price-yield correlation was set to 0 or -0.3, changes were minimal overall and did not consistently change in one direction or the other.

The Role of Unequal Proportionate Subsidy Levels
It is important to point out why the finding that optimism bias causes farmers to prefer lower levels of crop insurance coverage is both surprising and significant. This result does not simply stand on its own. In fact it can be shown that in cases when subsidy levels are equivalent across coverage levels that this result no longer holds, and that instead, an expected utility maximizing farmer will purchase either maximum coverage, or no coverage at all. This is can easily be shown mathematically for the case of a risk-neutral expected utility maximizer. Such a farmer will choose the coverage level that . This can be mathematically rearranged to state that such a farmer will maximize It is indeed the case that for all four of our models of optimism bias,
Re-simulating coverage choices under equal proportionate subsidies (s c = .5 for all c), we find that under this subsidy schedule, our mildly risk-averse EU farmer (r = .12) will always prefer c = .85 until eventually preferring c = 0. We find the same result for our mildly risk-averse, broad-framing CPT farmers. For our mildly risk-averse, narrow-framing CPT farmers, we find that there are still instances where the farmer prefers an interior solution, however the number of interior solutions chosen is greatly reduced. These results are presented in table 2.4. For brevity, we only show results under optimism model 1, however they are consistent across all four models of risk misperception.
[ Table 2.4 ] It is thus obvious that the mechanism driving these results is the fact that proportionate subsidies increase as coverage levels decrease. Thus, the impact of optimism bias on coverage level choices could be greatly reduced by a more equivalent schedule of subsidies across coverage levels.

Conclusion
In this paper we find that although expected utility suggests that producers should purchase crop insurance at a coverage level at least as high as the level that maximizes per-acre subsidies (c = .85 for corn and c = .80 for wheat), this is no longer the case when producers have inaccurate perceptions regarding their yield risk. We find that the effect of optimistically biased misperceptions of yield risk is to reduce the level of optimal coverage in the case of both revenue and yield insurance and that this effect is strongest for yield insurance. We further find that the mechanism driving this result is the higher levels of proportionate subsidies provided for lower levels of coverage and that such a result no longer exists for our expected utility farmers or broad-framing cumulative prospect theory farmers and is greatly reduced for our narrow-framing cumulative prospect theory farmers in the case of equivalent proportionate subsidies.
These findings suggests that if a goal of policy-makers is to induce farmers to purchase lower-deductible (higher coverage) policies through subsidies in order to reduce the risk of having to provide ad hoc disaster relief (Babcock, 2015), then incorporating optimism bias may allow for the design of more cost-effective subsidy schedules. Given that the effects we found were more pronounced for yield insurance relative to revenue insurance, it also suggests that in the case where yield misperceptions are heterogeneous across farmers (e.g. some are optimistic, while others are pessimistic, as was evidenced in Manuscript 1) that equivalently subsidized revenue insurance likely does a better job of minimizing deadweight loss by reducing the variance of producer surplus across farmers, relative to yield insurance. Rigorously testing these hypotheses, as well as looking at the interaction between optimism bias and yield skewness provide fruitful grounds for future research.    Note: On the left are displayed histograms of the yield outcomes based on the parameters of the yield distribution for corn (above) and wheat (below). On the right are scatter plots of price and yield outcomes based on the joint price-yield distribution for both corn (above) and wheat (below). For our scatter plots, yields are presented on the x-axis, while prices are presented on the y-axis.

Introduction
Local economic planning often relies on micro-level data that is not always available at the desired level of disaggregation. For example, Federal government-provided economic and employment data for key industry sectors is often reported at the county level and obtaining city or ZIP-code level data may require time-consuming special requests or considerable expense, or it may be simply unavailable. In this article, we address the need for micro-level count data by developing a Bayesian methodology to 'downscale' aggregated count data to lower levels of aggregation using the information contained in an outside statistical sample.
Suppose a researcher knows the true size of a population (e.g., farmers, voters, customers) and would like to classify members of that population into distinct subgroups (e.g., by farm type, county/region, political party, or demographic attributes) using independent data sampled from the full population. In this setting, we demonstrate a method for estimating the population proportion in each sub-group, in a manner that provides more stable and robust estimates than maximum likelihood estimation (MLE) in the face of sampling variability. The method consists primarily of using simulated random sampling combined with exact calculation of combinatorial probabilities to estimate the posterior distribution over combinations of counts. We leverage two key restrictions: i) the sub-group counts must add up to the population total, and ii) the sub-group counts cannot be smaller than their observed counts in the outside sample, nor larger than the population minus the sum of observed samples in the other sub-groups. This explicit handling of sampling variability, especially in small to medium-sized samples, results in smaller normalized errors and, consequently, more reliably accurate estimates.
We are, of course, not the first to address the demand for more disaggregated data from aggregated sources. Gocht and Roder (2011), for example, employ a Bayesian procedure to downscale county-level German Agricultural Census estimates of land devoted to agricultural use. Their method incorporates land use data from GIS to facilitate micro-level environmental impact studies that would otherwise be hindered by data protection rules (i.e., censoring). Other relevant studies include Chakir (2009);Dendoncker, Bogaert, and Rounsevell (2006); Gärtner, Keller, and Schulin (2013); Howitt and Reynaud (2003); Purcell and Kish (1980);and Polasek, Llano, and Sellner (2010). These papers share a common thread of attempting to estimate land-use patterns using a variety and/or combination of methods including regression, multinomial logit, maximum entropy, cross-entropy, and various iterative fitting procedures. However, while these procedures perform well in their intended domain, they are ill-suited to solving the downscaling problem for count data.
Intuitively, multinomial logit might be mapped to a count model in which sampling probabilities are estimated, but many observations and covariates are required. The methods we introduce here are designed to overcome this problem when the outside sample contains only limited categorical information.
Another popular application of downscaling involves disaggregation of global climate data (typically reported at grid levels of 100-200km) to a level of resolution more useful for decision-makers and impact assessors. Such procedures are outlined, for example, in Coelho et al. (2006);Hashmi, Shamseldin, and Melville (2009);Fasbender and Ouarda (2010); Murphy (1999);and von Storch, Zorita, and Cubasch (1993). The goals of such estimation procedures, however, are to disaggregate weather/climate data not only spatially, but temporally as well in order to model various potential weather outcomes for use in forecasting. The procedures outlined by these studies are both unnecessarily complex given our particular problem of interest, and potentially ill-suited to the count data problem due to highly detailed data requirements in the outside sample.
In an attempt to balance precision with tractability, we develop a method that is adaptable to the data and computational resources of the applied researcher.
Namely, we show that reasonable performance can be obtained using a uniform prior distribution over combinations of counts, but we also demonstrate a method for researchers to incorporate "informative" prior information generated by a simple linear regression or one of the more spatially-explicit and computationally demanding methods described above. In our simulation analysis, we demonstrate a means for testing the best performance among MLE, the uniform prior, or a more informative spatial prior, over a range of population counts and sample sizes. As might be expected, a more informative prior performs best for the smallest sample sizes and smallest population counts. However, we find that the uniform prior performs best over an unexpectedly wide range of sample size and population count combinations.
To provide context, we introduce and apply our methods in the setting of estimating spatially disaggregated farm counts by sub-region from regional data, using a sample of Rhode Island farms combined with aggregated data from the 2012 United States Department of Agriculture (USDA) Census of Agriculture (herein, "Ag Census"). We explore both county-to-city downscaling and state-to-county downscaling, and show how spatial patterns at higher levels of aggregation might be used to construct an informative prior. We take special advantage of state-to-county downscaling as an example where the true underlying distribution is known and can be used to validate our methods. We also use published estimates of uncertainty in the Ag Census total counts to demonstrate the robustness of our methods to uncertainty in the top-level population count.
Despite the focus of much of the literature, and our own application, on spatial downscaling problems, it is important to note that there is nothing inherently spatial about the mathematics involved. Our method is equally well adapted to arbitrary classification problems in which it is desired to estimate the size of population subgroups according to a number of discrete categories. These applications might include political polling, estimation of workforce participation rates, demographic breakdowns by gender, age, race or educational attainment, or market segmentation analysis, among many others. At the same time, though our method does not require spatial information per se, it is flexible enough to incorporate arbitrarily complex spatial information as an input to the estimation procedure, by way of the informative prior.
The remainder of this article is organized as follows. The next section outlines and derives our estimation methodology, and the following section discusses selection of a prior. The fourth and fifth sections outline our sample data and methods, and the sixth section covers the results. The next section discusses applications of our findings and areas for future research, and the final section concludes.

Bayesian Downscaling of Aggregated Count Data
Consider a source of aggregated count data for which estimated count data are required at the sub-population (e.g. sub-region, demographic classifications, etc.) level, for each sub-population, s = 1, …, S. Let N denote counts at the aggregate level, i.e. population size. We denote the counts to be estimated at the sub-population-level as N s, such that the sum of sub-populations counts is equal to the total population count, N s = N. We supplement this population level data with an outside, independently sampled data set with sub-population counts, n s , where n s = n < N.
That is, the outside sample of sub-region data is a subset of the population to be estimated. The immediate impact of the outside data set is to constrain the range of eligible values, which we will denote as N s ' within each combination. Namely, (3.1) Thus, we define to be the set of all valid combinations of integer-valued counts satisfying sample count and with the total count equal to 42. Thus, (2,21,19) is not a valid combination because there are not enough farms in the first town, and (25, 12, 7) is not valid because there are too many total farms (44), but both (6, 34, 2) and (15,15,12) are valid combinations.
We have now developed sufficient notation to outline our estimation procedure. First, recall Bayes' Rule: . (3.3) For our purposes, in Equation 3.3 represents the probability of a specific combination of sub-population counts given the data, or , and the other terms translate similarly. That is: where (i) n = n 1 ,…,n S ( ) denotes the vector of sub-population counts in the outside data, (ii) the equality follows from the constraint in Equation 1.1 since only combinations that sum to N are considered, and (iii) the final proportionality comparison relies on the assumption that the unconditional probability of a combination is uniform across combinations, representing the prior in our Bayesian approach. This is the simplest case of a uniform prior over combinations, which we will later generalize. Equation 3.4 therefore tells us that the posterior probability of a given combination is proportional to the probability of our outside data sample The analysis is further simplified because the conditional probability of our data given a combination, Pr ( n |C i ), has a closed form according to the formula for sampling without replacement. Namely, (3.5) Given Equation 3.5, it is theoretically possible to iterate over all eligible combinations of counts at the sub-population level and exactly calculate the posterior distribution over those counts given the outside sample data in n. Unfortunately, the number of combinations given in Equation 3.2 grows astronomically large rather quickly in realworld applications. Table 3.1 provides examples for our application.
[ Table 3.1 ] Because it is not computationally feasible using contemporary hardware to calculate Eq. 3.5 for each possible combination, we propose a (pseudo) random sampling procedure in which valid combinations are sample uniformly from .
These samples are generated by recognizing that each sub-population's count falls in a range containing N -n + 1 consecutive integers, whose lower bound is found in our sample for that sub-population. Revisiting our Bristol County example from earlier, wherein N -n + 1 = 42 -11 + 1 = 32, it is only possible for sub-regional values to fall in the set, {n s (+ 0), n s + 1, ..., n s + 31}. Since each sub-region must have the same size range, the problem reduces to picking uniform integers in this range. If we offset the uniform integers by their minimum values, then all the random choices ! must add up to the same total (also N -n + 1 = 32) to be a valid combination as described above.
This is a well-known problem who solution is to randomly choose switching points, , without replacement from the set of integers, {1, 2, ..., N -n }.
The sampled combination is then derived by differencing the switching points after setting s 1 = 0 and . In order to handle the minimum switching interval being size 1, the resulting differences are added to the sample value, minus 1, and the sampled switching points are taken from N -n + S -1 candidate values. Downscaling Bristol County into three sub-regions provides N -n + S -1 = 42 -11 + 3 +1 = 35.

Choosing a Prior
Researchers have two broad choices for estimating the Bayesian prior used in our estimation method: a uniform prior, or a more informative one. While in its most generalized form, our method has no requirement that sub-populations have additional characteristics from which to estimate a prior, researchers may be able to elicit a more informative prior based on additional characteristics of the sampling units. For example, in the case of classifying farms into sub-regions, this additional data may include population, land area, demographic data, etc. at the sub-region level.
We now outline a rigorous procedure for eliciting an informative prior. In cases where such additional characteristic information is unavailable for whatever reason, researchers have little choice but to assume a uniform prior across the subpopulations.
In many cases, the assumption that counted units have an equal probability of occurrence across sub-populations is potentially unrealistic, particularly in our example application of estimating farm counts. If the data available to the researchers consist of aggregate count data for multiple populations, as well as additional covariates at the sub-regional level (and can thus be summed to the regional level, e.g. population, land area, demographics, spatial information, etc.), one can test and identify potential informative priors by regressing these covariates (summed to the population level) on the population-level count data. By identifying the covariates that are most predictive of (correlated with) counts at the population level, one can use these relationships to estimate sub-regional farm counts. In this fashion, a more informative prior is elicited than the simple uniform prior. See figure 3.1 for an illustration of how the data analysis is structured.
[ Figure 3.1 ] In order to compare the accuracy of estimates resulting from a strong informative prior versus the uniform prior, we used the above procedure to elicit and compare various informative priors using supplemental sub-regional data obtained In what follows, we will explicitly compare predictions of the uniform prior against those of this simple informative prior, and compare both against maximum likelihood.
Although geographic downscaling is traditionally a spatial problem, the general form of our method ignores issues of spatial dependency in favor of a more parsimonious method that requires much less data (and less technical expertise in the area of spatial modeling). However, the generalized method can be easily expanded and the use of an informative prior in our procedure makes the incorporation of features such as spatial dependence relatively straightforward. While we opt for a simple, one-parameter, area-based prior as an example here, myriad potential models for eliciting an informative prior exist, including those discussed in the introduction.
Researchers with the prior belief that the posterior distribution follows a spatial dependency structure (such as spatial lag or autocorrelation) can easily incorporate such beliefs into this methodology by eliciting their priors using a spatial model such as geographically weighted regression (GWR), among many choices.

Sample Data
Rhode Island has a total of 39 municipalities grouped into five counties: Bristol, Of these data, only land area was used in our final estimation procedure (indirectly, to elicit the informative prior). This information is presented in table 3.2.
[ Table 3.2 ] It is worth pointing out that since our posterior estimates depend on the probability of obtaining n from a random n-sized sample of C i , the accuracy of our results depend upon the assumption that n was obtained via independent (random) sampling across s. In the case where sample data is collected via voluntary sampling, researchers must be confident in their belief that response rates are not affected by s.
Similar restrictions apply in cases where sample data is collected through convenience sampling. For example, in the case where s represents a spatial classification, selection bias is likely to occur when samples are collected in person at non-random or limited locations. In cases where observations are business, such as our farm application, unbiased sample data may be collected by searching business databases, such as those provided by the secretary of state. As is the case with most estimation procedures, any selection bias resulting from characteristics of s that cannot be controlled for will naturally induce estimation bias.

Methods
Clearly the unknowns in our data set are the city-level counts. We focus instead on the county totals, as if unknown, so that we can compare the results of our procedure against the true underlying distribution. By aggregating our regional counts to the state level and aggregating our sample data to the county level, we can compare the accuracy of our estimates using i) a uniform prior, ii) a simple spatial prior, and iii) To obtain estimates for the Bayesian methods, we use the sampling procedure described in Section 3.2 to calculate estimates from 100,000 sample combinations.
We report as point estimates the means of the posterior count distributions, which is For each comparison, we conduct two simulations, one with observations bootstrapped from our observed Rhode Island sample, and another randomly sampled from a multinomial distribution taken only from the population parameters. The results are nearly identical across the paired simulations, indicating that the sample data we collected do not contain extreme deviations from the projected sampling distribution.

Results
The mean NRMSE and standard error for each method are presented in table 3.3, as a function of varying sample size. For a population of 1,250, the Bayesian methods consistently outperform MLE for all sample sizes up to half of the population. Among the Bayesian methods, the simple, area-based, informative prior was best for small samples (due to greater sampling variability), but the uniform prior was best for samples comprising at least 5% of the population. These results suggest a somewhat counterintuitive finding: namely, that even in cases where detailed spatial information is available, many applied problems will get more accurate results using the uniform prior, even when it might not seem realistic to the application at hand. The reasoning is that even relatively small samples will quickly become more representative of the underlying population than a good informative prior, but not so representative as to obviate the need for a Bayesian approach over MLE.
[ respectively. Clearly, to evaluate every possible pairwise test in Table 3.3 involves many hypotheses, so p-values would need to be adjusted using either a Bonferroni correction or stepdown methods to control the family-wise error rate (e.g., as in Romano and Wolf, 2005). Explicit testing of multiple hypotheses in this fashion is beyond the scope of this paper.
[ Table 3.4 ] Table 3.4 is structured similarly, but show the effect of varying population size given the sample size held fixed at n / N = 20% of the population. The table also shows the Bayesian methods consistently outperforming MLE, but show subtly different patterns of performance of the informative prior against the uniform prior.
With the sample size held at a fixed percentage, the informative prior outperforms for populations smaller than 500, while the uniform outperforms for larger populations.
For populations of exactly 500, the performance of the two priors is not statistically different at conventional levels.

Discussion
The above results are primarily focused on evaluating the performance of our Bayesian methods for a case where the underlying distribution is known. However, our method is only designed to be useful in cases where this information is unavailable. Furthermore, applications of our procedure to new problems will likely involve variation in population size, sample size(s) and availability of an informative prior, distinct from the permutations described here. In this section, we consider the possibility that future researchers have access to data at higher levels of aggregation, similar to how we have both state-level and county-level farm counts for Rhode Island from the Ag Census, and county-level land area data from the US Census.
If it can be assumed that spatial (or other group-wise dependence) patterns are likely to hold at higher levels of aggregation, then an informative prior can be calibrated from that data and applied in the downscaling problem. In our application, that would mean calibrating the land-area prior from county-level data and then applying it to the city-level downscaling problem. Depending on the application, however, this assumption may not be palatable. Spatial econometric models can be conceptualized as having a direct effect from the covariates and an indirect effect from the spatial dependence structure. If this indirect effect is relatively smaller at higher levels of aggregation, then calibrating the prior at higher levels will cause it to appear more informative than it actually will be in the downscaled analysis.
Identifying when this problem materially affects the analysis is an area for future research. That said, there is no reason why spatial dependence observed in an econometric model would predict non-random sampling, so whenever the spatial prior is suspect, researchers can always default to the uniform prior for reasonable performance.
It is important to note that in an attempt to make our estimation procedure generalizable to non-spatial applications, as well as to reduce its dependence on additional outside data sources, we have intentionally ignored the explicit incorporation of spatial autocorrelation. Although some degree of autocorrelation is likely captured implicitly via our sample, estimates might be further improved through incorporating such autocorrelation into our conditional probability estimates when possible. That said, it is beyond the scope of this chapter to quantify these tradeoffs, or to demonstrate the information loss associated with discarding spatial elements in the analysis.
Beyond spatial dependence defined econometrically, there is also the possibility that the outside data sample is non-random, in the sense that spatial factors influence response rates. At higher levels of aggregation this can be tested simply be evaluating the degree to which the sample contains outliers relative to a typical sample from a multinomial distribution. A further verification step is possible using the simulation methods outlined above at higher levels of aggregation. Namely, the bootstrapped analysis can be replicated with counts drawn directly from a multinomial distribution instead of from the sample data. Below, we give an example of simulation results obtained in this fashion in Tables 3.5 and 3.6, which replicate our Tables 3.3 and 3.4 but do not use our sample data. For our specific application, it can be observed that the results are nearly identical, the desired outcome indicating that systematic sampling bias is unlikely to be a problem in our application.
[ Table 3.5 ] [ Table 3.6 ] If we consider the city-level downscaling problem in our application, the above procedures indicate that every county in Rhode Island should be estimated using the informative, area-based prior. Clearly, we do not have the underlying, true distribution of city-level farm counts for verification, so we include this observation only for completeness.
Two issues not previously addressed are i) the effects of uncertainty in the top-level population estimates, and ii) the scenarios in which MLE does outperform the Bayesian estimators, according to conventional wisdom based on asymptotic results. While our procedure is designed to mitigate potential estimation error resulting from the increased sampling variability inherent in relatively small samples, it does not account for potential error in the aggregate count data. In our application, for example, the Ag Census farm counts for Rhode Island are reported as 1,243 total farms with a standard error of 236 (USDA 2014). The analysis thus far suggests that incorporating an error term on the total count may have non-linear effects because of simultaneous changes both in the population size, N, and in the sample proportion, n/N.
To address this concern, we repeated the simulation analysis using the uniform prior, with each replication using a different total farm count drawn from a normal distribution with mean and standard deviation according to the reported Ag Census mean and standard error. The mean estimated farm counts arising from this procedure were within 1% of the values estimated with N = 1,243 . This suggests that errors in top-level counts are less of a concern, as long as i) it is recognized that the division of the population into groups will necessarily result in estimates that are proportional to the total used, and ii) that the estimation error in the total count is not so large as to make the collected sample size unlikely or impossible.
Additionally, it is important to give proper context to our finding that these Bayesian methods outperform MLE. Clearly, this is a finite sample result since, asymptotically, Bayesian updating with a uniform prior converges to MLE, whereas in small samples MLE is equivalent to Bayesian updating with zero sample weight on the prior. Also, it may not be immediately obvious, but our application data set includes considerable variation in the group sizes to be estimated: 42, 126, 214, 425, and 436 (from table 3.2). Having extremes in the group sizes, especially on the small end, leads to inherently noisier sampling of the smaller groups. This problem can be conceptualized as arising from the probability that a given sample will be representative of the population conditional on population size and sample size.
To show how variation across group sizes affects the performance of MLE relative to the Bayesian methods described here, we ran some preliminary simulations. The simulated group sizes were all drawn IID from a normal distribution with sigma given by a fraction of the mean value, and samples were then drawn from the resulting multinomial distribution. Our sample data had a standard deviation of 71% of the mean count, and MLE did not outperform the Bayesian methods for any population of N = 5,000 or below. Reducing the standard deviation to 50% of the mean count, we found that MLE was statistically significantly best (lowest NRMSE) for populations above 2,000 (as might appear in table 3.6). These preliminary results suggest that the efficacy of MLE relative to the Bayesian methods is not only a function of population size and sample size, but also of the degree of heterogeneity in the sub-population counts to be estimated. Naturally, however, the true distribution of sub-population counts is inherently unknown a priori. We leave exact quantification of these tradeoffs as an area for future research.
Lastly, one might wonder if it would be possible to apply these methods to the estimation procedure performed in Manuscript 1. There are two reasons why such an application would be inappropriate. Firstly, the classification estimates performed in Manuscript 1 are not explicit or 'known' classifications, but rather they are probabilistic classifications with respect to a distribution, where the probability of belonging to each class (or distribution) is calculated for each observation. Secondly, we do not posses the required independent sample. Rather, we are trying to estimate the distribution of classes across a sample of farmers, rather than the entire population, making claims about which population this distribution belongs to.
However, if one was trying to estimate the proportion of optimistically biased, pessimistically biased, and unbiased individuals across a population of known size using sample data, our distributional estimates from Manuscript 1 would likely serve as a good informative prior.

Conclusion
Micro-level statistical data are often unavailable at the desired level of disaggregation, despite their critical importance for applied policy research. Herein, we present a Bayesian methodology for 'downscaling' aggregated count data to the micro-level, using an outside statistical sample. Our procedure combines numerical simulation with exact calculation of combinatorial probabilities. We motivate our approach with an application estimating the number of farms in a region, using count totals at higher levels of aggregation, and data sourced from the 2012 USDA Ag Census. In a simulation analysis over varying population sizes, we demonstrate both robustness to sampling variability and outperformance relative to maximum likelihood. Our results show that Bayesian methods have better finite sample performance than MLE in many cases relevant to applied research, especially for relatively small populations (N < 5,000).
We develop a number of methods for applied researchers to calibrate informative prior probabilities, and to estimate whether the combination of sample size and population size in their application will perform best with their more informative prior, or a simple uniform prior. In many cases, the uniform prior performs reasonably well and can be used as a default in cases where a more informative prior is unavailable, or cannot be reasonably calibrated due to spatial considerations. We also show how the process of calibrating the priors can be simulated to verify that they are not being affected adversely by outside sample data that contains too many outliers. Our methods appear to be robust both to sampling variability in the outside data sample, and also to uncertainty in the top-level population counts. An area for future research is determining the effects of heterogeneity in sub-population sizes on the relative performance of maximum likelihood estimates in smaller populations.