MARKET MECHANISMS FOR VALUING PUBLIC GOODS

Conventionally, inefficiencies in supplying socially optimal levels of natural amenities have been addressed by government intervention via taxes and subsidies. However, these measures may not result in the socially optimal level of provision because they are often influenced by collective action and may inadequately account for local tastes and preferences. This dissertation research addresses the ways in which private markets instead can be used to solve the dilemma of under-provision and over-exploitation of natural resources. The first manuscript examines a prominent ecolabeling program that provides consumers with information about responsible sourcing of seafood and investigates whether or not there is a demonstrated price premium in the retail market for ecolabeled seafood. The study specifically focuses on the commonly-voiced doubts regarding whether consumers are willing to pay a premium for the Marine Stewardship Council’s (MSC) label. The analysis utilizes scanner data on frozen pollock products from supermarkets in the metropolitan area of London, UK across a sixty-five week panel. I use a hedonic model to control for brand, package size, product type and form. I find evidence that consumers within the sampled area are paying a 14% premium for MSC-certified pollock products. The second and third manuscripts address the possibility of creating a direct market for natural amenities in which consumers can contract with suppliers or custodians of the resource. However, complications arise when eliciting preferences for natural amenities because they are often public goods and thus are non-rival and non-excludable, i.e. scenic views, clean air and drinking water, etc. Willingness to pay for public goods are difficult to measure because individuals have incentives to hide their true values for the good offered for exchange. This research deals with the two most prominent sources of this bias: incentives to free-ride on others’ contributions and the tendency of respondents to overstate their values when no monetary consequence exists. The second manuscript explores how well laboratory-tested public goods elicitation mechanisms that mitigate free-riding perform in the field. I employ a mixed logit model in willingness-to-pay space to estimate individual-specific willingness to pay for protecting grassland nesting bird habitat on farmland and compare some of the most promising elicitation mechanisms in their ability to produce results that yield the highest valuations. I also estimate individual-specific measures of the scale of the error variance and test which elicitation mechanisms are the least noisy. I find evidence that providing a familiar reference mechanism induces behavior more in line with laboratory results, but that otherwise, individuals tend to ignore the elicitation mechanisms in the field. The third manuscript addresses a continuing debate [1, 2] in the literature regarding whether hypothetical choice experiment surveys can accurately reflect revealed preferences in a market for public goods. This issue is particularly important in studies involving valuation of public goods and common-pool resources because non-rival non-exclusive goods imply a disconnect between what is paid for and what can be consumed. I utilize a latent class model of attribute non-attendance to identify individuals who are more or less likely to accurately state their willingness-to-pay for habitat preservation for grassland nesting birds on farmland. I find that hypothetical bias differs based on the strategy employed in the stated preference experiment and that individuals who are estimated to fully attend to all attributes are more likely to generate reliable preference estimates. Overall, this research provides guidance on market approaches for enhancing environmental public goods provision.


Introduction
Societies rely on private markets to allocate scarce resources from least-cost providers to those who most value goods and services. Because well-defined property rights are required to ensure a healthy, efficient market, achieving the socially optimal level of output of public goods and common-pool resources (CPR) through private market exchange alone is impossible. For instance, several ecosystem services provided by nature, such as wildlife habitat, open-space and groundwater aquifer recharge certainly have value to local communities and society in general. Unfortunately, land owners must choose between land uses that yield private market value, such as high yield farming practices or industrial construction, and those that do not have direct or immediate market benefits.
The typical remedy for under-provision of public goods and over-exploitation of common-pool resources involves government intervention, often in the form of levied taxes, gear restrictions, cap-and-trade programs, or other regulations. Most common government actions are aimed at producers and entail supply-side policy instruments.
However, these measures may not result in the socially optimal level of provision because they are often influenced by collective action by special interest groups and may inadequately account for local tastes and preferences. The overall goal of this dissertation is to evaluate ways in which market incentives can be provided to producers of market goods in order to incorporate practices that are consistent with improving ecosystem health and mitigating the externalities involved in manufacturing private goods. Specifically, I examine the demand-side instruments that can drive these incentives. Ultimately, this research attempts to shed light on the promising opportunities and significant challenges that exist for market-based approaches for bolstering public goods provision and environmental sustainability.
1 To achieve this goal, I explore the effectiveness of two strategies to provide private incentives for producing public and CPR goods. The first manuscript focuses on eco-labeling as a tool to empower consumers to vote for environmentally friendly production practices. The second and third manuscripts provide an analysis of a novel experiment in which a local market-like process for wildlife-habitat protection was designed and implemented.
Ecolabels, such as organic labeling, energy-efficiency labels, the Forest Stewardship Council's (FSC) ecolabel on wood products and the Marine Stewardship Council's (MSC) label on seafood are gaining in popularity as consumers become increasingly aware of the impacts of their purchases on the natural environment around them [3].
These labeling programs allow consumers to differentiate between items not solely on their physical characteristics but also on unobservable attributes of the production process. But, the effectiveness of the label depends on consumer awareness of the label and trust in its claims. In addition, there must be some demonstrable demand for the environmental attributes touted by the label. Many have voiced skepticism regarding whether ecolabels are an effective tool for encouraging responsible resource management. With regard to seafood products, several studies have presented evidence that consumers are hypothetically willing to pay a premium for sustainably harvested seafood [4,5,3,6,7]. However, little is known about whether these intentions extend beyond intentions and to what extent ecolabels on seafood yield quantifiable market benefits. The existence of a price premium being realized for labelled products achieves two goals. First, it is a step toward quelling doubts about the effectiveness of ecolabels on seafood and, second, it provides some information to fisheries considering certification as to the potential benefits of pursuing certification.
In this first manuscript, sales of frozen pollock products from supermarket scanner data in the London, UK area are analyzed for evidence of a price premium for the MSC's eco-label on sustainably harvested pollock. The choice of frozen pollock products is based on the fact that the Alaskan pollock fishery was certified against the Marine Stewardship Council's requirements fairly early in the history of the labeling program. It is thus more likely that consumers are familiar with the label. I use an hedonic model to control for other physical attributes of the products, such as brand, process form, and size. The results indicate that, on average, the MSC label generates a 14% price premium in the market for frozen pollock products in the sampled area. This research provides strong evidence that market benefits are being realized for sustainably harvested seafood.
Although eco-labeling programs serve as a vehicle through which consumers can express their preferences for the treatment of the environment, a direct market approach places the environmental resource at the center of the economic exchange.
Constructing a market approach for the resource itself opens a path from consumer demand to producers' supply without supply chain or other considerations. This advantage, however, comes with its own set of costs and complications. Beginning in 2006, researchers at the University of Rhode Island orchestrated a novel field experiment designed to elicit direct values for the private provision of an agricultural ecosystem service: that of grassland nesting habitat protection for the Bobolink bird species. Jamestown is a small community of 4500 residents inhabiting approximately 2800 households. The residents have a history of support for community-wide conservation projects, particularly for farmland and open-space. The experiment involved the residents and farmers of Jamestown, Rhode Island and tested several elicitation mechanisms that were shown to mitigate free-riding in the lab. Free-riding, a result of treating a public, non-excludable good as a marketable good, has been found to yield inefficient outcomes in public goods experiments. It is well-known and documented, particularly in philanthropic endeavors, that an individual has some incentive to free-ride on the contributions of others if she believes that provision is not contingent on contribution of her full value [8,9,10].
In the second manuscript, I investigate the efficacy of different elicitation mechanisms to mitigate free-riding in the field. There has been a significant effort within the experimental economics scholarship in constructing mechanisms that will create incentives for players to reveal their true values in public goods experiments. Some of the more effective mechanisms involve some combination of provision points, moneyback guarantees, and proportional rebate of excess funds. The pivotal mechanism particularly, which collects an individual's bid only if her contribution is pivotal to the provision of the good, has been shown to be, under certain circumstances, incentive compatible in the lab. However, few of these mechanisms have been tested in field experiments [8,10]. Some authors have suggested that mechanisms that prove effective in the lab may not perform as expected in the field because they may be too complex or unfamiliar [11,12]. Field experiment participants do not have the option to ask questions if an instruction is not clear and the researcher has little control over how the information is processed.
To investigate these dynamics, the second manuscript utilizes a stated preference choice experiment survey that was designed to elicit values for a bundle of environmental amenities associated with preserving grassland nesting habitat on Jamestown's farms. Choice experiment data have been used to measure preferences for environmental amenities such as: forest landscape features [13], improvements in river ecology and aesthetics [14], recreational demand for different aspects of rock climbs in Scotland [15], nature conservation programs in Finland [16], passive use values for Caribou preservation [17], aspects of environmental projects managed by the World Wildlife Foundation [18], features of competing policy measures relating to soil erosion in Southern Spain [19] to name a few. Based on random utility theory, the 4 choice experiment design lists several options as packages of features or attributes and asks participants to choose the most desirable option based on the listed attributes. The Jamestown choice experiment elicited preferences for restoring fallow farmland to active cultivation, scenic views, and an expert-led bird walk in addition to protected habitat for nesting birds. The choice experiment format is widely accepted as the highest standard survey technique for generating reliable valuation in hypothetical settings. Prior research in valuation of public goods employed simple voluntary contribution mechanisms. The Jamestown experiment administered several different public goods elicitation mechanisms to participants in the field including: the pivotal mechanism based on the Clarke tax, uniform price auction, and the proportional rebate mechanism.
I focus on differences in choice consistency and willingness to pay estimates from the various elicitation mechanisms. Choice consistency acts as a proxy for choice task complexity and cognitive burden. Several researchers have examined heteroskedasticity in error variance in relation to choice experiment design dimensions such as the number of choices in the choice set or the number of attributes per alternative as well as learning and fatigue effects [20,21,22,23,24]. Along these lines, I examine choice consistency by elicitation mechanism in order to address the contention that demand revealing mechanisms do not perform as expected in the field because they are too obscure or complex. I augment the analysis by comparing willingness to pay across mechanism treatments. This permits an assessment of the ability of the elicitation mechanisms to mitigate free-riding behavior. The results suggest that demand-revealing public goods mechanisms are capable of performing in the field in a manner similar to results obtained via laboratory public goods experiments. However, there is some evidence that a familiar base mechanism must be applied to draw respondents' attention to the new mechanism. This research provides important insights into the design of efficient market mechanisms for generating funds for public goods projects.
Yet, there is a second drawback to the market approach-that of private value construction for a good that has previously been publicly consumed. Individuals conventionally contribute funds to environmental causes via charities and other philanthropic efforts. Often, they are asked for their contribution under the presumption that what is contributed goes toward a general goal: providing public radio, saving rainforests, etc. They need only be concerned that the charity, being a reputable, legitimate institution and having expertise with the problem at hand, will put the donation to its best use.
In the choice experiment, the participant is given precise information about the environmental practice, amenity or policy she is expected to value. Several natural (dis)amenities are listed in combination with varying levels, and the relative marginal values (costs) are inferred from the individual's choices. However, there is no frame of reference within this context. Asking an individual what she might pay to ensure that some number of wildlife mortalities will be spared, without reference to the scope that the effort might have on the health of the wildlife stock in the future may prove problematic. One might be inclined to simplify the choice to make it more like a charitable donation setting, either by focusing on the cost of the bundle of attributes or by choosing the attribute of the bundle that most matters to her.
As choice modeling research has shifted in recent history to focus on the sources of heterogeneity in responses of individuals to choice experiment scenarios, one line of study recognizes that individual behavior may be better described by alternative representations of utility functions that accommodate attribute processing strategies.
The most commonly investigated variation on the additive linear utility function is attribute non-attendance. Cameron and DeShazo [25] have pointed out that individuals make trade-offs between costs of further information processing and expected marginal benefits of attributes in choice experiments. This leads respondents to attend to some attributes and overlook others. Hensher and Rose [26], Hess and Hensher [27], Hensher et al. [28] and other research have highlighted the importance of accommodating information processing strategies for behavioral outcomes of choice experiments.
The third manuscript addresses the possibility that attribute processing rules can be results suggest that accommodating various information processing strategies allows the researcher to make predictions about behavior in settings that have monetary consequences with greater accuracy. The main finding from this research is that individuals who expend the cognitive effort to fully attend to all attributes in the hypothetical realm also exhibit highest valuations in revealed preference scenarios.
In addition, I find that yea-saying is a significant challenge in stated preference data and hypothetical bias for this strategy of response was quite high.
The methods and results outlined in this research have broad implications for designing markets for ecosystem services and validating valuation outcomes from stated and revealed preference studies. While more work is left to be done, this research makes a significant step toward understanding how market mechanisms can be used to enhance provision of public goods from the environment. The Elusive Price Premium for Ecolabeled Products: Evidence from Seafood in the UK Market

Introduction
Governance of common pool resources, such as fisheries and publicly-owned forests, often fails to correct for over-exploitation of the resources [29]. In some cases, poor management may evolve from a close relationship between the managers and the industry being managed. As a result, decisions regarding what is best for the resource are replaced by decisions regarding what is best for those utilizing the resource. To allow consumers a voice, certification programs for sustainably-managed resources and ecolabeled products derived from those resources have been introduced.
The goal of ecolabeling programs is to create market-based incentives for better management of the environment. Ecolabels provide otherwise unobservable information to the consumer about the environmental attributes of goods, allowing consumers to differentiate between products carrying the labels from those which do not [30]. If consumers value the environmental attributes of the products conveyed by the ecolabel, they will shift their demand toward the ecolabeled products and away from the non-labeled products. This in turn may create a price premium for ecolabeled products over non-labeled products, thereby creating a market incentive for producers to supply those environmental attributes [31,32].
Ecolabeling has become an increasingly important tool in the promotion of sustainable forestry and seafood products around the world [33,34,35]. In relation to seafood, the approach has created significant attention in markets since the first capture fishery was certified as sustainable against the Marine Stewardship Councils 1 (MSC) standards in 2000 1 . Those who sell products from fisheries which are MSC certified may purchase licenses for the right to place the MSC ecolabel on affiliated products, signaling to consumers that the product was produced from a sustainable fishery. Although there are now competing labels, the MSC is the leading label in terms of the number of fisheries certified, volume of edible seafood certified, and logo presence in the global marketplace [36].
However, this literature is based upon contingent valuation survey data and show only that consumers state a preference for ecolabeled products under certain conditions.
Determining the existence of actual price premiums in the market for ecolabeled products is important to address the expressions of skepticism by policy makers and others regarding the effectiveness of ecolabeling as a tool to create more effective management [42,43]. Such skepticism exists due to the lack of rigorous evidence that consumer preferences have transformed into actual price premiums for the certified fisheries.
The purpose of this paper is to investigate whether or not there is a demonstrated price premium in the retail market for ecolabeled seafood. A demonstrated price premium paid at the retail level may, depending upon the price transmission mechanism, indicate compensation at the fish production level. Because the focus of the analysis is upon the marginal value of the ecolabel, in other words the price premium, we follow Rosen [44] and use a hedonic price model. Our empirical analysis is applied to scanner data on frozen processed pollock products in the London metropolitan area.
Scanner data are increasingly being used in price analysis, including hedonic analysis of price premiums for such labels as fair trade coffee [45] and organic agricultural products [46,47,48].
The paper proceeds as follows. The next section presents a brief background on the MSC program and seafood ecolabeling, as well as a discussion of the rationale behind the research focus on a retail market in the UK and MSC-labeled pollock.
This is followed by a description of the data used, and a discussion of the model specification and estimation procedure used to measure the price premium. Results and implications from the model are discussed next, followed by the concluding remarks.

Background
The MSCs fishery certification program and seafood ecolabel recognize and reward sustainable fishing. The earliest fisheries certified were Alaskan salmon, Western Australian rock lobster and Thames River herring in 2000. In 2005 when the Alaska pollock (Theragra chalcogramma) fisheries in Alaskas Gulf of Alaska and Bering Sea were certified, the number of certified fisheries was only thirteen. In contrast, the number of certified fisheries as of January 2011 was 102. The number of fisheries in assessment for certification as of January 2011 was 132. As the number of fisheries in the program has grown, so has the market for MSC-labeled products. There are more than 5,000 MSC-labeled products on sale globally in over 66 countries with a retail value of over US $2 billion annually [49].
Alaska pollock represents one of the largest fisheries in the world, and, in spite of the growth of the MSC program, the largest proportion of volume among the certified 3 fisheries within the MSC program. This fishery has an average annual historic catch of approximately one million metric tons during the past 30 years, with catch levels set by the U.S. federal fisheries management [50]. The primary markets for Alaska pollock are North America, Europe, and Japan.
Whether the consumer is paying a price premium for ecolabeled products is one indicator of the market effectiveness of ecolabeling schemes. [31] provide a theoretical framework showing that price premiums play a critical role in providing marketbased incentives to the fishing sector for improving or maintain sustainable fishing practices. Wessells, Johnston and Donath [4], Johnston et al. [5], Jaffry et al. [3], Johnston and Roheim [6] and Brcard et al. [7] have shown empirically that some consumers prefer ecolabeled seafood products over non-labeled, and a statistically significant proportion indicate a hypothetical willingness to pay a positive premium, while Ozanne and Vlosky [38,39], Teisl et al. [33] and Aguilar and Vlosky [41] show the same for ecolabeled wood products. However, these studies have in common that they use hypothetical data. The studies do not provide an estimated premium, and actual price premiums paid in the market, if any, are yet to be determined.
In spite of the results of these studies, Gulbrandsen [51] argues that most markets for ecolabeled forestry and fisheries products have been created as a result of pressure by environmental groups on consumer-facing corporations, rather than resulting from consumer demand. O'Brien and Teisl [52] go so far as to say that ecolabels are ineffective due to a lack of marketing, leading to a lack of consumer awareness of ecolabels in forest certification. Roheim [35] concurs, positing that the market for ecolabeled seafood is driven less by consumer demand than by corporate decisions to source certified sustainable seafood for a variety of reasons, including risk reduction.
On the other hand, according to Sedjo and Swallow [32], even though consumers may express interest in purchasing an ecolabeled product, this does not suffice as evidence 4 that a price premium will manifest in the market. These academic discussions provide an impetus for research to document whether or not actual price premiums are being paid.
However, the most basic reason to determine the existence, if any, of price premiums in the market is to assist in evaluation of effectiveness of the ecolabeling as a market-based incentive. Producers of certified products and those contemplating assessment for certification are increasingly demanding proof of market benefits to justify the costs of the assessment process and of practicing sustainable fishing. For instance, according to Washington [43] and Roheim [53], the costs of obtaining MSC certification can range from $10,000 for small and simple fisheries to $500,000 for large and complex fisheries like the U.S. pollock fishery. Maintaining certification creates additional costs. However, costs of certification are only a fraction of the costs of transitioning to a sustainable fishery from a fishery which previously did not meet the conditions for sustainability. A sustainable fishery requires investment in appropriate fisheries management, practices, and capital. None are costless. Before entering the certification assessment process, fisheries must perceive that market benefits will be enough to offset these costs. Market benefits may not simply be a price premium as such, but may also include improved market access to premium markets, expanded market share in existing markets, and greater ability to favorably position oneself in the market with competitors [35].
Downstream in the supply chain, those who must have chain of custody certification also seek quantitative proof of the presence or absence of market benefits [35]. These firms have invested significantly in the program by sourcing certified product and paying license fees to place the logo on the products. A positive return is expected on this investment.  [55,56,57]. Theil [57], however, points out that this is a problem in linear models only if stores have heterogeneous responses to marketing actions. We have no information on the ways in which individual supermarkets respond to the marketing actions of manufacturers of frozen processed pollock products. However, it is not unreasonable in this case to assume homogeneous response to such actions across supermarkets. In addition, Cotterill [55] argues that even with aggregation, the market data contained in IRI Infoscan data allows for a rich set of possible empirical insights. For example, several researchers have used Infoscan data to investigate the effect of other types of labels. Lusk [58] provides an evaluation of demand for cage free, organic and conventional eggs, and Cotterill, Putsis and Dhar [59] and Cotterill and Putsis [60] in analyses of the competition between national and private labels of breakfast cereals and carbonated beverages.
The Infoscan database provides volume and dollar sales by SKU for over 400 frozen processed seafood products aggregated across supermarkets in the London metropolitan area on a weekly basis for 65 weeks, from February 24, 2007 to May 17, 2008. Unit prices for each product are averages derived from total sales and volumes. The focus is on processed food products since these are the only products for which SKUs are consistent across all supermarkets. SKUs on fresh products sold are specific to stores or supermarket chains, therefore fresh fish could not be included in the analysis.
For each pollock SKU, information provided includes: brand, species, product type (such as breaded, battered, natural smoked), product form (such as fillets, fish fingers, and kids fish -products in various fun shapes), and package size. Brands include national labels such as Youngs and Birds Eye or private labels. Private labels are labels associated with individual supermarket chains. The Infoscan database does not specifically identify the label, thus these products are simply identified as private label products.
A total of 24 pollock products are included in the analysis. These products are similar in product form (fillets, fish fingers, kids fish). Pollock products which are highly value added, such as ocean pies or in which vegetables are added, are excluded.
The Infoscan database does not contain information on which products carry the MSC label. Working with the logo licensing manager at the MSC, viewing the products on the websites of the producers and supermarket chains, and contacting the producers directly for affirmation, twelve of those products were identified as displaying the MSC ecolabel. Each national brand marketed both ecolabeled products and non-labeled products, indicating that these brands have differentiated individual product lines.
Of the nine Youngs products, seven appear with the MSC label. Of the eleven Birds Eye products, five have the MSC label, while none of the four private label products have the MSC label. There were three kids fish products, ten fillet products, and eleven finger products. Package sizes varied from a low of 240 grams to a high of 1080 grams.
In all, a panel dataset of 1,137 observations were included in the analysis, one observation for each week when the twenty-four products were sold. During the sixty-five week time period, none of the products appear in the market for the entire period. Some products are introduced, and other products are withdrawn, from the market during the observation period. Given negligible inflation during the short time period, nominal prices are used.
The appearance of an MSC-labeled product on stores shelves, with or without a price premium, does not guarantee that consumers will purchase the product. Pollock historically has been sold as a generic whitefish in the UK market. After certification it has been specifically identified as pollock on product packages. Analysis of the data 8 shows that 3.03 million units of twelve non-MSC labeled products were sold during   the sixty-five week period in the London market area, while 3.3 million units of twelve MSC-labeled products were sold during the same period.

Model Specification and Estimation
The hedonic model specifies the price of a product as a function of the attributes that characterize the product. The model can be written in its general form as: where P it is the price of good i at time t, and S = (s 1 , , s n ) is a vector of attributes that determines the price of the good.
where j indexes the brand attributes (Youngs, Birdseye, or private label),l indexes process form (breaded, battered, or natural/smoked),n indexes product type (fish fingers, fillets, or kids fish) and e it is a random error. In this analysis, the attributes are all expressed as dummy variables ( Carroll, Anderson, and Martinez-Garmendia [62], and Roheim, Gardiner and Asche [65], as well as hedonic analyses of organic produce [46], organic milk [47], organic tomatoes and apples [48], and ecolabeled apparel [66].
By including a constant term, the parameters are interpreted as the percent deviations from a basic product with a given set of attributes. In each dimension one can investigate whether the different attributes have different marginal values by testing whether the associated parameters are zero. Own-label, kids fish, natural smoked, and non-MSC labeled serve as the base attributes for the model. Models which included interactive effects between the MSC label and other attributes were tested, however none were statistically significant.
Since scanner data contain observations on multiple products of differing average values, the variances of the error terms are likely to differ across products. Whites [67] test rejected the hypothesis of homoskedasticity at the 1% significance level.
The model, which was run using ST AT A T M , corrected for heteroskedasticity with a heteroskedasticity-consistent covariance matrix estimator [68]. Following Davidson and MacKinnon [69], the HC3 covariance matrix estimator was used. The data were also tested for the presence of multicollinearity, although no significant effect on model results was detected. Breaded pollock products are 11% less expensive than natural smoked pollock, while battered is 24% less expensive and both are individually statistically significant.

Results
While typically considered value-added products, breading and battering may be adding value to a product which is of lower value from an initial state, perhaps because of lower quality. In other words, if the product were of sufficiently high quality, one might expect that the fish would be marketed as the higher-valued product, natural.
Thus, so-called value-added from breading and battering actually may be a process form that masks some of the quality control issues generated in the supply chain.
Again the F-test result shown in Table 1.3 indicates that process form, as an attribute sub-group, significantly explains changes in product price which follows intuitively from the previous discussion.
Fillets are 49% more expensive than kids fish, while fish fingers are 6% less expensive than kids fish, statistically significant both individually and as an attribute group.
There is a positive and significant relationship between price and portion size.
The focus of this analysis is whether or not there is a price premium for MSClabeled products. Thus, the premium is estimated to be 14.2% on these MSC-labeled processed frozen processed pollock products relative to non-MSC labeled frozen processed pollock products after fully accounting for the other product attributes such as brand, product form, package size and process form.
While it is useful to put this estimated premium into context, there are limitations in our ability to do so. First, as mentioned previously, there are no existing studies of actual premiums paid for ecolabeled seafood in the UK market or any other market, or for any other seafood products. 3 Secondly, previous analyses of consumer preferences for ecolabeled seafood have not generally estimated willingness to pay (WTP), but rather evaluated factors which influenced the probability of hypothetical purchase of ecolabeled seafood products, including Wessells, Johnston and Donath [4], Johnston et al. [5], Johnston and Roheim [6], Jaffry et al. [3], Brcard et al. [7], and Salladar et al. [37]. Only in Johnston et al. [5], in an international comparison of roughly 2,000 consumers in both the U.S. and Norway, a within-sample prediction was performed that showed 80% of U.S. consumers would be willing to pay an average 24% premium for ecolabeled salmon, cod and shrimp, while 54% of Norwegian consumers would be willing to pay the same price premium. These estimates are higher than the actual premium estimated as paid in the UK market, and may be a result of the hypothetical nature of the survey-based study in Johnston et al. [5] as well as differences in geographical markets.
Other interesting comparisons may be to look at alternative forms of product differentiation. Focusing on analyses that use scanner data and hedonic methodology to statistically estimate actual price premiums, we investigate existing literature for organic, fair trade, and branding attributes. Among these, Galarraga and Markandya [45] find an 11% premium in the UK market for fair trade coffee over regular coffee.
Roheim, Gardiner and Asche [65] determine the value of branding finding that national brands across many seafood commodities in the UK have a 10% premium over private labels. Lin, Smith and Huang [46] and Smith, Huang and Lin [47] show that organic labeling in the U.S. yields price premiums between 15% and 60%, depending upon food product and geographical market within the U.S. This implies that seafood ecolabels may be valued slightly higher than fair trade coffee in the UK. However, the premium is on the lower end with respect to what has been reported for organic products in the USA. Such a difference in premiums may not be surprising as fair trade and environmental sustainability may yield only warm glow effects which may be less welfare improving in terms of consumer utility than the combination of environmental sustainability and perceived health benefits potentially derived from consumption of organic products [71,72].

Conclusions
Success of ecolabeling programs in fisheries depends upon: a) a sufficient number of well-managed fisheries becoming and remaining certified, thus placing a critical mass of certified product into the supply chain; b) creating the incentives to reform poorly managed fisheries such that they become well-managed fisheries. To create that success, market benefits are necessary for ecolabeling programs to influence production and management practices in any industry. Price premiums are a direct means by which to offset costs incurred from sustainable fishing practices certified under fisheries ecolabeling programs, and are more directly measureable than other market benefits such as improved market access or expanded market share. To date, all evidence of the effect of ecolabels for seafood has been obtained using survey data [4,5,3,6,37]. Anecdotal evidence indicates the shift of European processors such as Unilever, Youngs Bluecrest and Frosta from sourcing Russian pollock toward U.S.
pollock due to sustainability certification [54]. However, doubts have frequently been expressed that price premiums actually exist [43,42]. In relation to the MSC, Washington [43] stated that the price premium is a myth and the OECD [42] stated that no evidence exists which documents effectiveness of ecolabeling schemes in creating market incentives for better fishing practices. Data limitations and complexities of the market often make it difficult to quantify market benefits [73,35].  To enhance these ecosystem services, there is a growing interest in market approaches in which those who value the services pay those who can provide them at the least cost [74]. However, constructing markets for public goods is complicated by free-ridership.
Because it is often prohibitively difficult to preclude non-payers from benefiting from the good, buyers do not have the incentive to participate in the market or pay their full values given that they cannot be excluded from consumption.
Historically, field experiments involving public goods rely upon voluntary contribution mechanisms to procure funds. However, decades of experimental economics outcomes have proven voluntary donations to be poor at achieving aggregate demand revelation. Given this flaw, researchers have developed alternatives to the voluntary contribution elicitation mechanism that have led to significant improvements in aggregate contribution. For instance, several authors [8,9,10] have shown that establishing a provision point with some form of rebate rule and money back guarantee if the provision point cannot be met can significantly reduce free-riding. In fact, the pivotal mechanism, which promises to collect payment only if, given the contributions of others, the participant's offer is required to ensure provision of the good, has the advantage of incentive compatibility in certain circumstances. That is, theoretically, revealing one's full value for the good truthfully is a dominant strategy.
The question arises: how will the mechanisms that help alleviate free-riding transfer from the laboratory to the field? Because they are often complex, unfamiliar in use, and require unanimous action or multiple rounds of interactions, several authors have pointed out that it may be difficult to extend the benefits of the elicitation mechanisms to the field [8,11]. In practice, there are several pertinent differences between laboratory and field environments that would suggest a need for careful consideration of public goods mechanisms in the field. First, participants in laboratory experiments often commit to a set duration of time, during which the researcher can ensure that directions have been read carefully. There are often several practice rounds administered to participants during which questions, if they arise, can be addressed. In many field experiments, lengthy instructions are provided but there is no guarantee that respondents read or comprehend them in their entirety.
We examine this issue by utilizing a novel data set involving one of the first efforts to bring several of the most promising public goods elicitation mechanisms to the field.
In 2006, the researchers at the University of Rhode Island, in collaboration with EcoAsset Markets, Inc., implemented a choice experiment describing a hypothetical market for wildlife preservation in Jamestown, Rhode Island. The "good" offered for exchange was a contract between the residents and the farmers within the community that would change hay harvesting practices to protect grassland nesting bird habitat.
Jamestown is a small community located off the coast of Rhode Island on Conanicut Island in Narragansett Bay. There are nine farms on the island, most of which produce grass-fed beef. There has in the past been evidence that the Jamestown community places a high valuation on its farms [75]. In addition, its residents tend to have a keen sense of attachment to their community and a history of supporting conservation of low-impact land use [76,77]. These characteristics made Jamestown an ideal venue 21 for which to test a local market for ecosystem services. The choice experiment was conducted to measure both the demand for the protection of grassland nesting habitat and how this demand varies across several public goods auction mechanisms.
In the absence of direct feedback about how individuals respond to the mechanisms they are administered, we evaluate choice consistency as a measure of the complexity of the choice task. Researchers have used econometric models that explicitly incorporate heteroskedasticity in error variance (scale heterogeneity models) to measure choice consistency in the presence of choice situation complexity, learning and fatigue effects [23,78,22,20,21]. In addition, scale heterogeneity models have been used to compare survey elicitation methods. Open-ended, discrete choice, or payment card formats have been examined as well as differences between contingent valuation and choice experiment surveys [17,79,80]. Rather than vary the survey format or the number of alternatives, choices, or attributes, we focus on the elicitation mechanisms administered and hypothesize that more complex mechanisms lead to greater choice inconsistency. This study will test whether more complex or unfamiliar mechanisms of payment complicate the decision process sufficiently to cancel out the benefits from incentive-compatibility. By comparing the variance in response across individuals, we identify quixotic behaviors and test for linkages within and across mechanism treatments. In combination with individual-specific WTP measures, we are able to draw valuable conclusions about whether the mechanisms are performing as expected. In this manner, this work contributes to the burgeoning literature on market design for ecosystem services. Additionally, we supplement the growing literature regarding the nature of scale heterogeneity [81].
Using the data from the choice experiment, we measure the randomness in subjects' responses by estimating individual-specific scale coefficients from a mixed logit model (MIXL) in willingness-to-pay space using hierarchical Bayes procedures. De-mographic determinants of the scale parameter values are explored as well as choice task features and strategic behavior such as yea-saying and lexicographic response.
Examining choice consistency is particularly relevant to the data generated from choice experiments involving public goods elicitation mechanisms because individuals who participate in the choice tasks must undertake two somewhat complicated thought processes which would require significant cognitive effort. First, the choice experiment invites participants to consider the provision of environmental amenities as salable goods with a variety of attributes that would be individually considered for valuation. In addition, respondents would then consider how their response would be processed given the elicitation mechanism administered.
In addition to the main goal of testing how subjects' scale parameter estimates are affected by the complexity of elicitation mechanisms, another novel contribution of the this paper is that we specify a WTP-space version of the random-scale multinomial logit model and test it against a full model, in which both scale and attribute parameters are allowed to vary. The scale heterogeneity model derived by Louviere and Eagle [81] assumes that all heterogeneity in preferences can be adequately explained by variations in scale alone. However, Adamowicz et al. [23] point out that accounting for both sources of heterogeneity is particularly important in welfare estimation, new product design, and segmentation in marketing. We add to the discussion by supporting the finding that scale heterogeneity is particularly important when attributes are highly correlated.
The layout of the paper is as follows. The next section describes the Jamestown choice experiment in detail along with the elicitation mechanisms that were administered. The three main hypotheses regarding the elicitation mechanisms are outlined.
The third section describes the WTP space approach and presents the model specifications. Section 2.4 describes the mechanics of the estimation with emphasis on the steps in the MCMC procedure. Section 2.5 presents results of the HB estimation along with the second stage scale regression. Section 2.6 concludes the paper and suggests further research.

The Experiment
In order to facilitate the design of a local market for ecosystem services, it was important to choose an ecosystem service that could be easily quantified, implemented on a sufficiently short time line, and be relatively inexpensive. Agricultural land provides support for many important and valued wood-edge species. It was determined that wildlife habitat preservation might fit the requirements for this type of experiment. Specifically, the black and yellow Bobolink (Dolichonyx oryivorus) utilize hay fields in Jamestown, RI as nesting habitat during a five week interval spanning the months of May and June. Hay harvesting and grazing activities during this period prove devastating to cohort success [82]. As a vulnerable species that would benefit from conservation efforts, the Bobolink cohort could be salvaged by a modest shift in harvesting practices.
Each choice situation described in the experiment contained hypothetical contracts under which the farmers would agree not mow or graze on the contracted acreage until after the fledgling Bobolinks have had time to mature. The contracted acreage could be seen from the road or not and potentially supplemented with an invitation to an expert-led birdwalk and additional fallow acreage to be restored to active farming.
Consultations with biologists indicated that delaying harvest until after July 4th would achieve this goal.
The survey was administered between October and December 2006 by mail to all valid addresses in Jamestown, RI, a total of 2893 households. After deducting undeliverable addresses, the response rate was 38.2%. There were 791 respondents in the final analysis. There were 10% more female respondents than male. The mean age of respondents was 57 years of age. Over 73 percent of respondents indicated that they did not have children under the age of 18 in their household. The median level of education of respondents was some college with a median level of income between $100,000 and $199,000 [77].
The survey was comprised of five sections. The first section described the ecosystem service in question. The survey described how Bobolinks utilize hayfields in Jamestown and how current harvesting practices impact breeding and rearing. This section also provided information about other important environmental services that hayfields provide. The second section included the choice experiment itself, with six questions regarding two hypothetical farm-wildlife contracts and a no-buy option.
Before being asked to choose between contracts, respondents were given information about how their elicitation mechanism worked. Respondents were given the option to choose one of the two contracts, neither, or both. The sixth question was unique in that it consisted of only one contract and respondents were asked to indicate whether they would purchase the contract or not. The remaining sections solicited participants for their opinions regarding farmland amenities and rural community character and collected demographic information.
The choice experiment included six attributes of the farm-wildife contracts ( Table   2.1). The first attribute was the 'Acres of managed hayfields'. This attribute also incorporated the expected number of bobolink fledglings. In estimation, this attribute is called 'High Bobolink' and has two levels: high and low and was mildly correlated with the number of acres of contracted hayfields. The second defining characteristic of a contract was 'acres of restored fields'. Preliminary discussions with the farmers residing in Jamestown suggested that there may be opportunity to restore fallow land to active hay production. The 'View' and 'Tour' attributes were both binary attributes and represented whether the parcel was viewable from the road and whether the contract came with an invitation to an expert-led birdwalk. This last attribute was unique in that it was the only attribute that can be viewed as a purely private good characteristic. The final attribute was the contract's cost: an eight level attribute ranging from $10 to $105. An example of a choice experiment scenario is included in Appendix A. The scenarios were constructed from a 4 2 x2 3 x8x2 orthogonal main effects design and blocked into groups depending on the treatment.

The Elicitation Mechanisms
There were two main groups of surveys administered. The main groups differed in whether a reference mechanism was applied or not. The reference mechanism was a hypothetical referendum for a tax increase. Individuals who were assigned to this group were presented with two choice situations in which the mechanism of revenue collection was an increase in taxes in the amount of the cost of the contract. The intention of providing the reference mechanism was to test whether respondents who were administered a familiar base mechanism would then be more responsive to a less familiar mechanism. We test the significance of providing a familiar reference mechanism on the behavior of the respondent.
Hypothesis 1 Providing a familiar reference mechanism highlights the advantages/ disadvantages of the alternative mechanisms and so individuals will be more responsive Each of the two main groups was further divided into several subgroups that differed in the elicitation mechanisms that were administered. We tested four elicitation mechanisms. All four included a provision point and money-back guarantee: if not enough revenue was collected to compensate the farmers, all of the money would be returned to the respondents. The value of the provision point was not revealed to the participants. The first elicitation mechanism, the provision point with money back guarantee (PP/MBG), collects all offers unless there are not enough to reach the provision point. This mechanism was only included in the treatment that was administered a reference mechanism. The second mechanism, the provision point with proportional rebate (PR), builds upon the PP/MBG by offering a proportional rebate of excess contributions. While the PP/MBG and PR mechanisms have shown evidence of alleviating some free-riding behavior [9,8,10], they are not in theory incentive-compatible. The pivotal mechanism (PM) was tested as the third mechanism because it has been shown to be incentive-compatible in controlled experiments.
The pivotal mechanism promises to collect from the participant only if her contribution makes the difference between meeting the provision point or not. In this manner, the pivotal mechanism induces the respondent to consider revealing her true value for the good. In the Jamestown choice experiment, PM came with a provision point and money back guarantee as well. However, Milgrom [11] points out that incentivecompatible mechanisms tend to be more complex and may be difficult to implement outside of the lab. Our second hypothesis is that, if individuals are not responding as predicted to the pivotal mechanism, this aberration can be explained by a smaller scale parameter (larger error variance), thereby indicating that the complexity of the choice situation has overtaken the advantages of the mechanism.
Hypothesis 2 Relative to the other public goods elicitation mechanisms tested in this experiment, the performance of the pivotal mechanism outside the lab is hindered by its complexity.
The fourth mechanism tested in this experiment was the Uniform Price Auction (UPA). Under UPA, the respondent is informed that after all bids are collected a uniform price will be determined such that all bidders pay the same price for the good, provided her bid was above the determined price. This mechanism has a fairness aspect to it that the others lack. Fairness has been a prime motivator in many cases in the lab [83]. In addition, paying the same price for a good is a common occurrence in markets for private goods and hence the expectation that everyone pays the same price for the good may be appealing.

Hypothesis 3
The Uniform Price Auction is likely to succeed in achieving higher demand revelation and more consistent choices than the PP/MBG and PR mechanisms because the aspect of fairness is appealing to respondents and resembles a private market setting.
In summary, there were 256 pairs of farm-wildlife contracts and 32 single contracts. These were divided among four main groups based on whether the reference mechanism was applied and then further subdivided into groups based on auction mechanism. The mechanism descriptions that were administered in the survey can be found in Appendix A.

The WTP-space approach
where p ic is the price of the contract faced by individual i in choice situation c and x ic is a vector of attributes of the contract including: number of acres protected, number of acres restored to active farmland, whether the plot has a view from the road, and whether the contract provides buyers with an opportunity to attend an expert-led birdwalk (we exclude the number of bobolink territories from the analysis because it was not shown to add significantly to the estimation). In the traditional MNL specification, α and β are homogeneous in the population and represent the marginal utility of income and marginal utility of the attributes of the contract, respectively.
ε ic captures the unobserved factors that influence the utility of person i for alternative c. In order to obtain choice probabilities that are in the set [0,1], we assume that ε ic is Gumbel distributed with Var(ε ic )=σ 2 (π 2 /6). σ 2 represents the variance of the unobserved factors that influence utility. The standard deviation of σ 2 is termed the scale of utility. This term is not a component of utility itself, but represents the standard deviation of the random portion of utility. If we define the original error term ε * , then Var( ε * σ ) = ( 1 σ 2 )(σ 2 )( π 2 6 ). The choice probabilities then become σ is not separately identifiable from marginal utility parameters and so in general it is fixed to 1 and parameters α * = α σ and β * = β σ are estimated. This reparameterization is of little concern if the aim of the estimation is ultimately to derive measures of marginal rates of substitution such as willingness to pay measures as these values generally require dividing one parameter (the marginal utility of an attribute) by another (the marginal utility of income or the cost parameter). In these cases, the scale factor cancels out.
However, the implicit scale parameter becomes a problem in two cases. First, scale of utility is a concern when comparing coefficients between groups. In general, larger scale implies smaller coefficients overall, even if the underlying preference parameters are the same across groups. A second issue, and one that is central to this paper, is that once heterogeneity is incorporated into the model, then discerning variation in tastes from variation in scale is not possible. That is, once we move from a model that does not model heterogeneity to one that does, then assuming that the standard deviation of error variance is the same for all respondents is a strong assumption and one that, in most cases is not supported.
To address the shortcomings of the preference approach, we follow the approach popularized by Train and Weeks [24] and others [90,91,92,93,94] and estimate the WTP-space specification. 2 Instead of choosing the alternative that maximizes utility, the consumer chooses the alternative that maximizes consumer surplus, which is the difference between her reservation price for that alternative and its price. Therefore where R c is the respondent's maximum willingness to pay for alternative c and p c is c's price, and this difference is a measure of consumer surplus. In this case, the logit choice probability becomes In this specification, the βs are interpreted as direct measures of marginal willingness to pay and the scale parameter is explicitly estimated. 3 Using the WTP-space approach, we examine the sources of heterogeneity in choice consistency. Individuals can exhibit response heterogeneity in choice experiments in several different ways (Table 2.2). First, the alternative-specific constants (ASCs) represent general differences in preferences across alternatives. If these are allowed to vary, then we allow each individual to have different patterns of residual taste heterogeneity across alternatives. Second, as discussed previously, the scale of the error term, or the standard deviation, is a representation of heterogeneity in response across choices for a particular respondent. Allowing the scale term to vary across individuals accommodates heterogeneity in the consistency of choices by respondent.
Finally, the regressors on the attribute variables can be held fixed or allowed to vary by respondent. This type of heterogeneity is accounted for in the standard randomparameter logit model, and is generally termed taste heterogeneity.
Among these sources of heterogeneity, we posit that the scale parameter largely represents the heterogeneity in response in our application. Heterogeneity in scale parameter can come from choice set design, respondent characteristics as well as product complexity [81]. Train and Weeks [24] posit that scale heterogeneity may arise because of purely idiosyncratic behavior on the respondent's part, or through differences in the variance of unobserved factors over choice situations.
Several studies support the view that the scale parameter captures much of the response heterogeneity. Louviere and Eagle [81] posit that much of the heterogeneity encountered in choice models can be accounted for simply by modeling scale heterogeneity. In this sense, the scale parameter varies but attribute coefficients are fixed.
Thus, respondents are assumed to have the same preferences but these preferences are shifted up or down based on the individual-specific scale of the error term. Fiebig et al. [96] analyze the contribution to improvements to loglikelihood gained from marginal willingness to pay for an attribute. In the model in preference space, it is not possible to separate heterogeneity in scale from heterogeneity in preferences.

31
sequentially adding sources of heterogeneity. They find that the biggest improvement in log-likelihood from incorporating scale heterogeneity can be found in the data sets that involved rather complex goods, such as medical decisions and cell phones as opposed to more mundane consumer goods such as pizza delivery and vacation destination.
On the other hand, Adamowicz et al. [23]  Second, Fiebig et al. [96] find that scale heterogeneity matters most in cases where there are "extreme" respondents who may make decisions that are not consistent with random utility maximization. These individuals exhibit behaviors that include lexicography, protest votes, and yea-saying. By inspecting the data, we were able to find evidence of all three response strategies. The third source of scale heterogeneity involves possible reactions to the mechanism treatments themselves. Systematic differences in scale across treatments indicate differences in the average level of idiosycratic behavior measured by the model. In this light, differences in scale indicate the extremity of reaction to changing the rules of the game regarding payment.

Model Specification
The analysis is based on four contract characteristics (Acres, Restore, View, and Tour), cost, and two alternative-specific constants (ASCs). The fifth contract attribute, High Bobolink, was mildly correlated with the Acres attribute and was excluded from estimation because it was not found to significantly contribute to the explanatory power of any model specification. The two alternative-specific constants represented the No-Buy option and the Both option.
In an effort to isolate taste and scale heterogeneity, three models were estimated.
The first model assumes that behavior can be sufficiently captured by taste heterogeneity alone. The second tests whether behavior is best described by scale heterogeneity and the third model incorporates both taste and scale heterogeneity.
There are a few considerations with regard to model specification. First, Fiebig and colleagues [96] point out that scaling the alternative-specific constants leads to complications in estimation if there is a significant fraction of the population that always chooses the same alternative. In our application, scaling the ASCs exacerbated autocorrelation in the MCMC chains. In fact, a substantial proportion of the sample (n=109/791 respondents) chose the Both alternative for all questions. Therefore, the ASCs were left un-scaled so as to facilitate model convergence.
The second consideration is generally cited as one reason why the cost parameter in Utility-space specifications is modeled as a fixed parameter in the population. It is often the case that a fully random specification is empirically intractable. We found this to be the case for our data as well and, hence, at least one parameter is specified as a fixed parameter in all three estimations.
The final taste heterogeneity model is formulated as follows: where V njt denotes the value function of individual n for alternative j in choice situation t. x njt is the vector of contract attributes and β n are the individual-specific marginal WTP estimates. Upon inspection of the MCMC chains, there was strong evidence that the hyperparameters are highly correlated. Survey responses support this finding: a large majority of respondents exhibited homogeneous preferences with regard to the contract attributes. Based on this information, we aimed to test the hypothesis that differences in behavior for this sample can best be described by scale heterogeneity alone. The value function for this second model is Unfortunately, because the scale parameter is inextricably linked to the ASCs, it was not possible to restrict the ASCs to be fixed in the population while allowing for scale heterogeneity. This is best exemplified by examining a choice of Both contracts in a choice situation. By choosing both contracts, an individual is effectively indicating that she is not willing to make any trade-offs among attributes. Therefore, the influence of the attribute characteristics themselves in the value function decreases. This is achieved by a lower value for the scale parameter.
The third model incorporates both taste and scale heterogeneity. However, because specifying a fully random model was empirically infeasible, there was a need to restrict at least one parameter to be fixed in the population. There was not sufficient response heterogeneity to accommodate random parameters for both the View and Tour attributes. Therefore, these were specified fixed parameters.
V njt = α n ASCBOT H njt + γ n ASCN O njt + σ n β n,Acres Acres njt + σ n β n,Restore Restore njt We utilize the three models in order to determine how much of the heterogeneity in response can be accounted for by scale alone and how much incorporating taste heterogeneity adds to the explanatory power of the model.

The Model and Estimation
We estimate the model using hierarchical Bayes (HB) estimation procedure was chosen for several reasons. First, our research draws heavily on outcomes at the level of the individual. HB incorporates these calculations into the overall estimation quite efficiently. In contrast, classical methods treat individual-specific estimates as an afterthought. Fiebig et al. [96] point out that an advantage of HB is that specify- Therefore, estimating a model that restricts the covariance matrix to be diagonal would be a serious misspecification. Bayesian estimation incorporates full covariance matrices far better than classical estimation [98]. Das et al. [91] find that accounting for correlated coefficients in a model estimated by classical methods slows down calculation considerably. Finally, we recognized the limitation in data per person: the number of attributes under consideration was just short of the number of choice situations faced by the individual. Allenby and Rossi [99] point out that hierarchical Bayes methods are particularly well-suited for data that consists of many individuals with relatively sparse information per unit of analysis, or "short" panels.

The Mechanics of HB Estimation
Bayesian estimation begins with the assumption that we can combine our expectations with our observations to update our beliefs about the world. We represent our expectations independent of our observations by prior distributions on the parameters to be measured. What we observe is represented with a likelihood function.
These two combine to produce the posterior distribution of the parameters which is a weighted average of the two. Formally, For the HB estimation of the WTP-space model with fixed and random parameters, we can represent this relationship as: where b is the vector of means of the population-level parameters, W is the variancecovariance matrix, β n represents the individual-specific estimates of b, Y are the observed choices, ψ(β n |b, W ) and k(b, W ) are the prior densities on the parameters, and L(y n |β n ) is the likelihood function for each person. The likelihood function for this model is where x is a vector of attributes with random coefficients, z is a vector of fixed coefficient attributes, ASC is a vector of alternative-specific constants, c is the cost of the contract, and σ i = 1 θ i is the individual-specific estimate of the scale parameter. There are several considerations with regard to the estimation of HB models. First, the researcher must specify the prior distribution of the parameters. We tested the common distributions including normal, lognormal, and truncated normal, but found that, excepting the scale parameter, the diffuse normal distribution suited our needs.
Apart from the attitudinal survey distributed with the CE experiment, no prior information was available regarding values for the attributes of the contracts. While responses indicated that all attributes would be positively valued by community members, we did not want to preclude the possibility of protest votes. Therefore, to permit the possibility of positive and negative valuations, the normal distribution was chosen.
It is, however, customary to specify the prior on the scale parameter to be lognormal, as, being the standard deviation of the variance of the error, it cannot logically be negative.
Balcombe, Chalak, and Frasier [92] have tested both random-walk and importance sampling M-H algorithms with varying degrees of success in obtaining convergence.
Allenby and Rossi [99] point out that the random-walk algorithm works well with short panels-relatively few observations per person. We followed Train and Sonnier [100] and implemented the random-walk algorithm. The second consideration in estimation regards the details of the mechanics of the estimation. The Gibbs sampler converges to draws of the posterior distribution with enough iterations. It is common practice to discard the initial draws as 'burn-in' [101]. Bayesian discrete choice models in preference space [100]. All models are implemented using the R statistical package. The output of the analysis is passed to the BOA package in R for analysis of the MCMC chains. 5

Results
The contract attributes in this survey may all be considered "goods" and positively related to the health of the Bobolink species and/or consistent with supporting farmland in the community. Survey responses indicated that both of these objectives were important to respondents. We therefore expected that individual-specific parameter estimates as well as the population-level estimates of the farm-wildlife contract attributes would be highly correlated and positive.
Across the three models, all parameter estimates have the expected signs (Table   2.3). The mean willingness-to-pay per acre of protected Bobolink habitat was higher for models in which the parameter was permitted to vary across respondents. This

Root Likelihood
Because   In addition, having some history of ordering items through the mail raises the consis- PM. There were approximately twice as many respondents in the treatment that received the reference mechanism (Group 1) than in the treatment that received none ( Within the group that did not receive the referendum questions, the PP/MBG has markedly low scale parameters. However, as mentioned above, this group likely in- The proportional rebate mechanism does not perform significantly different from the pivotal mechanism. At first glance, it appears that the pivotal mechanism is not in fact being hindered by its complexity (Hypothesis 2) as, in both main groups, its scale parameter is relatively high. In addition, there is some weak evidence that the uniform price auction yields lower scale parameters than the proportional rebate and pivotal mechanisms. At the very least, the results imply that the UPA performs no better than other mechanisms at achieving choice consistency. This provides some basis for rejection of our third hypothesis. That is, there does not seem to be gains in choice consistency to be had by couching the exchange in a manner that is more comparable to a private market transaction. However, as mentioned above, we note that we did not find a significant difference in scale within the two main groups.
In order to evaluate whether the different elicitation mechanisms have an effect on free-riding behavior, we use individual-level willingness-to-pay values ( Table 2.6).
The average WTP was higher for the Group 1 respondents. This finding does not necessarily indicate that in making the respondent more aware of the mechanism, it is having the intended impact of reducing free-riding (thereby inducing the respondent to reveal their true, higher values). It may simply be the case that the referendum questions are pulling up the estimates of WTP.
There was no significant difference in either scale or WTP across elicitation mechanisms within each main group. However, while not significant, we note that, of the Group 2 treatments, the pivotal mechanism had the lowest contract WTP. Thus, there is weak evidence that the PM mechanism is failing to induce respondents to reveal their true valuations when a reference mechanism is not administered. However, by comparing each mechanism individually across Groups 1 and 2, we get a sense of just how much changing the payment rules makes individuals more responsive to the mechanism. For instance, we cannot reject the null hypothesis of equality of means in scale parameter across groups for individuals who were administered the UPA mechanism and the PP/MBG mechanism. For these mechanisms, applying a reference mechanism did not induce a change in behavior, presumably because the "rules of the game" did not change sufficiently to warrant a change in strategy. There was, however, a marked difference between groups for both the PR and PM mechanisms.
The PM, the only incentive-compatible mechanism tested in this experiment, had the highest mean scale parameter when administered without a reference mechanism (0.064). The difference across the two groups for the PM was 0.02, 2.5 times the difference across groups for the PR, the only other statistically significant difference (0.008). This finding suggests that much of the difference in scale between the two groups is driven by the difference in the PM vs. SP/PM. That is, individuals appear to be reacting more to the change in mechanism from referendum to PM than to other mechanisms.
The difference in average WTP across the two groups by mechanism is even more revealing ( Table 2.6). The pivotal mechanism is the only mechanism with a significant difference in WTP across the two groups. In addition, the PM produced the lowest values for WTP in the Group 2 treatment, but, highest the values in the Group 1 treatment. We can, therefore, comfortably conclude that the increase in WTP may be due to respondents reacting to the mechanism by more truthfully revealing their values. Thus, providing a reference mechanism may have the effect of inducing the respondent to behave in a manner more aligned with behavior in a laboratory setting.

Summary and Conclusions
Constructing a market for ecosystem services involves designing a market for public goods which suffer from free-riding. Generally, the first step in designing markets for new products involves surveying respondents about their preferences for the attributes of the new product. The Jamestown choice experiment was designed to test different payment elicitation mechanisms and their ability to mitigate free-riding behavior in a local market for wildlife habitat protection.  Note: The five questions from the CE experiment included in this analysis presented two contracts side by side with varying levels of the above attributes. Participants were asked to choose their preferred contract, both, or none. High Bobolink was excluded from the analysis because we were not able to identify its influence independently of the Acres attribute. Source: [76]

Introduction
Research in environmental and natural resource valuation rely heavily on hypothetical survey data to estimate values for public goods [103,104,105,106,107]. The advantage of the stated preference approach lies in the ability to construct hypothetical scenarios in which the researcher has the ability to define the attributes of the scenarios and their levels [108]. In contrast, revealed preference experiments confine the researcher to the realm of realized behavior.
Often, however, results from hypothetical surveys have not been satisfactorily reflective of observed behavior. Data from stated preference experiments tend to overestimate actual demand, particularly in the case of public goods valuation [109,110,111,112,113]. This phenomenon is generally termed "hypothetical bias" and has been linked to: subject pool variety, differences in information provided across experiments, social norms, and whether willingness-to-pay vs. willingness-to-accept is being measured [114]. However, in light of the fact that market-based instruments for valuing ecosystem services are on the rise [115,116,117,118,119], demand for the kind of information that stated preference methods alone can offer is steadily rising.
Inferring values from stated preference surveys that suffer from hypothetical bias may induce policy-makers to set policy objectives at levels that will result in inefficient outcomes.
Several studies have found significant differences between stated and revealed values for goods and services that are derived from the environment. Aadland and Caplan [120] compared stated and revealed preferences for curbside recycling pro-grams. Brooks and Lusk [121] compared survey responses to scanner data on sales of organic and rBST-free milk. Champ and Bishop [122] utilized certainty scales to identify hypothetical bias in response to questions regarding the voluntary purchase of wind-generated energy for a period of one year. Murphy and colleagues [111] evaluated hypothetical payments and binding offers to contribute voluntarily to the Massachusetts Chapter of The Nature Conservancy using a cheap-talk script to mitigate hypothetical bias.
Concurrently, there is mounting evidence of the importance of accommodating attribute processing heterogeneity such as ignoring or 'non-attendance' to one or more features of the good and lexicographic preferences in stated and revealed preference data analysis [123,124,125,126,127,107]. The recent literature recognizes that individuals responding to surveys often employ simplifying strategies that are not consistent with conventional random utility maximization. I posit that attribute processing rules may explain some of the observed hypothetical bias in data from surveys concerning values for public goods. The objective of this study is to identify differences between stated and revealed payments by drawing upon the latest research on attribute processing rules (APRs) in order to examine the extent to which APRs can be useful for identifying sources of hypothetical bias.
From a disaggregated perspective, I identify response strategies and test their impacts on estimated values for agricultural ecosystem services. Using a latent class model that incorporates APRs, I am able to identify strategies in responses such as 'yea-saying' [114], attribute non-attendance (ANA) [123], and lexicography on particular attributes [127] in order to make inferences about how these dynamics affect measurements at the aggregate level. These issues are particularly relevant to valuation of ecosystem services for several reasons. First, because of the non-rival nature of public goods and services, as long as enough support is generated, non-payers cannot be precluded from consumption of the good. Thus, in a hypothetical situation, it is advantageous to send a positive signal. In addition, because such goods are not traded in conventional markets, assigning economic values for them can prove cognitively arduous. Thus, employing simplifying heuristics when making choices might be more Both CE and market experiments were administered to the same sample of individuals in a small community in rural Rhode Island. Thus, differences in subject pool (i.e. university students vs. grocery store shoppers) can be ruled out. In addition, a substantial amount of demographic and attitudinal information was collected regarding the respondents. This information is used to make assessments of feasible processing strategies. In contrast to most studies using stated and revealed preference data, this project conducted a choice experiment before the market experiment.
Finally, care was taken to engineer the market good and the mechanism of exchange to be as closely consistent with the hypothetical choices as possible. Moreover, nearly identical elicitation mechanisms including the pivotal mechanism and provision point mechanisms were administered in order to reduce freeriding. This way, I am able to examine the performance of different elicitation mechanisms to address free-ridership as a separate issue.
The paper proceeds as follows. Section 3.2 describes the ecosystem service under analysis and provides a brief review of the CE and RP experiments and the hypotheses to be tested. Section 3.3 describes the method used to combine the data. Section 3.4 presents results and section 3.5 provides a summary and conclusion.

Constructing a local market for wildlife preservation
The empirical application involved establishing a market for wildlife protection marketable to the members of the rural communities surrounding local farmland.
Every spring, hay farms on Jamestown, Rhode Island serve as nesting grounds for a species of ground nesting birds with a large migratory range and charismatic song called the Bobolink (Dolichonyx oryzivorus). Historically, hay fields in many U.S.
states have been in decline as preference is given to other crops. In addition, crops are cut 2-3 weeks earlier than has been historically (since the 1940s and 1950s) [128].
This shift in cropping practices has led to serious mortality for Bobolink fledglings. There were 5 questions comparing two potential contracts and a sixth question with one potential contract. The sixth response was not utilized in this analysis. Each contract was described by a list of attributes (see Table 3.1). There were six attributes described: 1. acreage under contract to delay harvest (Acres), 2. the number of acreage to restore to active farmland (Restore), 3. whether the acreage was found to have a high or low concentration of Bobolink (HighBobolink), 4. whether the contracted acreage is viewable from the road (View), 5. whether or not a birdwalk is offered (Tour), and 6. the cost of implementing the contract. Respondents were presented with two competing contracts displayed side by side. Individuals were then asked whether they would choose contract A, contract B, both, or neither. That individuals were given a choice to choose Both contracts is a novel feature of the SP survey that permits identification of yea-sayers in the sample. A full description of the survey design and implementation can be found in Uchida et al. [76].
The SP mailing was comprised of five sections in total. In addition to the CE task outlined above, there were three additional sections that elicited opinions with respect to values of farmland amenities, rural character preservation, community attachment, and the importance of fairness in payment for services provided by farmland amenities. This last line of questioning was meant to assess the impact of different elicitation mechanisms on the decision to participate in the market. Several public goods payment mechanisms were administered in order to test in their field effectiveness at mitigating free-riding. A summary of the relevant findings of the attitudinal sections is listed in Table 3 The SP and RP treatments were designed to be as closely consistent as possible.
However, there were some differences across surveys. A comparison of the attributes across both phases of the project is listed in Table 3 The second issue with this subset of the SP respondents is the actual signal that is being sent by adopting a "yea-saying" strategy. Caudill and colleagues [109] present evidence that yea-sayers come in two varieties: that some of the respondents are truly more interested in and willing to pay for ecosystem preservation but that others may not ultimately be willing to pay the stated amount. This second group may simply be sending a signal that farmland amenities are important to them without expending the mental effort to assess whether they would actually be willing to pay the stated amount. Because of the hypothetical nature of the survey, there is no consequence to this type of behavior and this type of behavior is generally termed "hypothetical bias".
Over all, individuals who returned the CE survey were more likely to return the market experiment solicitation, make an offer, and offer higher bids. The subset of yea-sayers, in fact, have even higher participation and offers (with the exception of the 2008 treatment). However, the vast majority of respondents to the CE experiment stated that they would participate in at least one of the scenarios offered them. Given this fact, the participation rate for these individuals is lower than one would expect.
There were two possible reasons for lower participation rates in the 2008 treatment of the market experiment. First, the recession by then was fully in place and, second, there had been some controversy involving the land trust's efforts to purchase conservation easements on three farms in Jamestown. This may have had the effect of generally reducing confidence in projects to support farmland. A summary of market experiment participation separated by year and by major group (SP respondents, SP yea-sayers, and RP response only) is listed in Table 3.4. Based on this information, I test several hypotheses.
First, I exploit techniques from burgeoning research on attribute non-attendance (ANA) to test outcomes for individuals who had low sensitivity to the contract attributes that did not transfer to the market experiment. Attribute non-attendance involves ignoring one or more attributes when comparing alternatives in a choice scenario. I model attribute non-attendance to the Restore and Tour attributes and hypothesize that these estimates of stated WTP are in a sense more reliable because they assume that the decision process is more aligned with the RP scenario.
Hypothesis 1: Non-Attendance to attributes that were not included in the market experiment yields more consistent estimates of WTP.
Next, I examine the market behavior of yea-sayers in this context. Because of the unique presence of the Both option, yea-saying is easily detectable in the SP application. Based on the discussion above, I examine the behavior of this class of individuals in the market in order to determine the extent of hypothetical bias inherent in yea-saying. I then compare the incidence of hypothetical bias in this group against the rest of the sample.
Hypothesis 2: There is both higher incidence of hypothetical bias and higher revealed payments linked to yea-saying behavior.

68
The overarching goal of this analysis is to explore whether accounting for attribute processing rules impacts the predictive validity of SP data. While the SP survey required substantially more cognitive effort and costs in terms of investment of time, the RP survey traded these costs for actual monetary commitments. Therefore, while both experiments involved costs, they were of different types. Indeed, Hensher and Greene [126] suggest a link between attribute processing rules and hypothetical bias, implying that failure to accommodate for APRs might significantly contribute to what has been termed hypothetical bias in the literature. Thus, I examine a model of response that accounts for preference heterogeneity alone against one that incorporates attribute processing rules. I hypothesize that models that incorporate APRs outperform models based on random utility maximization in terms of both model fit and predictive validity.

Hypothesis 3: Behavioral outputs of SP measures that account for APRs predict
payments in experiments involving real payments with greater accuracy than models that account for preference heterogeneity alone.

Methods
The analysis proceeds in four steps. First, two competing models of SP response are estimated: a latent class model (LC) which incorporates taste heterogeneity and a latent class model that also accommodates attribute processing rules (LC-APR).
Then, in order to make inferences about how certain processing strategies manifest in revealed preference experiments, the individual-specific conditional probabilities of class membership from the LC-APR model are utilized in a model of market participation for the 2007 and 2008 market experiments. Conditional on market participation, an offer equation is used to examine differences in offer amounts by respondent type, controlling for contract and demographic covariates. Specific attention is paid to classes that model non-attendance to attributes that were omitted from the RP experiment. I test whether there is evidence that non-attendance to the Restore and Tour attributes leads to more reliable estimates of market outcomes. Finally, predictive validity of the LC and LC-APR models is compared.

The LCL Model
In order to identify strategic behaviors that might violate the assumptions of neoclassical utility maximization and evaluate the predictive performance of a model that incorporates these strategic behaviors, I utilize an approach that is commonly used in the absence of direct survey queries about response rules. The latent class logit (LCL) model with restrictions for APRs is particularly useful for this kind of analysis. The LCL model has been used both to explore patterns of attribute non-attendance and other violations of continuous preference ordering as well as modeling non-parametric preference heterogeneity. There is by now a substantial literature which uses the LCL model to identify attribute processing rules (APR), especially in the absence of self-reported non-attendance [123,124,125,126,127]. Most cite improvements in model fit and more realistic estimates of WTP when attribute processing strategies are incorporated in this manner. The LC model is widely exploited in marketing and transportation studies but has recently been used in cases of public goods valuation.
Several studies cite the importance of accommodating APRs in choice modelling and there is growing evidence that modeling APRs improves model fit and leads to estimates of marginal WTP that are more consistent [107,28,125,124]. If an attribute is ignored, then relative trade-offs that involve that attribute are not meaningful.
That is, no increase/decrease in the ignored attribute compensates for a change in an attended attribute. This is particularly concerning if the attribute being ignored is the Cost attribute as WTP estimates cannot be calculated.
The first-stage model combines Train's [129] Expectation Maximization algorithm for nonparametric estimation of the random parameter latent class logit model with Hess et al.'s [130] expansion of attribute non-attendance for heterogeneous taste variation. I model non-attendance to key CE survey attributes that were omitted in the market experiment: acres of restored farmland and invitation to a bird walk. In addition, I aim to catch and contain the yea-sayers in the sample, whose insensitivity to contract price would otherwise inflate marginal values for all other attributes.
Attribute non-attendance is expected to be a significant problem with this particular type of choice task since respondents are not likely to be familiar with the ecosystem service for offer and thus may make unforeseen assessments of the true meaning of the attributes of the contract, or to decide that a particular attribute is too cryptic to assess a value for.
The Expectation-Maximization algorithm applied to latent class modelling has been utilized by Train [129] as a form of non-parametric estimation of underlying taste heterogeneity whereby a discrete distribution whose accuracy in approximating the true underlying distribution rises with the number of parameters. This is an extension of Bhat [131] where increasing the number of classes allows for better approximation of taste heterogeneity. Several authors have noted advantages of the LC model over the popular Mixed Logit Model in capturing taste heterogeneity [132,133,130].
The latent class specification proceeds as follows. Given the standard choice mod- Model fit is generally assessed based on minimizing an information criterion such as AIC, BIC, or CAIC [123,107,134]. If the information criteria do not agree on which model is preferred, the researcher must choose based on examination of standard errors and feasibility of parameter signs. Because the EM algorithm does not involve maximizing the likelihood function, special attention must be paid to assessing local vs. global maximum attainment. This is achieved by testing several starting points to ensure that a global maximum has been obtained. Each candidate model was estimated from fifty random starting points. When the number of classes is relatively small, variation in BIC was relatively low from one estimation to the next. The variance in BIC rose with the number of classes. From previous analysis, it is quite likely that preferences for the contract attributes are highly correlated. Hess et al. [130] point out that latent class models incorporate this correlation inherently through class membership probabilities.
The LC-APR model used in this analysis is fundamentally different from conventional LC models in which classes are not representative of specific behaviors [123].
For this reason, Hensher et al. [125] refer to the model as a "probabilistic decision process model" whereby the class membership probabilities represent the probability of a typical respondent exhibiting the behavior modeled in a class. Therefore, the model specification search is undertaken in a different manner. The assumptions about processing strategy are outlined first. That is, the model structure is defined.
Then, the appropriate restrictions are imposed on each class defining the response strategy and then the model is estimated. In this case, the primary response behaviors of interest are non-attendance to the attributes that were left out of the market experiment and yea-saying. to me as part of Jamestown" and, "It is important to me that I can view birds and other wildlife when I walk near farms". Nearly 97% of respondents indicated that they agreed or strongly agreed that open space in agricultural use was an important feature of their community. Undeveloped woodland was also found to be important to most of the respondents. However, the responses were mixed with regard to whether maintaining remaining agricultural landscapes was more or less important than maintaining undeveloped woodland. Participants were also asked whether they would join an expert-led bird walk if invited. There was a mix of responses to this question. A summary of these findings can be found in Table 3.2. Overall, the attitudinal findings imply that non-attendance to the Restore and Tour attributes is a distinct possibility in the data.
The convention with regard to using latent class models with APR restrictions is to first identify candidate APRs so that they can be tested for inclusion. In many cases, the parameters are constrained to be equal across classes and attribute nonattendance is specified by restricting the parameter to equal zero in a particular class.
The rationale for constraining parameters across classes is to focus on attribute nonattendance without concern for preference and scale heterogeneity. I am interested in accommodating taste heterogeneity as well as process heterogeneity. However, doing so complicates the analysis quite a bit as the combinations of potential behaviors rises exponentially. Thus, to simplify the analysis, I focus on non-attendance to attributes that were included in the SP experiment but left out of the RP treatments.
While doing so facilitates tests of Hypothesis 1 above, it is also somewhat reinforced by the attitudinal findings listed above. That is, there is some qualitative evidence that individuals would not participate in a guided birdwalk if offered, and that some respondents might be ambivalent with regard to restoring fallow land to active cultivation. In addition, I let the taste heterogeneity be guided by evidence from the full-attendance classes. That is, I first tested one class for each type of APR. A layer of taste heterogeneity was added by increasing the number of full attendance classes until the lowest information criteria were obtained. I then added a layer of taste heterogeneity on to the ANA classes by assuming two of each ANA class. An outline of this process can be found in Table 3.5. The Cost ANA and All attribute ANA were confined to one class each for all specifications. The reason for not testing higher dimensions on these APRs was that constructing estimates of WTP is inconvenient for the All ANA class and practically impossible for the Cost ANA class.
For comparison against a model without APRs, I perform the classical LC model specification search by testing up to thirteen unrestricted classes. The unrestricted model is used as a baseline for comparison of performance against a model that incorporates APRs.

Constructing Individual-level WTP estimates
In order to assess the predictive validity of the LC-APR model, it was necessary to construct individual estimates of WTP for the contract that was presented to the individuals for each year of the RP market experiment. Individual-specific conditional probabilities of class membership conditional were used to weight the within-class parameters which were then applied to the standard formula for calculation of WTP based on Hannemann's formula [135]. The procedure is outlined below.
Given the K x C class parameter estimates, I estimated the conditional probabilities of each individual belonging to each of the classes. I use these conditional probabilities as weights on the class parameters to construct the marginal utilities for each individual [132]. I then combined the estimated parameters with the RP con- I utilize within and across class variance-covariance to construct individual-specific estimates of WTP as follows:

Descriptive Statistics and Class Allocation
Upon preliminary examination of the SP experiment data itself, there appeared a few clear strategies. For instance, the yea-sayers (those who answered that they would purchase both contracts for all questions) comprised a substantial share of the overall respondents (13.8%). There was also evidence that many individuals were simply choosing the lowest cost option (10.7%). These qualitative findings were used to examine the performance of the LC-APR model.

77
The final LC-APR model was arrived at after methodical testing. The best model in the specification search, based first on BIC and then on highest average maximum conditional class membership probability had five classes: four restricted classes and one full attendance class (Table 3.6). The final five-class LC-APR model included the following APRs: Restore and Tour ANA, Tour ANA, All ANA, Cost ANA, and one full attendance (unrestricted) class.
The LC-APR model results suggest that full attendance was not a majority strategy (Table 3.7). Individuals are assigned to classes based on highest conditional probability of membership. There were 173 respondents (22%) for whom full attribute preservation was a best fit. All attributes in the full attendance, Restore/Tour attribute non-attendance, and Tour non-attendance classes are significant and have the expected signs. The All attribute non-attendance class and Cost ANA class parameters are insignificant. Individuals who fit these classes with high conditional probability act in a way that makes it difficult for preference trade-offs to be calculated with accuracy.
Individuals who chose both contracts for all choice experiment questions (the yeasayers) fit the Cost ANA class with extremely high conditional probability (at least 90%). The remaining seven individuals in the Cost ANA class made only one choice that was not the Both alternative. The unfortunate drawback of this class is that constructing willingness to pay estimates involves division by the cost parameter which is restricted to zero. Therefore, I found that the estimates of WTP for these individuals, using the method described in Equation 3.1, produced wildly high negative or positive values because the cost parameter at the individual level was exceeding close to and equally likely to fall on either side of zero. In addition, confidence intervals around these estimates ranged in the thousands.
Following some authors, I included a class that represented full attribute Non-Attendance (All ANA) to capture idiosyncratic behavior. There were 82 respondents who fit this classification best. A portion of these respondents were found to have extreme reactions to a change from the referendum vote to one of the elicitation mechanisms. 1 Fourteen of the respondents (17%) were "protest votes", that is, they chose the No Buy option for all choice occasions. The remainder of the All ANA class exhibited seemingly random response behavior. The parameter estimate for the Cost parameter for this class is positive and very close to zero, leading the average WTP to be negative and drastically large.
There were 420 individuals who fit the Restore/Tour ANA and Tour ANA classes best. Recall that the primary goal of modeling a Restore/Tour ANA class is to test whether individuals who were relegated to this class made choices that were more consistent in the market experiment, which did not include these contract features. This class had a high positive value on the View attribute relative to the other classes, perhaps implying lexicography on this contract characteristic and indicated that a segment of the population is highly concerned with preserving the aesthetics of unharvested farmland. An alternative interpretation is that, if a parcel can be viewed from the road, then ensuring that farmers uphold the contract is possible. Monitoring compliance on a parcel that cannot be seen from a road would be difficult.
In the membership model of the LC-APR, membership to this class was comprised of individuals who had high values of the Equality variable ( For comparison, an unrestricted latent class model was estimated (Table 3.

Market Experiment Outcomes Participation
The Random Effects Probit model of market participation (Table 3. Of the contract attributes found to influence participation, only the log of the minimum amount of the solicitation was found to be significant. As in Swallow et al [136], I acknowledge that this implies that at least some of the individuals did open the solicitation and look at it before deciding not to return the mailing. None of the effects-coded variables for mechanism treatment were found to contribute to the decision to participate in the market experiment.
In order to investigate participation given the first stage SP results, I included the  times the amount of hypothetical bias exhibited by the full attendance class which had the next highest proportion. Therefore, I find some support for Hypothesis 2, that yea-sayers reveal a high level of hypothetical bias but also exhibit high valuation otherwise.

Summary and Conclusions
Stated choice experiments are good tools for assessing the nature of potential demand for a new product. The strength of the discrete choice experiment lies in the ability of the researcher to generate data that contains sufficient variation so as to best infer the nature of the trade-offs among attributes that potential market participants will make. However, unlike new product development in the realm of tradable goods, a significant challenge to the success of eliciting preferences for nonmarket goods is inducing respondents to reveal their true values for a nonexclusive public good.
Behavioral economists and choice modelling researchers have found mounting evidence that the neoclassical model of utility maximization does not always succeed in characterizing economic behavior. Attribute processing strategies have an effect on valuation in CE surveys and that these processing strategies have implications for market participation and contribution in this unique dataset. It has been suggested that accounting for such strategies may help alleviate some of the well-documented differences in measured stated vs. revealed values, especially for public goods.
The study outlined in this paper offered a unique means by which to examine differences between revealed and stated preferences for two reasons. First, the CE survey included an option to purchase both contracts, thereby releasing the respondent from the need to make any trade-offs. Individuals who were inclined to choose this option were found to be of two types: respondents with high values for the contracts and their attributes, and respondents who exhibited hypothetical bias. By comparing the behavior of these respondents in the subsequent market experiment, I was able to clearly observe the level of HB resultant from this strategy of hypothetical response.
My findings indicate that yea-sayers may be expected to exhibit high contribution levels, but overall, were not found to participate to a larger extent than other respondents. This implies that some means of partitioning this group into individuals whose actual value for the good is high and those who are engaging in yea-saying would assist in determining how much weight to assign these responses when making inferences about public goods values.
In addition, the CE survey involved the presentation to respondents of a 'good' that was not previously available to the community. Rather than testing a new dimension of an already existing consumption good (i.e. an added label to an existing carton of milk or a new transportation option), the survey involved assigning value for a new product with possibly unfamiliar characteristics. Therefore, applying simplifying rules to make the choice easier was a distinct possibility. Indeed, I find that a model that incorporates APRs succeeds in mitigating the upward bias of SP responses.
The method employed in this research can be useful for many applications. In gen-  Note: There were five defining attributes of each of the CE contracts. The attributes and their levels are listed above. Source: [76].     The specification search is outlined above. Each column represents the number of classes specified in each model to have the parameter restrictions that imply the non-attendance patterns listed in the first column. Boldfaced values represent the best performing information criterion.    In comparison to other studies that assess preferences for the MSC's eco-label within the United Kingdom, these results are similar in magnitude to those found for salmon products, 13.1% [64] and chilled haddock products, 10% [63]. However, as Johnston et al. [5] have pointed out, there are significant cross-cultural differences in preferences. Therefore, a comparison of our findings with findings from other markets for pollock products would produce a more comprehensive assessment of the market benefits of certification. The closest comparison in the literature is the finding of Uchida et al. [138] that consumers in Japan are willing to pay a 20% premium for certified salmon products if given information about the label. These findings may suggest that more work is needed in bringing issues of sustainability to consumer consciousness, at least for some markets. A comprehensive assessment that includes the main markets for Alaskan pollock products would yield a more accurate estimate of the price premium that the MSC's label on pollock generates.
In the second manuscript, I examined a stated preference survey administered to the inhabitants of Jamestown, Rhode Island, designed to elicit values for the preservation of grassland nesting habitat on Jamestown's farmland. The Jamestown experiment was unique in that several different public goods auction mechanisms were tested in the field. Because the environmental good had not been previously offered to the Jamestown residents, I was interested in ways of capturing uncertainty in response to the survey and how the elicitation mechanisms, which were designed to mitigate the urge to free-ride, might complicate decisions in field studies for public goods. Interestingly, while I did not find evidence that more esoteric elicitation mechanisms added noise to the participants responses, I found some evidence that adding a familiar base mechanism prompted respondents to behave in a manner that was more consistent with findings in the laboratory with regard to these mechanisms.
The performance of voluntary contribution against provision point mechanisms has been tested in the laboratory and in the field [139,10]. As Rose et al [10] point out, provision point mechanisms provide product definition which has been judged by market researchers to assist participants in understanding what they will receive in return for their offers. While all of the mechanisms in the Jamestown choice experiment were administered with both provision points and money back guarantees, this was the first time to my knowledge that the pivotal mechanism which has been proven theoretically incentive compatible was tested in the field. I found encouraging evidence that, contrary to many assertions, demand-revealing mechanisms can be used in field settings to raise contributions and reduce free-riding. But there is significant evidence to warn future researchers that individuals will ignore the elicitation mechanisms if not emphasized. It is unclear whether the placement of the mechanism in the middle of the choice task or the application of a familiar reference mechanism was what induced respondents to have stronger reactions to the mechanisms but it is clear that emphasis must be placed on the novelty of the mechanism if it is to perform as expected in the lab, something that is akin to the researcher reading instructions aloud in the lab.
Stated preference surveys are often criticized for their tendency to suffer from hypothetical bias. Recent research has voiced concerns about the usefulness of hypothetical choices at all given this well-documented drawback [1]. Carson and Groves' seminal paper [140] addressed hypothetical bias in relation to the perception of "consequentiality" of an individual's response. According to Vossler and colleagues [141], "a survey elicitation is consequential if the respondent cares about the policies contemplated ... and further views her response as potentially influencing agency action".
It follows logically that respondents may evaluate the elicitation mechanisms as influencing the consequentiality of their responses. This seems to be built in to the structure of the pivotal mechanism. If it is the case that individuals view their responses to be more or less consequential as a result of the mechanism administered, then a variety of levels of hypothetical bias can be expected for this data. This may have a confounding effect on our results. In fact, Swallow et al. [136] found qualitative evidence that offers were higher in uniform price cap and proportional rebate than for pivotal mechanism in the revealed preference experiments of the Jamestown project. The failure of the pivotal mechanism to generate higher offers may indicate some distrust in the mechanism in practice. More work is needed to investigate these dynamics.
In the final manuscript, I utilized the findings from the stated preference survey and compared them against the subsequent two-year market experiment that was administered to the same community. Care was taken to match elicitation mechanisms as best as possible in both phases of the project. I hypothesized that respondents would be more inclined to use simplifying rules such as yea-saying and attribute nonattendance when answering hypothetical questions regarding the market. My interest was in the implications of these findings for behavior in the revealed preference market experiment. I was able to identify several processing strategies and, as expected, a higher rate of hypothetical bias in individuals who chose the highest level of provision in all choice occasions in the stated preference survey. I have shown that attribute non-attendance can be used to identify sources of hypothetical bias as a result of survey design and respondent type. I have also found that the methodology can be used to identify estimates that will yield more reliable results. This was the first study that took the results of burgeoning research on attribute non-attendance and tested their implications for realized behavior.
These findings have implications for survey and market design particularly for public goods valuation. Researchers are becoming increasingly concerned with identifying the incentive structure of stated preference methods when formulating expectations pertaining to hypothetical bias [140,2]. In many ways, the work done by Carson and Groves is reshaping our thoughts with regard to the issue. There may be a link between the attention paid to the survey and the degree of consequentiality perceived by the respondent. Cameron and DeShazo [25] and others point out that individuals may attend to some attributes more or less than others as a result of the cognitive effort of the choice task. If a respondent is found to attend to all attributes, then it can be rationally assumed that she cares more about the initiative and thus deems her responses as consequential. Indeed, I found that individuals who were likely to fully attend to all choice experiment attributes had higher revealed offers in the market experiment. However, I found only weak evidence that participants who fully attended to all attributes of the choice experiment exhibited less hypothetical bias in the revealed preference scenario. Instead, accounting for non-attendance to attributes that did not transfer to the revealed preference domain significantly reduced measured hypothetical bias. That is, our results suggest that there may be more gains to be had in terms of consistent estimates of WTP by focusing on respondents who care very little for the attributes that are not transferable to the revealed preference context.
Regardless, the results of this manuscript suggest promising new directions for the use of attribute non-attendance research in survey and new market design. Research on attribute processing strategies is likely to prove invaluable for survey design and analysis There were some notable limitations to this research and I thank several conference attendees and committee members for their input regarding these issues. First, generating aggregate estimates of marginal willingness to pay using this method would require sample selection adjustment at multiple levels, first to correct for self-selection in response to the stated preference survey and then for each year of the market experiments. While we attempted several variations on corrections for sample selection bias, none were found to be satisfactory. Further research is needed to remedy this drawback. In addition, while I identify individuals who employ different processing strategies, some way of accounting for this information at the level of the population would enhance valuation estimates.
Overall, the research presented in this dissertation provides some insights into how effectively markets for private goods are being used to provide environmental goods and services and how new, direct market mechanisms for providing these amenities can be employed in the field. Markets have the potential to provide an important complement to government programs to enhance ecosystem services if efficient mechanisms to reduce free-riding behavior can be developed. Further research is needed to advance our understanding of how public goods elicitation mechanisms can be successfully transferred from the lab to the field and, if not, how adjustments can be made.

APPENDIX A Elicitation Mechanisms Employed to Mitigate Free-Riding
This section provides a verbatim transcription of the public goods elicitation mechanisms administered in both the stated and revealed preference experiments. The other other mechanisms were worded as follows:

A.1 The Stated Preference Mechanism Descriptions
The Provision Point Mechanism with Proportional Rebate We will ask you to tell us whether you would buy a share of a farm wildlife contract for the proposed amount. Each farm wildlife contract is distinguished by the five characteristics described in the previous table.
If you agree to buy a share, you pay only if the total amount committed by all the Jamestown residents is enough to cover the total contract cost. Our guarantees: 1) If the total amount committed by all the Jamestown residents is not enough to cover the total contract cost, then we would not establish a farm wildlife contract and the farm business would be unable to change their management plans to protect grassland nesting birds like the bobolink.
You would pay nothing even if you had offered to buy a share.
2) If the total amount committed by all the Jamestown residents exceeds the total contract cost, then we would give a rebate to those who offered to pay. The rebate would be in proportion to the amount each person committed. Q& A on how this method works To illustrate an example, we use unrealistic numbers. Suppose we ask you if you are willing to buy a contract-share by paying at most $5. Q1. What happens if the amount commited by all the Jamestown residents is more than enough to cover the total cost of a farm wildlife contract? Answer: Suppose you offered to buy a share for $5 and enough other residents bought shares so that we collected 30% more than needed for the contract. Then we would give you a rebate of $1.50 from your $5 offer, and we would establish a farm wildlife contract. Q2. What happens if the amount committed by all the residents is not enough to cover the total contract cost? Answer: We would not enter into a wildlife management contract with the farm. You would pay nothing even if you had offered to buy a contractshare.

The Uniform Price Auction
We will ask you to tell us whether you would buy a share of a farm wildlife contract for the proposed amount. Each farm wildlife contract is distinguished by the five characteristics described in the previous table.
The amount you commit would be a cap (the maximum) on how much you would pay. The amount you actually pay (the "price") would be determined after we know how much everyone else offers. The price would be the lowest dollar amount that we can identify among the residents such that we collect enough funds to pay for the farm wildlife contract if everyone who agreed to pay at least that amount woudl pay the same price. We would then bill you this amount. However, if not enough residents offer to buy a share, we may not be able to identify such a price. In that case, we would not establish a wildlife management contract with the farm, the farm business would be unable to change their management plans to protect grassland nesting birds like the bobolink.
Our guarantee: If we cannot identify a "price" as described above, we would not establish a wildlife management contract. In that case, you would pay nothing even if you had committed to buy a share. Q&A on how this method works To illustrate an example, we use unrealistic numbers. Suppose we ask you if you are willing to buy a contract-share by paying at most $5. Q1. What happens if I commit $5 and the "price" (the lowest amount someone is willing to pay that would meet the total contract cost if everyone who buys would pay this amount) is $2? Answer: You get a rebate of $3 and we would bill you just $2. Everyone who agreed to buy a share for at least $2 would pay $2 and we would establish a farm wildlife contract. Q2. What happens if I commit $5 but the "price" could not be identified? Answer: If not enough residents offer to buy a share at a large enough cap, then a "price" may not be identified. In this case, we woudl not establish a farm wildlife contract. You would pay nothing even if you had offered to buy a contract-share.

The Pivotal Mechanism
We will ask you to tell us whether you would buy a share of a farm wildlife contract for the proposed amount. Each farm wildlife contract is distinguished by the five characteristics described in the previous table.
What you actually pay depends not only on your decision but on the total amount committed by all the other Jamestown residents.
If you offer to pay the proposed amount, and: • if your payment is required to meet the total contract cost after we have added up all other residents' offers, then we would bill you for the entire payment. We would establish a farm wildlife contract.
• if your payment is not required to meet the total cost, i.e., the total of offers from all other residents is enough to meet the total contract cost, then you pay nothing even if you had committed to buy a share. We would establish a farm wildlife contract.
Therefore, you pay only if your payment makes the difference between meeting the total cost and not meeting the total cost.
Our guarantee: If the total amount committed by all the Jamestown residents is less than the total contract cost, we would not enter into a wildlife management contract and the farm business would be unable to change their management plans to protect grassland nesting birds like the bobolink. You would pay nothing even if you had offered to buy a share. Q& A on how this method works To illustrate an example, we use unrealistic numbers. Suppose we ask you if you are willing to buy a contract-share by paying at most $5. Q1. What happens if the total of offers from all other residents fall short by an amount less than or equal to my payment ($5)? Answer: You would pay $5. We would enter into a farm wildlife contract. Q2. What happens if the the total of offers from all residents (including me) is not enough to meet the total contract cost? Answer: We would not enter into a wildlife management contract with the farm. You would pay nothing even if you had offered to buy a contractshare.

A.2 Market Experiment Mechanism Descriptions
The elicitation mechanisms administered in the market experiments are described below.

The Provision Point with Money Back Guarantee and Proportional Rebate
The following text was used in the market experiment to describe the mechanism: If the total of your group's offers is more than enough to cover the costs of the contract, we will pay the costs to implement the contract and refund any extra money offered. All extra funds received will be refunded to everyone in proportion to their share of the total offers we received. Making your highest possible offer increases your group's chance to succeed in implementing this contract. Remember that you will pay no more than the amount you offer, and it is possible that you would pay less.
The Uniform Price Auction The following text was used in the market experiment to describe the Uniform Price Auction: If the total of your group's offers is more than enough to cover the costs of the contract, then we will calculate a "group price" so that everyone who pays ends up paying the same price. We will try to find a group price that divides the contract cost evenly across the maximum number of people, while still collecting enough money. If the group price is higher than your offer, you pay nothing and receive a 100% refund. If the group price is lower than your offer, you pay only the group price and we will refund any excess money offered above that price. If too few people offer enough money, so that it is impossible to determine such a group price, the contract will not be implemented and you will pay nothing. Making your highest possible offer increases your group's chance to succeed in implementing this contract. Remember that you will pay no more than the amount you offer, and it is possible that you would pay less.
The Uniform Price Cap The following text was used in the market experiment to describe the Uniform Price Cap: We are asking for your money now, but we will use it only if necessary. That is: We are asking everyone in your community group to contribute to a dedicated fund to buy a farm wildlife contract for the 2008 Bobolink nesting season. On April 30, if the fund contains sufficient money, we will buy the farm-wildlife contract. We will return any leftover money as follows. We will look for the lowest contribution that we can set as a "contribution cap" and still buy the contract. If your contribution was above this cap, we will return to you the amount you contributed over the cap. If the fund does not contain enough to pay for the contract, then we will return all money collected and the hayfield will not be managed for Bobolinks this year. This approach is designed to bring many people to participate at the same time, which means costs to you and each Jamestown resident in your group will be kept low.

The Pivotal Mechanism
The following text was used in the market experiment to describe the Pivotal Mechanism: If the total of your group's offers is more than enough to cover the costs of the contract, we will implement the contract and determine your payment as follows: If the total of everyone else's offers -not including yoursis higher than the amount needed to implement your group's contract, then we really don't need your money. Because everyone else's offers are enough, we will implement the contract and you will pay nothing. If the total of everyone else's offers is not enough to implement the contract, then your decision could be critical. If your offer raises the total offers high enough so we can implement the contract, then we need your money and we will collect the portion of your offered amount to meet the contract cost. (If the total offers including your still falls short, then we cannot implement the contract and we will refund your money.) Because you pay only when your decision is critical, it is in your interest to offer the highest amount you feel the farm-wildlife contract is worth to you. If you value the contract more than your offer, and if your decision is critical, a lower offer may prevent us from implementing the contract, when your highest value would have implemented the contract. V njt = α n ASCBOT H njt + γ n ASCN O njt + σ n β A,n Acres njt + σ n β R,n Restore njt + σ n β V,n V iew njt + σ n β T,n T our njt − σ n cost njt + ε njt (B.1) where the choice probabilities are 2) The three models tested are listed below. R indicates that the parameter was estimated as a random parameter (β x,n = β x + η n ), while F indicates that the parameter was modeled as a fixed parameter. There are several diagnostic tasks that can be employed to test for convergence.
At the most basic level, the acceptance rate of the sampler can be monitored. We monitored and adjusted the acceptance rate to remain between 30-40% as suggested by Gelman et al. [142], who provide the rule of thumb when monitoring acceptance rates. Specifically, when K = 1, the optimal acceptance rate is about 0.