Computerized, Tailored, Theory-Based Interventions for Healthy Behavior Change: A Comprehensive Meta-Analysis

Personal behavior accounts for much of the risk associated with chronic ~isease, thereby providing incentive for development of interventions that offer effective prevention on a large scale. Computer tailored interventions have become increasingly common for facilitating behavior change for a number of health concerns associated with chronic disease . Systematic reviews of tailoring have been completed but a sufficient number of outcomes are now available to facilitate the quantitative analysis of overall effect sizes for this type of intervention . The present study employs meta-analytic techniques to assess the mean effect for tailored interventions focusing on fow-health behaviors : smoking cessation , increase in physical activity, eating a healthy diet, and receiving regular mammography screening . Clinically and statistically significant overall effect sizes were found across each of the fowbehaviors. Retailored interventions were found to have increased efficacy over tailored interventions based on one assessment only. The addition of counselor calls to the feedback produced greater effects initially, but these were not sustained over time when compared to retailored interventions. A nonsignificant trend was found for effect sizes decreasing over time, with the most significant drops after six months postintervention . Mean effects did not differ by recruitment strategy and differences by theory or study group could not be adequately assessed due to sample size. Gender was the only demographic predictor associated with effect size. This analysis quantifies the effect of tailored interventions, demonstrating the ability to reach large numbers of people with effective techniques that promise to reduce chronic disease burden if implemented consistently .

but a sufficient number of outcomes are now available to facilitate the quantitative analysis of overall effect sizes for this type of intervention . The present study employs meta-analytic techniques to assess the mean effect for tailored interventions focusing on fow-health behaviors : smoking cessation , increase in physical activity, eating a healthy diet, and receiving regular mammography screening . Clinically and statistically significant overall effect sizes were found across each of the fowbehaviors. Retailored interventions were found to have increased efficacy over tailored interventions based on one assessment only. The addition of counselor calls to the feedback produced greater effects initially, but these were not sustained over time when compared to retailored interventions. A nonsignificant trend was found for effect sizes decreasing over time, with the most significant drops after six months postintervention . Mean effects did not differ by recruitment strategy and differences by theory or study group could not be adequately assessed due to sample size. Gender was the only demographic predictor associated with effect size. This analysis quantifies the effect of tailored interventions, demonstrating the ability to reach large numbers of people with effective techniques that promise to reduce chronic disease burden if implemented consistently .

To my family and friends -
Your support cannot be quantified Ill             Information-based Internet sites can also be considered in the category of generic interventions . Pamphlets and websites may communicate a health message but do not match their message to characteristics of the prospective consumers . They often attempt to include as much information · as possible , aiming to provide something of interest to every reader. This leaves the consumer, however, to wade through the information and provides no guidance on what advice is most personally relevant.

LIST OF TABLES
Targeted interventions , such as a mass mailing to a population with diabetes , may increase message specificity , but cannot address variations among subgroups.
Differing patterns of needs may exist along various gender , ethnic , and social lines that such general messages cannot address. As Kreuter et al. (2000) point out, traditional public health campaigns largely follow a health publicity model , operating from the belief that knowledge leads to behavior change . Such a perspective led to the use of television to reach large numbers of people with factual health information.
Little evidence exists that even the most sophisticated anti-smoking commercials have an effect on cessation and prevention rates. Without a two-way feedback loop , such costly and visually appealing ads lack a degree of personal relevance . For example , an outcome study of four consecutive Dutch mass media campaigns to reduce dietary fat found no effect, possibly because this modality does not allow for assessment and individual feedback, a necessary component for change (Brug , Steenhuis, van Assema , Glanz, & De Vries , 1999). This data illustrates that complex behaviors require know ledge of cognitive and behavioral patterns on an individual level (Kreuter et al., 2000) . 3 The third category of health communications , tailored interventions , provides for individual assessment and feedback and is thus becoming an increasingly common method of facilitating health behavior change. According to Kreuter et al. (2000) "Tailored health promotion materials are any combination of information and behavior change strategies intended to reach one specific person, based on characteristics that are unique to that person, related to the outcome of interest , and derived from an individual assessment. " Given that tailored messages are composed of a combination of individual information, assessment , and feedback , infinite derivations from many theoretical perspectives are possible . Such diversity offers an oppo1tunity for creating unique and effective interventions that is both exciting and challenging for the field.
Each component of a tailored intervention from assessment to feedback must be planned and carefully considered .

Assessm ent
The possibility of tailoring exists when two conditions are met: when variation in the audience exists and when complex outcomes are possible . Matching audience variation with personally relevant messages requires assessment. According to Kreuter 's definition , tailoring should be "based on characteristics that are unique to that person , related to the outcome of interest , and derived from an individual assessment. " Psychology , more so than any other field, has adopted the study of individual characte1istics. From the questionable goals of Galton's eugenics to Binet's attempts at improving children 's education , to the present day study of personality by McCrae and Costa, the study of individual differences has comprised a major theme of research (Klie, 1997) . Systematic study of difference has accordingly required the 4 creation of assessment instruments, either conducted through self-rep01t or observation . Despite this proliferation of assessment , psychology historically has focused its lens mainly on the·study of personality differences , to the neglect of systematic and theory-based assessment of mental diagnoses and other psychological realities such as health behavior.
As previously stated , psychology has a strong assessment tradition , but one that has rarely been used to infmm interventions. Since assessment has been the realm of personality researchers and to a lesser extent, clinicians , psychology and related fields fall upon flawed clinical decision-making when choosing treatments. In a 1983 survey of psychologists, Norcross and Prochaska found that research findings exhibited a weak to moderate influence on practice and that outcome research ranked 10 th among other factors , such as supervisory influence, in affecting a psychologist's choice of treatments . Twenty years later, Kopta et al. (1999) argue that psychology has no empirical no1ms for how , when, and why patients progress. Treatment planning and even manualized treatment does not specify the most important treatment variables or intervene specifically upon them because , surprisingly and regrettably , we have struggled to define them. For example, the Hawaii Integrated Healthcare project planned manualized therapies and outcome measures, but pre-treatment assessment operated from clinical impression (Laygo et al., 2003). Traditional instruments such as the MMPI may predict that a patient may have more anger than the normative population , but a therapist still has no basis for ipsative comparison , nor does the instrument provide any advice on what variables require focus. Goldfried (1980) has called for the delineation of therapeutic change principles with the hope of defining a 5 set of empirically based principles to guide practice. Arising from reasons including shear difficulty , theoretical differences, and entrenchment in tradition , grounding intervention in assessment has been an elusive target.
Since the psychological tradition provides little guidance , state-of-the-science interventions require creative solutions to position psychological assessment as a foundation of behavioral health intervention. Llewelyn and Kennedy (2003), for example, describe a three-dimensional model of psychological interventions for health behavior : problem, assessment , and intervention . In this model , the ten most common health problems interact with assessment, which interacts with intervention services.
This occurs in the context of individual , family, provider, and socio-cultural factors .
Such a model appears simple , but breaks new and vital ground in the search for effective practices to reduce disease burden.

Assessment and Health Communication
Each level of tailoring necessitates a differing degree of assessment: (1) Generic tailored messages contain as much information as possible, allowing people to decide what to take from them. Such a modality requires minimal assessment , as little as asking if someone smokes or not; (2) Personalized communications simply use a person's name in a generic message , thus requiring little assessment ; (3) Targeted generic communications are based on "market segmentation " for a specific population. They continue , however , to assume homogeneity in the population .
Targeted interventions can entail some assessment, such as determining stage of change , from which a person could be sent a gene1ic change manual ; (4) Tailored communications are a "co mbination of strategies and information intended to reach 6 one specific person based on characteristics that are unique to that person , relat ed to the outcome of interest , and derived from an individual assessment " (Kreuter et al., 2000 , p. 277) . Obviously, the more assessment done, the more individual the feedback w ill become . Learning the o1y has dete1mined that feedback is essential for reinforci ng and coJTecting behavior . Petty and Elster (1981) propose this occurs through "elaboration likelihood " such that people proce ss informati on more actively if they find it pers onally rele vant. Elaborated messages are thought to lead to more change by eliminating irrele vant information , enabling a person to attend to the most salient points , which may then result in reconsideration of behaviors and , eventually , to change . Since each level of communication requires more assessment , the main question remains how to dete1mine the most salient variables upon which to intervene .

Theory-Based Intervention
Assessment of variables shown to produce change can guide treatm ent, but what forms can that interventi on take ? Can valid asse ssment of individual factors occur and can treatment be matched to each individual based on its findings ? Can the factors that create change in the process of indiv idual therap y be applied on a broader scale? These questions form the central core of apply ing the best techniques from individual change the o1y to the public health arena .
If assessment is to guide intervention , variables that effect change must be identified and assessed . Some promising veins of research have been developed to aid intervention planning . Various health behav ior change the ories such as the Transtheoret ical Model (Prochaska & Di Clemente , 1982), Health Belief Mod el (Rose nstoc k, 1966), and Themy of Plann ed Behavior (Ajzen , 1985) have attempted to determine variab les that underlie health-related behaviors. This vein of research assumes that , once discovered , intervening on these variables will lead to beha vior change . For example , the Health Belief Model (HBM) asse11s that susceptibility to illness , severity of an illness , and baiTiers and benefits of a suggested action influence whether a person will caiTy it out. A meta-analysis of the variables proposed by the Health Belief Model found that the baiTiers and benefits of a behavior are more predictive than susceptibility (Becker & Rosenstock, 1984). The model, however, has been criticized on two main grounds: that it focuses on rational thoughts to the exclusion of emotional factors and that it assumes people actively process health information (Ogden, 2000) . The Theory of Planned Behavior (TPB) added the concept of personal value to the rational conceptualizations o~ the HBM . This theory proposes that several beliefs influence behavioral intentions, defined as "plans of action in pursuit of behavioral goals" (Ajzen & Madden , 1986). Intentions ai·e formed from a person's attitude toward a behavior , social norms , and perceived control , also known as self-efficacy. The TPB has received criticism for neglecting to propose and research causality among its variables, but neve1theless has successfu lly been employed to inform intervention s (N01man & Conner , 1995).
The Tran stheoretical Model (TTM) began with examination of naturalistic change itself in the hope of creating change through specific interventions . Research proposed and eventually supp01ted the concept that people go through five stages of change from not thinking about change to maintaining change . These changes occur through the action of ten change processes that were derived from many model s of psycho logy. The model has also incorporated variables common to other theories such 8 as benefits , ban iers, and self-effic acy. The TTM has received criticism regarding whether or not change occurs in discret e stages and regarding its applicability to a variety of health behaviors (Ogden, 2000) . Each theory of behavior change represents a promi sing intervention strategy , but as Proch aska (1999) argued , an interventi on requires inclusion of the strongest predictive variables of not only change , but process and retention as well .

Populatio n-Based Methodology
Theoretically, interventions that can produce change among a large number of people will have broad health impact when measured in terms of cost and general health. Interventions that reduce relati ve risk of a developing a disease , such as smok ing cessation for reducing rates of lung cancer , can ce1tainly help improve health outcomes , but does not specify how common a risk factor is in the general populati on .
Population attributab le risk measures the prop ortion of excess disease attributed to a risk exposure , whether unhealth y eating , lack of exerc ise, or smoking (Rychetni k, Frommer, Hawe , & Shiell, 2002) . This statistic thus shows the potential for a preventi on program to increase life expectanc y, quality of life, cost, etc, if exposure to the risk factor is reduced or eliminated . Attributable risk refers to the effects of disease on a population, whereas the te1m impa ct can be used to refer to the effects of an intervention in reducing disease in a population . Impact becomes a vital considerati on when public health is concerned . The idea of impact can be exemplified in the following equation : Impact = Effect Size x Reach (Prochaska & V elicer , 2004 ;Glasgow et al., 2006) . This equation suggests that if an interve ntion has a large effect size and is extremely effective in helping people exercise , but can only reach two people per year, it will have little impact on overall health. lf, on the other hand, an intervention is moderately effective and can reach 10,000 peop le it will have great impact on decreasing overall healthcare costs and in improving health in a population .
Individuals may change behaviors only slightly, but small changes are magnifi ed when considered socially. For example,  state that reducing two unhealthy behaviors reduces healthcare cost by $2,000 per year. Cost in itself is not a value in considerations of health , but can be a predictor of increased quality of life for individuals since decreased cost could indicate less treatment seeking. The U.S.
Department of Health recently rep01ted that healthcare costs rose at their fastest rate in 15 years and will consume 20% of all spending by 2015 , a large portion of which will be subsidized by the government , that is, the population as a who le (Poisal et al., 2007). Intervening on a problem with low relative risk, but high prevalence can have broad population impact and establishes the logical ground for population-based interventions.
According to Peters and Bister (2002), population based medicine involves defining a population , identifyin g needs , delivering services , assessing impacts , and providing feedback. Population-based intervention accounts for the fact that since health problems are situated socially, interventions that concentrate on groups are necessary to impact individuals (Jeffrey, 1989). Population intervention also allows directing limited resources , meeting preventive guide lines, and mitigating economic disparit y in healthcare availabi lity. This perspective enhances a biomedical model directed at cming specific diseases . Preventive interventions reduce the need for care, inappropri ate demand for care, inappropriate use, and poor delivery of care (Peters & Elster , 2002). Integrating population-based medicine and insights from individual treatment may create unprecedented impact on a society's health .
As an example of an application of population-based medicine , the American Medical Association (AMA) has recently realized the need to move toward population prevention and risk management. They have produced a document entitled "A Primer on Population-Based Medicine" (Peters & Elster , 2002) in which the authors delineate a possible means for clinicians and medical managers to integrate population assessment and intervention into practice. In their model , preventive service delivery would entail five steps to be employed in a clinic or HMO setting: This model serves as a possibly groundbreaking population intervention strategy for integrating prevention into p1imary care. It follows the recommended, but neglected , practice of using assessment as a tool to give reliable feedback for intervention. Behavior change theory and intervention techniques would fit perfectly into this model. Health promotion means giving people control over their own health .
The idea now requires systems to implement it.

Population Methodology in Practice
The public health , insurance, and medical industries have attempted comm unity-based and other interventions to promote behavior change, but with little to no success . These education-based interventions are based upon generic or, at best , targeted methodologies and therefore produce little behavior change, even though most people believe behaviors often intervened upon , such as smoking , are harmful (Weinstein, 1984).
Public health messages have difficulty affecting an individual ' s decisionmaking since they do not convey personal urgency (Jeffrey, 1989). Successful interventions need to apply evidence-based behavior change strategies while providing elaboration and engaging the feedback loop.
The move toward empirically supported treatments in individual therapy and medicine is occurring for public health interventions as well. Evidence-based public health involves development and implementation of effective programs aimed at improving the health of a population at high risk. Heller and Page (2002) advocate reconceptualizing evidence-based medicine as "evidence for population health ." Such a conceptualization means that interventions should follow requirements including ease of administration, low respondent load, reliability and validity (V elicer et al., 2000). The area of tailored communications meets the need for cost-effective , efficacious, and practical population interventions that hold patticipants' attention through individualization of messages . Many authors recommend tailoring as a possibility for population intervention . Glasgow et al. (2002) believe intervent ions need to be tailored to personal va1·iables and organizational climate, overall making it more likely that results will replicate and generalize. Tailoring promises to meet the need for population interventions based on reliable evidence and assessment.  (Velicer, Prochaska , Fava , Laforge , & Rossi , 1999 ) . In addition , Prochaska and colleagues  found that counselors did not ad d to computer-based tailored intervent ions.

Reach and Implementation
14 If tailored communications can be made practical , they can reach many people and cost effectively produce behavior change. Brug et al. (2005) Prochaska , DiClemente , Velicer , & Rossi , 1993;Strecher , Kreuter , Den Boer , & Kobrin , 1994), reducing fat intake (Brug , van Assema , & de Vries , 1996;Campbell , DeVellis , Strecher , & Ammerman , 1994), increasing physical activity (Bull, Kreuter , & Scharff , 1999;Kreuter & Strecher , 1996;Marcus et al., 1998a), and getting mammograms (Skinner , Strecher , & Hospers , 1994). A variety of approaches , however , fall under the title of tailored interventions . To highlight a few distinctions , tailored interventions differ in te1ms of the theory upon which they are based , feedback modality , amount of assessment , variables intervened upon , type of delivery channel , and dose of intervention . This project aims to research these divergent methods of tailoring to provide greater insight into an optimal formula that would increase intervention effectiveness . Helping to dete1mine a combination of variables ( e.g . amount of tailoring, format , and theoretical constructs) that produce optimal tailoring will enhance the efficiency and impact of health behavior interventions.

CHAPTER 2: META -ANALYSIS
Given the criticisms of null hypothesis testing and the increasing amount of often-discrepant research in most fields, analysts are increasingly relying on metaanalysis to provide clearer bases for inference . Meta-analysis , a te1m first coined in 1976 by Glass (1976) , describes a synthesis tool that pools data from many different studies asking similar questions. Not only does it bring results of different studies together , but techniques associated with it can be used to advance theory. Metaanalysis allows more precise estimates of treatment effects, helps to explain heterogeneity among studies , aids in resolving conflicting results , and can be used to establish grounds for research-based policies.

Benefits
Meta-analysis allows a precise estimate of treatment effects since it uses a continuous measure of outcome rather than the dichotomous 's ignificant' or 'nonsignificant' declarations traditionally used to repmt results. Low powered studies , far too common in psychology , yield false nonsignificant results at unacceptably high Type II e1rnr rates, leading to many laments about lack of progress in the field (Schmidt , 1996). Even Pearson as far back as 1904 predicted , "Many of the groups are far too small to allow of any definite opinion being formed at all, having regard to the size of the probable eITor involved" (Pearson , 1904, as cited in Egger, Smith, Schneider , & Minder, 1997. Given the rutifact of significance and sample size, differences in statistical power create conflicting results among studies . Studies with high power and with low power may share the same effect size but show a different significance testing result. Thus, nrurntive reviews that simply count significant results mislead the field from the true results of a study. This can also lead to unnecessary inves tigations into moderato r variables to explain why ce1tain studies were and were not significant (Schmidt , 1996). Meta-an alysis can be used to resolve such conflicting results . When comparing many studies , it allows standard objective measure of outcome instead of nanative descriptions such as "some evidence ." The technique can also control for study-level sampling and measmement eITor. Under certain conditions , meta-analysis as a technique can increase the statistical power of finding significant overall effect size by reducing standard eITor (Cohn & Becker , 2003).
Examination of confidence intervals from a series of studies often reveals if they estimate the same population parameter (Schmidt, 1996).
Meta-analysis helps to explain the heterogeneit y found among studies . The differences among effect sizes of studies may follow some pattern , such as by gender or ethnic group. Using moderators in the analysis enables disentangling of meth od, substance , and en-or (Marsh, Johnson , & Carey , 2001) . A researcher can use these moderators as a priori hypotheses going into the analysis . Although not often used , Shadish ( 1996) argues meta-analysis permits identification of mediators such as how peer pressure mediates effects of treatment on alcohol behavior. The technique can also compare the methodol og ical quality of studies to determine if certain conditions had effects on the outcomes (Lipsey & Wilson , 2001) .

Role in Theory-Testing
Meta-analysis can have positive effects on a field in general. Psych ology especially, and other fields as well , are awash with conflicting evidence from noncomparabl e studies . This is a troubling revelation since rese arch forms the bas is for 18 policy and action . Practical applications desperately need consensus from the research domain to proceed. Meta-analysis can help bring resolution to uncertainty and suggest policy imperatives . Inherently , the procedure is a statistically and interpersonally less biased review method than systematic review (Egger et al., 1997), one drawback of the Cochrane review system Bias is controlled in a manner not feasible in a systematic review if the analysis employs systematic implementation without reference to study title or authorship. Statistically , the procedme controls for artifacts of sample size , design, and en-or. When a meta-analysis is compiled for a particular topic, more studies can be added as they arise, leading to a cumulative meta-analysis database . This process can identify when an effect first showed up or when something changes an effect. If an analysis compiles results and finds iatrogenic or small effects , it can prevent waste on continued studies . The act of compiling studies also shows gaps and weaknesses in the literature . Meta-analysis permits a solid overview of a research field and can move a field toward the "big pictme ."

Procedures
The Literature Search In the preparat ory phase , the most impo1tant consideration is that of forming a solid research question . It should be specific enough to find rele vant research , but broad enough to be useful in answering the question at hand . An initial delve into the li_ terature can help specify an accurate and realistic question. Studies should be conceptually similar to ensure the validity of conclusions . Once the question is established , the analyst begins locating and retrieving data/papers . This process too requires explicit criteria. Since research often begins in electro nic databases , keywo rds 19 should be documented and modified as the researcher progre sses and gains familiarit y with the literature content. Distinguishing features found in abstracts can also be used .
Searching for a pru.ticular demographic (women , minorities) or reseru.·ch designs (RCTs , quasi, etc) may help limit seru.·ches. The analyst also considers the relevance of cultural/linguistic range and time frame of the studies . Publication type can also influence inclusion criteria since many meta-analyses only employ peer reviewed journal articles . The stricter the criteria , the more credible studies will be included , but this results in smaller , N ' s, loss of data , limited generalizability , and inflated effect sizes (Egger et al., 1997) . Once the analyst specifies criteria to naITow searching , he begins locating studies . Seru. ·ching usually proceeds first in numerous electronic databases . These are often ru.·ea-specific such as Medline for medicine and Psychlit for psychology . The analyst must use multiple databases since a1ticles are listed in some but not in others . Also , searching should proceed at multiple institutions since libraries purchase different levels of database detail . In addition to primru.·y database searches , reviews , references , journals , conferences , authors , and government agencies can provide references . When retrieving studies every effort must be made to limit bias .
For example , bias can enter a stud y if dissertations ru.·e left out systematically . Library loans , reference librru. ·ians, government agencies , AP A, and professional organi zations must be utilized to find a represent ative sample of studies. Letters to prominent authors in a field and reseru. ·ch org aniz ations should be used to locate studie s. Such efforts can limit the public ation bias for significant result s with lru. ·ger effects (Begg, 1994 ;Lipsey & Wilson, I 993 ;Stern & Sime s, 1997). Given these neces sities the analyst needs to schedule sufficient time for data collection.

Id entifying Studies
Since the researcher will want to specify initially the broadest criteria possible , the searches at first will identify a large number of studies . Being too specific with a electronic search may eIToneously limit the sample of studies identified . Unexpected titles and phrasings may become apparent only when a broad sample of studies is examined by the researcher. This entails a great deal of time and concentrated work , but will decrease bias in retrieval. To aid in this eff01t, criteria need to be specified to deteITnine which studies to obtain in full-text format. When examining titles and abstracts , the analyst will consider : (1) Is the study relevant to the research question?
(2) Does the study include the variables of interest? (3) Does the study employ the selected methods (i.e. RCT, pre/post design, case control , etc)? (4) Does the study fall within the selected timeframe for the analysis? Studies that meet these criteria should then be downloaded or requested in full-text form for further review.

Data Extraction and Variable Coding
Studies then require coding into a database for analysis. Software options should be considered from the strut as incompatibility may arise. Programs exist for meta-analysis , such as Comprehensive Meta-Analysis and Easy MA, and each have pros and cons in terms of data modeling, data entry, display, and analyses offered.
Separate programs can be used such as databases for entry and statistics programs for analysis . Commonly used programs , such as Microsoft Access and Excel , can also prove to be flexible programs for entry and analysis, along with Reference Manager for the study database. Data can then be imp01te d into standai·d analysis packages such 2 1 as SPSS or SAS for which meta-analysis macros have been written that result in accurate parameter estimates.
Coding should proceed according to a coding manual of variables that is also open to changes as analysis proceeds . The coding itself should be done by two independent coders who have training in the specific content of the literatme, in procedures common in the ·content area, and in meta-analysis techniques . To control for bias, they should be blinded to names of authors and journals . Quality should be reviewed periodically and any questions documented . Inter-and intra-coder consistency should be measured . After a time, a subsamp le can be drawn and recoded , comparing them with percent agreement or by using inter-rater reliability statistics .
Since reporting of some variables is often poor, coders can give a confidence rating on the most impo1tant variables .
Choice of variable s to code is an important decision since it determines what analyses can be done later, especially in terms of moderator analyses . Overall study descriptors and effect sizes need to be coded. Descriptors include date , fo1m of publication , authorship, population, methods and procedures, variables specific to a field, and methodological soundness .

Effect Size Entry
Effect sizes (ES) can be dete1mined directly or estimated from information .
Statistical information required includes: timepoint , sample effect size, subsample effect sizes, means, standard deviations , sample sizes , con-elations, and significance levels (Rosenthal , 1995). An effect size quantifie s, in either direction , the magnitude of a relationship. As such , it estimate s the effect of an independent variable on a 22 dependent variable . Overall , the same statistic must be used across comparisons. If they cannot use the same statistic , then separate analyses must be done . Also , effects sizes must be independent so as not to affect statistical tests . Three type of effect sizes exist: mean difference , association , and multivariate .
Mean difference effect sizes are reported as Cohen ' s d, Hedges g or Glass ' s /1.
All are mean differences divided by standard deviation . Mean differences are either one-variable or two-variable relationships . One-variable relationships include mean, median, mode , or propo1tions . An example would be comparing scores on two measures of the same construct (Lipsey & Wilson , 2001

Preliminary Analysis Issues
Data preparation procedures proceed in ways similar to traditional analysesby examining the distributions of data. Analysis of the mean effect, range of effect sizes, sample sizes, outliers and missing data all must be attended to before analysis ensues. Histogram and stem and leaf displays are excellent for showing central tendency , variability , and normality and diagnosing skewness and outliers .
Adjustments to effect sizes often must be made at the level of the study. The researcher must weigh the pros and cons of these procedures because corrections of some biases can increase others. Analyses can be reported with and without adjustments and compared . For example, measurement error correction increases sampling error . Most often the analysis uses c01Tection for attenuation due to unreliability, which occurs when sample effect sizes have a smaller range than the population . Additionally, biases specific to meta-analysis must be dealt with . These are publication bias, sample size bias, and artifact , or measurement , biases.

Sample Size Bias
One main strength of meta-analysis is the ability to achieve higher power to detect population differences from studies with small sample sizes . Inevitably in a meta-analysis, studies of various sample sizes will be included . Simply taking the mean effect size from these studies does not account for the differing en or vatiances associated with sample size. Lai·ger samples theoretically result in more accmate estimates of the population mean and thus should receive greater weight in the pooling 24 of effect sizes. Before pooling estimates , each effect size is weighted by multiplying the effect by the inverse of its variance, which helps c01Tect for e1rnr variance associated with samp le size. Calculating a mean effect size always involves weighting individual effect sizes by their reliability.

Missing Data
Missing data can bias meta-analyses as in any other study . If data are missing nonrandomly, it usually suggests systematic bias . In the case of meta-analysis, bias enters when studies with small effect sizes are included less often than studies with larger effect sizes . If a study reports results as nonsignificant without providing statistical specifics, the effect can be included as zero . This is a conservative procedure , however , and may nullify the aim of including underpowered studies .

Measurement Bias
Effect sizes are often dependent on outcomes measured by various testing instruments. The greater the unreliability of the measurement instrument, the more the effect size will be underestimated. Mean difference effect sizes are weighted by the squared inverse of their standard elTor (SD of the samp ling distribution) , their "inverse variance weight. " Odds ratios are colTected by taking the natural log and conelations with Fisher's Z. Hunter and Schmidt (1990)  alpha . This then depends on whether the reliability of the instrument is rep01ted , which often is not done in outcome papers , thus requiring the researcher to obtain instrument development studies .

Publication Bias
Related to the problem of missing data is that of publication bias . The theory of meta -analysis assumes that a representative or even comprehensive sample of studies has been included that show both significant and nonsignificant results. It has been shown that studies with nonsignificant findings are often not published, whereas a tendency exists to publish results of small sample size studies that result in large effects (Lipsey & Wilson , 1993) . This bias of publication has been te1med the 'file drawer ' problem , refen-ing to the fact that results of many studies remain unknown due to difficulty of publishing nonsignificant findings .
Various methods have been developed to assess for publication bias . The frrst method of studying publication bias is to plot effect sizes by their standard en-ors, forming what is known as a funnel plot. Studies with smaller en-or variances will cluster near the top of the plot , and studies with larger variance will fall out near the bottom , dispersing to the tight and left of the mean . If more studie s fall near the bott om and to the right of the mean, one can assume that a bias exists for publishing these small N studies with large effects. For instance , funnel plot s can be difficult to interpret and asymmetry found in the funnel plots may be due to the presence of 26 heterogeneity of the studies rather than publication bias (Egger et al., 1997 ;Sterne , Gavaghan , & Egger, 2000) or may be due to both heterogeneity and publication bias (Pham et al., 2001) . This is merely a visua l anal ysis tool and others employ statistical techniques . Duval and Tweedie (2000) devised a technique for imputing values assumed to be missing in the funnel plot and allows calculati on of a mean effect adjusted for publication bias .
Egger's linear regression method quantifies the bias captured by the funnel plot. In the Egger test (Egger et al., 1997), the standardized effect ( effect size divided by standard error) is regressed on precision (inverse of standard error). Small studies generally have a precision close to zero , due to their high standard error . In the absence of bias one would expect to see such studies associated with small standardized effects and large studies associated with large standardized effe cts . This would create a regression line whose intercept approached the origin. If the intercept deviates from this expectati on, publicati on bias may be the cause. This would occur , for instance, when small studies are disproporti onately associated with larger effect sizes.
Another method of assessing publication bias is the "fail safe N" (Rosenthal, 1979), which estimates the number of nonsignificant studies needed to reduce the overall ES to nonsignficance . This may be to o conservative a procedure , howe ver, since missing studies would rarely have an effect size of zero. O1win's (1983) method employs the same idea , but calculates the numb er of studies with a specific effect size (not necessarily Oas in Rosenthal ' s method) needed to reduce the overa ll effect to whatever value the researcher designates as clinical nonsign ificance and thus will result in lower values than Rosent hal's method .

Independence
When proceeding with a meta-analysis, the researcher must be ce1tain that effect sizes are independent of each other. Many studies rep01t outcomes using more than one measurement instrument. For example, dietary fat can be measured by self rep01t, by calculation from dietary recall , by percent cal01ies from fat, or percent of people reporting attainment of the Action or Maintenance stages of the TIM .
Commonality across studies , reliability , and validity must be considered when choosing the one measure or they may be averaged. Additionally , various measmes can be compared across studies to determine if one may under-or over-estimate the effect.

Outliers
One weakness commonly associa ted with employing a mean as an outcome is that an inordinately large or small effect size can skew the result . Since such outliers can arise from mis-coded data or the occasiona l odd finding they should be examined and the coding checked to insure accuracy of data. The analyst can keep them and move them to the closest cluster (Lipsey & Wilson, 2001) or a employ sampleadjust ed meta-analytic deviancy statistic (Huffcutt & Arthur , 1995). Unfo1tunately this uses rather subjective scree plots , involves numerous computations , and is likely to remove small con-elations.

Modeling Variance
Once the data has been cleaned and effect sizes appropriately adjusted , analysis can proceed . As in any other statistica l procedure, employing a mean with a large variance does not provide a precise representation of the population value. In meta-analysis , then , the variance among effect sizes Cai.Ties prime impo1tance and is known as homogeneity testing .

Homogeneity of Effect Size
Just as in any other statistical procedure, the meth od of modeling the variance affects procedures , assumptions, drawbacks , and conclusions. In meta-analysis, the heterogeneity among studies is the vai·iance in question . Before pooling estimates we need to see if they can reasonably be described as shru.·ing a common effect size. In other words , we perfo1m a 'null hypothesis test' on the assumption that e1rnr is due to sampling e1Tor or systematic vai·iance . Meta-analyses employs Hedges Q ( a Chi-squai·e with df = k-1) for this test. A significant result suggests heterogeneity and a presence of moderators . The hom ogeneity analysis is calculated using the equation : Q = (Lroi ES/) -[(Lroi ESi)2 / Iroi], where roi = [2(n, n2) (n1 + n2 -2)] / (n1 + n2 ) [t 2 + 2 (n1 + n2 -2)]. Unfo1tunately this test has low power when assumptions of normality are not met and when vai·iances ai·e not equal . It fails to reject the null even with large differences , yielding false models and false pooling of vai·iance estimates (Harwell , 1997). Harwell (1997) found that it does, however , work well when study samp le sizes are propo1tion ally greater than the number of studies included (k) .
The variance among studies can be modeled in three ways : fixed effects , random effect s, or a combination of both , refeITed to as a mixed model. The results of the Q test have been used to suggest which model to employ, but some researcher~ disagree with this detennination . Rosenthal ( 1995) suggests that contrasts shou ld be planned and done independent of the heterogeneity test. He states, "A significant X 2 for heterogeneity 'morally' obligates one to search for moderators , but a nonsignificant X 2 does not preclude the search ." Lipsey and Wilson (2001) suggest a significant Q test is enough to determine model used . Hedges & Vevea (1998) suggest that choice of model depends primarily upon the nature of the inference desired .
Heterogeneity is not the sole criterion for choosing a model. Fixed and random effect models have different inherent assumptions and techniques that affect the inference drawn from them.

Fixed Effects Modeling
Fixed effects modeling treats variability between studies as random error resulting from subject-level sampling error . Hedges & Vevea (1998) call the fixed effect model the "conditionally random effects" model because it allows inferences conditional upon only the sample of effect sizes at hand . It assumes the effect sizes are a complete sample and creates a mean effect size without statistical modeling. The fixed effects model has high type I error rates (up to .50) because it underestimates variances (Cohn & Becker , 2003 ;Ove1ton, 1998) and is not conservative.

Rand om Eff ects Modeling
The random effects model treats variabilit y between studies as samp ling error plus a randomly distributed other sow-ce of variability ("s tudy -level" eITor). It assumes the effect sizes at hand are randomly drawn from a populati on of studies and thus estimates a populati on mean effect size from a samp ling distributi on. It thus allows 'unconditional ' inferences beyond the observed studies. A difference between fixed and random methods is seen only when studies are very heterogeneous . This procedure may overestimate varianc es, leading to more conservative estimates with wider confidence intervals than with fixed effects (Ove1ton , 1998).

Mixed Effect Models
The AN OVA analog groups effect sizes of descriptive variab les such as gender into 'between' and 'within ' categories and tests homogeneity using Chi-square within and between groups. If significant residuals result from these tests , an additional component random effect can be assumed to exist , resulting in a mixed effect model.
The mixed model has lower Type I en-or than a fixed model, but less power for detecting moderators. A sensitivity analysis can be done to compare the fixed and random models.

Meta-Regression
With continuous variables a weighted multiple regression can also be done to explain heterogeneity . This procedure has high type I en-or rates for detecting moderators when a large amount of heterogeneity is present. When employing regression techniques , c01Telations shou ld be examined between descriptive variables to assess for colinearity. Macros have been written in Stata, SPSS , and SAS that permit regression with c01i-ections for standard en-ors unique to meta-analysis. These programs output an overall fit statistic , QR for the regression and a QE for the residual en-or, which are distributed as a chi-square (Lipsey & Wilson , 2001) . They also output an overa ll R 2 for the model allowing examination of variance accounted for and change in R 2 when adding addit ional predictors .

Choice of Model
Choice of model then is a statistical and theoretical decision. Hunter and Schmidt (2000) , for example, conclude that random effects modeling should usually be used because it allows generalization , while Lipsey and Wilson (2001) suggest the model is difficult to estimate . Meta-analysis is thus not immune to statistical problems .
Just as in other statistical procedures , meta-analysis requires decisions that affect conclusions drawn from the analyses .
With these characteristics in mind, the analyst will still probably proceed as Lipsey and Wilson (2001) suggest , by doing the Q test , and if significant, ( 1) assuming random effects , (2) assuming excess variance is not random , accepting a fixed-effect model with post hoc tests , or (3) assuming a mixed effects model such that that enor beyond subject level e1Tor is both systematic and random.
The overall effect size significance test depends on choice of model. The fixed effect model employs the Stouffer method where all Z's are added and divided by k or the lower confidence limit method (L. V. Hedges, Cooper , & Bushman, 1992) . This method usually agrees with Stouffer but has higher type I e1rnr. Random effect models use a one sample t-test on the mean effect size , but as discussed, are more conservative than fixed effects procedures with higher type II etTor.

Effect Size Interpretation
In interpreting the meaning of an effect , use of standardi zed effect sizes facilit ates analysis with commonl y underst ood indice s. For continuous outcome s using Hedges g as the effect size measure , g is directl y comparabl e to a Z-score and interpreted as a n01mal distributi on with a mean of 0 and a stand ard deviati on of 1. An effect of g = .30, for example, indicates that the intervention group is 1/3 of a standard deviation above the control group and exceeds the scores of 62% of the control group.
A n01mal distribution is assumed, however , in this example as well as outcomes presented with the same measure . Interpretation of a standardized effect involves calculating the mean and pooling the standard deviation of the control groups of the included studies . This provides a baseline from which to compare the effect size of the intervention.
Interpretation of dichotomous outcomes using the odds ratio is more common in the literature and therefore more readily understood. The odds ratio measures the relative effect of the treatment group versus the control group . Thus an OR of 1.30 represents a 30% greater effect over the control group . Again , knowledge of the control group mean is necessary for translation to the original metric .
These are general guidelines and all effect sizes should be interpreted in light of the content area. For example, public health interventions account for .05%, 1.0% and 1.5% of variance for small, medium , and large effect sizes respectively (Rossi , 2003).
The mean effect size, however, can be misleading without an examination of amount and sources of variation in the effect sizes contributing to those means . The analyst must explore moderators and sample size before being confident in the estimate.
Results can be translated then into other metrics such as original measurement metric by determining its mean and deviation, and into a Binomial Effect Size Display. The BESD shows coll'elations of effect sizes in terms of overlapping distributions. Another comparison is the criterion contrast , a comparison of the effect size with a known difference of practical significance. For example, the effect size could be compared to a 5% difference in smoking cessation rates usually considered to be clinically significant.

Power
One benefit of meta-analysis is the ability to estimate a population parameter estimate from under-powered studies . Power for each study can be calculated to quantify the number of studies that have clinically significant effect sizes compared to the population mean, yet would be considered nonsignificant due to low sample size.
Power for detecting significance of the overall mean effect size has not been considered an important issue in meta-analysis since the technique is less interested in statistical tests than in obtaining population estimates . Power calculations can be done using Cohen's tables with the obtained effect size as the estimate. Of greater imp01tance for meta-analysis is determining the power of the Q test for heterogeneity since this can indicate the presence of moderators and choice of statistical model. It has been suggested that when sample size is below 10, the Q test has limited power to detect heterogeneity (Lipsey & Wilson, 2001).

Confidence Intervals
Confidence intervals are useful in illustrating the precision of individual effect sizes and the overall parameter estimate. Displaying intervals for each effect size making up the mean enables quick examination of the point estimates, en-or variance, and statistical significance of each study. For the parameter estimate, their width relates to amount of data, level of confidence chosen, and the model employed. Fixed effect Cl's may tend to be smaller than those from a random effects model.

Limitat ions of Meta -Analysis
Despite the many applications and broad conclusions that can be drawn from meta-analysis , the procedure has various drawbacks. While acceptable , meta-analyses are c01Tect regarding direction of effect about 80% of the time (Naylor , 1997) . First, the procedure is relatively new and lacks refined techniques. Techniques basic to statistics such as ANOV A and multiple regression cannot be run with common software packages without advanced knowledge . Statistically , meta-analysis places emphasis on the variances from individual studies . Variance challenges the assumption that the studies really do measure the same construct , and also affects the Q test (Harwell , 1997). Lipsey and Wilson (2001) assert that analysts need to determine the source of this variance by using analyses of methodology . Rosenthal's "coefficient of robustness" (Rosenthal, 1995) can be used to weight means by their variability . Difficulty in using common statistical programs with meta-analysis limits use of multivariate techniques and possibly more accurate , specific conclusions . In addition meta-analyses may not have the sample sizes required to perfmm multivariate analyses .
The process of meta-analysis can also bias results . The old computer adage "garbage in -garbage out" applies to meta-analysis as well. The effect size estimate is only as good as the studies that compose it. If studies use a limited sample , generalizability will be limited. For example, a dispropmtionate number of psychotherapy studies are done using Cognitive-Behavioral Therapy techniques , biasing results of meta-analyses to CBT over other forms of therapy . Poor design techniques and lack of control will result in effect sizes that fail to validly sample reality . Meta-analysts recommend investigating if results differ according to study methodological quality . Multiple regression models can be used in this determinati on in which methodologica l features predict effect size, with the beta weights indicating the influence of each factor . Confounding of substantive and methodological features also occurs . If a difference appears in two groups that are also measured differently , we cannot determine the source of the discrepancy (Kazdin & Weisz , 1998) .
Bias inevitably enters a meta-analysis from publication bias as well . Metaanalysts may not be able to locate a ce1tain kind of study or fail to search properly .
Even with a good search, the field has documented the publication bias problem such that significant studies are more often published than non-significant studies . This results in upward bias of the mean effect size. Since a systematic effo1t will locate published and non-published studies, the analyst can compare effect sizes for published and unpublished studies . The fail safe N, regression methods , funnel plots and imputationa l ttim and fill techniques provide multiple methods of estimating and cotTecting the effect size for publicati on bias . A different fo1m of publication bias can also enter when various authors use data from the same study , resulting in multiple inclusion of the same effect size (Nay lor, 1997).

Meta -analysis in Sum
The research community has created a problem by relying on significa nce testing without question , resulting in an overemphasis on replication . Too much 36 information exists without the ability to gain knowledge from it. Meta-analysis has been developed to answer this cwTent crisis in research . By compiling many similar studies of imp01tant questions , meta-analysis allows treatment effect estimates in tenns of both direction and magnitude. Despite its drawbacks , which for the most part can be mitigated , meta-analysis is becoming the procedw-e of choice for compiling results and for informing policy . As it becomes the accepted standard, it may help focus research on issues necessary to move research forward: power, sample size, effect size, and confidence intervals. It will help to solve past controversies, and as databases grow, suggest moderators that may better inform future interventions. Metaanalysis provides hope for moving past the information age into an age of cumulative , constructive know ledge.

Meta-analysis and Tailored Interventions
Previous reviews of 'first generation" tailoring studies (Brug, Campbell, & van Assema, 1999;Skinner et al., 1999;Strecher, 1999)  In previous reviews meta-analytic methods could not be used because targeted behaviors, tailoring methods , and populations differed widely among studies (Skinner et al., 1999). Since then the number of tailored interventions has increased dramatically facilitating the use of meta-analytic methods . The present study will also broaden its scope beyond previous reviews that concentrated only on smoking (Strecher, 1999) or nutrition  to include a full range of tailored interventi ons. Including studies focusing on smoking, nutri tion, physical activi ty, mammography , sexua l behavior, and alcohol use will increase the samp le size of the data and permit comparisons on key variables common across studies. Using moderators in the analysis enables disentangling of method , substan ce, and effor (Marsh et al., 2001). Multiple regression models will be used to detennine if methodological features predict effect size , beta weights indicating influence of each factor. The technique can also compare the methodological quality of studies to determine if various conditions affected the outcomes (Lipsey & Wilson, 2001).

Research Hypotheses and Predictions
This research will investigate the following predictions: Within the same behaviors (i.e. smoking cessation, dietary fat reduction) tailored interventions will outperfmm non or minimally tailored interventions.
-Effect size estimates will increase with outcome assessment timepoints Kristal et al., 2000;).
-Proactive recruitment strategies will result in a small percentage of pruticipants reaching behavioral criteria but will reach a larger percentage of people than reactive methods ).
-The oretical orientation employed will not influence outcome , since the main Study group /site will not influence effect size estimates (Noar & Zimmerman , 2005) ..
-Moderat ors such as ethnic background , stage of change , amount of smoking (light/heavy) , decisional balance , etc . will affect treatment outc omes.

Literature Searches
The computerized databases Psychlnfo and Medline will be searched for relevant studies during the spring and summer of 2006 . Additionally , reference lists from published studies and personal communications with authors will be used to locate studies . The effort to broaden the search beyond published studies helps to limit publication bias for significant results showing large effects . Datasets will include published articles , conference presentations , and papers in progress.

Inclusion Criteria
Databases will be searched starting in 1988 (the year of the first tailored feedback study) . Studies must have employed a tailored intervention , have included a comparison group , and must have given paiticipants feedback reports, whether printed or computer-based. An intervention will be considered "ta ilored" if it provides individual-based feedback on at least one assessed vai·iable. Studies will need to contain information regai·ding sample size, outcome variables , means and standard deviations for treatment effects and/or test statistics .

Coding
In the cunent meta-analysis , each behavior in a multiple behavior study will be looked at sepai·ately. Studies wi ll be coded according to the coding scheme as outlined in Appendix A. Given the nature of the project , the author will read and code all studies . To enhance accuracy , coding will be re-examined after a delay of a few months prior to data analysis . Studies will be entered using the Comprehensive Meta-Analysis software package.

Effect Size Calculation
For outcomes measured in continuous format such as minutes of physical activity per week or servings of fruit per day, Hedges g will be used to calculate effect size. This method has received the most suppo1t for its accuracy in detennining effect sizes. Hedges g is a derivation of the mean difference (d) effect size. Cohen 's dis simply the mean difference divided by the pooled standard deviation of the two groups defined by the equation: Cohen's d will not be used in the present study because it does not account for sample size, nor unequal sample size between groups , causing d to be biased in the direction of the larger standard deviation (and the less reliable effect). The mean difference divides by the simple additive standard deviation of each mean, whereas g c01Tects for sample size bias by dividing by a denominator COITected for sample size (n-1) , thereby coITecting for underestimation of population standard deveiation . Hedges' g requires means, SD's, and N's for each group and is defined as the difference between the sample means divided by the average pooled sample standard deviation as shown below in equation 3 (Hedges & Olkin , 1985). In addition, Hedges found that g can be upwardly biased when samples sizes are less than 20 per group . The second pait of the following equation provides this coITection.
This equation also illustrates that with a large N little difference will exist between estimates of g and d.
(3 .) In the present study many outcomes will be presented in terms of prop01tion of the sample attaining various behavioral criteria such as percent reaching Action or Maintenance stages of change , Action traditionally being defined as engaging in the desired behavior , but for less than six months , and Maintenance being defined as sustaining the behavior change for more than six months (Velicer , et al., 2000) .
Standardized mean difference effect size indices do not directly apply in these instances. Effect size for proportional outcomes , therefore , will be calculated using odds ratios. The odds ratio is defined as Unf01tunately , effect size cannot directly be calculated from odds rati o as the f01mat is not stand ardized . Takin g the natural log of the odds ratio , the log odds, thereb y standardizes the odds ratio on a scale from -1 to 1, with a mean of 0. This transfo1mation facilitates standardized comparisons above and below the mean , unlike the odds ratio where, for example , an OR of -.5 below the mean is equivalent to an OR of 2 above the mean . The resulting log odds can then be conve1ted back to the more easily interpreted odds ratio.
When combining results across studies, outcomes often will be reported in either continuous or dichotomous outcome formats. This presents a difficulty in choosing a combined effect size measure due to lack of equivalency between odds ratio and standardized mean difference . Outcomes can be calculated separately for each index but this results in a decreased number of studies available for comparison.
Transformations of the odd ratio into standardized mean difference effect sizes are available. For analyses in which both exist, but many are reported in dichotomous format, Lipsey and Wilson (2001) suggest the lo git transformation of the odds ratio enabling reporting of effects as a standardized mean difference. This method will be employed for dietary intake and exercise outcomes when results are rep01ted in both continuous ( e.g. number of fruits and vegetables /day) and dichotomous outcome formats (e.g . % reaching Action or Maintenance stages for fruit and vegetable intake).

Weighting of Studies
The main benefit of meta-analysis is the ability to pool effects from a variety of small N studies to aiTive at an overall estimate of effect size . Since the analysis will likely include studies with relatively small samples and others with large samp les, simply taking the ai·ithmetic mean of effects does not account for accuracy in estimating population means. It is assumed that larger samples will r esult in mor e accurate estimates of effect size and therefore should receive greater weight in the combination of effects . Prior to estimation of overall effect , each obtained effect size will be weighted by the inverse variance weight. Effects employing Hedges g will be multiplied by their weight ( ro) prior to combination and weighted according to equations 5, 6, and 7: where ES = the observed (uncorrected) effect size estimate , ES'= the cmrected effect size , N = the total sample size, SE = the standard error of the corrected effect size estimate, n1 and n2 = the sample sizes of the two groups , and ro = the inverse variance weight. Odds ratios will first be transformed to log odds and weighted according to equations 8, 9, and 10:

Outliers
As in any statistical analysis, outliers can unduly influence outcomes where the arithmetic mean is used to combine effects. Such outliers can result from miscoding , the presence of moderators , publication bias, or the occasional odd finding. In metaanalysis the presence of outliers differentially effects the estimation of the fixed and random effects models as outlined below. In the present study when effects fall outside two or three standard deviations of the overall mean, they will be examined for accuracy of coding. The analysis program for the present study permits analysis of the overall effect with "one study removed ." This analysis will be followed when outliers are present to assess their effect on the mean . Studies with small sample size may have little effect on the overall estimate given that they are first weighted. To preserve as much data as possible, studies will be deleted from analysis only if the extreme value indicates that the study does not conceptually fit with the set of comparisons studied .

Confidence Intervals
The point value of a mean effects size for a group of studies is considered the best estimate of the overall effect , but given the standard en-or, the actual population effect could fall within a range of values . Calculating a confidence interval around each effect accounts for the standard en-or of the estimate and permits simple analysis of the range in which the population effect may lie for a ce1tain level of confidence.
Confidence intervals for each study and overall mean effect will be calculated according to equations 11, 12, and 13: where SE Es is the standard enor of the effect size mean , OJ; is the inverse weight associated with the effect size i with i = 1 to k effect sizes included in the mean, ES is the mean effect size, and z <I -a) is the critical value for the z-distribution (1.96 for a= .05 ). Therefore , if the 95% confidence interval is chosen, the standard enor is multiplied by the con-esponding Z-value (1.96) and one can be 95% confident that the population value falls in this range . The confidence interval will become larger if the 99% level is chosen and smaller if the 90% level is chosen. Additionally , the interval is affected by the precision of the estimate, estimated by the standard en-or. Larger studies will offer more precise estimates with tighter intervals. This method also permits analysis of significance such that if the value includes 0, the null is maintained at the chosen significance level. Confidence intervals are suited to graphical display, which facilitates visual analysis of the range of effects and their precision and will be repo1ted in table and graphical formats.

Modeling Variance
As outlined previously , the pattern of variability of effect sizes around the mean is of prime concern in meta-anal ysis. The variation of effect sizes can be assumed to occur from sampling en-or among the subset of studies , from systematic variation , or from a combination of sampling and random enor . Various methods of modeling can account for each of these instance s.
Fixed effects modeling assumes that the only source of variance in the sample of effects arises from the actual variability of the sample of effects around the mean.
Fixed effects assumes that the mean value represents the best value of the population of scores and that variance around this mean arises from subject-level eITor alone . This Given that the Q test has low power to detect differences with less than 10 scores , planned moderator analysis will be done in the present study on all means. The following demographic and theoretical moderators will be tested : mean sample age, percent female , percent minority , retention rate , recruitment strategy (proactive v. reactive) , intervention strategy (tailored v. retailored) , and study group . Dichotomous moderators can be tested using procedw-es similar to the ANOV A, or regression analysis can be used for analysis of discrete and continuous moderators simultaneously . The ANOV A for meta-analysis paititions variance using the same techniques as any other ANOV A by separating the total variance Q (Total) into Q (Between) and Q (Within) using the equations: where Q is the between groups variance , ES i is the weighted mean effect size for B each group , aJ i is the sum of the weights within each group , and j is the number of groups .
where Q is the pooled Q within groups variance, ES ; is the individual effect size, w ES i is the weighted mean effect size for each group, aJ i is the sum of the weights within each group , i is the number of effect sizes , and} is the number of groups. The between groups Q is the measure of interest and is tested with the Chi-square distribution. A mixed effects analysis can also be employed. This model tests variance left over after assuming a within groups random effects model. Since it accounts for more within-groups variance , this model therefore has less statistical power to detect between-groups effects than the fixed effects model . The ANOV A or mixed effects models will be employed when only discrete predictors are of interest. Metaregression will be employed when both discrete and continuous variables require investigation. Con-elations among variables will be examined to indicate possible inclusion in the regression. Common regression procedures do not con-ectly estimate standard .en-ors and statistical test values for effect sizes, and thus , require conection .
Lipsey and Wilson have written a SPSS macro , which will be used in the present study , that performs conected meta-regression (Lipsey & Wilson , 2001).
As in any statistical test , detection of moderators using significance testing becomes difficult with small N's. With a sample size of 10 studies (as for many comparisons in the present study), power for detecting differences between groups is minimal , ranging from .07 with an ES of d = .20 to .55 with an (unlikely) ES of d = 1.0, assuming alpha = .05. Thus moderators may be present , but will not be able to be detected statistically . When moderators are not found in the sample of studies, the random effects variance component will be assumed for theoretical purposes . A random effects model is preferred in this instance because generalization to a larger population of studies is desired and because populations are assumed to have preexisting differences , and studies are assumed to randomly vary in characteristics such as sampling strategy , recruitment , message content , etc. Accounting for this nonsystematic error theoretically permits generalization of the mean effect to the larger population of similar studies . The random effects model may provide a slightly different estimate of the mean effect since fixed effects modeling weights smaller studies less than random effects modeling. The weights assigned to random effects are more balanced across small and large studies. That is, a random effects model operates from the assumption that extreme values, whether from large or small studies , come from a population of values and thus give small and large studies a more equal weighting than fixed effects. With a few small but extreme effects, such as in the case of publication bias, the random effects mean may be upwardly biased , but will have a larger variance given that it includes between-study variance. Random effects modeling includes the addition of a random variance component ( u 0 ) to the fixed effects subject-level error ( u;) and is thus defined as: where Ve is the random or between-subje cts component , and v ; is the subject-level sampling en-or. Defining the random component presents a difficulty and can be accomplished using the method of moments or maximum likelihood estimati on. Since it is iterative , maximum likelih ood can provide slightly more accurate estimates , but its difficulty outweighs the difference and the methods of moments will be employed in the present study defining Ve as: where Q is the value of the homogeneit y test , k is the number of effect sizes and (1); 1s the inverse weight for each effect size .

Statistically Dependent Effect Sizes
In the present study many instan ces exist in which studies include one control group with multiple intervention groups , all measured at more than one timepoint.
Unfortun ately, inclusion of more than one comparison or outcome tim epoint per subgroup introduces statistical dependen ce for which usual statistical procedures do not account. Gleser and Olkin (1994) have developed methods of accoun ting for such covariance among outcomes , thus enabling inclusi on of otherwise lost data. Their method , however , requires knowled ge of the corre lation between outcome measur es, which is rarely provide d. Therefore , the present study will follow the suggestion of Lipsey and Wilson (200 1) and combine outcomes where possib le. The mean of 51 timepoints will be used and outcomes will also be grouped for separate comparison .
Statistical procedures cannot be canied out in this instance , but it will permit examination of overall trends.

Choice of Comparison
A true test of tailoring requires comparison of tailored studies not only to nointervention control group , but to minimal or usual intervention whether that be providing targeted pamphlets , informational brochures , or physician advice. In the present study when studies include comparison of tailored intervention with assessment only control and with minimal intervention , minimal intervention comparison will be chosen as the reference group for effect size calculation.
Assessment-only reference groups will be combined with minimal intervention in combining studies to increase the overall N . This is theoretically feasible given that assessment itself is well-known to introduce intervention effects. Various combinations and definitions of tailoring were discovering during data coding . In the present analysis studies that provided at least one assessment and feedback will be refened to as "tailored." Studies completing an assessment and feedback at more than one timepoint will be considered "retailored ." Studies that assessed participants only once, yet provided feedback on more than one occasion are te1med "multiple tailored " and were grouped for analysis with the tailored studies. When tailored and retailored modalities will be considered together , the te1m "re/tailored" will be employed.

Missing Data Procedures
Missing data has long been neglected in many outcome analyses and can result in biased inferences. Sophisticated iterative techniqu es exist for analysis of subject 52 data, but do not apply well to meta-analysis given the differentiation among studies and small sample sizes. Data in meta-analysis can be missing at the study level (studies unable to be located) , at the effect size level (studies that do not rep01t sufficient data for calculation) and at the moderator level (lack of repo1ting characteristic variables). Study-level missingness in meta-analysis is dealt with in tenns of publication bias analyses. Effect size missingness can be dealt with by exclusion of a study or inclusion of a best predicted value. If a study did not supply enough information for calculating effect size it was not included in the analysis. If a study indicated that the effect was nonsignificant but did not specify group differences , it was included with the ES entered as O along with a dummy variable indicating this. Such a procedure can downwardly bias the result and can even be counterintuitive to the overall rationale for meta-analysis. Thus inclusion of a dummy variable allows effect of the study on the overall mean to be easily indicated and assessed. Methods of dealing with missing data common to all data analysis include complete case analysis, substitution of the mean, and analysis of available data. In the present analysis of moderator variables complete case analysis will be prefen-ed for analysis of moderators, but given that not all studies rep01t data required for the present moderators , available case analysis will then be chosen if enough studies remain to enable a comparison. Given the variability among studies , mean substitution will not be employed as it is unlikely to give an accurate estimate of the missing value.
Type of analysis canied out in each pa1ticular study also bears relevance to missing data . In cases where studies repo1t results from both intent-to-treat and all subject available conditions , effect sizes from all subjects available will be used for 53 analysis. As Hall et al. (2001) show, intent-to-treat analyses make the en-oneous and unnecessary assumption that subjects who drop out should be considered unsuccessful in terms of an intervention , thereby underestimating effect sizes.
Mean effects will be assessed for degree of publication bias using four In the Egger test, the standardized effect ( effect size divided by standard e1Tor) is regressed on precision (inverse of standard e1Tor). A significant intercept suggests that bias is present in the studies such that treatment effect is related to precision of estimation (study quality) .
Trim and fill is a techn ique developed by Duval and Tweedie (2000) and Each study is ranked using the equation : where r; • is the rank for each study ES and d; is the study effect and d; is the mean effect. Ro is the imputed number of studies where : where k is the total number of studies and r; is the largest negative rank . They suggest that publication bias exists when Ro > 3. Publication bias is not expected to be of significant concern in the present meta-analysis as most tailored interventions arise 55 from grant-funded research, cimying with it the obligation to publish results, regardless of significance.

General Characteristics of the Studies
The behavior , physi cal activity , and mammography screening (see Table 1). Thus 74 studies intervening on these fom behaviors were included for analysis , representing 96,018 paiticipants (see Table 2 .). Studies were coded according to a coding scheme modified from one previously developed to code theory-based behavior change studies (Hall, 2005) . Fifty-three variab les were coded which acco unted for over 30,000 uniqu e pieces of data (see Appendix A for the list of variables coded) . All but one study had been published in a peer-reviewed journal. Five authors were contacted for additional info1mation regarding outcomes and additional studies . Three responded resulting in the inclusion of two studies. Another author prefe1Ted not to have the data included until publication .   Table 3 rep01ts general characteristics of the studies. A total of 21 variables were rep01ted to have been intervened upon and a total of seven different health behavior change theories were employed . A few studies rep01tedly drew from more than one theory , resulting in 10 different combinations of theories. Table 4 summarizes health behavior theories refeITed to in the studies by actual variables  Table 5. 68

Mammography
For mammography screening behavior, 12 studies were found that employed a print tailoring intervention component. Compar isons that were similar across studies were combined , and fow-studies were chosen as the fewest to analyze together. With these restrictions, fow-overall combinations of comparisons were extracted . The most common comparison involved assessment or minimal intervention, such as a general brochw-e, versus print tailoring. Eleven studies were included in this analysis, with outcome timepoints ranging from two to 24 months post baseline. Most studies rep01ted results as propo1tion of paiticipants obtaining mammography using odds ratios and thus log odds will be employed for effect size analyses .

Assessment or Minimal Intervention versus Print Tailored
Eleven studies were included in the lai·gest compai·ison group for mammography screening. Included in this combined compai·ison were studies that compared assessment only or standai·dized brochure with theory-based tailored print feedback. Only two studies employed one additional retailored feedback that presented subjects with progress made since their first assessment and thus were included with the other nine. Mean effect size for the eleven studies with fixed and random effects was LO= .22 (.04), Z = 5.58,p = .001 and LO = .24 (.07), Z = 5.58, p = .001. The test for heterogeneity reached significance where Q = 22 .7 1, p = .012, df = IO (see Table 6 and Table 7). All studies employed proactive recruitment from a non-treatment seeking population and thus recruitment strategy did not serve as a moderator. No

Assessment or Minimal Intervention versus Tailored Calls
Four studies included a telephone-only condition in which tailored feedback was provided in the context of one or two phone contacts . The mean ES for this comparison was LO = .37 (.19), p = .0001 under both models (see Table 6 and .003 under both models (see Table 6 and Table 8). Standard en-or for these studies was small and thus the Q test for heterogeneity was nonsignificant where Q = .95, p = .814, df = 3. The small number of studies in this comparison prevents the search for moderators . Fail safe N for these comparisons was 5 and 9 with Rosenthal's and Orwin 's methods , respectively , yet Eggers regression intercept was nonsignificant (I= .824 (2.12), t = .39,p = .37), suggesting that the effect size is small, yet comprised of homogenous studies indicating some confidence in the ES of this comparison . Effects over Time Figure 1 presents the summary of effects across time for the 11 re/tailored and four studies that added counselor calls to the intervention. Data is taken from studies assessing outcomes at more than one timepoint and thus dependence prohibits statistica l comparison. The trend reveals decreases across time for both modalities , with the call condition remaining superi_ or. Examinat ion of confidence intervals suggests no significant differences between the groups , however , and a significant drop in effectiveness after 6-month assessment.   Rakowski 1998 Assess v retailored  intake . Fifteen studies intervened on two or more diet components such as both fat and fruit and vegetab le intake. Dietary outcomes wi ll be analyzed separate ly.

Fat intake
No differences were found between outcomes measuring fat intake -percent calories from fat, food frequency fat scores, and percent reaching Action or Maintenance stages for fat intake . Percent of calories from fat showed the greatest standard eITor (.033), followed by percentage Action or Mainten ance (.026), and fat scores (.022). Percent calories from fat relies on two calculations based on self-rep01t -total calories and fat intake -creating possibility for additional eITor variance. Fat scores will be the prefeITed outcome variable since it is measured continuousl y, followed by percent reaching Action or Maintenance stages , followed by percent calories from fat if necessary . Hedges g will be employed as the effect size measure for these studies since most studies rep01ted data in continuous format. Dichotomous outcomes, when they occU1Ted , were transformed with the logit function .

Fat Intake -Assessment or Minimal Intervent ion versus Tailored and Retailored
Studies employing either tailored or retailored interventions were included together. This allows the combination of 17 effect sizes. Emp loying a random effects model, the mean effect size for these studies was g = .20 (.03), z = 8.86, p = .0001 (see Table 9 and

Fat Intake -Assessment or Minimal Intervention versus Tailored
Ten studies employing an assessment or minimal intervention compared to tailored feedback were compared first. The overall effect size employing the fixed effects model was g = . 15 (.03), p = .0001 (see Table 9 and Table 10

Fat Intake -A ssessment or Minimal Intervention versus Multiple or Retailored
Seven studies compared assessmen t only or minima l intervention to multipl e tailored or iterat ively retailored inte rventi ons. These studies were examined together for a mean effect size of g = . 2 5 (. 03 ), p = . 001. Effect sizes ranged from . 18 to .46 with no outliers present. Studies were homogenous where v = .002, Q (6) = 2.71 , p = .84 (see Table 9 and

Fat Intake -Assessment or Minimal Int ervention versus Tailored + Calls
Four studies compared brief advice or assessment only conditions to tailored feedback plus one or more brief telephone contacts . Effect sizes ranged from . 06 to .50. The mean effect size for these studies was g = .25 (.07) , p = .000 1 using the random effects model. Tests were found to be heterogeneous where v = .012 , Q (4) = 9.87 , p = .04 (see Table 9). One study was removed (Jones & Rossi, 2003)

Fat Intake -Effects over Time
Effects increased over time for the tailored and retailored interventions , with an increasing, albeit fairly unreliable, effect even at 13-24 outcomes . With the addition of counselor calls to the print component, effects were initially higher (g = .28) but decreased across time (g = .18), becoming smaller than the print tailored conditions .
Even though formal analysis could not be done, a horizontal trendline can be drawn incorporating all confidence intervals , suggesting that no significant differences exist across timepoints , nor between groups. See Figure 3.

Fruit and Vegetable Intak e -A ssessment or Minimal Int en ,ention versus Tailored and Retailored
Nine studies employed tailored or retailored interventi on to increase fruit and vegetable intake together. For the fixed model the mean effect was g = .18 (.02), p = .0001 and g = .20 (.03), p = .000 1 with random effects . Heterogeneity was present where v = .0007, Q (8) = 16.48, p = .04. Comparing the tailored versus retailored methods revealed that the five studies employing a retailored intervention showed mean effect of g = .22(.03), p = .000 l and the six studies employing a tailored intervention had a mean effect size of g = .11 (.04), p = .0001, which was a significant difference Qb(l) = 4.82, p = .03 (see Table 11 and Table 12 increased up to 6-month assessment. Only three studies presented outcomes after 12 months , one of which was the group outlier and negative . Figure 5 presents the longterm effects with and with out the outlier removed . Confidence interva ls suggest no differences across timepoint , possibly due to small sample size, especially at 1-6 months.

Fruit Intak e -Assessment or Minimal Intervention versus Tailored and Retailored
Four studies intervened on fruit and vegetab le intake but measured each food category separately. One other study intervened on fruit intake only. These five studies were examined as a group. For fruit outcome , the mean effect size was g = .17 (.05), p = .00 1. Effect sizes ranged from g = .OI to .32 , yet heterogeneity was not present where v = .0001, Q (4) = 3.92, p = .42. See Table 11 and  Table 11 and Table 13). Despite the small sample size, moderator analysis was pursued in this case. A significant trend was found for number of interventions where two or more interventions significantly increased the effect size (B = .33 (.11), z = 3.04, p = .0002) , leaving a nonsignificant residual (Q (2) = 3.47 , p = .18). Interventions using more than one contact had the largest effect sizes (.31 and .11). Given no significant difference , fail safe N was 0, yet Egger 's regression was not significant (I = -17 . 7 ( 11. 96), t = 1.48, df = 2, p = .14), but this may be an mtifact of the sample size. Examination of the funnel plot suggested no bias for publication of small sample size studies with lm·ge effects and trim and fill imputed no values . See Table 11.

Compar ison of Fruit versus Veg etabl e Intak e
For all for studies measuring fruit and vegetable intake separately , a trend for larger effects was found for fruit intake , suggesting fruit intake is easier to increase.
This difference was not significant (~(l) = 1.57,p = .2 1), but given that fom of five outcomes m·e dependent , the standard e1rn rs may be eIToneously lm· ge.

Fiber Intak e
Four studies intervened upon increased amounts of dietary fiber. Effect sizes for these interventions ranged from . 18 to .93. Given the small sample size of the study showing an effect of .93, it had little weight and removing it only reduced the overall effect to .29 so it was retained . Therefore the mean effect size was g = .34 (.09) , p = .0001. Studies were homogenous where v = .03, Q (3) = 5.40 , p = .15 (see Table 11 and Table 13). Moderators were not examined given the small number of studies . Fail safe N was 16 and 3 with Rosenthal 's and Orwin 's methods respectively , but Egger 's regression(/= 4.55 (.86), t = 5.27, df = 2, p = .02) was significant suggesting publication bias . Such bias would be expected given the publication of one small N study with a large effect. Imputation of one study reduced the overall mean effect tog = .29, the same result as if the study had been removed . See Table 11.     Percent reaching Action or Maintenance stages , percent making any stage progress , percent meeting CDC exercise criteria, percent making any increase in exercise, and amount of activity measmed by seven-day physical activity recall. Two studies rep01ted outcomes on three of these measures concunently. Bock et al. and Pinto et al. (2001 ;2002) rep01ted results in te1ms of percent reaching Action or Maintenance stages, percent meeting CDC exercise criteria, and seven day activity recall . There were no significant differences among these outcomes, although there was a trend for the seven-day recall to show smaller effects . Thus, when multiple outcomes are reported, CDC criteria will be prefen-ed for the analysis, followed by percent reaching Action or Maintenance , and finally, seven-day activity recall.

Resul ts by Comparison Physica l Activity -Percent Reaching Criteria
Eleven studies measured outcomes in terms of percentage reaching CDC crite1ia, percent reaching Action/Maintenance stages, or mean physical activity recall .
Six studies employed a retailored intervention, three employed a tailored or retailored 94 intervention plus counselor contact, and two employed a one-time tailored intervention . Due to small sample sizes, all interventions were combined for calculation of an overall effect size. Four studies measured outcomes at more than one timepoint , and the mean of these effects was used as the overall measure for each.
After combining timepoint s, effect sizes ranged from g = .06 to .49. One outlier was present where g = .72 (Bock et al., 2001 , 6-mo) which appeared due to an intense intervention and was kept , as it entered into calculating the mean for that study with two additional timepoints . Overall the mean effect size for this group of studies under the fixed effect model was g = .20 (.04), p = .0001 and g = .24 (.05), p = .0001 using the random effects model (see Table 14 and Table 15).
The Q test for homogeneity of variance was not significant where v = .007, Q(lO) = 14.32, p = .16. In terms of moderators , five studies employed proactive recruitment where g = .22(.06) and six, reactive where g = .19(.05), but studies did not differ (Qb = .11, p = .74). Studies were conducted at seven different sites, preventing meaningful comparison . Retention rate and mean age were not related to effect size, but percent female was moderately related using p < . l O as the significance level (B = -.39 (.23), Z = -1.69, p = .09), which reduced the residual to nonsignificance (Q = 11.36,p = .25). Percent minority and recruitment rate could not be assessed due to missing data .

¾ female
Since studies varied by outcome timepoint and four provided multiple outcomes, outcomes were examined by timepoint grouping . Outcomes were grouped into three categories : 1-3, 4-6, and 7+ months . Five outcomes comprised each category . Due to dependence, statistical significance could not be examined across timepoint categories . This comparison contains a small number of studie s, creating large confidence intervals that overlap, however, suggesting no significant differences over time . Examinat ion of means suggest that strongest effects were found from 1-3 months (g = .38), followed by 4-6 months (g = .31), then 7+ months (g = .12). See Table 17 and Figure 7. Fail safe N was calculated to be 100 and 12 with Rosenthal' s and Orwin ' s methods respectively. Both Egger 's regression (/ = 1.91 (.58), t = 3.29, df = 9,p = .005) and trim and fill suggested an effect of publication bias. Trim and fill imputed six studies to the left of the mean reducin g the overall effect to g = .14. Tlme po lnt (months ) .

Physical Activity -Percent Making Progres s
Nine studies measured outcomes in terms of percent of pa1ticipants making stage progress or increasing amount of exercise . Stage progress refers to percent of patticipants moving at least one stage of change forwru·d on the continuum from Precontemplation to Contemplation to Preparation to Action to Maintenance as specified by the Transtheoretical Model. Despite the dichotomous nature of this outcome , the logit transformation was taken to enable comparison with other physical activity outcomes employing Hedges g . Four studies measured outcomes on more than one occasion. Overall , however , seven outcomes were available for 1-3 month outcome , two for 4-6 month outcome and 3 for 12 month outcome and thus comparisons by timepoint were not feasible . The mean of multiple timepoints was included in the combined effect size analysis. Effect sizes ranged from .05 to .66 . The mean effect under the fixed effects model was g = .14 (.05), p = .001 and g = .18 (.07), p = .01 under the random effects model. Significant heterogeneity was present among studies where v = .02, Q(8) = 15.54, p = .05 (see Table 15 and Table 18). Examination of studies revealed that studies with the two highest effect sizes both employed an intensive , interactive web-based intervention (Kosma , Cardinal , & McCubbin , 2005 ;Napolitano et al., 2003). These studies also employed reactive recruitment strategies .

Smoking
Choice of outcome measure is an impo1tant and somewhat controversial decision in regard to smoking cessation studies. FDA guidelines recommend 28-day abstinence or longer as the prefen-ed method of assessing efficacy (FDA , 1995). Many studies , however , do not employ this method , using either 24-hour or 7-day abstinence as outcome measures . Studies in the meta-analysis sample include outcomes assessing smoking using complete abstinence at 24-hour , 7-day, 28-day , 10-week, and 6-month increments . If such outcomes could be combined this would result in a larger sample for analysis. In an analysis of smoking outcome measures , Velicer et al. (2004) found that 24-hour, 7-day point prevalence , and 30-day prolonged abstinence measures showed con-elations of at least . 98 with each other among a series of three similar studies . The six-month continual abstinence measure showed a con-elation of .82 with these measures. They note two main problems with using prolonged abstinence as an outcome measure : (1) that it ignores the usual pattem of quitting in which people relapse multiple times and (2) that it does not account for delayed quitting in the sample. Thus prolonged abstinence rates tend to decrease across time, whereas the other measures show increases . This pattem will be examined in this analysis to determine if it replicates across disparate studies . In the present analysis 24-hour, 7day point prevalence , and 28-day prolonged abstinence outcomes wil l be analyzed as if they are equivalent. Where studies repo1t more than one of these measures for each timepoint , 24-hour point prevalence will be prefen-ed , followed by 7-day point prevalence, and then by 30-day prolonged abstinence. Six month and 10-week sustained abstinence will be examined separatel y. Since all outcomes are rep01ted in dichotomous format, log odds will be rep01ted as the effect size measure .

Smoking Outcomes: 6 Month Abstinen ce
For smoking behavior, 10 studies rep01ted outcomes for prolonged abstinence (10 weeks , 6-months or Action or Maintenance outcomes) . For each study the mean effect for multiple timepoints was used since few differences were found across outcome assessment timepoints (Qb= .69,p = .71). Effect sizes ranged from .12 to .75 .

R e/ Tailored versus R e/ Tailor ed + Coun selor Calls
Six studies included tailored or retailored intervention alone and in combination with counselor calls, allowing for comparison of the additive effect of calls over print tailoring . The mean effect with the fixed and random effects models was LO = .20(.09) and .19 (.13) respectively. Due to large standard en-or, the random effects model is nonsignificant. Heterogeneity was not present were v = .04, Q(5) = 8.22, p = .14 (see Table 21 and Table 23). Compared to retailoring alone, the effect of adding counselor calls resulted in increased effects at sh01t-term follow up, but smaller effects over time (see Figure 12).

Analysis by Timepoint
Since most stud ies assessed outcomes at more than one timepoint , studies were examined across timepoints . Timepoints cannot be compared with formal statistical tests due to dependence , in which case the null is rarely rejected , but can be presented for detennination of trends of effects over time. Examination of overlapping confidence intervals allows estimation of significance, however, and suggests that the trend is not significant with the sample size included here . The mean effect at 1-3 month assessment was LO = .27 (.04), at 4-6 months LO= .35 (.08) , at 7-12 months LO = .31 (.05), at 13-23 months LO = (.24) and at 24 month assessment and longer LO= .17 (.06). Figure 11 shows the effect size trend by timepoint with 95% confidence intervals.

Mammography
The overall mean effect size for tailored or retailored mammography intervention suggests about a 24% increase in effectiveness for these interventions compared to minimal contact. These studies showed significant heterogeneity, however , which was explained by positive con-elations among effect size, retention rate , and minority status . Studies showing greater retention rates may result in greater success since high retention facilitates more paiticipants receiving the intervention .
Increased success for interventions aimed at minority populations also makes theoretical sense given the base rates of minorities getting regular mammography are lower than non-minorities . The interventions would have more relative success moving a group from 55% mammography status , to 70% status than moving a higher SES from 65% to 70% getting mammograms. The effect of tailored versus retailored interventions could not be assessed for mammography given that only two studies employed a retailored component. All studies recrnited paiticipants proactively and no differences were found across theoretical orientation or study group. The effects of the interventions declined across time for both the tailored and tailored plus counselor call condition , with the largest drop from six to 12 months post-baseline . Follow-up past 12 months was not available for the counselor call condition to compare to the tailored intervention alone, which did maintain effects of about OR = 1.22 in the three studies assessing at periods greater than 13 months . Thus, it appeai·s that interventions aimed at increasing mammography use maintain at least some effect over time .

114
Adding counselor calls to the tailored interventions appears to result in 2. 8 times larger effects for this behavior. This difference could arise from two sources: 1).
studies employing calls being of better quality and/or 2). Effect ofretailoring that calls provided . It does appear that the interventions that employed additional counselor call interventions were of better quality. When the tailored intervention components of the studies that employed calls were compared alone, the LO = .37 (OR = 1.45), which is greater than the overall mean for all tailored interventions LO = .24, suggesting that the tailored component of the call studies was of greater effectiveness alone .
Additionally , since only two studies employed a retailored component , the effect of calls over retailoring could not be assessed. It is possible that counselor calls provided an updated assessment and feedback component that would be independent of intervention modality . It does appear, however , that adding calls to tailored intervention does increase effectiveness for mammography behavior by 1.7 times.
The relative risk calculated for mammography screening interventions is 1.12.
Examination of the relative risk facilitates interpretation of the practical effects of i_ ntervention effects. Relative risk employs the overall sample size per group as the denominator and thus does not compare treatment versus intervention groups. Relative risk can be used to show the amount of change over the entire population if an intervention were provided and thus usually will be lower than the odds ratio.
Subtracting the relative risk from 1 yields relative risk reduction (RRR) in percentage terms , which would be 12% in this instance. If the intervention were provided , therefore , one would expect a 12% change in mammography behavior overall. Since effect size relies on difference scores, it is necessaiy to base interpretati on on the 115 actual base rates of the behaviors under question . The absolute rate of women getting mammography ranges from 63% to 75% depending on SES. These interventions offer the possibility of increasing mammography rates of people not already getting mammography by 7-10% in the population (see Table 24) . Given that the relative risk of breast cancer m01tality decreases about 23% for women over 50 who get regular mammograms (Agency for Healthcare Research and Quality, 2002), increasing rates by 10% could decrease the 41,000 m01tality rate by about 2.3% or 943 persons. Individual studies have found larger effects for decreasing fat intake for men than women (Kristal et al., 2000). The CDC estimates that in 2000 men and women both consumed about 11 % of calories from fat with rates comparable between gender for the past 30 years (CDC, 2004). At least one study in the present analysis, however , did find and repo1t higher baseline consumption of fat for men than for women (Armitage et al., 2001) and other studies have identified women as attending to health message to a greater extent than men (Stevens et al., 2003).
For fruit and vegetable intake, the mean effect was equivalent to that for dietary fat reduction . Retailored interventions showed about twice the effectiveness of tailored interventions. Effects increased from baseline to 6-month outcomes and either decreased or remained consistent at 12-months post-intervention .  (CDC, 2007) .

Physical Activity
The mean effect size for percent reaching criteria was g = .24, a large effect for population-based interventions . This represents a 39% increase in effect over the control group and a 28% increase in exercise if employed on a population .
Recruitment strategy did not influence this outcome, but effect size was found to increase with the inclusion of fewer females in the studies. This suggests that physical activity interventions are more effective for men than women , which may be accounted for by the lower rates of activity by men (21 % ) than by women (26%) overall  . A trend for decreasing effect size over time was found for percent reaching criteria, with a large drop between six and twelve month outcomes . It appears that the effect of the intervention decreases over time , but the drop is nonsignificant and the overall effect remains clinically meaningful where g = .15. A small to medium-size mean effect was found using percent of participants making progress as the outcome .   costs annually or 3.8 billion dollars (CDC, 2002). A 5% decrease would also result in 9,300 fewer deaths from all cancer and 8,900 deaths from cardiovascular disease (CDC, 2002) .

Effectiveness of Tailored Interventions
Over all four behaviors the mean effect size was statistically significant for each comparison . Tailored interventions outperfonned assessment-only or minimal interventions, with greater effects for retailored interventions .

Differences between Tailored or Retailored Interventions
The difference between tailored versus retailored interventions appears for smoking cessation , dietary fat reduction , and increasing fruit and vegetable intake such that retailoring almost doubled the effectiveness of tailored interventions . The greater effect could be explained by increased number of overall contacts that retailoring necessitates, but number of contacts regressed on effect size was not a significant predictor for any behavior. This suggests that periodical reassessment to enable updated feedback provides qualitatively meaningful improvement in interventions over and above increasing number of contacts. Sufficient data were available for smoking cessation to compare the effects of tailored versus retailored interventions across time. It appears that for shorter term outcomes, retailored interventions gave greater effects and that these endure more so than tailored interventions (see Figure   14). Results thus suggest that retailoring, despite resultant increased paiticipant burden and effort involved, provides meaningful behavior change information that facilitates long-te1m maintenance.  initially to quit rates , but effects for calls declined more sharply than for tailored interventions alone. Thus it appears that counselor calls provide sh01t-term efficacy , but less intensive print retailoring fulfills the need for outcome maintenance , possibly due to reliance on the counselor rather than self-efficacy.

Eff ect Size Over Time
It was predicted that effect size would increase over time . In fact , across behaviors in which sufficient data were available , effect sizes decreased over time.
Across behaviors , the sharpest decreases in effect size were seen past 6-month followup. This was also accompanied by an increase in the enor of measurement , as a surprising number of studies measured outcomes at relatively sh01t outcome timepoints , often one to three months post-baseline . Such methodology limits the ability to detect long-term effects and makes little sense when attempting to measure behaviora l change. Indeed such methodology assumes that the entire sample is ready to commence the behavior , an assumption that ignores stages of readiness. The hypothesis that tailoring methodology would improve since the first tailoring interventions were developed , resulting in larger effects was also tested by regressing publication date on effect size for each behavior. None of these regressions revealed a trend, significant or not.

Differences in Recruitment Strategy
It was predicted that proactive recruitment would result in smaller effect sizes, but reach larger numbers of people, resulting in greater impact. Due to small numbers of studies using reactive methodology for mammography and diet behaviors, this prediction could not be addressed. No differences in effect size by recruitment strategy were found for increasing physical activity or for smoking cessation.

Differences of Effect Size Among Behavior Change Theories
As predicted, theory was not a significant moderator in any comparison for any behavior. This could be due to the fact that many interventions employed either the TTM , did not refer to a paiticular theory , or combined components of various theories.
Since most change theories incorporate similai· vai·iables into tailored feedback rep01ts, finding differences in either number of variables intervened upon as a continuous variable or among change theories in the small sample included here becomes difficult. In a related prediction , the multitude of study groups involved in creating tailored interventions increases the number of categories in some cases to the number of effects, thereby preventing statistical or even visual inspection of reliable patterns . Ve1y few studies repo1ted stage distributions of their baseline samples , preventing comparison of mean effect between studies intervening on pre-action or comprehensive stage-distribution samples .

Effect of Demographic Moderators
Some differences were found for demographic variables. Percent female was negatively related to effect size for dietary fat reduction, fruit and vegetable intake , and physical activity criteria. This most likely arises due to the fact that women in general engage in more health-conscious behavior than men and therefore have less to learn from the type of interventions provided here, suggesting a ceiling effect.
Retention rate was a significant positive predictor of effect size for mammography and smoking cessation, suggesting that keeping participants involved in the study remains a vital component of intervention. Lack of finding significant moderators does not necessarily mean they are not present. Given small sample sizes, statistical power is low to detect these relationships. The database created for this study enables investigation of statistical power of these predictions in further follow-up studies .

Limitations
The main limitation of this study involves the wide differences among the studies in question . Tailoring is a relatively nascent field open to interpretation and various modalities of intervention. Messages differ in te1ms of writing style, language, layout , amount of tailoring , behavior intervened upon, and assessment time points.
Such disparity may limit the ability to compare studies in some instances.
Meta-analytic methods also can-y limitations. Meta-analyses are coITect regarding direction of effect about 80% of the time (Naylor , 1997), an acceptable but not perfect statistic. Statistically, meta-analysis places emphasis on variance among study effect sizes . With even a fair number of studies to compute a mean effect, power is limited to detect and predict between-study variability . Multivariate techniques require sample sizes (in this case number of studies) much larger than the number of predictors, a case that rarely exists for meta-analysis , a situation that limits modeling and discovering more specific conclusions. Since effect sizes appear similar across studies, combining effects across behaviors is theoretically justifiable. Such combination would increase the power to detect moderators , which was a significant limitation of the present analysis, preventing additional conclusions to be drawn regarding an optimal tailoring formula The process of meta-analysis can also bias results. Generalizability may be limited if a limited sample of studies is found . Confounding of substantive and methodological features also occurs. If a difference appears in two groups that are also measured differently , the source of the discrepancy cannot be determined (Kazdin & Weisz , 1998) . In addition, the nature of this study did not pennit use of an additional coder to facilitate inter-rater reliability comparisions . This stud y did not calculate a methodology qualit y variable , which may be a valuable moderator in future analyses.
Sampling and publication bias inevitably skew the results of a meta-an alysis.
Searches are not able to locat e all rel evant studie s, even with conceited effmt. Despite the intensive search for studies in the present analysis , a recently published work contains at least five studies not included here . Even with an intense search , the field has well-documented the publication bias problem such that significant studies are more often published than non-significant studies . This results in upward bias of the mean effect size. Since this study assessed outcomes largely from well-controlled and funded t1ials, publication bias may be limited.

Further questions
Differences between modality of intervention in terms of live counselor, print only, print plus counselor call, interactive tenninal or web-based , or email reminder , whether tailored or retailored , could not be detennined in this analysis. The majority of studies employed print tailored interventions alone, preventing meaningful comparison among these modalities . There does appear, however , to be added benefit of counselor calls for three of four behaviors , but this intervention is short-lived and surpassed by less expensive print retailored feedback over long-term follow up.
Dissemination remains a problem for many of these interventions since they require significant infrastructure to design and implement. Even if grant funding pays for initial development, interventions need to be continually administered to continue producing their effects on health behavior . Maintenance is a vital component in assessing the effectiveness of an intervention according to the RE-AIM framework suggested by Glasgow et al. (2002). An intervention with a large effect size will have little impact if it is not put into practice on a consistent basis. Intervening on a few thousand people will not reduce disease burden at a measurable population level.
Traditional public health practice , usually at the state or local level, does not have 13 1 funding or organizational capacity in place to implement these programs. This leaves smaller , private organizations with an interest in prevention such as large employers or health insw-ance companies to implement these interventions on a population of their members . The cost/benefit ratio for long-te1m prevention must be clearly specified for such investments to be made. Unfortunately , prevention often makes little fiscal sense in the cmTent health care setting. Beth Israel Medical Center in Manhattan , for example, developed a highly successful diabetes prevention and management protocol , which was subsequently halted due to lack of income from adverse diabetes sequelae such as amputations (Urbana , 2006) .
Additional development of the methodology of tailored interventions would also facilitate their dissemination . These interventions require significant paiticipant bw-den in terms of assessment to guide tailored feedback. The increasing use of electronic medical record technology enables information to be gathered on health behaviors and risk factors . Theoretical vai·iables such as decisional balance can only be tailored through assessment , but much behavior and risk feedback information can be gathered directly from medical records without assessment burden . Many interventions in the present study employed a combination of tai·geted and tailored methodology by locating and tai·geting people most a risk for an outcome through medical record data and presenting them with a ·tailored intervention .
The sw-vey of tailoring that this study enabled also revealed methodological flaws both in caiTying out and rep01ting of outcomes. Meta-analysis lai·gely relies on coding from written repo1ts of each study , whether published or not. If meta-analysis is to guide progression of a field, outcome papers must repo1t detailed (yet succinct) 132 accounts of their interventions . With the increasing use and usefulness of metaanalysis writers should rep01t statistical results in formats that enable inclusion in a meta-analytic review . This would entail at least rep01ting means , standard deviations , confidence intervals, and actual p-values for every comparison . Stating results were "not significant" without reporting statistical data inordinately restricts the metaanalyst.
Greater detail is also needed in specifying how tailoring is accomplished in