Testing Theory-Based Quantitative Predictions with New Behaviors

Theory development is essential for the generation and support of research ideas. Traditional Null Hypothesis Significance Testing (NHST) has been the modus operandi for testing research questions across many branches of science since the early 20 century. The focus of a statistical test under the NHST framework considers the rejection or acceptance of a null hypothesis based on a conditional probability of the data given that the null hypothesis is true (i.e. a p-value). This approach provides no direct support for a specific theory, which often takes the form of an alternative hypothesis. Furthermore, rejection of a null hypothesis based on a p-value provides no information on the magnitude of a difference and is affected by sample size, alpha level, and effect size. Such dependency on p-values can lead to misunderstanding and misinterpretation of results and conclusions. Therefore, the limitations of NHST warrant the investigation and development of new, more rigorous approaches to theory testing. A quantitative approach, called “Testing Theory-Based Quantitative Predictions” (TTQP), has been proposed using effect size indices and confidence intervals to directly test predictions posited by theory (Velicer et al, 2008). Effect size indices provide information regarding the magnitude and direction of an effect while confidence intervals provide a means of “testing” specific predictions. This approach is an iterative process, allowing the researcher to tailor the theory as empirical data is collected. The use of the TTQP approach contributes to the movement away from NHST and the reliance on p-values, while promoting a stronger and more informative method. A quantitative orientation represents an essential change in thinking about theory testing by emphasizing the numeric strength of a measure, leading researchers away from a simple binary accept/reject framework. Predictions are made relative to the specific measure/variable, but the use of effect sizes allows for comparison across studies and across theories. Therefore, the TTQP approach actually provides more information than traditional NHST. The TTQP approach involves several steps. First, verbal descriptions of the expected values are designated a priori. These predictions are theory-based and guided by previous empirical findings (e.g. “small effect”). Second, verbal predictions are translated into quantitative values based on traditional guidelines or empirical results (e.g. “0.01”). Then, observed effect size estimates with surrounding confidence intervals are generated from sample data. If a confidence interval contains the predicted value, the prediction is confirmed. If the predicted value falls outside of the confidence interval, the prediction is not confirmed and explanations for failed predictions are examined. The current study replicated findings from Velicer et al. (2008) and extended previous research by generating predictions for new health behaviors: diet and sun exposure. Secondary analyses were performed on cross-sectional data from a multiple health risk behavioral intervention. Predictions for each behavior varied slightly depending on the nature of the behavior and represented the major constructs of the Transtheoretical Model: decisional balance, self-efficacy, and processes of behavior change. Effect size indices were represented as ω and 99% confidence intervals were generated to employ a stringent test of fit.


LIST OF TABLES
Theory development is essential for the generation and support of research ideas. Traditional Null Hypothesis Significance Testing (NHST) has been the modus operandi for testing research questions across many branches of science since the early 1900's (Kline, 2004). Modern day NHST is the result of a blending of two schools of thought: the Fisherian approach, which simply features a statistical test of the null hypothesis; and the Neyman-Pearson approach, which introduces the alternative hypothesis, a fixed alpha level, specification of one-or two-tailed regions, as well as Type I and II errors (Kline, 2004). The focus of a statistical test under the NHST framework considers the rejection or acceptance of a null hypothesis based on a conditional probability of the data given that the null hypothesis is true (i.e. a p-value).
This framework provides no direct support for a specific theory, which often takes the form of an alternative hypothesis.
Consequently, NHST tends to provide a result that is not what researchers are actually interested in: support for their theory or the alternative hypothesis (Kline, 2004;Steiger, 2004). Thus, the limited focus on a test of the null hypothesis has been criticized for almost as long as significance testing has existed. Furthermore, rejection of a null hypothesis based on a p-value provides no information on the magnitude of a difference and is greatly affected by the size of the sample, the alpha level, and the size of the effect. Such dependency on the p-value, which is arbitrary, can lead to misunderstanding and misinterpretation of results and conclusions (Cumming, 2012; development of new, more rigorous approaches to theory testing.
A quantitative approach, called "Testing Theory-Based Quantitative Predictions" (TTQP), has been proposed and uses effect size indices and confidence intervals to directly test predictions posited by a theory (Velicer et al, 2008). Effect size indices provide the researcher with essential information regarding the magnitude and direction of an effect while confidence intervals provide a means of "testing" the specific predictions generated by theory. By examining and comparing effect sizes, researchers can gain insight into the importance of individual measures or constructs within a theoretical framework. This approach is an iterative process, allowing the researcher to tailor the theory as empirical data is collected. The TTQP approach has been conducted on theory of smoking behavior, but needs to be replicated with new data and applied to predictions involving novel behaviors. The present study is a secondary data analysis that represents a replication of the Velicer et al. (2008) findings and an extension of the smoking based predictions to two new behaviors, diet and sun exposure. The use of the TTQP approach contributes to the movement away from NHST and the reliance on p-values, while promoting a stronger and more informative method of theory testing.
As mentioned above, TTQP uses effect size indices and confidence intervals to directly test predictions posited by a theory. Scientists conducting research to test a specific theory are often determining the adequacy of the theory given the data, thus an "accept-support" hypothesis testing method may be an appropriate and informative approach (Steiger, 2004). The goal of the "accept-support" method places emphasis on supporting a theory and can be considered to have a focus on model fit rather than the rejection of a null hypothesis (i.e. the "reject-support" method). Cumming (2012) describes TTQP as a model fitting approach: it allows for a comparison of how well a theoretical model can fit different sets of data. Due to the quantitative nature of the comparisons, incongruities between the model and the data are easily identified and subsequently examined. Ease of interpretation is furthered through the use of graphical displays of comparisons, such as Figure 1, displayed by Velicer et al. (2008). The graph contains each prediction for a list of variables in relation to a 99% confidence interval generated by point estimates obtained from empirical data. The variables are sorted by ascending effect size for easy identification of the variables with the largest effects.
The TTQP approach serves as a guide to generate specific quantitative predictions about the magnitude of the effect size for a certain measure or variable, while confidence intervals surrounding the observed effect size are employed to test the predictions. A quantitative orientation represents an essential change in thinking about theory testing by leading researchers away from a simple binary accept/reject framework. Previously, binary outcomes have been considered sufficient and the magnitude of an effect was largely ignored. This quantitative approach shifts traditional ways of thinking by emphasizing the numeric strength of the measure(s).
Predictions are made relative to the specific measure, but use the use of effect sizes allows for comparison across studies and across theories. If a prediction lies outside of the observed confidence interval, it is not confirmed. Therefore, the use of effect sizes with confidence intervals actually provides more information than traditional significance testing as the magnitude and direction of an effect will inform a researcher of the role specific measures play in the scientific theory.
The iterative nature of this approach allows the researcher to tailor the theory as empirical data is collected. Kline (2004) describes a "paucity" of replication in the literature of behavioral research. Application of the TTQP can build empirical cumulativeness by integrating and testing previous empirical results to new data. In addition, the TTQP approach promotes meta-analytic thinking by emphasizing effect sizes, confidence intervals, and the integration of previous empirical data.
Mathematically, two main components make up the TTQP approach: effect size estimation and confidence interval generation. Effect size predictions are conducted a priori and involve several steps. First, verbal descriptions of the expected values are designated for each prediction. These predictions, such as "small," "medium," "large," or "no effect," are theory-based and guided by previous empirical findings. Second, verbal predictions are translated into quantitative values based on Cohen's (1988) guidelines. Then, if available, empirical results from a previous study are used to recalibrate quantitative values to better reflect the empirical findings.
Finally, the effect size estimates and confidence intervals are generated from the new data set. If a confidence interval contains the predicted value, the prediction is confirmed. If the predicted value falls outside of the confidence interval, the prediction is not confirmed and explanations for failed predictions are examined. Visual graphics are utilized to aide the interpretation of results. Point estimates for multiple measures can be displayed together for easy comparison and examination of missed predictions.

Effect Size Estimation
Explicit effect size predictions allow researchers to gain information beyond the binary accept/reject decision procedure by moving the focus of research away from p-values. Traditional NHST does not emphasize the magnitude of a difference, thereby ignoring a crucial piece of information regarding the relationship between variables. Quantitative predictions of effect size estimates guide a theorist to numerically clarify what their theory predicts. These numeric predictions then allow for quantitative comparisons between groups. The proposed study will utilize omegasquared (ω 2 ), which is used when comparing differences across more than two groups.
This metric also corrects for the positive bias of explained variance when estimating a population effect by taking random error into account. It is calculated using information from a one-way between-groups fixed effects ANOVA: where SS BETWEEN and SS TOTAL are the Sum of Squares between groups and Sum of Squares total, k is the number of groups, and MS WITHIN is the Mean Square within groups. By directly testing a theory with quantitative comparisons, such as ω 2 , researchers can come to more precise conclusions. Cohen's (1988) guidelines for the interpretations of variance-accounted-for effect sizes suggest a "small" effect to be one percent of the variance-accounted-for, a "medium" effect to be six percent, a "large" effect to be fourteen percent or more, and "no effect" to be zero percent. By verbalizing effect size predictions, comparisons can be viewed with greater practical understanding, while assigning numeric values to verbal predictions allows the procedure to become quantitative. However, these classifications are very broadly defined and were intended only as a guide to initial estimates. As theorists become familiar with the effects found in a specific area of research or population, they should adjust the classifications of "small," "medium," and "large" to better represent that area. Thus, for one study a medium effect size may be better represented by a classification of ω 2 = 0.08 rather than ω 2 = 0.06. Since part of this study examined two new behaviors, diet and sun exposure, Cohen's original classification were used to guide effect size prediction estimations. As new data is generated for these behaviors, the effect size classifications may be recalibrated to reflect the data (Velicer et al., 2008). Depending on the population and context of the data, classifications may vary across studies.

Confidence Interval Generation
Confidence intervals surrounding point estimates are becoming more widely used as recommendations and formulas for calculation emerge. The APA Publican Manual recommends the use of confidence intervals whenever possible (APA, 2001). Cumming and Finch (2001) suggest four reasons for using confidence intervals: "They Estimation of confidence intervals recognizes that sampling error may occur in point estimates generated by sample data. Intervals can be used to specify a range of plausible values at different levels of confidence. Use of a 95% level of confidence will generate a narrower interval than a higher level, such as 99%, as it allows for slightly more error.
Calculation of confidence intervals around normal distributions is fairly straightforward; however, the calculation of confidence intervals around effect sizes such as ω 2 relies on the noncentral distribution. In fixed-effects between-subjects ANOVA, the noncentral F distribution is characterized by three parameters: the numerator and denominator degrees of freedom and the noncentrality parameter, lambda (λ). As λ increases, the distribution becomes more noncentral. In terms of hypothesis testing, the null hypothesis states that the noncentrality parameter equals zero. Thus, if a null hypothesis is false, λ will depart from the null distribution (Steiger, 2004).
Rationale for the movement away from traditional null hypothesis significance testing has existed for quite some time, but widespread use of confidence intervals has been hindered by the difficulty of computing exact intervals for noncentral distributions until the advancement of computer capabilities in the 1980's and 1990's (Smithson, 2001;Steiger, 2004). Kromrey and Bell (2010) provide a SAS macro that applies the interval inversion approach to the calculation of confidence intervals around ω 2 (Steiger, 2004;Steiger & Fouladi, 1997). When a distribution is not symmetric, such as the F-distribution when the null-hypothesis is false, the relative frequencies for λ change as the distribution changes shape, dispersion and location (Kromrey & Bell, 2010;Smithson, 2001). The interval inversion approach calculates a confidence interval around λ by using an iterative approach that seeks the values of λ that demarcate top and bottom percentiles of the noncentral distribution for a particular F distribution. The upper and lower noncentral values can then be transformed into upper and lower ω 2 values. For additional information regarding the calculation of confidence intervals around effect sizes, see Fidler & Thompson (2001) The Transtheoretical Model The TTQP approach is used in the context of an explicit theory and provides an alternative framework to test predictions regarding the relationship between constructs described within that theory. In this paper, TTQP will be used to assess predictions of the relationship between health behavior constructs theorized by the Transtheoretical Model of behavior change (TTM; Prochaska & DiClemente, 1983). Briefly, the TTM is a theory of intentional behavior change and is used to assess an individual's readiness to engage in change. It utilizes a stage approach with three core constructs including decisional balance, self-efficacy, and processes of change. Behavior change is regarded as movement through a series of five stages: precontemplation, contemplation, preparation, action, and maintenance. Individuals across each stage of change utilize the TTM constructs in different ways, allowing for comparison of constructs by stage using effect size indices to gauge the direction and magnitude of differences.
The three behaviors examined in this study are smoking, diet, and sun exposure. The core TTM constructs for each of these three behaviors, decisional balance, self-efficacy, and processes of change, may be indicated by slightly different variables, depending on the nature of the behavior. For example, the self-efficacy constructs of smoking and diet are represented by a temptation scale, but is represented as a confidence scale for sun exposure. Similarly, there are ten core processes of behavior change in the TTM, but a certain behavior may have additional processes specific to that behavior. For example, diet has one additional process, Thirty-six of the 40 predicted effect sizes were confirmed by the data, however, the study used significance tests instead of confidence intervals to support the hypotheses. In these cases, the effect sizes predicted were larger than the effect sizes observed and were not contained in a 99% confidence interval.
A cross-sectional study employed the TTQP approach on a sample of smokers (Velicer et al., 2008). Fifteen effect size predictions were made that compared 15 variables within the three TTM scales (decisional balance, temptations, and processes) across the first three stages of change. Effect sizes were recalibrated using values observed in two previous smoking studies in order to better represent effect sizes in the area of smoking cessation research. Eleven of 15 predictions were confirmed and missed predictions were evaluated. Missed predictions were determined to be a result of four potential issues: sample fluctuation, need for theory revision, theory incorrect, or need for further calibration of effect sizes.
The current study replicated the findings of Velicer et al. (2008) and extended previous research by applying the smoking-based predictions to two new health behaviors. Secondary analyses were performed on cross-sectional data from a multiple health risk behavioral intervention (Prochaska et al, 2005). Study 1 directly replicated the smoking cessation results. Study 2 applied predictions to unhealthy diet and Study 3 applied predictions to sun exposure. Predictions for each behavior varied slightly depending on the nature of the behavior and represented the major constructs of the TTM: decisional balance, self-efficacy, and processes of change. Effect size indices were represented as ω 2 and 99% confidence intervals were generated to employ a stringent test of fit.

Study Design
Secondary analysis was conducted on baseline data from a population based multiple risk factors behavioral intervention. See Prochaska et al. (2005) and DePue et al. (2008) for an overview of the study. Participants were recruited and assessed for smoking, diet and sun exposure behaviors. Smoking behavior was assessed in regard to self-reported daily smoking habits. Diet was assessed in regard to a high fat diet: greater than 30% calories and total score on the Dietary Behavior Questionnaire (Greene et al., 1994;Greene et al., 1996). Finally, sun exposure was assessed in regard to self-reporting of exposure: 15 or more minutes of exposure per day or inconsistent SPF-15 use and total score on the Sun Protection Behavior Scale (Rossi et al., 1995).

Sample
A large health insurance organization provided patient information and 5407 primary care patients agreed to take part in the study. Participants were at risk for at least one of four behaviors: reduce smoking, improve diet, decrease sun exposure, and receive regular mammograms. Participant data from the baseline measurement was examined for three behaviors: smoking, diet, and sun exposure. The total baseline sample was 68.0% female, predominantly white (96.7%), 1.3% Hispanic, and had a mean age of 44.7 (SD = 12.7).

Measures
Demographics. Single items assessed age, sex, race, and ethnicity. Race and ethnicity were represented by two separate questions. One question asked for race (White, Black, Asian/Pacific Islander, American Indian/Alaska Native, Other) and a second question asked for Spanish/Hispanic origin (yes, no).

Stages of change.
Participants were classified into one of five stages of change using an algorithm that assessed their readiness to change. For diet and sun exposure behaviors, stage is assessed based on an individual's perception of their readiness to change, and then adjusted based on a series of questions regarding their habits and behaviors to best reflect their readiness to change. Adjustment for these behaviors is due to the potential for discrepancy between self reported (i.e. perceived) readiness and an individual's actual behavior. The Dietary Behavior Questionnaire (Greene et al., 1996) is used to stage for diet and includes questions regarding an individual's intention to avoid eating high-fat foods followed by a behavioral assessment of that intention. For example, items include: "Do you sometimes eat fruit and vegetables as snacks" and "Do you eat reduced fat or low-fat cheese". Similarly, the Sun Protection Behavior Scale (Rossi et al., 1995) classifies participants regarding their intention to consistently protect themselves from sun exposure followed by a behavioral assessment of that intention. For example, items include: "Do you avoid the sun during the mid-day hours" and "Do you use a sunscreen with SPF of 15 or more on all sun exposed skin areas". For smoking, only one stage measure is assessed based on their readiness to quit. Participants were measured at baseline and grouped into one of the first three stages of change, representing the "pre-action" stages: precontemplation (PC), contemplation (C), or preparation (PR). A detailed discussion of stage of change measures for smoking is discussed by DiClemente et al. (1991); for diet, see Green et al. (1994Green et al. ( , 1999; for sun exposure, see Weinstock et al. (2002) and Rossi et al. (1997).

Decisional balance.
Cognitive and motivational aspects of decision-making are measured by the Decisional Balance Inventory (Prochaska et al., 1994;Velicer et al., 1985). These two constructs are distinguished as the "Pros" and "Cons" of engaging in a behavior. For smoking and diet, the pros represent the perceived benefits of engaging in the unhealthy behavior and cons represent the disadvantages. For sun exposure, the reasoning is reversed. The pros represent the benefits of sun protection while the cons represent the difficulties or disadvantages.

Self-Efficacy.
A person's self-efficacy, or belief that they can prevent or cope with the temptation to fall back into unhealthy or high-risk behavior, is measured using measures of confidence and situational temptations (DiClemente, 1986;Velicer et al., 1990). These measures may be conceptualized and/or measured differently depending on the behavior of interest.
Situational temptations. The temptation to engage in negative health behavior in various situations was measured using the Situational Temptations Inventory (Velicer et al., 1990). This scale consists of three subscales that measure responses using a 5-point Likert scale (1 = not important to 5 = extremely important). For smoking, the subscales measure the Positive/Social, Negative/Affective, and Habit/Addictive aspects of the temptation to engage in smoking behavior. For diet, the subscales include: Positive/Social, Negative/Affective, and Difficult Situations.
Confidence. Confidence was measured using a scale designed to assess a person's confidence to engage in sun protection behavior during difficult situations.
This scale has two subscales, general and sunscreen use, and is represented by a sum score of each subscale, with a higher score representing higher confidence.

Processes of change.
The processes of change represent ten different behavioral and experiential strategies for changing behavior (Prochaska, Velicer, DiClemente & Fava, 1988). Experiential processes include: Consciousness Raising

Statistical Analyses
Predictions were conducted by comparing participants classified by stage at a baseline assessment. Stage of change was considered the independent grouping variable while the decisional balance and self-efficacy subscales, as well as the processes of change, were considered the dependent variables. Comparisons between stages were achieved by one-way between-groups fixed effects ANOVA. Information from the ANOVA source table provided the information necessary for calculating ω 2 .
99% confidence intervals for ω 2 were calculated using a SAS Macro (Kromrey & Bell, 2010). Predicted and observed values and upper and lower bounds for observed values were entered into an Excel spreadsheet that generates a graph allowing for a visual comparison of the prediction with the observed values.

Initial Effect Size Predictions
Effect size predictions for smoking have been examined previously (Velicer et al., 2008). Study 1 used the recalibrated effect sizes suggested in Velicer et al. (2008) as predictions in order to replicate and validate findings for smoking behavior. In the absence of previous empirical work for diet and sun exposure, initial predictions for Study 2 and Study 3 followed the original predictions of Velicer et al. (2008) that were guided by TTM theory and put forth before recalibration of observed data for smoking behavior. A few of the processes variables are unique to sun exposure and diet and therefore did not have initial predictions from the list previously tested smoking predictions. A panel of TTM, diet, and sun exposure experts met to discuss appropriate predictions for these behaviors and they were used for subsequent analysis.

Study 1: Smoking
Fifteen Effect size predictions for smoking were examined. Predictions represent a direct replication of results from Velicer (2008) and integrate the 1 displays the quantitative predictions for smoking effect size based on previous data.

Study 2: Diet
Sixteen predictions for diet (eating healthier) were examined. Verbal predictions based on TTM theory were translated into quantitative values based on Cohen's (1988) guidelines for interpretation of effect size. Table 2 displays the verbal and quantitative predictions for diet.

Study 3: Sun exposure
Twenty predictions for sun decreasing exposure were examined and are identical to those predicted for diet, except for the self-efficacy measure. For sun exposure, self-efficacy is represented as General Confidence and Sunscreen Confidence. Table 3 displays the verbal and quantitative predictions for sun exposure effect sizes.

Results
Baseline stage distributions for each behavior are presented in Table 4. Tables 5 displays sample sizes, means (represented as T-scores), and standard deviations by stage of change for Study 1.

Study 1: Smoking
Thirteen of 15 predictions were confirmed for smoking behavior.

Study 2: Diet
Six of 16 predictions were confirmed for diet behavior based on the behavioral criteria stage. Table 9 represents a summary of predictions and results for the 16 variables. The misses include: Pros, Neg./Aff., CC, DR, ER, HR, IS, RM, SC, and SL.
in increasing order of effect size.

Study 3: Sun Exposure
Five of 19 predictions were confirmed for sun exposure behavior based on the behavioral criteria stage.  Figure 4 displays predicted effect sizes and observed confidence intervals for variables in increasing order of effect size.

Discussion
The iterative nature of the TTQP approach necessitates a close investigation of failed predictions for each study. By understanding why a prediction has failed to be confirmed, a researcher can modify and adjust expected values of effect size for future use, thus providing specific empirical values for constructs within a theory. As data accumulates across studies, the effect size values become true quantitative predictions rather than categorical values (i.e. small, medium, large).
Four explanations exist that may explain a failed prediction: sample fluctuation, theory revision, incorrect prediction, or further calibration needed. First, the use of 99% confidence interval permits a small number of misses due to chance fluctuations in a sample. These misses tend to be "near misses" and are very close to falling within the interval generated by sample data, but instead fall just outside of it.
Thus, a future replicate may in face confirm these near misses. Second, theory revision is required when an observation falls in the opposite direction of or very far away from the prediction. Such may be the case when a small effect was predicted and a very large effect was obtained. The theory made an inadequate prediction and need to be revised to predict large effects rather than small effects for that variable. Third, a prediction may be incorrect when observations are undoubtedly discordant with predictions such that the theory led to an overwhelmingly incorrect prediction. It may be the case that the theory itself needs major reconsideration to better reflect empirical observation, as opposed to a slight revision, as explanation two describes. Finally, further recalibration may be needed when observations and their confidence intervals do not align with any of the predicted values. This would be shown by variables that demonstrate consistent effects that do not fall within Cohen's guidelines. For example, for a particular behavior, a medium effect size may be better represented by a value of ω 2 = 0.08 rather than 0.06 if observed values fall more frequently near that estimate. It should be noted that recalibration is specific to the population and behavior of interest and should always be considered in regard to context; thus, a large effect for smoking may be presented as ω 2 = 0.19 while a large effect for diet may be presented as ω 2 = 0.14.

TTQP
Study 1 represents a successful replication of the smoking prediction from Velicer et al. (2008). Thirteen of 15 predictions were confirmed and the two misses were near misses, most likely due to sample fluctuation. It is evident that although the Pros and Self Liberation measures were not confirmed by the current study, they fell just outside the confidence interval but are still consistent with previous findings. Pros (observed ω 2 = 0.022, [99% CI: .004, .047]) was posited to have no effect, but was observed to a lower limit of .004, which is essentially zero and thus can be considered a near miss. Self-Liberation measure (observed ω 2 =0.101, [99% CI: .046, .164]) represents a fairly large effect size but was found to be lower than predicted (ω 2 = 0.19). Regardless, the upper interval is still quite large in the context of smoking variables and though sample fluctuation may be the reason for this miss, interpretation remains conceptually the same.
As data accumulates across replications, it becomes possible to generate true quantitative predictions. This study represents the third replication for smoking behavior variables in the TTM. Table 11 displays a compilation of ω 2 values obtained across three studies. Data from the current study was compared to data from Velicer et al (2008) and one other study presented in Velicer et al. (2008), the Random Digit Dial (Fava et al., 1995). Though  Health Care Provider was predicted as ω 2 = 0.01, and the lower bound of the confidence interval missed it at ω 2 = 0.018. Self Reevaluation was predicted as ω 2 = 0.14 and the upper bound of the 99% confidence interval just missed it at ω 2 = 0.135.
Future replication will reveal if these are sample fluctuations.  .143, .197]) fell more than one effect size class lower than the observed effect. It was predicted to have no effect, but was found to be a large effect and should In summary, the successful confirmation of thirteen out of fifteen smoking predictions lends substantial support for the cross-sectional TTM theory predictions in this research area. However, this is the first application of the TTQP approach on TTM measures of sun exposure and diet. Six of 16 and five of 19 predictions were confirmed for diet and sun exposure behaviors, respectively. Many of the predictions failed because they were an extension of the theory guiding the smoking predictions.
The inadequate fit of these predictions suggests that these measures behave differently than smoking and would greatly benefit from a replication using the observed effect sizes from Studies 2 and 3 as predictions for a new study.
Use of the TTQP approach highlights the degree to which these different measures affect health behaviors differently, allowing comparison of theory across various behavioral areas. This information is valuable for future studies that apply the TTM, or other competing theories, to target specific behaviors. For example, findings from Studies 1, 2, and 3 reveal that measures of Self-Efficacy were highest for Sun Exposure, suggesting that people may feel more confident in their ability to change their sun behavior than they are to change their diet or smoking behaviors.
All of the studies conducted in this paper are cross-sectional nature and taken from baseline measures, thus results only represent a snapshot of each measure. While baseline information is valuable, the TTQP can also be extended as a longitudinal approach. Longitudinal predictions involve the predictions of effect sizes for movement across time (progression, regression, or stable). This approach has been applied to smoking behavior (Velicer et al., in press) but needs to be replicated and applied to new behaviors. In addition, as demonstrated by the great variability in effect sizes across the three studies, this approach is specific to predictions regarding the constructs, measures, sample, and behaviors. Theory and context should always guide predictions and close examination of misses should always be conducted.
The methodology presented thus far represents an alternative to Null Hypothesis Significance Testing and promotes a meta-analytic and quantitative way of thinking. By using effect sizes and confidence intervals to test quantitative predictions, researchers are able to bypass significance tests that rely only on p-values. The TTQP approach shifts away from traditional ways of thinking by emphasizing the numeric strength of a measure. Quantitative values provide a researcher with more dynamic comparison across studies and across theories by utilizing an effect size metric.
Advancement in computer programming has made analyses considerably simpler, such as the SAS macro (Kromrey & Bell, 2010) that computes intervals based on noncentral distributions for determining upper and lower bounds of omega-squared (ω 2 ).
Furthermore, the TTQP recognizes and encourages the need for replication and metaanalysis in behavioral science. Results obtained by the TTQP approach are specific, allowing theorists to produce and test theories that can be quantitatively examined and falsified.        In order to fulfill the University of Rhode Island Department of Psychology requirements for the incorporation of a multiculturalism and diversity component, this study included an investigation of effect sizes across racial/ethnic subgroups. The TTQP approach applied confidence intervals to examine whether predictions are consistent by race/ethnicity to address and integrate the multicultural requirement. Johnson et al. (2002) used a quantitative approach to make predictions of effect size for TTM variables in a sample of ethnically diverse smokers. The predicted relationships were confirmed in this study and were consistent with previously reported studies of TTM measures as well as the smoking predictions applied in Velicer et al. (2008). This suggests that ethnically diverse samples do not differ from the population; however, the authors did not explicitly compare differences between different ethnicities. Other studies have also suggested that the TTM is effective in ethnically diverse adolescent groups (Callaghan et al., 2005) and the processes have been found invariant across race for exercise behavior (Dishman et al., 2010).

TABLES AND FIGURES
Since smoking predictions were replicated by Study 1, as discussed above, and are not as exploratory in nature as Studies 2 and 3, a fourth study compared smoking predictions across race/ethnicity. Study 4 therefore followed the same procedures as Studies 1, 2, and 3, but applied to TTQP approach to participants grouped by race.
Three subgroups were originally proposed to be examined (Hispanic, Black, White), however, sample sizes for racial/ethnic subgroups were extremely low, with only 1.3% Hispanic and 1% Black. To maximize a non-white racial/ethnic group, participants who responded to a race category other than white were pooled together. confirmed due to very wide confidence intervals; however, results are inconclusive.
The inadequate low sample size of non-white participants limits generalizability of findings from Study 4. As displayed in Figure A1, confidence intervals are extremely wide, suggesting a wide range of plausible ω 2 values, given the limited sample size.
Point estimates for observed ω 2 are fairly similar to predictions for the decisional balance and self-efficacy measures, but are highly variable for the processes. A future study with a larger sample of non-white participants should be employed to replicate this study. Many of the observed values for the processes are so far from predicted values that further investigation is warranted.  Figure A1. Comparison of predicted and observed ω 2 effect size values for smoking variables of non-white participants surrounded by 99% confidence intervals. Figure A2. Comparison of predicted and observed ω 2 effect size values for smoking variables of female participants surrounded by 99% confidence intervals. Figure A3. Comparison of predicted and observed ω 2 effect size values for smoking variables of male participants surrounded by 99% confidence intervals.