Psychometric Validation of the Four Factor Situational Temptations for Smoking Inventory in Adult Smokers

The situational temptations for smoking inventory assesses the degree of temptation a person might feel to smoke across a variety of situations found to be important for smoking cessation. The temptations measure with four subscales, Positive/Social (PS), Habit Strength (HS), Negative/Affective (NA), and Weight Concerns (WC), was previously validated among adolescent smokers. The measure that has been validated in adults includes only the PS, HS, and NA subscales, although weight concerns are also salient to adults who smoke and have been negatively associated with smoking cessation. This study examines the psychometric validity of the temptations measure with the addition of the WC subscale, including stability of the measurement model, using a population-based sample of adults who reported being current smokers (N = 2921, age range 18–82 years, 68.6% white, 55.3% female). Participants in the sample had complete data for the measure, and those with extreme response patterns were deleted. Confirmatory factor analyses (CFA) showed that theoretically based four-factor (PS, HS, NA, WC) models fit the measure well (CFI: .967, RMSEA: .052), with moderate to high internal consistency for all subscales (α .55 – .91). Multiple sample CFA established that the factor structure of the temptations measure was invariant across population subgroups defined by gender, age, racial identity, ethnicity, stage of change for smoking cessation, baseline smoking severity, and weight status. Measurement invariance testing using multiple sample analyses of mean and covariance structures showed that the invariance models fit well across stage of change, racial identity, ethnicity, and weight status at the level of strong measurement invariance. These results indicate a consistent relationship between the four factors (PS, HS, NA, WC) of the situational temptations for smoking measure, and the twelve items that serve as their measured indicators, confirming the internal validity of the measure in adult smokers. Multivariate analysis of variance revealed a small but significant effect of stage of change on the temptations subscale scores, demonstrating that the temptations measure can differentiate between adult smokers in the early stages of change for cessation.


INTRODUCTION
Smoking is the single most preventable cause of premature death and chronic disease in the United States. It causes heart and pulmonary diseases, multiple types of cancer, and exacerbates other chronic health conditions (USDHHS, 2010). Each year in the United States, smoking accounts for at least 443,000 premature deaths, and approximately $96 billion in direct medical costs and $97 billion in lost productivity (CDC, 2008). Nonetheless,approximately 19% (43.8 million) of all adults in the United States continue to smoke (CDC, 2012). Even though the prevalence of smoking has declined slightly since 2005 (CDC, 2011), the current estimated smoking rate is still much higher than the Healthy People 2020 target of less than 12% (USDHHS). Smoking rates still vary widely across racial or ethnic groups, with the highest prevalence found among American Indians/Alaska Natives, African Americans and non-Hispanic whites (CDC, 2011;Caraballo, Yee, Gfroerer & Mizra, 2008), and most subgroups would be unable to meet the Healthy People target if the current trend continues.
Increasing cessation rates among those who currently smoke and preventing smoking in the population remain important public health goals.
Behavioral interventions for smoking cessation using tailored health communications based on the Transtheoretical Model of behavior change have been developed and implemented, and have demonstrated significant impacts (e.g., Velicer, Prochaska, Fava, Laforge & Rossi, 1999;Prochaska et al., 2004Prochaska et al., , 2005. The Transtheoretical Model (TTM; Prochaska & Velicer, 1997;Velicer et al., 2000) is an integrative model of intentional behavior change underlying numerous effective interventions. Empirically based tailoring is especially relevant in population-based interventions when not everyone is prepared to change their risk behavior immediately , for example, less than 20% of all smokers in the United States are prepared to quit smoking in the next month (Velicer et al., 1995).
The concept of self-efficacy refers to an individual's perceived ability or confidence to perform a task, which in turn mediates performance on future tasks (Bandura, 1977); it is also one of the core constructs integrated within the TTM framework. Temptation to smoke is conceptualized to be inversely related to confidence/self-efficacy in remaining abstinent from smoking, and reflects how tempted people are to smoke in different situations rather than how confident they are to avoid smoking in those situations (Velicer, DiClemente, Rossi & Prochaska, 1990). The theoretical relationship between self-efficacy/temptations and progress through the stages of change (i.e., readiness to change) has been documented (Velicer et al., 1990;Fava, Velicer & Prochaska, 1995;Velicer, Rossi, Prochaska & DiClemente, 1996), and incorporated into TTM-tailored intervention programs.
Appropriately operationalizing theoretical constructs into psychometrically sound measures is critical for testing and implementing a theoretical model. Several TTM-based smoking cessation measures have been tested in adult smokers and demonstrated good psychometric validity (e.g., O'Connor, Carbonari, & DiClemente, 1996;Ward, Velicer, Rossi, Fava & Prochaska, 2004). The situational temptations for smoking inventory with the original three subscales (Positive/Social, Habit Strength, Negative/Affective) has been used in a number of applications (e.g. Prochaska et al., 2004Prochaska et al., , 2005, however, no study to date has evaluated the psychometric properties of this version of the Temptations measure including the additional (fourth) Weight Concerns subscale in an adult population. Smoking-specific weight concerns are salient to both women and men who smoke, and are equally important among adult African American and Caucasian smokers, and weight concerns and body image have been associated with lower rates of smoking cessation and relapse (White, McKee & O'Malley, 2007;Clark et al., 2004;Pomerleau, Zucker, Namenek Brouwer, Pomerleau & Stewart, 2001;Sanchez-Johnson, Carpentier & King, 2011;Klesges & Klesges, 1988;Meyers, et al., 1997;USDHHS, 2001). The situational temptations subscale relating to Weight Concerns should therefore be evaluated for inclusion in the temptations measure, in order to be used in assessment and interventions.
The aim of this study is to assess the internal and external validity and measurement stability of the temptations measure with the addition of the fourth Weight Concerns subscale, including confirming the factorial invariance of the measure. Factorial invariance is central to establishing the internal validity and reliability of a measure, as it indicates whether a set of items measures the same theoretical constructs consistently across population subgroups, allowing legitimate comparisons between groups on the measure of interest (Meredith, 1993;Meredith & Teresi, 2006 Concerns (see Figure 1), to demonstrate that the temptations inventory is reliably measuring four constructs.
a. Items 1 to 3 should have primary, non-zero loadings on the first factor (i.e. Positive/Social; see Figure 1) to demonstrate that these three items are reliably measuring positive or social situations where a person may feel tempted to smoke.
b. Items 4 to 6 should have primary, non-zero loadings on the second factor (i.e. Habit Strength; see Figure 1) to demonstrate that these three items are reliably measuring situations related to smoking habits when a person may feel tempted to smoke.
c. Items 7 to 9 should have primary, non-zero loadings on the third factor (i.e. Negative/Affective; see Figure 1) to demonstrate that these three items are reliably measuring negative/affective situations when a person may feel tempted to smoke. d. Items 10 to 12 should have primary, non-zero loadings on the fourth factor (i.e. Weight Concerns; see Figure 1) to demonstrate that these three items are reliably measuring situations when a person may feel tempted to smoke due to weight concerns.
Hypothesis 2: The final correlated four factor model for temptations should provide an adequate fit to the data, with CFI > .90 and RMSEA < .08. This would demonstrate that the four-factor temptations measurement model fits well in a large sample of adult smokers.
Hypothesis 3: The final correlated four factor model for temptations should also have the potential for one higher order factor (i.e. Temptations), to demonstrate replication of the hierarchical factor structure found previously in other samples (Velicer et al., 1990;Plummer et al., 2001). Hypothesis 10: Scores on the temptations inventory should show significant mean differences across the three stages of change at baseline (i.e., Precontemplation, Contemplation, and Preparation) to demonstrate that the temptations measure has "known groups" validity . The effect size for stage of change is expected to be relatively small because the baseline sample is restricted to current smokers (i.e., those in Precontemplation, Contemplation, and Preparation to quit smoking), whereas the Temptations/Self-efficacy construct is theorized to be more important during the later stages within the Transtheoretical model framework.

Participants
This study involved secondary analyses of primary data from a large population-based smoking cessation intervention study . All procedures were approved by the Institutional Review Board at the University of Rhode Island. The study was a randomized-controlled trial with four separate treatment arms in which TTM-tailored CTIs were applied, and one assessment-only comparison arm. Participants were recruited from a population of smokers who had been proactively recruited via a national list-assisted telephone survey. Participants provided informed consent, and were then randomly assigned to one of the four intervention conditions or to the control arm. Randomization was stratified by stage of change for smoking cessation.
Data from all participants were collected at each assessment time point by telephone interview conducted according to an established protocol. Participants in the control group were assessed at baseline, 12, and 24 months. Participants in each of the four intervention groups completed assessments at baseline, 6, and 12 months for intervention purposes; printed intervention materials were mailed to them immediately upon completion of the telephone surveys. Participants in the intervention groups also completed a final follow-up assessment at 24 months. Participants in the four intervention groups were assessed and treated at baseline, 6, and 12 months; those in the control group completed assessments at baseline and 12 months. All participants completed a follow-up assessment at 24 months. This psychometric validation study for the situational temptations for smoking inventory was conducted using baseline data combined across all five intervention and control groups.  Table 2.

Instruments
This study focused on demographic questionnaires, the situational temptations for smoking inventory, and the Transtheoretical model (TTM) stage of change scale for smoking cessation, from the baseline assessment. Demographic variables were not analyzed directly with respect to smoking behavior or the outcomes of the intervention study, but were assessed and reported as they relate to the internal and external validity of this psychometric assessment study. There was adequate racial-ethnic heterogeneity among participants in this large sample of adult smokers to allow assessment of the stability of the temptations measure across different population subgroups defined by gender, racial identity, ethnicity, age, stage of change for cessation, smoking problem severity, and weight status.
The situational temptations for smoking inventory assesses the degree of temptation a person feels to smoke across different situations. The version of the measure being evaluated consists of four subscales: Positive/Social (PS), Habit Strength (HS), Negative/Affect (NA), and Weight Concerns (WC), with three items for each subscale. For each item, participants are asked to rate how tempted they may be to smoke in each of the situations described using a 5point Likert scale (5=Extremely tempted, 4=Very tempted, 3=Moderately tempted, 2=Not very tempted, 1=Not at all tempted), a response format that has been preferred by several researchers (e.g. Redding et al., 2006). Table 1 shows the list of 12 items for the four factor temptations measure. A hierarchical three factor model, without the Weight Concerns subscale, has been demonstrated among adult smokers (Velicer et al., 1990), and extensively used. A hierarchical four factor measurement structure with all four subscales (PS, HS, NA, WC) was previously tested in a large sample of adolescent smokers and ex-smokers in the United States, and subsequently validated in a sample of Bulgarian adolescent smokers (Plummer et al., 2001;Anatchkova, Redding, & Rossi, 2006). In the present study, an alternative measurement model with four correlated subscales (see Figure 1) was assessed in measurement invariance analyses using data from the baseline assessment.
Stage of change was measured using an algorithm assessing readiness to quit smoking based on the following criteria: Precontemplation (not intending to quit in the next 6 months), Contemplation (intending to quit in the next 6 months), Preparation (intending to quit in the next 30 days, and has attempted to quit for at least 24 hours one or more times within the past year), Action (quit for less than 6 months), and Maintenance (quit for 6 months or more). The reliability, utility, and predictive validity of this algorithm have been demonstrated (DiClemente et al., 1991;Prochaska & DiClemente, 1983;Velicer et al., 2007). In addition to the discrete stage measure, the average number of cigarettes smoked per day is a quantitative measure of smoking behavior that permits participants to be categorized into groups by baseline severity according to the following criteria: Light smoker (not more than 15 cigarettes per day), Moderate smoker (16 to 29 cigarettes per day), and Heavy smoker (30 or more cigarettes per day). These cutoff points were selected so as to be reasonably consistent with previous studies of light and heavy smokers (Rossi, Prochaska & DiClemente, 1988 categories were therefore collapsed to form a single weight status category (BMI 24.9 or less) that was used in the measurement invariance analyses.

Analyses
To assess the psychometric properties and validity of the situational temptations for smoking inventory, this study utilized several psychometric procedures including principal components analysis (PCA), confirmatory factor analysis (CFA), multiple sample nested invariance model comparisons, and multivariate analysis of variance (MANOVA). Some of these psychometric procedures are included within the structural equation modeling (SEM) framework. All SEM procedures in this study were conducted using EQS 6.2 (Bentler, 2007) and the results were replicated using the lavaan software package (Rosseel, 2012) in the R statistical computing environment. Other analytic procedures, such as calculating descriptive statistics, PCA, and MANOVA, were conducted using SPSS 19.
The initial phase for the analyses utilized a "split-half cross validation" approach to validate the factor structure of the measurement model for the temptations inventory. The overall baseline sample was randomly divided into two subsamples to form an exploratory half and a confirmatory half. This procedure was conducted using SPSS 19. Participants' characteristics were compared between the two subsamples, and the summary is presented in Table 3.
The goal of these analyses was to validate the temptations instrument with the additional (fourth) Weight Concerns subscale in adult smokers, instead of improving the scale as in a traditional measure development study. The cross-validation approach was applied only to the PCA and assessment of the measure's internal consistency (i.e., coefficient alpha; Cronbach, 1951); these procedures were therefore conducted in both the exploratory and confirmatory samples to verify replication of the results. Analytic procedures to validate the temptations measure were performed using the full sample, including CFA to assess the measurement model and MANOVA to test discriminant or "known groups" validity based on the TTM .
PCA was conducted to examine the model structure and how the 12 measured items relate to the latent factors in the temptations measure. The Varimax with Kaiser normalization rotation method was used to interpret the factor structure resulting from the PCAs. The factor structure among manifest and latent variables was compared to the model that had been validated in adolescents (Plummer et al., 2001) and also to the original three factor model that was validated and has been extensively used in adult populations (Velicer et al., 1990).
Internal consistency estimates (Cronbach's coefficient alpha) were computed for each of the four subscales in both subsamples.
After cross-validation of the factor structure, CFA was used to test the fit of the hypothesized four factor temptations measurement model ( Figure 1) using data for the full baseline sample (N=2921). Normal distribution theory maximum likelihood (ML) and robust maximum likelihood (MLM) estimation methods were used. Multiple macro fit indices based on normal ML estimation were evaluated, including model χ 2 value, comparative fit index (CFI; Bentler, 1990), and the root mean square error of approximation (RMSEA). Manifest indicators that are ordinal variables may pose a challenge to the assumption of multivariate normality underlying normal theory ML estimation. However, for ordinal variables with five or more levels (e.g. assessed on a 5-point Likert scale as in the temptations measure), corrected test statistics computed based on robust standard errors using MLM estimation were found to be reliable for evaluating mean and covariance structures (MACS) based models (Curran, West, & Finch, 1995;Rhemtulla, Brosseau-Liard, & Savalei, 2012). Additional fit indices based on MLM estimation such as the Satorra-Bentler (1988) (Hu & Bentler, 1999). For the RMSEA, values below .06 indicate excellent fit (Kline, 2011). In addition, the individual items factor loadings were examined, with adequate factor loadings expected to be above .40. Five alternative comparison models for temptations including a single factor model, and a hierarchical model that includes one higher order factor in addition to the four first-order factors (PS, HS, NA, WC; Hypothesis 3), were also assessed using CFA in the full sample.
This study also investigated the stability of the final, best-fitting measurement model for temptations ( Figure 1) across population subgroups defined by stage, gender, racial identity, ethnicity, age, baseline smoking severity, and weight status. For the series of measurement invariance analyses, the baseline sample was split into subsamples for testing of the measurement model, for example, into male and female subsamples to test measurement invariance across gender. The four factor temptations model was first assessed for good fit to the data in each subgroup category separately. Next, multiple sample CFA based on the analysis of mean and covariance structures (MACS) was used to evaluate invariance of the final temptations measurement model ( Figure 1) across population subgroups simultaneously (e.g., across male and female subsamples). Based on analyses of mean and covariance structures, four levels of measurement invariance were tested using a stepwise procedure, progressing from the least to the most restrictive: (1) Equal form (also referred to as configural invariance) with the same factor pattern but unconstrained factor loadings; (2) Equal factor loadings (or metric invariance) with factor loadings for like items constrained to be equal across groups; (3) Equal indicator intercepts (strong factorial invariance) with both factor loadings and indicator intercepts (item means) constrained to be equal across groups; and (4) Equal indicator error variances (strict measurement invariance) with equal factor loadings, indicator intercepts, and item error variances across subgroups. Model fit was assessed using several fit indices, including model χ 2 value, CFI, and RMSEA based on both ML and MLM estimation. Measurement invariance was tested by examining the change in fit index values between a less restrictive model and the more constrained model. The χ 2 -difference test was included to assess decrement in fit for the nested invariance models, even though χ 2 statistics are very sensitive to large sample sizes (Kline, 2011), as in this study. Alternative fit indices that are not affected by sample size such as CFI, McDonald's Non-Centrality Index (NCI; 1989), and gamma-hat (Steiger, 1989), have been suggested for testing of measurement invariance (Cheung & Rensvold, 2002;Meade, Johnson & Braddy, 2008), and these were also assessed. The difference between the fit index values between the less restricted and more constrained models were computed and evaluated. This difference represents the deterioration in the fit of the model to the data as additional across-subgroup equality constraints are imposed, for example, the difference (∆CFI) when CFI for the equal factor loading model is subtracted from the equal form model. Cheung and Rensvold (2002) have suggested that ∆CFI greater than .01, ∆NCI greater than .02, and ∆Gamma-hat above .001indicate that the more constrained model provides a significantly worse fit to the data (i.e., does not support invariance with the additional constraints), and the less restrictive model should be retained. Chen (2007) showed that an alternative cut-off value between .005 to .008 for ∆Gamma-hat was more consistent in terms of sensitivity to invariance with the ∆CFI and ∆NCI guidelines previously suggested by Cheung and Rensvold (2002).
Last but not least, the external or "known groups" validity of the temptations measure was examined in the full sample . Multivariate analysis of variance (MANOVA) was conducted to simultaneously test for differences in the four temptations subscale (i.e., PS, HS, NA, WC) mean scores across the three baseline stages of Precontemplation, Contemplation, and Preparation. Although the 12 items in the temptations measure are ordinal variables with up to five response levels, the mean of three item scores computed for each subscale is a continuous variable that can be used in analytic methods based on the General Linear Model (GLM) such as ANOVA and MANOVA. Means, standard deviations, skewness, and kurtosis values for the four subscale mean scores were examined to assess departure from normality. Four independent ANOVAs were conducted as a follow-up procedure to the MANOVA to examine which of the four subscale scores showed significant mean differences across the three baseline stage of change groups. Follow-up Tukey tests for multiple pairwise comparisons were conducted for each significant ANOVA. Effect sizes were also computed for each model, including a multivariate η 2 for the MANOVA, and univariate η 2 for each ANOVA.

Comparison of cross-validation samples
The baseline sample of 2921 participants was randomly split into two cross-validation subsamples to form an "exploratory" half (Sample 1) and a "confirmatory" half (Sample 2). A comparison of the demographic and smoking-related characteristics for participants in both samples found no meaningful differences. The summary of the main characteristics for participants in each of the two samples is presented in Table 3. In addition to the principal components analysis procedures, assessment of the temptation measure's internal consistency was performed in each cross-validation sample.

Principal Components Analysis (PCA)
The purpose of these analyses was to examine the model structure and the relationship between the 12 measured items and the underlying constructs (components) in the temptations measure. PCA was conducted on each of the two cross-validation samples separately. The Varimax with Kaiser normalization rotation method was used to interpret the resulting factor structure. PCAs were performed initially without a priori specification of the number of components to be retained. The minimum average partial method (MAP; Velicer, 1976;O'Connor, 2000) was used to determine the number of underlying components to be extracted

Confirmatory Factor Analysis (CFA)
Confirmatory factor analysis was conducted to assess the fit of the temptations measurement model to the data based on the full sample of 2921 participants. Five alternative models besides the null model were compared: (1) a one factor model, (2) an uncorrelated four-factor model with three theoretically based indicators per factor (Table 1) Table 6. The Likelihood Ratio χ 2 test is based on the central χ 2 distribution and tests the null hypothesis that the sample variance-covariance matrix is equal to the predicted variance-covariance matrix produced by the specified model. It has also been shown to be inflated by sample size, so that even negligible discrepancies between the sample and predicted matrices can result in large and significant χ 2 values with large sample sizes (Brown, 2006;Kline, 2011). Alternative fit indices that are less sensitive to large Ns were examined for all CFA procedures, and generally given more weight than the χ 2 in assessment of model fit. All the model χ 2 obtained in this study were statistically significant because of the large sample size, even when alternative fit indices indicated otherwise good model fit. The model fit statistics based on robust ML estimation are reported, the normal theory ML statistics are also presented in Table 6 for reference.
First, a one factor model in which all 12 measured indicators loaded onto a single latent variable was tested. As expected, the model scaled χ 2 was very large and significant, S-B χ 2 (54) =4518.65, p < .001, indicating that the one factor model fit the data poorly. For the one factor model, the robust Comparative Fit Index (*CFI) was only .61, and the robust root mean square error of approximation (*RMSEA) was .168, confirming that the one factor model provided a poor fit to the data.
The next model assessed had four orthogonal factors; the three items associated with each subscale served as measured indicators for each factor (refer to Table 1). The uncorrelated four factor model also did not provide a good fit to the data, S-Bχ 2 (54) =1994.28, The fifth and final model assessed in the full baseline sample was an alternative hierarchical model that included one higher order "Temptations" factor in addition to the four first-order factors (PS, HS, NA, WC); the higher-order factor was implied by the significant correlations between the four first-order factors in Model 4 (see Fig. 2). The factor structure specified for each first-order factor had the same three indicators loading on them as in Models 2 and 4, and in turn, all four first-order factors served as indicators for a single higher order factor. The hierarchical model χ 2 was significant, S-B χ 2 (50)=429.00, p < .001, although review of other fit indices revealed that the hierarchical model also provided a very good fit to the data, *CFI=.967, *RMSEA=.051. The hierarchical model with standardized parameter estimates for the full baseline sample is shown in Figure 3. In the hierarchical model, the Habit Strength factor was found to be extremely strongly related to the higher order factor for this sample (standardized γ coefficient=1.0). The hierarchical model was confirmed, but was also not retained for subsequent testing in this study.
Internal consistency was assessed for (i) each of the four subscales, and (ii) the complete instrument with 12-items, in both cross-validation samples based on the final temptations measurement model; the computed coefficient alpha values are presented in Table 7  The internal consistency of the full measure was high (α=0.80) across 12-items.

Measurement Invariance (Multiple-sample CFA)
The purpose of these analyses was to examine the invariance (stability) of the final temptations measurement model ( Figure 1) over population subgroups defined by gender, racial identity, ethnicity, age, TTM-stage of change for cessation, baseline smoking (problem) severity, and weight status. The baseline sample was split into subgroups for testing of the measurement model. Sample sizes associated with each category for all seven subgroups are presented in Table 8.
As a first step, (single sample) CFA was used to test the fit of the correlated four factor measurement model to the data in each subgroup category separately. For each subsample category assessed, the temptations model demonstrated a very good to excellent fit to the data as shown by CFI > .90 and RMSEA < .08. The overall model fit statistics for each subgroup are presented in Table 9.1 (based on normal theory ML estimation) and surprising that all of the models had statistically significant χ 2 , and most of the Δχ 2 computed for nested model comparisons were also significant. The model χ 2 and Δχ 2 were therefore given much lower weight in assessment of fit compared to other fit indices (e.g. CFI).
Alternative fit indices used to assess model fit and test for invariance were robust versions of the CFI (*CFI), and Gamma-hat (*Gamma-hat), and the uncorrected McDonald's Noncentrality Index (NCI), these are presented in Table 11.1 (normal ML estimation) and  1, 2, 6, and 8). Four separate ANOVAs confirmed that the means were significantly different across age-groups for item 1 "with friends at a party," item 2 "over coffee while talking and relaxing," item 6 "when I realize I haven't smoked for a while," and item 8 "when I am very angry about something or someone." The ANOVA and follow-up Tukey test results for these four items are shown in Table 12. indicator error variances measurement model also provided an excellent fit for weight status, S-Bχ 2 (208)=673.04, p < .001, * CFI=.957, * RMSEA=.049.

External (Known groups) Validity
The purpose of these analyses was to assess the external (or "known groups") validity of the four factor temptations measure by testing whether the scores on the measure could differentiate between the different stages of change in adult smokers. The internal validity and measurement stability of the four-factor temptations measure was established through CFA, and measurement invariance testing. Therefore, it was reasonable to compute composite (unweighted mean) scores for each of the four subscales.

Confirmatory model
Confirmatory factor analyses (CFA) using the full sample (N = 2921) compared five competing measurement models for temptations. The theory-based measurement model with four correlated factors, each with three measured indicators (Figure 1), demonstrated an excellent fit to the data. All factor loadings were adequate to high, the highest loadings (> .80) were for the three weight concerns items and also item 9: "when things are not going my way and I am frustrated." The item with the lowest loading was item 5: "when I feel I need a lift" (λ = .44), review of the Lagrange Multiplier indices suggest that adding a path between item 5 and the weight concerns factor would significantly improve model fit even more than adding another suggested path between item 2 and the habit strength factor would. This indicates that item 5 was quite likely to be another complex item. All correlations among the four factors were significant, the highest correlation was between the positive social and habit strength factor (φ = .74), which may reflect some impact of the complex-loading for item 2. The correlations between the weight concerns factor and the other three factors were much lower suggest that several items on the temptations measure could be further improved.
Cronbach's coefficient alphas computed for each three-item subscale showed that the weight concerns subscale had the highest internal consistency. The unweighted mean score in the full sample for weight concerns was well below the theoretical midpoint for the 5-point response scale (i.e. 2.09 compared to 3.00), and the distribution of the scores was positively skewed, suggesting that weight concerns were not highly endorsed by a high proportion of the sample. The negative/affective subscale also showed fairly high internal consistency, but the mean score was much higher than the scale midpoint and the distribution was negatively skewed. Temptations to smoke in response to stress or anxiety can hamper attempts to quit smoking, and could benefit from tailored interventions that address this barrier. Finally, the estimates of coefficient alpha indicated that the internal consistency of the positive/social and habit strength subscales were only moderate, and lower than found in previous samples. Once again, these results probably reveal some effect of the cross-loading for item 2. Internal consistency for the positive/social subscale was re-assessed after excluding item 2; the computed coefficient alpha for 2-items of .56 was exactly the same as the previous 3-item alpha, indicating that inclusion of a poor item did not contribute to subscale performance.

Measurement invariance
This study confirmed the invariance of the temptations measurement model with four correlated subscales across multiple population subgroups in a large sample of adult smokers.
The strong factorial invariance model constrained factor loadings and item intercepts in the model to be equal across comparison groups, and provided a very good fit across gender, racial identity, ethnicity, age, stage of change for cessation, smoking problem severity, and BMI status, based on CFI values around .95 and RMSEA values below .08. Results of these analyses indicate a consistent relationship between the four factors (PS, HS, NA, and WC subscales), and the twelve items that serve as measured indicators for the factors.
Although the CFI and RMSEA values for the strong factorial invariance (i.e. equal factor loadings and item intercepts) models indicated good to excellent fits across all subgroups tested, it should be noted that the ∆CFI, ∆Gamma-hat, and ∆NCI values computed to compare nested invariance models were slightly less consistent for comparisons across gender, age and smoking severity subgroups. For gender, ∆CFI= .019, ∆Gamma-hat= .011 and ∆NCI=.035 were all above the suggested cut-offs of ∆CFI= .010, ∆Gamma-hat= .001 and ∆NCI=.020 (Cheung & Rensvold, 2002), or even the alternative ∆Gamma-hat range of .005 to .008 proposed by Chen (2007), when factor loadings were constrained to be equal. This suggests some slight differences in the factor loadings between men and women, specifically on item 7: "when I am very anxious and stressed," and item 12: "when I am concerned about managing my weight." However, further examination of the discrepant loadings indicate that even though the difference in absolute values were statistically significant, the magnitude of the difference represented only a small effect for item 7, Cohen's q = |0.18| (Cohen, 1988), and for item 12, the suggested effect appeared larger than the real difference showed (λ women : .95 vs. λ men : .90, not a meaningful difference) as an artifact of being at the extreme tails of the distribution. When equal indicator intercepts were constrained for comparisons across age subgroups, ∆CFI= .024, ∆Gamma-hat= .045 and ∆NCI=.015, suggesting some lack of invariance. Review of the modification indices revealed that four items associated with the decrease in model fit were item 1 "with friends at a party," item 2 "over coffee while talking and relaxing,' item 6 "when I realize I haven't smoked in a while," and item 8 "when I am very angry about something or someone." When equal indicator intercepts were constrained for comparisons across smoking problem severity subgroups, ∆CFI= .016, ∆Gamma-hat= .028 and ∆NCI=.009, suggesting again possible invariance in some indicator intercepts.
Examination of the modification indices showed that the three items with intercepts (means) that were not invariant across smoking severity subgroups were item 2 "over coffee while talking and relaxing," item 4 "when I first get up in the morning," and item 5 "when I feel I need a lift." Items 4 and 5, and possibly item 2 as suggested by the PCA results, are all related to smoking habit strength, so it is not surprising that groups means for these specific items were different across light, medium and heavy smokers. It is also possible that these results indicate a possible interaction between the effects of age and smoking severity. However, the noted decrement in model fit when cross-group equality constraints were imposed do not invalidate the high degree of fit for the strong invariance model as indicated by the macro model fit indices such as CFI and RMSEA values.
These results demonstrate that the measurement model of four correlated factors for situational temptations for smoking have a consistent relationship across subgroups and provide empirical support for the internal validity of the measure. The four subscales have demonstrated invariance in factor loadings and indicator intercepts, and even indicator error variances (for subgroups defined by racial identity, ethnicity, stage of change and weight status), across multiple subgroups assessed, and allow meaningful comparisons of the measured constructs to be made across different samples in the target population.

External validity
Multivariate analysis of variance showed that temptations varied slightly across the first three stage of change although the overall η 2 of .02 would be interpreted as a small multivariate effect size (i.e., < .02; Cohen, 1992). This is consistent with TTM predictions because the Temptations/Self-efficacy construct is theorized to be more important during the later stages of Action and Maintenance . As expected, participants' temptation to smoke in positive/social and habit strength situations were highest in Precontemplation and lower among those in Preparation, replicating previous studies in adults and adolescents (Hoeppner et al., 2012;Redding et al., 2013;Velicer et al, 1990). The largest increase on the negative/affective subscale was observed between smokers in Contemplation compared to those in Precontemplation, before decreasing again for those in the Preparation stage. The η 2 of .005 indicates a small effect of the three early stages of change on variance in negative/affective scores (Rossi, 2012). Negative affect was also more highly endorsed than the other subscales. Interestingly, weight concerns showed a pattern of increase across stage groups, which was in the opposite direction compared to the other subscales, although this was also a very small effect. Weight Concerns were endorsed much lower than the other subscales, indicating that it was not as important across all participants in the sample. However, smokers for whom weight concerns may be a barrier to cessation may benefit from individually tailored intervention attention. These results support the use of this measure for both assessing temptations to smoke and for tailored intervention purposes in this sample of adult smokers.

LIMITATIONS
Findings from this study are based on data from a large population-based sample.
However, one major limitation of this study was the restricted range because the sample consisted entirely of current smokers. This low variability in the sample was also indicated by the low values of the determinants for the data matrix. If possible, a sample that includes a mix of both current and former smokers (e.g. from a follow-up assessment) should be selected for future analyses, which would provide greater variance in responses on these measures, and also a wider range in terms of stages of change (i.e. a sample with smokers who have quit smoking would allow assessments to include the Action and Maintenance stages). The reduced variability in a sample that included only smokers in the pre-Action stages (i.e. Precontemplation, Contemplation, and Preparation) may also have reduced the estimates of internal consistency for some subscales, which were lower compared to previously reported estimates.
Another limitation of the current sample relates to the racial and ethnic demographics. A sample that is more diverse in terms of racial identity, with adequate numbers of other racial groups besides white and black, would allow more comprehensive assessment of the measure across more racial groups. The sample sizes used in the analyses were highly unbalanced across racial and ethnic groups, although the invariance models were still indicative of good fit. This sample also had too few participants who were classified as underweight (i.e. BMI below 18.5), so that underweight participants had to be combined with those of normal weight (i.e., BMI 18.5-24.9). This resulted in greater heterogeneity in weight status among participants in that subsample for measurement invariance testing. It also meant that the measure could not be assessed specifically in a sample of underweight adult smokers; it would have been especially interesting to investigate whether the fourth Weight Concerns factor was equally stable in underweight adults who smoke.
Finally, the cross-sectional nature of the data used was another limitation of this validation study. This measure would benefit from longitudinal analyses, for example, assessing the predictive validity of the four factor inventory. Also, establishing measurement invariance over time would satisfy a fundamental assumption of any analyses designed to investigate temporal change in the construct. samples (Hoeppner, 2012;Plummer et al, 2001). The fourth factor, weight concerns, had high factor loadings and high internal consistency (coefficient α .91). The internal consistency for the negative/affective subscale was high (α .79), although lower than expected for both remaining subscales (positive/social α .56; habit strength α .55).
In addition, these study results provide strong support for the stability of the four factor measurement model across population subgroups defined by stage of change for cessation, gender, racial identity, ethnicity, age, smoking problem severity, and weight status. These findings confirmed that the four factors and the set of 12 items that serve as their measured indicators have a consistent relationship across population subgroups, and provide empirical support for the internal validity of the measure. The four factor measurement model demonstrated invariance in factor loadings and indicator intercepts, allowing meaningful group comparisons to be made on these constructs.
Finally, temptations varied slightly across the first three stage of change consistent with TTM predictions , although only the habit strength and positive/social subscales replicated previous findings in adults and adolescents (Hoeppner et al., 2012;Plummer et al., 2001;Redding et al., 2013;Velicer et al., 1990