VALIDATION OF A SOCIAL SKILLS CONSTRUCT USING MULTITRAIT MULTIMETHOD AND GENERALIZABILITY APPROACHES

Social skills are important components of social‐emotional functioning that allow children to be successful in school, both socially and academically. A single, agreed‐upon definition of social skills has not been identified in the literature, however, which has led to variations in the measurement and assessment of social skills. Issues of measurement may be linked to the ineffectiveness of school‐based social‐skills interventions. Commonly used conceptualizations and operationalizations of social skills are presented, as well as a review of issues surrounding social‐skills interventions. The use of a multitrait‐multimethod approach is presented for the use of establishing a unified set of social skills and the use of generalizability theory is examined as a psychometrically‐based approach to developing a measure for observing social skills. The assessed skills included six social skills, drawn from the Social Skills Improvement System – Rating Scales, and two academic skills. Skills were assessed using behavioral observation and rating scales. Convergent and discriminant validity was evaluated for a social skills construct, or a homogenous set of social skills. The reliability of the observational strategy was evaluated in order to assess the optimal number of occasions and skills needed in order to obtain adequate degrees of reliability. Results indicated that a homogenous set of skills could be identified and that a behavioral observation strategy could be used reliably to assess social skills. Results are discussed in terms of applied use for the measurement strategy in school settings for formative assessment and in terms of directions for future research.

operationalizations of social skills are presented, as well as a review of issues surrounding social-skills interventions. The use of a multitrait-multimethod approach is presented for the use of establishing a unified set of social skills and the use of generalizability theory is examined as a psychometrically-based approach to developing a measure for observing social skills. The assessed skills included six social skills, drawn from the Social Skills Improvement System -Rating Scales, and two academic skills. Skills were assessed using behavioral observation and rating scales. Convergent and discriminant validity was evaluated for a social skills construct, or a homogenous set of social skills. The reliability of the observational strategy was evaluated in order to assess the optimal number of occasions and skills needed in order to obtain adequate degrees of reliability. Results indicated that a homogenous set of skills could be identified and that a behavioral observation strategy could be used reliably to assess social skills. Results are discussed in terms of applied use for the measurement strategy in school settings for formative assessment and in terms of directions for future research.
iii Acknowledgement I would like to express my thanks and appreciation to those who helped me accomplish this goal and complete my dissertation. First, I would like to thank my major professor, W. Grant Willis, for his guidance and support throughout my graduate career. Second, my appreciation goes out to the students and staff at the participating school for their willingness to assist me in my research. Next, I would like to thank my friends and family for their love and encouragement when I needed it most. Finally, and most significantly, I would like to thank my husband for his endless and tireless support, for never letting me give up or settle, for pushing me forward, and for being a constant source of encouragement. iv

Statement of the Problem
Social functioning plays an important role in students' abilities to thrive in school (Cappadocia & Weiss, 2011). Students with poor social skills are at-risk for internalized and externalized behavioral difficulties as well as poor academic achievement (Cook, Gresham, Kern, Barreras, Thornton, & Crews, 2008). Socialskills interventions occur in schools as a means of improving the social-emotional functioning of students with difficulties. Unfortunately, however, social-skills interventions often are ineffective (Cappadocia & Weiss, 2011;Gresham, Sugai, & Horner, 2001), and this may be related to associated assessment and measurement issues (Matson & Wilkins, 2009).
Proper assessment of skill deficits and skill performance in multiple settings may improve the overall effectiveness of social-skills interventions. The most common social-skills assessment methods are direct behavioral observation and behavioral rating scales (Matson & Wilkins, 2009). There are no current, agreedupon criteria for defining social skills, however, and, as such, measurement strategies vary and focus on identifying specific behaviors to target for treatment (Matson & Wilkins, 2009); rating scales often vary greatly in the kinds of behaviors and skills they assess. Additionally, measures for monitoring the progress of socialskills interventions, which would improve differentiation of instruction and potentially increase effectiveness, are not currently available , although there are a few in development (Cummings, Kaminski, & Merrell, 2008;Stichter, Herzog, O'Connor, & Schimdt, 2012).
The current research aimed to use a multitrait-multimethod approach to evaluate the validity of a social skills construct. Additionally generalizability theory was employed to evaluate the psychometric properties of a behavioral measurement tool and its utility for using multiple observers in multiple settings to observe social skills. It was hoped that a reliable set of social skills could be identified and a useful method for progress monitoring of these skills could be established.

Critical Review of Literature
The following critical review focuses on theoretical issues surrounding the conceptualization of social-skills constructs and their measurement, aspects affecting the outcome, or overall effectiveness, of social-skills interventions, and measurement approaches to establishing reliability and validity.

Social Skills Background
Definitions and conceptualizations. Social skills may be conceptualized as a set of competencies that allow an individual to initiate and to maintain social relationships that contribute to peer acceptance and school adjustment (Luiselli, McCarty, Coniglio, Zorilla-Ramirez, & Putnam, 2005). Social skills also have been conceptualized as social cooperation skills that lead to successful school adjustment and positive peer relationships (Cummings et al., 2008). Crowe, Beauchamp, Catroppa, and Anderson (2011) suggest that appropriate social functioning is important across the lifespan and is the basis for forming lasting relationships. Gresham et al. (2001, pp. 333) conceptualized social skills as "specific behaviors that an individual uses to perform competently or successfully on particular social tasks." How social skills are defined or conceptualized has an impact on the manner in which they are operationalized, or measured. The operationalization and assessment of social skills, in turn, may impact the skills that are targeted for intervention as well as the intervention approach and its overall effectiveness in improving said skills.
Operationalization and assessment. As mentioned previously, no single, agreed-upon conceptualization or definition of social skills exists. As such, multiple social-skills constructs have been suggested as well as multiple methods of assessing said constructs. For the purposes of measurement, social skills are often broken down into distinct observable behaviors. In other words, social skills may be operationalized as the target behavior(s) to be observed, or measured. Most frequently, these distinct behaviors are measured through contrived social role plays or standarized rating scales (Matson & Wilkins, 2009 were the most commonly used form of assessment. Gresham (2002, as cited in Luiselli et al., 2005 suggested that a problemsolving model should be used to assess social skills; and that problem identification and problem analysis form the first two stages of this model and should incorporate rating scales and direct observations in tandem. In this manner, information may be obtained regarding the context in which social difficulty occurs and to what degree (i.e., compared to same-aged peers). This information then can be used in the second two stages of the problem-solving process, namely plan, or intervention, implementation, and treatment evaluation. In this problem-solving model, rating scales are suggested to help identify target behaviors for intervention but may not serve as the best means of evaluating intervention effectiveness, or overall outcomes.

Social-Skills Interventions
Social skills play an important role in child development, in general, and play a pivotal role in a student's ability to thrive in a school environment (Capadocia & Weiss, 2011). Luiselli et al. (2005) suggested that success in school depends as much on social skills as academic performance, namely because effective social skills help children to form friendships, respond appropriately to classroom expectations, and build positive relationships. Well-developed social skills are linked to positive social and school-related outcomes whereas poorly developed social skills may place students at risk for poor social and school-related outcomes (Cook et al., 2008). Gresham, Cook, Crews, and Kern (2004) have suggested that social skills may serve as academic enablers because they appear to help facilitate academic performance and often are linked with academic achievement.
Students who lack social-emotional skills often receive school-based interventions in order to remediate associated difficulties and to facilitate socialemotional development. These interventions, however, often unfortunately are ineffective.
Effectiveness. Within the last three decades, multiple meta-analyses have been conducted assessing the effectiveness of social-skills interventions (see Mabe, 2013 for a more detailed examination). Reviews of the meta-analytic literature (Cook et al., 2008;Gresham et al., 2004;Gresham et al., 2001) revealed two major findings: (a) many studies used outcome measures that were not directly linked to the skills taught during intervention, and (b) the majority of studies relied solely on outcome-based evaluation rather than formative assessment throughout.
Standardized rating scales, discussed previously, are an example of outcome-based measures that are frequently used to assess the effectiveness of social-skills interventions. When considering the first major finding, it is important to note that rating scales are often composed of hundreds of items. It is easy to see how these types of measures can assess a large number of skills and not necessarily focus on the skills targeted within an intervention. Additionally, in consideration of the second major finding, these types of measures provide summative information in a global manner that may not indicate a student's standing on specific skill components.
Rating scales are typically used as a general outcome measure (GOM), which do not provide information about specific skill defecits and simply provide an overall general description of skills (Hosp, Hosp, & Howell, 2007). Formative assessment, however, utilizes change-sensitive measures that allow one to observe small changes in performance over brief periods of time (Burns & Coolong-Chaffin, 2006). This method of assessment allows for a more detailed examination of a student's performance on specific skills over short periods of time and may be beneficial for instructional planning. A GOM may be used to determine if a student can perform particular tasks subsequent to an intervention. A formative assessment model differs from a GOM because it is change-sensitive and uses multiple measurements of skills throughout an intervention to detect performance changes, which inform instructional decisions throughout (Hintze, Christ, & Methe, 2006).
Relying on an outcome measure limits the individualization of an intervention that may be necessary to improve its effectiveness.
Overreliance on outcome-based measurement and poor individualization of instruction are major contributing factors to the ineffectiveness of social-skills interventions. Linked to these issues, and also contributing to ineffectiveness, are the issues of: (a) conceptual clarity, (b) instructional format, and (c) generalization of instruction. First, as previously discussed, there are many varied conceptualizations of social skills. Behaviors may be targeted for intervention based on assessments with a specific conceptualization (Matson & Wilkins, 2009) and omit potential skills not included in that particluar conceptualization. In other words, skills that may benefit from instructional support may not be included in interventions because they were not initially included as part of the identified social-skills construct. Skills that often are grouped (i.e., taught and measured) together may not be homogenous and may attenuate the reliability of measurement (Mabe, 2013). Second, the instructional format of social-skills interventions often focuses on acquisition deficits (have not learned skill) rather than performance deficits (do not perform a previously learned skill) and does not tailor instruction to account for individual differences (Gresham et al., 2001). Interventions may teach a number of skills to a group of students, all of who may experience different skill deficits. A failure to differentiate instruction to meet specific needs of students within a group may make the intervention less effective. Finally, interventions often lack generalization instruction, resulting in a failure of instruction to generalize to new situations (Cappadocia & Weiss, 2011;Gresham et al., 2001). Here, students may be instructed and learn to perform skills in one setting (e.g., instructional room outside of the classroom) but not in another (e.g., classroom or playground). Issues of assessment are found among each of the disscussed issues affecting intervention effectiveness, specifically, concerning conceptual clarity of social skills assessment and formative assesment approaches.
Progress monitoring and direct observation. Progress monitoring is a formative assessment approach used in conjunction with a multi-tiered format of intervention in schools called response-to-intervention (RTI). The RTI process is a complex one and depends on valid, easily administered, brief, change-sensitive measures to inform interventionists about student progress on specific skills in order to make decisions regarding their progress (Burns & Coolong-Chaffin, 2006;Hosp et al., 2007). These measures need to have a high level of reliability in order to make accurate instructional decisions from obtained data. Progress monitoring provides the means of evaluating instruction and decision making regarding instructional modifications (Fletcher & Vaughn, 2009;Stecker, Lembke, & Foegen, 2008). In other words, progress-monitoring tools are essential to effective interventions because they provide data for decision making about student needs and differentiation of instruction.
Social skills are often challenging to assess in a manner that is reliable, generalizable, and efficient (Cummings et al., 2008). Recently, some measures have begun to be developed in an effort to produce progress-monitoring tools for socialskills interventions that meet these criteria. Brief rating scales have been developed (Gresham & Elliott, 2008;Gresham et al., 2010) but remain in the style of traditional rating scales and may not be adequate for progress-monitoring purposes. Cummings et al. (2008)  Although behavioral observations may be well suited to progress monitoring, Hintze and Matthews (2004) found that direct behavioral observation often has low reliability even when interrater agreement is high. They suggested that direct observation may not be as reliable of a method as it is often believed to be in the field of school psychology. In another study that assessed the reliability of a behavioral observation tool (Mabe, 2013), three occasions of observation were found to be inadequate and a minimum of five occasions would be required for accurate assessment of skills. In the same study, the skills that were assessed did not cluster well together as hypothesized, which attenuated reliability. In order to obtain an adequate level of reliability for making instructional decisions, it appears that one would need to assess a validated homogenous group of skills on a sufficient number of occasions for a representative sampling of skill performance. Four occasions of observation may prove adequate for skill assessment if a higher level of interobserver agreement and skill homogeneity could be obtained.

Measurement Approaches to Establishing Validity
Multitrait-multimethod. A multitrait-multimethod approach is a method of assessing validity for a construct by evaluating convergent and discriminant validity simultaneously. Campbell and Fiske (1959) explained that convergent validity is necessary to validate a trait (e.g., social skills) but that discriminant validity is also required. In other words, one must be able to establish what a trait is, as well as what it is not. To do this, more than one trait and more than one method must be used during the validation process; the traits should be theoretically unrelated in order to ensure that discriminant validity can be obtained. For example, social skills and academic skills could be assessed using rating scales and behavioral observation because the skills and methods are theoretically independent of each other. Although the link between social and academic skills has long been studied and it is widely recognized that one may often have an impact on the other (Welsh, Parke, Widaman, & O'Neil, 2001), they are viewed as separate and distinct constructs; certainly, the assessment of one could not be substituted for the other with any degree of accuracy.
After identifying at least two theoretically independent traits, correlations can then be calculated between each trait as measured by each method in order to evaluate reliability and validity. Here, the index of reliability should exceed convergent validity values, and convergent validity values should exceed discriminant validity values (see Campbell & Fiske, 1959). Values that do not follow this order are an indication of poor validity for the methods or traits. For example, in the current study, it was anticipated that social and academic skills may have some degree of correlation, as one may be predictive of the other in some instances (Welsh et al., 2001). A high level of correlation between these traits, however, would indicate poor validity of trait constructs and would indicate that the two traits were theoretically linked in some way.
Generalizability theory. Generalizability theory (GT) is an extension of classical test theory (CTT) that can be used for assessing the reliability of behavioral measures. CTT is a major perspective in psychometric assessment that is used to evaluate measurement variability. In CTT, variability in test-scores is partitioned into two areas: (a) variance due to true scores, and (b) variance due to error. The major assumption in CTT is that error is randomly distributed and comes from sources unrelated to true differences in the assessed trait. GT extends CTT in a number of ways: (a) recognizing multiple sources of measurement error, (b) estimating each source of measurement error separately, (c) indexing the magnitude of each source of error, (d) distinguishing between relative (i.e., normative or inter-individual) and absolute (i.e., ipsative or within-individual) decisions, and (e) differentiating between generalizability and decision studies (Shavelson, Webb, & Rowley, 1989). Of particular interest for the purposes of this study is the use of GT to acount for multiple sources of error, estimate the magnitude of error for each source, and differentiate between generalizability and decision studies. Reliability estimates from GT studies account for expected error as well as additional error sources, which are important for the evaluation of behavioral measures (Hintze & Matthews, 2004). For example, in GT, one can account for error attributed to multiple observers and multiple settings when using behavioral observation.
As noted, GT differentiates between two phases of a study: (a) generalizability studies and (b) decision studies. These two phases work together to estimate an optimal level of reliability for a measure. The generalizability-study phase results in a g-coefficient that provides an estimate of reliability for a measure and can be interpreted similarly to the r-coefficient in CTT; the Generalizability study also estimates the magnitude of each source of error. The decision-study phase uses data from the generalizability study to estimate the impact that changes to a mesurement strategy can have in order to minimize error (Shavelson et al., 1989). In other words, one can estimate how adjustments to sources of error may influence reliability. For example, one could estimate how many observations need to be conducted of a particular skill in order to obtain a particular level of reliability.

Purpose of the Study
There is no single, agreed-upon definition or conceptualization of social skills in the literature. Social skills are often operationalized based on specific observable behaviors that differ across assessments. A number of assessments exist that assess various conceptualizations of social skills. The current study to uses a multitraitmultimethod approach to evaluate the validity of a social-skills construct.
help social-skills instructors gauge student progress on specific skills, differentiate instruction appropriately, and improve the overall effectiveness of social-skills interventions. The current study uses G theory to develop an observational, formative assessment tool for social-skills interventions that could be used for progress monitoring and decision-making purposes. It is hoped that the identification of a unitary set of social skills and the establishment of a method to monitor progress reliably will be able to improve program effectiveness and student outcomes.

Participants and Setting
Participants were 20 middle and high-school students from specialized classrooms in sixth through twelfth grade (ages 12 to 18 yrs., M = 15 yrs., SD = 2.1 yrs., Median = 15 yrs.) attending an alternative public day school run by an educational collaborative in the northeastern part of the United States. Sample size was selected given considerations for the statistical analyses that were conducted and with consideration of the limited size of the population and the lengthy process required to obtain consent. All grade levels of the participating school were represented due to the limited student population from which to draw a sample. The participating school is comprised of two specialized middle-school classrooms and three specialized high-school classrooms run by an educational collaborative. All students meet Diagnostic and Statistical Manual of Mental Disorders (DSM IV, TR) diagnostic criteria for one or more mental-health conditions. Students attending the collaborative school are in attendance because they have not been able to be successful in a traditional educational setting and have been referred to the school by their member district (i.e., school district that participates with the collaborative).
Students at the collaborative school often experience severe social-emotional and/or behavioral difficulties that require educational modifications in order for them to be successful. Students may have developmental delays or cognitive impairments as well.
Students are referred to the collaborative school from member districts in the surrounding area; the student population is diverse and consists of students from urban, low socioeconomic status (SES) areas as well as suburban middle-class areas. SES was estimated by participation in the school's lunch program: Students who qualified for free lunch were estimated to come from families of low SES, those who were eligible for a reducedprice lunch were estimated to come from families of middle to low SES, and those who paid the full price for lunch were estimated to come from families of middle to above SES. The majority of the sample (i.e., 75%) qualified for free lunch (low SES), 0% for reduced-price lunch (medium to low SES), and 25% paid the full price for lunch (medium to above SES). The characteristics of the sample are represented in Table 1. Several considerations should be made regarding the contribution of cultural influences shared by the sampled population to the expression of social skills. First, socioeconomic status indirectly may be linked to poor social skills. Children from low SES backgrounds often qualify for free or reduced meals at school in order to reduce academic and behavioral difficulties due to hunger (as mentioned previously, the number of students who qualified for free or reduced lunch was used as an indicator of SES). Jyoti, Frongillo, and Jones (2005) found that food insecurity over time is related to decline in reading and math test performance, increase in weight, and impairment of social skills. Results from that study also indicated that children from low SES backgrounds may experience difficulties over time in social and academic areas if their basic dietary needs are not met.
Additionally, the sample population is clinical in nature and the participants presented with marked social difficulties as a result. It is commonly understood that children and adolescents with Autism Spectrum Disorder frequently exhibit social interaction and communication difficulties. Other disorders found within the sample population, such as Attention-deficit Hyperactivity Disorder, Mood Disorder, Anxiety Disorder, and Oppositional Defiant Disorder, are also associated with social difficulties, although they may originate for different reasons. It is important to note that the sample population is distinctly different from a typical population of students in a regulareducation setting.

Design
Two broadly defined traits and three methods were selected for the multitraitmultimethod design. The first trait, social skills, was more narrowly delineated into six social domains (representing the construct of interest for validation). The second trait, academic skills, was more narrowly delineated into two academic domains, that is reading and math, and was selected as a discriminative construct. Academic skills were selected as the discriminant trait in this study because they are theoretically separate from social skills and can be measured as such. Although studies such as those conducted by Arnold, Kupersmidt, Voegler-Lee, and Marshall (2012) and Demaray and Jenkins (2011) highlight the relationships that exist between social and academic skills, these studies also demonstrate that these two constructs are disparate ones and can be measured separately. Each construct was assessed with three methods: (a) teacher rating scale, (b) student rating scale, and (c) behavioral observation. The dependent variables were the obtained scores for skill performance for each social-skill observation (averaged across 4 occasions), obtained scores for academic and social rating scales (averaged across 2 occasions) and academic skill observation (obtained from performance scores on a standardized test). The multitrait-multimethod matrix is illustrated in Appendix A.
The generalizability study was conceptualized as a three-facet, partially nested design with occasions (4 levels) and skills (6 levels) as crossed facets, and students (20) nested within observers (3 levels). For practical reasons, students were nested within observers because each observer was randomly assigned to particular students; in order for them to be completely crossed (i.e., independent), each observer would have had to observe every student on all skills and occasions, and this was not feasible for the current study. The dependent variable was the observational outcome as a percentage of intervals

Measures
Rating-scale data for social skills and academic skills were obtained using the Social Skills Improvement System (SSIS) Rating Scales (Gresham & Elliott, 2008 academic scores were obtained from these measures. These assessments are also nationally standardized assessments of academic achievement that provide standardized scores for similar reading and writing tasks (Flanagan, Alfonso, & Dixon, 2014, pp. 38-41). Scores from the KTEA-II and WJ-III may be reasonably compared to the WIAT-III reading and writing scores given that each assessment utilizes standard scores of 100 with standard deviations of 15.
Dependent variables. The dependent variables for social skills were teacher and student ratings as reported on the SSIS Rating Scales, teacher and student forms, as well as observational ratings of successful completion of specified skills. Various skills were selected for observation from three subscales on the SSIS (Communication, Cooperation, and Engagement); skills representative of other subscales were not selected due to potential low availability to observe in typical classroom settings. Six skills were identified for observation (a) conversation, (b) nonverbal communication, (c) classroom participation, (d) follow expectations, (e) group participation, and (f) interaction.
"Conversation" was defined as using appropriate conversational skills such as responding when spoken to, using appropriate tone and volume, and taking turns while speaking.
"Nonverbal communication" was defined as using nonverbal communication appropriately during conversation such as making eye contact, facing appropriately, and maintaining appropriate distance between speakers. "Classroom participation" was defined as being actively or passively involved, as appropriate to situation, in instruction; examples included volunteering to answer questions, taking notes, following along in text, engaging in class discussion, and working on assignments. "Follow expectations" was defined as engaging in appropriate classroom behavior, such as following directions, completing tasks without disrupting others, following classroom rules, and ignoring distractions. "Group participation" was defined as engaging in group interactions; examples included joining activities that have already started, participating in games or group activities, and inviting others to join in activities. Finally, "Interaction" was defined as the quality, or appropriateness, of social engagement such as ease of engagement and positive interactions. Consistent with the instructions for the SSIS, each of these items were rated by both teachers and students on a 4-point Likert scale with the letters "N," "S," "O," and "A" assigned to the values 0, 1, 2, and 3 respectively. A response marked "N" indicated that the student never exhibits the behavior, "S" indicated that they seldom exhibits the behavior, "O" indicated that they often exhibit the behavior, and "A" indicated that they almost always exhibit the behavior.
The rubric for scoring the behavioral observations was as follows: Conversation, Follow Expectations, and Interaction were observed and rated using the slider tool for rubric scoring on Metryx. The slider tool allows the observer to rate a student's performance of a skill on a scale of 0 to 100%, indicating the percentage of skill criteria completed. A rating of 1-20% indicates that a student was not very successful in the completion of the skill, 21-40% indicates they were somewhat successful, 41-60% indicates they succeeded in completing about half of the criteria, 61-80% indicates they were mostly successful, and 81-100% indicates they completed criteria nearly flawlessly.
Nonverbal Communication, Classroom Participation, and Group Participation were observed using momentary time sampling for 15-minute observations with 30-second intervals. In this manner, a student was observed at the end of each 30-second interval and rated as engaging in the behavior or not. This type of observation resulted in a percentage from 1 to 100 indicating the percentage of intervals in which the student was successfully engaged in the observed behavior.
The dependent measures for academic skills were teacher ratings as reported on the SSIS, student ratings as reported on a similar form, and performance scores from standardized achievement assessments as reported in student records. Teachers and students rated academic performance on a 5-point Likert scale comparing student performance to that of other students. Academic skills were rated as being in the Lowest 10%, Next Lowest 10%, Middle 40%, Next Highest 20%, or Highest 10% with the values 1, 2, 3, 4, and 5 respectively. The student SSIS form does not include an academic skills section, however, a form was developed (Appendix A) by the researcher, based on the teacher SSIS Academic Competence scale, for students to complete. Reading and math performance scores were obtained from the WIAT-III (n = 11), KTEA-II (n = 5), and WJ-III (n = 4). Students completed tasks that involved reading aloud for a timed period and performing math computations. Reading and math tasks were similar across assessments.

Procedures Informed Consent/Assent
Informed consent was obtained from parents/caregivers for participation in the study for students under the age of 18, and assent was obtained from all students.
Students who were 18 years of age provided informed assent for participation and a follow-up phone call was provided to the parent/caregiver by the student's school-based clinician to notify them of their child's participation in the study and to clarify any questions the parent/caregiver might have. All parents/caregivers were provided with a phone call from their child's school-based clinician to explain the study and answer questions prior to the consent form being sent home. Students received "credits," or points to be used toward school-based rewards upon return of the parent/caregiver form.
Students received credits regardless of whether or not the parent/caregiver agreed to student participation in the study. Parent/caregiver consent forms (Appendix B) were sent home and returned to school with the student. Consent forms were sent home with 22 students and 86% (19 parents) were signed and returned. Two of the 22 parents/caregivers declined to have their child participate in the study. Student assent forms (Appendix C) were explained by the researcher to each student individually upon obtaining parent/caregiver consent. Students who were 18 years of age (n = 3) completed the assent process as previously outlined. Teacher consent (Appendix D) was obtained at a staff meeting after the research study was discussed and all questions answered. Rating scales were not completed by teachers who did not sign consent.
Parent/caregiver consent and student assent were the primary inclusion criteria.
Students also needed to reside outside of a group home setting in order to simplify the consent process as well as spend the majority of their day in the classroom. Due to the nature of the school population, many students have difficulty staying in class for extended periods of time and often walk the halls or take movement breaks in the gym.
In order to be included in the study, participants needed to be located in the classroom reliably for an extended period of time on most days. All students for whom parent/caregiver consent was obtained assented to participate in the study. Data were not collected from students within each classroom for whom informed consent and assent were not obtained. All students were treated in a manner consistent with the ethical guidelines of the American Psychological Association, the National Association of School Psychologists, and the Institutional Review Board of the University of Rhode Island.

Training Procedures
Two females enrolled in a psychology undergraduate program served as observers, in addition to the researcher, for course credit. Observers were trained in direct observational methods, the use of Metryx, and in how to navigate the classroom settings appropriately (e.g., consideration of student location within the classroom, appropriate classroom demeanor, how to use timing devices properly during observational periods, consideration of skill presentation in a clinical population). Assistants attended four hours of training (divided into two sessions) conducted by the researcher. During the training sessions, assistants discussed operational definitions of the behaviors to be observed and were trained in the observation methods used in the present study (e.g., momentary time sampling and rubric scoring). Additionally, assistants practiced observation skills while observing video recordings of children in classroom settings.
Assistants practiced observations on video recordings until 80% agreement was obtained between the assistants and the researcher.
All assistants were required to provide documentation of education and training in the "Responsible Conduct of Research" and of an official criminal background check prior to conducting observations in the school. All rating scales were hand-scored by the researcher and double-scored on a separate occasion.

Direct Observation
Students were observed for social-skill performance during the naturally occurring day in the classroom and transition periods such as breakfast, lunch and free time. Dates, times, and subject matter being studied during an observation were recorded. The skills "Conversation" and Nonverbal communication" were observed on the same occasions as one skill pair representing the Communication subscale.
"Classroom participation" and "Follow expectations" were observed on the same occasions as a second skill pair representing the Cooperation subscale. Finally, "Group participation" and "Interaction" were observed on the same occasions as a third skill pair representing the Engagement subscale. Observers were randomly assigned to students.
Each participating student was observed on four occasions on each skill pair; each observation was 15 minutes long, divided into thirty-second intervals. Every participating student was observed for 15 minutes in a classroom setting on 12 separate occasions for a total of 180 minutes (3 hours).
Social-skill observations were conducted using momentary time sampling and rubric scoring. Each skill pair included one skill to be observed using momentary timesampling (Conversation, Class participation, Group participation) and one skill using rubric scoring (Nonverbal communication, Follow expectations, Interaction). The momentary time sampling procedure required observers to observe students across a 15minute period, whereas the rubric scoring system required observers to evaluate the percentage of skill criteria completed during the 15-minute period. Thus, the researcher paired them together in order to maximize the productivity of time spent in observation. Table 2 illustrates the social-skill observational matrix for the present study. Comm.

Class
Part.

Follow
Expect.
Rating scales. Academic and social skills were assessed using the SSIS and the companion academic form for students, developed by the researcher, on 2 occasions (2 to 3 weeks apart) in order to obtain test-retest reliability. Participating teachers were asked to complete the SSIS and return it to the researcher. Participating students were asked to complete the SSIS with their school-based clinician or the researcher so that any questions they had while completing the form could be answered. School-based clinicians were asked to return completed student forms to the researcher. The SSIS consists of 83 items to be answered on a Likert-type scale as previously described.
Participants were only asked to complete the social skills and academic sections of the SSIS, which consists of 53 items. The SSIS forms take approximately 10-15 minutes to complete. Thus, participants spent approximately 20-30 minutes completing rating scales over the two testing periods.

Chapter III: Results
Six sets of analyses were conducted: (a) Multitrait-multimethod analyses, (b) Test-retest reliability analyses, (c) G-study analyses, (d) D-study analyses, (e) Kappa analyses of interobserver agreement, and (f) MANOVA analyses of demographic characteristics.

Multitrait-Multimethod Analyses
Pearson product-moment correlations were calculated to assess convergent and discriminant validity for a social skills construct. Scores were averaged across ratings or observational occasions for comparison. Tables 3 through 8 illustrate the obtained correlations for the multitrait-multimethod matrix. Correlations marked with an asterisk (*) are significant at the .05 alpha level.         Table 3 a2   Table 4 b2 Table 5 b6 Beh. Obs Social a3 c2 b3 d2 Academic Table 6 a4 Table 7 b4 Student Rating Social a5 c3 Academic Table 8 a6 In order to condense information from each table for ease of interpretation, a measure of central tendency was calculated for each lettered section. The median was used for central tendency as the mean can be influenced by skewed distributions and some level of skewness was anticipated due to low sample size. In some instances, Fisher's z' (Fisher, 1924) was used to assist in calculating the median. Fisher's z' is used to transform Pearson r's to a normally distributed statistic, with which mathematical calculations can be computed. Following the calculation of the median, the Fisher's z' statistic was transformed back to a Pearson's r for consistency of interpretation.
Social-skills domain. The median correlational values across social-skills domains as outlined previously, were .681 for teacher ratings, .532 for behavioral observations, and .476 for student ratings. In order to assess the consistency of ratings, Cronbach's α (Cronbach, 1951) was calculated for each set of social-skills reliability correlations. Values greater than .9 are considered excellent, values between .7 and .9 are good, values between .6 and .7 are acceptable, values between .5 and .6 are poor, and values below .5 indicate unacceptable consistency. A value of .878 was obtained for the consistency of teacher ratings, a value of .874 was obtained for behavioral observations, and a value of .853 was obtained for student ratings. All raters/observers displayed a high level of consistency in their ratings.
Convergent validity for social skills. Convergent validity values were obtained using different methods but the same trait (i.e., social skills). The median convergent values were .365 for teacher rating and behavioral observation, .330 for behavioral observation and student rating, and .182 for teacher and student rating.
These values provide some evidence of convergent validity, though it is not strong evidence.
Academic-skills domain. The median correlations across academic-skills domains were .90 for teacher ratings, .448 for behavioral observation, and .218 for student ratings. Teachers had, by far, the highest level of reliability for rating academic skills. Cronbach's α was used to calculate consistency for each set of academic skills correlations. A value of .948 was obtained for the consistency of teacher ratings, a value of .619 was obtained for behavioral observations, and a value of .358 was obtained for student ratings. Teachers were very consistent in their ratings, observers were moderately consistent, and students were not very consistent.
Homomethod discriminant validity. Homomethod discriminant values were obtained using the same method but different traits (i.e., social and academic skills).
Median discriminant values were .542 for teacher ratings, -.093 for behavioral observation, and .190 for student ratings. Strong discriminant validity was obtained through behavioral observation and student ratings but not from teacher ratings.
Low correlations are desirable in order to indicate that one trait, or construct, does not overlap with another. Moderate correlations were obtained between teacher ratings of social and academic skills, which is not surprising given that some degree of correlation was expected between the two traits.
Heteromethod discriminant validity. Heteromethod discriminant values were obtained using differing methods and different traits. Median discriminant values were .181 for teacher ratings and behavioral observations, .019 for behavioral observations and student rating, and .190 for teacher and student ratings. Strong evidence for discriminant validity was obtained as the values indicate virtually no relationship between academic and social skill ratings across methods.

Test-Retest Reliability
In order to establish reliability between the first and second administrations of rating scales, test-retest reliability was calculated using Pearson product-moment correlations of time 1 and time 2 (T1 and T2, respectively) administrations (administered 2-3 weeks apart) of both teacher and student rating scales. Table 10 displays the obtained correlational coefficients for teacher and student rating scale test-retest reliability.

Generalizability and Decision Studies
The VARCOMPS procedure was used to compute the variance components analyzed in the G2.sps SPSS program developed by Mushquash andO'Connor (2006, revised 2012). The Matrix-End Matrix procedure was used to read the variance components according to the specifications of the design, and G-theory results were obtained. Results of this G-study are presented in Table 11, which lists the sources of variation, the variance components, and the proportions of total variance explained by each facet; Figure 1 presents the proportions of variance explained by each of these sources in a circle graph. The overall, relative G-Coefficient, which describes the universal reliability of the measure, was .936. This coefficient indicates that approximately 94% of the variance was accounted for with approximately 6% of the variance representing error. The residual term accounted for the greatest portion of variance (i.e., 51%).
Students (i.e., the object of measurement, accounting for nesting within observer) accounted for 23% of the variance. The interactions of Occasion-by-Observer, Occasion-by-Student, Skill-by-Student, and Skill-by-Observer accounted for smaller proportions of 6%, 6%, 7%, and 5%, respectively.
Next, D-studies were conducted in order to estimate how varying levels might affect the reliability of each facet. These D-study results are presented in Table 12. Figure 2 provides graphic illustration of the relative G-coefficients for level-1 Observer relative G-coefficients. Closer examination is given to the level-1 Observer D-study results because it most closely resembles scenarios applicable to a practicing school psychologist.

Interobserver Agreement
Interobserver agreement was calculated for observer pairs consisting of the primary researcher and each undergraduate assistant. A secondary observer was randomly assigned to 19 out of 20 student participants. As stated previously, the fourth occasion was used for the interobserver recordings as the secondary observer simultaneously observed each skill with the student's primary observer.
In other words, 19 students were observed on each of the skills by two observers on the fourth occasion of observation. Interobserver agreement was unable to be calculated for one student who was chronically absent at the end of the data collection period, and a fourth occasion/interobserver observation was not obtained. Interobserver agreement was calculated using SPSS Crosstabs function, which produces a Kappa statistic for level of agreement. According to Cohen (1960), Kappa values lie between -1.00 and 1.00, with 0 indicating chance agreement, positive values indicating greater than chance agreement, and negative values indicating less than chance agreement. Landis and Koch (1977) categorized Kappa values from 0.41 to 0.60 as moderate and values above .60 as substantial levels of agreement. Table 13 displays the level of agreement for each skill pair. Observer 1 and Observer 3 .032 Primary and secondary observers displayed agreement around chance levels.
Results previously described in the G study indicated that rating style of the observers had minimal influence on scores, but that the observer interaction with other facets accounted for approximately 11% of the measurement variance when combined.

Descriptive Analyses
A series of five, multivariate analysis of variance (MANOVA) tests were conducted in order to assess any score differences in the behavioral observation data based on demographic categories. Dependent variables included scores on each of the six social skills that were observed. Grade included seven groups from grades 6 through 12, Gender included two groups (male and female), Ethnicity included three groups (White, African American, and Hispanic), SES included two groups (low and medium to above, as previously described), and Diagnosis included five groups (Mood Disorder, Autism Spectrum Disorder, Attention Deficit Hyperactivity Disorder, Anxiety Disorder, and Post-Traumatic Stress Disorder). As each student met more than one diagnostic category, the primary diagnosis was used as the descriptive factor. Table 14 displays the results of MANOVA analyses. There were no significant MANOVAs for the demographic factors, indicating that participants did not differ significantly in their performance of social skills based on the given descriptive factors. It should be noted that group size within each demographic factor was not equally represented, with some groups often much larger than others. For example, the gender of the sample population was largely male (n = 15). It is possible that differences in skill performance may have been seen if the number of participants had been larger and the distribution across groups within factors more similar.

Chapter IV: Discussion
Students' social functioning is important to both the social and academic success of children and youth (Cappadocia & Weiss, 2011;Cook et al., 2008;Gresham et al., 2004). Indeed, outcomes for students with poor social skills often include behavioral, emotional, and academic difficulties (Cook et al., 2008). Often, the only access that children and adolescents have to mental-health services, including social-skills instruction, is in a school-based setting (Hoagwood & Johnson, 2003). Unfortunately, social-skills interventions in schools are often ineffective (Cappadocia & Weiss, 2011;Gresham et al., 2001), which may be related in some ways to measurement and assessment issues (Matson & Wilkins, 2009). There is no single, agreed-upon definition of social skills, and measurement strategies are numerous and varied (Crowe et al., 2011;Matson & Wilkins, 2009). The present study evaluated a set of skills with the purpose of identifying a homogenous skillset, which could be used for social-skills intervention, and a measurement strategy that could be used for progress monitoring of skills and decision-making regarding social-skills instruction.
This study used a multitrait-multimethod approach to assess a social-skills construct, or identify a homogenous group of skills, and generalizability theory to evaluate a behavioral-observation tool. The multitrait-multimethod approach used academic skills as a discriminant trait for comparison with social skills. Behavioral observation, teacher ratings, and student ratings were used for methods of obtaining participant data on both traits. Convergent and discriminant validity were assessed through correlations among all traits and methods.
G theory was chosen for use in this study because of the benefits it has over the traditional approach of CTT. The present study used G theory to evaluate multiple sources of variance, or facets (e.g., observer, occasion, skill), and obtain an overall value of reliability that accounts for those sources of variance. G theory was also used to predict the reliability of a behavioral observation tool given alternative levels of each facet, different from those used in the original study.

Psychometric Findings
The current study used a multitrait-multimethod approach to assess the validity of a social skills construct, using social skills and academic skills as traits and behavioral observation and teacher and student rating scales as methods.
Additionally, this study used G theory to examine the reliability of an observational tool to observe student performance of social skills. The measurement strategy included student nested within observer as the object of measurement and skill, occasion, and observer as facets. The present study used a nested design, meaning that each facet does not occur at each level with every other facet. Some facets may occur only at some levels and not at others. For example, one might have a study where some skills are observed on particular occasions and others are not. In the current study, students were nested within observers; observers were randomly assigned only to particular students and did observe every student. A completely crossed design would require each observer to observe every student on all skills for each occasion. A discussion of each study and the attributed follow-up analyses follows.
Pearson product moment correlations were calculated for each trait-method combination to assess convergent and divergent validity of the social-skills construct. Results yielded six trait-method combinations, through which values were obtained for the reliability of social and academic skills, convergent validity, and discriminant validity of social skills. Median values were calculated in order to provide a summary value for each trait-method matrix. Cronbach's α was also calculated for social and academic skills reliability matrices to assess the consistency with which skills were rated.
Social-skills domain. The obtained median values were .681 for teacher ratings, .532 for behavioral observations, and .476 for student ratings. These values were somewhat unexpected, as behavioral observation was anticipated to have the lowest reliability. Although median values are moderate, they are lower than expected. Rater inconsistency cannot be used to explain these results, as the consistency of ratings was found to be above .85 for all methods, indicating that raters were consistent across methods. It is likely that reliability for teacher and student ratings would be higher given a larger number of items to assess each skill.
Social skills consisted of 3 to 4 items on each rating scale, which may not be sufficient to obtain a high degree of reliability for each skill. Additionally, the cohesiveness of skills may also have had an impact on reliability. As anticipated, some skills appeared to be more highly correlated than others. Thus, overall, as shown in Table 15, the square root of homotraithomomethod correlations (i.e., "a"s) for the social skills domains generally exceeded convergent correlations (i.e., "b"s), which, in turn, exceeded discriminant homomethod correlations (i.e., "c"s), which, in turn, exceeded discriminant heteromethod correlations (i.e., "d"s). Of course, this is the pattern that would be expected according to Campbell and Fiske's paradigm to establish construct validity.
An important exception is that teacher ratings of social and academic skills were unusually high (i.e., 542 versus -.093 for behavioral observation and .190 for student methods of assessment). It may be that teachers in this specialized school setting link social and academic skills more closely than would be seen in a typical school setting. Communication were assessed as separate skills (3-4 items each) rather than the composite skill of Communication (6-8 items). Reliability estimates are likely to be lower due to a lower number of items representing each skill.

Generalizability Study
As previously discussed, the G study provides a coefficient that describes the global reliability of a measure, variance component values and percentages of variance accounted for by each facet, and residual error. The D study provides variance component values and associated G-coefficients for alternative measurement strategies given varying levels of each facet. A discussion of the G study and D study results from the current study follow.
Relative G-coefficients were reported as a measure of overall reliability for the measure, as well as for the D study, because they are most applicable for the applied use of the behavioral measure of interest. Generalizability theory can be used for the purposes of making relative decisions and absolute decisions. Both G and D studies provide relative and absolute coefficients applicable for each decisionmaking purpose, respectively. Relative decisions are those concerning an individual's performance compared to others, whereas absolute decisions are those concerning an individual's performance compared to a specific criterion regardless of others' performance. The behavioral observation tool used in this study could be used to make both types of decisions. For example, relative decisions would be useful in identifying students with similar skill deficits for the purpose of placing them in similar instructional groupings. During a social-skills intervention, absolute decisions would be useful to influence individualized instruction depending on whether or not a student has met criteria for mastering specific goals. The relative G-coefficient is analogous to the reliability coefficient in CTT and is a more accurate indicator of reliability than the absolute Phi-coefficient of dependability (Shavelson & Webb,pp. 93). Thus, relative G-coefficients were reported for the purpose of relative interpretations in this study.
The overall, relative G-coefficient (used for decisions based on the relative standing of comparison to others) of the measure was .936, an excellent level of reliability for a behavioral measure. The residual error term accounted for the greatest portion of variance (i.e., 51%), and includes all 3-way interactions between facets that cannot be statistically partialled out. Students nested within observer accounted for the next largest portion (i.e., 23%). The interaction effects accounted for small portions of variance separately, but approximately 24% when combined.
Skill and Observer only accounted for approximately 1% of the variance each, so they are not discussed further. The interaction effect of skill and observer would be of interest in identifying the effect of individual student performance for particular skills. A low percentage of variance accounted for, as is seen here, indicates that the measure may assess effects for the interaction of student and individual skill with low reliability.
Utilizing G theory provided a better assessment of measurement reliability than would be available if using CTT, as the percent of variance accounted for by each facet would have been otherwise attributed to random error. Ideally, the student facet would contribute the largest portion of variance, but here, it accounts for approximately a quarter of the variance. The student facet is of interest because it pinpoints the amount of the effect that can be attributed to a particular student and occasions, as well as 10 skills and occasions yielded reliability coefficients above .90. The reliability increased greatly between 1 and 6 occasions, but appeared to taper off quickly after the combination of 4 skills and 6 occasions. Optimal reliability was found with a combination of 10 skills and 10 occasions (i.e., .928).
However, it may not be worth the cost in time and resources, when an adequate level of reliability could be obtained using fewer occasions and skills.
Interobserver agreement. Cohen's Kappa was used to calculate interobserver agreement. As stated previously, Kappa values range between -1.00 and 1.00 with 0 indicating chance agreement. Positive values indicate greater than chance agreement and negative values indicate less than chance agreement. Kappa values were found to be K = .157 for the first observer pair and K = .032 for the second.
Agreement for both pairs, or among the three observers, was close to chance levels.
These findings may have resulted from a combination of the way that data were represented and the way that Kappa is calculated. Generally speaking, Kappa uses the frequency of agreement between observers in designated categories (e.g., number of yes/no observations per rater). This type of calculation may underrepresent the level of agreement between observers, as the range of scores in the present study is much larger than would be accounted for by a yes/no type of response. For example, every participant was rated from 0-100% on each skill, so they could obtain values anywhere from 0-100. If observer 1 rated student A as completing a skill with 80% accuracy and observer 2 rated student B as completing the same skill with 85% accuracy, Kappa calculations may not account for this as a "match" between observers even though both values fall within a similar range of scores. Given this consideration, and the G study results that found the observer facet to be of little influence on variance, the Kappa values are viewed to be an underrepresentation of interobserver agreement.

Cultural Considerations
Information on multiple demographic factors was collected for each participant, including, age, grade, gender, SES, ethnicity, and primary mental-health diagnosis. Individual analyses were conducted for each demographic factor to determine if scores may have varied among factors. No significant effects were found for any demographic factor. However, sample size was small and the size of groups within factors was often unbalanced. Given the large effect sizes found for grade (η 2 = .467), SES (η 2 = .404), and diagnosis (η 2 = .395), it is possible that a larger sample size and more balanced level of participants across within-factor groups could reveal differences. Additionally, it should be noted that the majority of participants exhibited significant social-emotional difficulties as a result of mentalhealth disabilities. Further examination of potential differences in skill performance based on diagnosis would be beneficial to the field in application to the building of more effective social-skills interventions.

Implications
This study revealed a cohesive set of skills that may be identified as "social skills" and reasonably measured together as a homogenous group, or skillset.
Conditions varied across methods, and some skills were more closely correlated than others, but results largely supported the assessed skills as a cohesive and distinct skillset. As some skills were more closely linked than others (i.e., Classroom Participation and Follow Expectations were more highly correlated than Similar studies could be conducted on other behavioral observation measures in order to assess their adequacy for the same purpose. The behavioral observation tool, and similar tools once validated, could be used in multiple stages of the intervention process. First, it could be used as a screening measure to identify students with similar skill deficits (relative comparison). Second, after students with similar skill deficits have been grouped together for instruction, it could be used as an observational tool to collect progressmonitoring data of student performance of social skill. Third, information collected during progress monitoring could be used to influence the differentiation of instruction for individuals (absolute comparison), such as by gauging the completion of criteria or benchmarks that indicate the mastery of particular skills.
These uses link directly to the implementation of an RTI format for social skills. As discussed previously, the RTI process is frequently used in schools for academic instruction, but is rarely seen for use in social-emotional instruction. This research adds to this area of study by establishing a reliable and feasible measurement strategy to assist in all stages of the RTI process for social-emotional education. Salvia and Ysseldyke (2004) suggested that reliability coefficients of .90 or higher are recommended for instructional decision-making purposes and coefficients of .70 or higher are recommended for screening purposes. The obtained reliability for the measure used in this study was above .90 and could, therefore, be used for both screening and instructional decision-making purposes.
As other behavioral measurement tools are developed for a similar purpose (Cummings et al., 2008;Stichter et al., 2012), consideration should be given to the implications of using behavioral observation as a primary data collection strategy. Hintze and Matthews (2004) suggested momentary time sampling to be a more favorable observation strategy than partial or whole-interval methods as it results in smaller estimation errors. The authors also cautioned the overuse of behavioral observation, as it is less reliable than commonly believed. As shown in the present study, a number of observations may need to be conducted on each skill in order to obtain an adequate level of reliability. School psychologists may find the amount of time required to observe skills reliably to be unmanageable in addition to other job responsibilities. Baer, Harrison, Fradenburg, Petersen, and Milla (2005)

Limitations
First, although the multitrait-multimethod approach used in this study provided useful information for the establishment of a social skills construct, it provided so much information that findings needed to be condensed for ease of interpretation. When information is condensed in such a way, some robustness of the overall picture may be lost.
Second, the present study used G theory to assess the usefulness of a measurement strategy because it is less restrictive than CTT and considered multiple facets of the measurement design. Although multiple facets and 2-way interactions between them were assessed, there are still variables left accounted for.
Variables such as setting, time of day, and activity were not controlled for or evaluated in the present design and may have played some role in the outcome of student performance on specific skills, as suggested by the G-study interaction effects. Currently, the procedure for assessing more than three facets in a generalizability design is unavailable, and holding environmental variables constant, or controlling the facets used in this study another way, may be the only options for assessing their potential impact on skill performance. Holding environmental variables constant may also reduce the amount of variance attributable to random error.
Third, a limitation in the observational method, primarily the interval length used in the observations, should be noted. Because beginner observers were used, a longer interval was selected (i.e., 30 seconds) in order to obtain a more accurate score. A longer interval was used to reduce the effort needed to track interval length and in hopes of obtaining accurate scores for the appropriate interval. More experienced observers would be able to use 10 or 15-second intervals while keeping track of time and student performance, which can provide a closer approximation to the percentage of time spent engaging in the specified skill. Additionally, 15-minute observations were conducted in order to maximize the number of students who could be observed during the limited time frame for data collection, which may not be an adequate amount of time to obtain a representative sampling of student behavior on some skills.
Finally, due to the nature of the school setting where the study took place, sample size was limited. Although the number of data points collected for each participant made the analyses of interest for the present study appropriate, skill performance differences based on multiple cultural factors could not adequately be assessed as a result of the small sample size. Additionally, the sample size was unbalanced, where within-factor groups often had only a few members. The sample also represented a clinically based, low SES, primarily male student population, which may limit the generalizability of research findings, somewhat, to typical student populations. It is possible that the participant population is qualitatively distinct from the student populations of other school settings.

Future Directions
The present study provided a number of results from which future research and practice can be based. A multitrait-multimethod approach was used to identify a cohesive set of social skills, which could be targeted for social-skills intervention.
Although this approach provided a variety of useful results, it provided more information than could reasonably be interpreted and was condensed for ease of interpretation. Attempts to identify other cohesive skillsets could utilize alternative approaches, such as structural equation modeling, which may better organize findings and provide a structure to the results to ease interpretation.
By using G theory, a reliable measurement strategy was developed that could be used in schools. However, limitations exist for the analysis of more than three facets in their contribution to measurement variance. Future research could be conducted to account for the impact that environmental factors may have on skill performance. One method may be to conduct a G-study using a single observer and skill (e.g., control for observer and skill), and use environmental factors as facets.
Researchers may also consider examining the potential impact of teacher experience on student behavior. For example, one could examine whether more experienced teachers report fewer behavior problems in their classroom, as they may be more comfortable with students and have more effective behavioral strategies.
It is hoped that school practitioners may utilize the findings of this study to inform their practice by identifying cohesive groups of skills to target for intervention and using behavioral observation measures in a reliable way as part of a three-tiered instructional model. An RTI-based approach to social-skills instruction might include screening students in order to place students with similar skill deficits into groups for instruction, progress monitoring of skills using semiweekly behavioral observations of students in multiple environments, and differentiating instruction based on observed student progress. In other words, future research could develop a method, or "best practice" for implementing a behavioral-observation tool in such a manner. Additionally, future research could be used in an experimental way to determine if the use of behavioral observation measures in a three-tiered format has an impact on intervention effectiveness.
Future research should also investigate the use of behavioral-observation measures, such as the one used in this study, in multiple school settings with differing student populations in order to assess their appropriateness for various populations.

Summary and Conclusions
Social skills play an important role in student success in schools, both socially and academically (Cappadocia & Weiss, 2011;Cook et al., 2008;Gresham et al., 2004). Historically, social-skills interventions have lacked effectiveness in the way of generalization of skills to settings outside of the instructional environment (Gresham, 2010). Ineffectiveness of interventions often may be related to assessment and measurement issues (Matson & Wilkins, 2009). In schools, attention has started to be given to the development of progress-monitoring tools for social-emotional interventions (Cummings et al., 2008;Stichter et al., 2012); these tools have attempted to utilize behavioral observation in a reliable manner.
This study used multitrait-multimethod and generalizability approaches to develop such a measurement strategy.
The present study demonstrated an approach to identifying skillsets for targeted intervention by evaluating the convergent and divergent validity for social and academic skills. A structured approach to identifying cohesive target skills strengthens the underpinnings of social-skills interventions by providing an evidence-based approach rather than relying on a heterogenous grouping that may interfere with measurement reliability.
The current study also demonstrated the usefulness of G theory for developing a multifaceted measurement strategy for behavioral observation. G theory expands the CTT perspective by including multiple facets to account for aspects of variance in addition to random error. In addition, G theory can be used to assess how different levels of each facet might affect the measure's reliability in alternative measurement scenarios. In this study, the skills of interest were

Conversation, Nonverbal Communication, Classroom Participation, Follow
Directions, Group Participation, and Interaction. The measurement design used students nested within observers as the object of measurement and skill, occasion, and observer as facets.
Results indicated that a cohesive set of skills had been identified, but that the findings may have been stronger if more items had been used on rating scales to assess each skill (providing a more reliable estimate of each skill). G-study results indicated a good level of reliability for the measure, and that approximately half of the error could be attributed to students and the 2-way facet interactions combined.
D-study results indicated that an adequate level of reliability could be obtained using multiple observers and a moderate number of occasions, but that the number of occasions would need to be increased for a single observer to obtain adequate reliability.
This study established a method for assessing a cohesive skillset for intervention and established a reliable measurement strategy that lends itself to multiple decision-making purposes in the intervention process. It is hoped that social skills instructors could utilize these methods to create better-planned interventions and use progress-monitoring practices. By creating groups with homogenous skillsets and using progress monitoring to inform the decision-making process regarding student social skills performance, the effectiveness of social-skills interventions may be improved and the social-emotional functioning of students may be improved. Future research should seek to assess the reliability of behavioral measures, similar to the one used in this study, with different school settings and students from multiple backgrounds. Future research should assess the usefulness of behavioral measures to inform the decision-making process and what impact their use may have on the effectiveness of interventions.

PARENTAL PERMISSION FORM FOR RESEARCH
Your child has been invited to take part in a research project described below. My name is Monica Mabe and I am a graduate student at the University of Rhode Island (URI) and will be conducting this research project with Professor W. Grant Willis, a faculty member at URI. I am asking for permission to include your child in this study because he/she is a student in one of the classrooms selected to participate in this study. This research project will begin in November and be completed in the school by February. This study has been approved by the Executive Board of South Coast Educational Collaborative and administrators of the school.

Description of the project:
Until recently, there have not been many tools available for measuring student behaviors and tracking behavioral progress in areas such as social skills. The purpose of this project is to see if one of the recently developed behavioral measures can be used for observing how adolescents engage in various social skills such as working in groups or following classroom expectations. In other words, the goal is to assess a measurement strategy that can accurately measure and monitor the progress of student social skills.
What will be done: If you allow your child to participate, here is what will happen: A student from the University of Rhode Island (URI) will be assigned to your child's classroom and observe them during their regular scheduled day. Your child will not be asked to leave the classroom or speak to the URI observer alone. The URI student is only interested in observing different social skills used by your child in the classroom and how they happen during a regular day. The URI students will be observing multiple students in the classroom, so your child will not be identified or singled-out as being observed. Your child will also be asked to complete a form that asks him/her to rate his/her performance of various social skills. Students will be able to complete this form with their counselor or the graduate student researcher.

Risks or discomfort:
There are no risks or discomfort involved for your child in this project. It will be explained to them that there will sometimes be a person from URI observing the classroom so that they are comfortable and know who will be visiting their classroom.

Benefits of this study:
Although there may be no direct benefit to your child for participating in this project, the school will benefit greatly from the information that will be collected. The information from this project will help personnel at the school improve their data collection procedures so that they are more accurate and meaningful. Results of this study will be made available for viewing online through the ProQuest library search engine. A hard copy of results can also be viewed at the University of Rhode Island library after the completion of the study.

Confidentiality:
Your child's part in this study is confidential. All information will be stored electronically in the online system connected to the behavioral measure, which requires an account with a secure login and password that is only issued to a few individuals at the school. Only individuals directly involved in the study will have access to the secure information. After all of the information is collected, an identification number will be used in alternative to student names; all names will be deleted and there will be no way of tracking any collected information back to an individual student.
Decision to quit at any time: Students will be given the opportunity to decide whether or not to participate in this project. Their decision to participate will not affect your or their relationship with South Coast Educational Collaborative. Your child will have the right to stop participating at any time. You have the right to withdraw your permission for your child to participate at any time.

Rights and Complaints:
If you are unhappy with the way this study is happening in your child's classroom, you may talk about your complaints with Professor W. Grant Willis (401)

Please sign both consent forms, keeping one for yourself
The University of Rhode Island Department of Psychology

STUDENT ASSENT FORM
My name is Monica Mabe. I am a graduate student at the University of Rhode Island (URI). I am inviting you to participate in a research study because I am trying to learn more about social skills and students your age. I will explain about the study, but you can ask questions by contacting me later if you want to know more.

Description of the Project:
Recently, a lot of behavior observational tools have become available for use on phones and tablets. Part of this project is to see if one of these observation tools can be used for observing social skills. Social skills are things that people do when they interact with others, like having conversations, working in groups in class, or inviting someone to join a game. The other part of this project is to see if social skills can be measured equally well with paper forms called rating scales and through observation.

What will be done:
If you agree to be in this study, you will be asked to complete two forms about social skills. The forms take about 10 minutes to complete and ask you to rate how you do different things like getting along with others or asking for help when you need it. You will be able to complete this form with a staff member at school in case you have any questions about the items. Sometimes there will be a person from URI in your classroom. They will be looking to see how you and other students act in normal classroom situations. You will not know if they are there to observe you or if they are there to observe other classmates.

Risks or discomfort:
There are no risks or discomfort involved in this project. There will sometimes be a person from URI observing the classroom. Many students in each class will be participating and none of your classmates will know who else in the classroom has agreed to participate in the project.

Benefits of this study:
Although there may be no direct benefit to you for taking part in this study, we may learn more about measuring social skills and students your age. You will have the opportunity to express your opinion about your social skills and learn more about yourself at the same time. Results of this study will be made available for viewing online through the ProQuest library search engine. A hard copy of results can also be viewed at the University of Rhode Island library after the completion of the study.

Confidentiality:
No one else will know if you were in this study and no one else can find out what answers you gave on the social skills form. All of the information from this project will be stored in a locked office on the URI campus.
Decision to quit or not participate at any time: I will also ask your parent/guardian to give their permission for you participate in this project, but even if your parent/guardian says "yes", you can still decide not to do this. If you do decide to participate, you can always drop out of the study at any time. No one will be upset if you don't want to participate or even if you change your mind later and want to stop. If you want to quit the study, just let me know or ask one of your parents/guardians to call me.

Description of the project:
Recently, many observational tools have become available for use on phones or tablets. Last year, a study was conducted with one of these tools at the K-2 grade level for observing social skills. This year, I would like your help to conduct a study at the middle and high school level. One purpose of this project is to see if a behavioral observation tool can be used for observing how students engage in various social skills such as working in groups or following classroom expectations. The other purpose is to evaluate how social skills are measured through observation and through paper forms, called rating scales. This research project will begin in November and be completed in the school by February.

What will be done:
If you agree to participate, here is what will happen: A student from the University of Rhode Island (URI) will be assigned to observe students in your classroom and observe the students during their regular scheduled day. You will not be asked to change your classroom routine in any way. You will be asked to provide a daily schedule to the researcher so that the URI observer will know the best time to observe specific skills. You will also be asked to complete two social skills rating scales for participating students; the forms take 5-10 minutes to complete. Students will be asked to complete these forms as well with assistance from me or their clinician.

Risks or discomfort:
There are no risks or discomfort involved for you in this project. URI students will initially need assistance to identify specific students but should be of no further distraction to you or the class afterward.

Benefits of this study:
Although there may be no direct benefit to you for participating in this project, the school will benefit from the information that will be collected. The information from this project will help personnel to improve data collection practices in the future when assessing student behaviors. Results of this study will be made available for viewing online through the ProQuest library search engine. A hard copy of results can also be viewed at the University of Rhode Island library after the completion of the study.

Confidentiality:
Your part in this study is confidential. The study is concerned with the students' skills and the measurement of those skills; the information you provide will not be linked back to you or used in any other capacity. Following data collection, all names will be removed and replaced with identification numbers so that identification cannot be traced back to a specific person.
Decision to quit at any time: Your decision to participate will not affect your relationship with South Coast Educational Collaborative (SCEC). You also have the right to stop participating at any time.

Rights and Complaints:
If you are unhappy with the way this study is being conducted in your classroom, you may talk about your complaints with Professor W. Grant Willis (401)