Evaluating the Implementation Quality and Utility of Response to Intervention Practices

The current study investigated the extent to which eight elementary schools in one cooperative education district implemented the five essential components of Response to Intervention (RtI), and explored the relationship between fidelity of implementation and student reading outcomes measured by oral reading fluency (ORF). Various RtI models exist because there is no single method to implement RtI appropriately. The majority of available studies examining fidelity of RtI implementation have focused on the individual components of RtI. However, when implemented as intended, RtI is a coherent system of coordinated components. Consequently, it is important to study the implementation of RtI as a whole model in addition to the individual components and its’ relationship with student outcomes. Descriptive statistics were used to analyze the extent of implementation and to characterize differences between the schools and analysis of variance (ANOVA) was used to examine differences in student outcomes between elementary schools. Finally, the researcher made qualitative inferences to explore the relationship between fidelity of implementation and student reading outcomes measured by ORF. Results from the current study revealed that fidelity of RtI implementation varied between elementary schools despite similar professional development and supports around RtI practices. In addition, results from the current study have preliminary implications supporting the value of infrastructure and supports and fidelity and evaluation within the RtI model.


LIST OF TABLES
mechanisms, and e) fidelity of implementation and evaluation. The interrelatedness of these components forms the larger RtI framework for making valid decisions about student learning (Kovaleski, 2007). In other words, a complete (and likely to be effective) RtI framework can be considered to be in place only when evidence exists to support the implementation of each of the five components.
The current study examined the extent to which RtI as a whole, and each essential component, was implemented in eight elementary schools and, additionally, explored the relationship between the quality of implementation of RtI and school level outcomes.
The current study examined the following primary research questions: 1. To what extent are staff in elementary schools implementing the essential components of RtI with integrity, as measured by the RtI Essential Components Worksheet between schools; and if differences exist between schools, how can those differences be characterized?
2. To what extent does the integrity of overall RtI implementation, and the implementation of the five essential components, in elementary schools, relate to differences student-level student outcomes, at each grade across schools, as measured by AIMSWEB screening data?

Implementation Science
One of the most critical issues in educational research is a gap between what is known about effective practices (i.e. evidence-based practices) and how the practices are implemented in schools. While there is a growing supply of evidence-based practices in education, there is much less research evidence regarding how to support the successful implementation of these practices in school settings (Durlak & DuPre, 2008). With the increasing demand by federal regulations for the use of evidence-based practices, it is important to understand and reduce the gap between what is known and what is occurring. When a new practice or program is studied in the natural, desired setting, its implementation can be planned and studied using methods from the area of research/scholarship known as 'implementation science'. Resulting information can yield important conclusions regarding the utility of the practice within the context and culture of individual schools and levels of adaptation that schools may need to maintain the new practice (Shulte, Easton, & Parker, 2009).
Implementation science research has demonstrated that when evidence-based practices are implemented in schools, varying levels of implementation tend to emerge (Odom, 2009). For example, Odom et al. (2010) conducted a national study examining the implementation of a curriculum across multiple sites at seven times during the school year and found significant differences in the quality of implementation between sites. More specifically, dramatically low implementation levels were found in schools with high percentages of minority students. This study demonstrates that school characteristics may influence how new practices are implemented in schools.
According to the American Psychological Association (2005), adaptation is a natural part of implementing evidence-based practices when an individual with expertise in the area manages the modifications. Further, recent meta-analyses indicate that full implementation may not be necessary to obtain positive outcomes, but instead core components may be identified as the pathway to optimal implementation (Durlak & DuPre, 2008). Individuals who implement evidence-based practices should be knowledgeable about how to implement the practice and how to implement it with high quality. Notably, Greenwood (2009) proposes that interventions are the result of the highly skilled behaviors of the individuals who implement them as they follow a set of manualized procedures. Examining the fidelity of program implementation is one method for evaluation in implementation science research.
Treatment Integrity. When implementing a new program it is important to understand both the extent to which it is being implemented as intended, and to understand implementation in a manner that links the program to student outcomes.
That is to say, outcomes cannot confidently be attributed to a new program unless data are collected documenting how the program was implemented (Sanetti & Kratochwill, 2010). The terms treatment integrity, treatment fidelity, and fidelity of implementation are often used interchangeably in the field of education. For the purposes of the current study, the term fidelity of implementation will be used and defined as, "the extent to which essential intervention or program components are delivered in a comprehensive and consistent manner by an interventionist trained to deliver the intervention" (Hagermoser, Sanetti & Kratochwill, 2009 p. 448). Adherence and quality of implementation are the two dimensions of fidelity of implementation commonly used when evaluating a program, or framework, such as RtI. Adherence refers to identifying whether something occurred or did not occur and quality refers the skill with which it was performed (Noell & Gansle, 2006;Schulte, Easton, & Parker, 2009). For example, if all students were assessed using a screening assessment then, on an adherence measure the school would be considered to have implemented screening; however, on a quality measure the extent to which standardized procedures were followed, and the number of times per year students are assessed would determine how well the school is implementing screening. Evaluating the quality of implementation permits a more detailed analysis aimed at understanding relationships between different aspects/levels of implementation with student outcomes and it broadens the understanding of what elements are necessary to achieve improved student outcomes. Additionally, understanding the context (i.e. child engagement, classroom context, teacher involvement) in which implementation occurs is believed to improve the validity of decisions regarding program effectiveness (Greenwood & Kim, 2012).

Response to Intervention/Instruction
The primary issues noted regarding implementation science and the importance of fidelity of implementation for valid data-based decision making are relevant to the contemporary area of educational practices known as Response-to-Intervention (RtI).
Since IDEIA (2004) language permitted the use of RtI to be considered when making decisions about special education eligibility, RtI has been adopted at an increasing rate in schools across the country. RtI is a comprehensive multi-tiered framework intended to improve outcomes for all students through prevention and early intervention efforts.
RtI is composed of three, or more, tiers of service delivery, which are defined by the intensity, duration, and degree of individualization of service provided (i.e. method of service delivery) (Jimerson, Burns, & VanDerHeyden, 2015). Educators collaborate and engage in data based decision making to appropriately allocate instructional resources and other support resources to ensure individual student needs are met (Burns & Gibbons, 2008). Brown-Chidsey (2007) argues that RtI meets student needs in a more time efficient manner than prior models and fewer children will require special education services when RtI is implemented with fidelity. In fact, one study found a reduction from 4.5% to 2.5% in special education placements rates over a 10- year period of RtI implementation (Bollman, Silberglitt, & Gibbons, 2007). For a detailed description of RtI, the reader is referred to Burns and Gibbons (2008).
According to the National Research Center on Learning Disabilities (NRCLD), an RtI model must be formally in place school-wide, to apply it to high-stakes decisions, such as special education eligibility. Further, Shapiro & Clemens (2009) suggested that successfully implementing a complex model, such as RtI, may take between three to five years indicating that implementation of an RtI service delivery model cannot be completed quickly. Rather, implementation is a thoughtful, long-term process in which schools engage in across multiple school years.
RtI Implementation. A majority of the available studies evaluating RtI implementation are qualitative and describe the practices states and/or specific school districts are adopting. In other words, most of the data collected are based on selfreport from teachers, state administrators, and other school personnel. In 2007, approximately 48 out of 51 state department directors of special education reported that their state was implementing RtI or was, at least, considering the implementation of RtI (Hoover, Baca, Wexler-Love, & Saenz, 2008;Berkeley, Bender, Peaster, & Saunders, 2009). The two elements of RtI that were found to be implemented most consistently across states were progress monitoring at all tiers and the inclusion of research-based instructional services and other student support practices (Berkeley, Bender, Peaster, & Saunders, 2009). However, variability exists in how states, districts, and schools implement RtI.
Of the states that adopted RtI before 2008, more than half, or 58% to 67%, reported using RtI models that combine standard treatment protocol and problemsolving approaches (Berkeley, Bender, Peaster, & Saunders, 2009;Hoover, Baca, Wexler-Love, & Saenz, 2008). Additionally, the two most commonly reported emphasized purposes for RtI were instructional decision-making and identifying students eligible for special education services (Hoover, Baca, Wexler-Love, & Saenz, 2008). Finally, although a majority of states indicated the use of progress monitoring as an RtI practice, Bailey (2014) found that some schools reported using measures that were not research-based and only some schools reported using collected data for systematic decision-making. Varying results also have been found regarding the effectiveness of RtI implementation based on whether researchers or school personnel implemented the model.
In a 2002 meta-analysis, Burns and Symington (2002) found that problemsolving teams led by university faculty members had a greater influence on student outcomes (i.e. time on task, task completion) and systemic outcomes (i.e. referrals to special education, percentage of referrals that are diagnosed with a disability) than did problem-solving teams led by school personnel; however, both teams had a large effect on both student and systemic outcomes. In another, more recent, meta-analysis examining four large-scale implementations of RtI, implementation led by school personnel had a large effect on systemic outcomes and a medium effect on student outcomes while implementation led by researchers had a large effect on student outcomes and a small to medium effect on systemic outcomes (Burns, Appleton, & Stehouwer, 2005). It is possible that as schools have more opportunities for training and gain more experience implementing RtI, integrity of implementation may improve if the appropriate system-level factors and support mechanisms are available.
Some of the challenges of implementing RtI reported by teachers and school personnel include limited resources, insufficient administrative support, and limited professional development opportunities and time (Bailey, 2014). Further, schools in rural districts reported that data based decision making was the most challenging RtI component to implement (Dexter, Hughes, & Farmer, 2008) and staff training was consistently identified as an area for improvement. Berkeley, Bender, Peaster, and Saunders (2009) reported that 88% of states were conducting professional development on RtI, while other studies have found that these training efforts vary in their focus and are generally inadequate for providing school personnel with the knowledge base necessary to implement RtI in an optimal way (Hoover, Baca, Wexler-Love, & Saenz, 2008;Bailey, 2014;Martinez & Young, 2011). The challenges reported are relevant to system-level supports and RtI as a school-wide initiative that requires a supportive infrastructure.
Contemporary perspectives of RtI implementation emphasize system-level factors that promote or discourage fidelity of implementation (Stobier, 2014). When students have difficulty learning, it is regarded as a mismatch between resources and student need (Greenwood & Kim, 2012;Burns & Gibbons 2008). In other words, student success is a result of creating a context that is most conducive to each student's individual learning needs. Within an RtI framework, assessment and decision-making extend beyond students' skills to the environment in which learning is occurring. Assessing the environment in addition to student skills establishes the optimal conditions in which student learning takes place to maximize the effectiveness of instruction. When the appropriate and necessary supports are in place, a more effective RtI implementation is promoted (Noltemeyer, Boone, Sansosti, 2014). Some system-level factors that have been found to promote implementation include leadership, coordination of assessment and data management, culture and beliefs, resource allocation, and opportunities for professional development (O'Connor & Freeman, 2012;Kratochwill, Volpiansky, Clements & Ball, 2007).

Fidelity of RtI Implementation:
Given the complexity of the RtI multifaceted framework, fidelity of implementation is an essential component to the implementation process. Data regarding fidelity of implementation make it possible to determine if changes in student performance are the result of the instruction and/or the intervention and further, when a change in instruction or intervention is necessary for student success. Additionally, these data make it possible to attribute student performance to RtI implementation. A majority of state department directors of special education (i.e., 36 out of 50) reported that fidelity of implementation was important to consider when implementing RtI (Berkeley, Bender, Peaster, & Saunders, 2009).
Regardless, data supporting the fidelity of implementation is rarely reported in studies examining RtI implementation (Burns, Appleton, & Stehouwer, 2005;VanDerHyden, Witt, & Gilbertson, 2007). Fidelity of implementation data provide clear information that could eliminate variables related to team process and poor instruction as reasons for student failure (Fuchs, 2007). Consequently, evaluating and monitoring the RtI process would increase the credibility of the decision making process and increase confidence in decision making (Noell & Gansle, 2006 confusion about what specific practices or procedures must be in place to ensure that a comprehensive RtI model is implemented (Swindlehurst, Shepard, Salembier, & Hurley, 2015). Most of the research on RtI has focused on individual components of the model including tier II interventions (Fuchs, Fuchs, Mathes & Simmons 1997), decision rules for progress monitoring (Ardoin, Christ, Morena, Cormier & Klingbeil, 2013), evidence-based assessments (Deno, 1985;Stecker, Fuchs, & Fuchs 2005) and the problem solving team process (Newton et al., 2012

Relationship Between Implementation and Outcomes:
In a review of studies examining RtI implementation in various rural school districts, Dexter, Hughes, and Farmer (2008) found that in general, RtI models improved student performance on reading measures and math measures. Although gains in reading and math performance were noted, these studies ranging from 1999-2007 did not provide fidelity of implementation data and as a result, improvements cannot be attributed, with confidence, to the implemented RtI model. Further, mixed results were found regarding the impact on special education placement rates ranging from rates staying constant to rates falling by seven percent (Bollman, Silberglitt, & Gibbons, 2007;Dexter, Hughes, & Farmer, 2008;VanDerHyden Witt, & Gilbertson, 2007).
VanDerHyden, Witt, and Gilbertson (2007) examined the effects of one RtI model, STEEP, within one school district on systemic outcomes including student eligibility for special services, the use of STEEP data for decision making, identification rates by ethnicity and sex, the placement costs for the district and the accuracy of decisions regarding students' response to intervention. Results indicated that using the STEEP model led to more accurate identification in which students should be tested for eligibility purposes, which reduced the number of cases referred for eligibility testing, and therefore, the district assessment costs. Additionally, this study provided initial evidence that RtI models may have the potential to decrease the disproportionality in identification rates by ethnicity and by gender.
In another study, Bailey (2014)  Another study (Noltemeyer & Sansosti, 2012) sought to examine the association between school-level RtI implementation (i.e. Ohio's Integrated Systems Model, which incorporates behavior into the typical academic RtI model) and student reading outcomes, measured by DIBELS oral reading fluency measures. An adherence checklist measured the percentage of RtI academic features and behavior features that were in place within the school. Results of the study indicated that academic implementation of RtI significantly predicted student performance on DIBELS oral reading fluency measure above and beyond controlling for student's initial scores on the oral reading fluency measure from fall of the prior school year.
Finally, Sharp, Sanders, Noltemeyer, Hoffman, and Boone (2016) conducted a study to examine the relationship of school-level RtI implementation and reading achievement. An empirically validated self-report RtI measurement scale, called the RtI Implementation Scale for Reading (RTIS-R), was used, which was developed based on five essential components of RtI (i.e. assessment, data based decision making, instruction, professional development, and treatment integrity) (Noltemeyer, Boone, & Sansosti, 2014). Results revealed three significant models that predicted student scores on the state achievement test. In the first model, the percentage of economically disadvantaged students explained 27.8% of the variability in scores and in the second model, the number of discipline referrals explained 8.1% of variability in scores. Finally, in the third model, data based decision making explained 7.2% of variability among scores above and beyond the demographic variables. Although the external validity of this study is limited, it is the first to quantitatively examine the relationship between the quality of school-level RtI implementation as a coherent system, and student level reading achievement. In addition, these results suggest that RtI implementation positively contributes to students' reading performance even after controlling for demographic variables.

Purpose of the Study
General knowledge and understanding of RtI educational service delivery methods and models is increasing as a result of a growing research base. However, there are several limitations to the available research on evaluating the fidelity of RtI implementation. First, the number of studies quantitatively examining RtI implementation as a complete, coherent system appears sparse (Glover & DiPerna, 2007). Second, due to the inconsistent definition of RtI, some studies may report to be examining the large-scale implementation of RtI, but may in fact be examining only a small-scale implementation (i.e. only some essential components are implemented) (Burns & Symington, 2002). If RtI implementation is defined by five essential service delivery components, then comparisons cannot be made between models with all five components implemented and models with less than five components implemented.
Finally, from the limited studies available, mixed results have been found regarding whether RtI implementation predicts student outcomes, or not.
The purpose of the current study is to extend the extant literature by using a rubric, developed by the National Center on Response to Intervention, to quantitatively evaluate RtI implementation and its' relationship to student reading outcomes in one large education district with a long-standing history of RtI implementation. Exploring how schools are implementing RtI, including the relationships between implementation and student outcomes, will help researchers and practitioners gain a better understanding of the pathway to optimal RtI implementation, which may vary based on school characteristics.
Specifically the following research questions were addressed: was not working regarding the RtI implementation process. Prior to this study, the district used self-report surveys to assess the implementation of RtI. However, administrators and special education personnel thought the results were inflated, which is a common limitation with self-report measures (Cook & Campbell, 1979). As a result, the executive director for the education district sought to find an objective measure to assess the quality of RtI implementation. In 2014, the education district chose to use measures created by the National Center on Response to Intervention as an objective measure of RtI implementation.
Following the selection of the measures, the executive director spoke to administrators from each member district to discuss the study and requested for a group of key personnel involved in RtI to be selected for a group interview. Next, the executive director set up meetings at each school with the selected teams. One two-hour meeting was scheduled at each school during March and April of 2015 to conduct the semi-structured interview. Three individuals were involved with conducting the interviews. These individuals were highly involved with the overall implementation of RtI throughout the education district, as their positions were shared core services that were involved in all member districts. Their positions included executive director for the education district, instructional services coordinator, and director for special education services. As a result, each person was highly knowledgeable about the continued efforts and current practices in place. Further, their experience working within the district varied; one individual had over 10 years of experience working in the district, one individual had a few years of experience in the district, and one individual was new to the district the year of the data collection.
During the interview, one individual facilitated the conversation and the other two individuals took notes. The note-takers also monitored the quality and quantity of responses to ensure enough information was collected to properly rate the responses on the rubric, and these individuals asked clarifying questions, as necessary. For example, clarifying questions included asking for permanent products to review. The executive director facilitated seven interviews, and the instructional services coordinator facilitated one interview.
Once the interviews were finished, the executive director and the instructional services coordinator, both trained in school psychology, separately scored the interviews using the RtI Fidelity of Implementation Rubric. Each item on the semistructured interview was assigned a rating. Inter-rater reliability was not calculated, and the information to calculate it was unavailable. However, the individuals compared their scoring and when disagreements were found they discussed their point of views, used their knowledge of the school, and came to a consensus on the best rating. In the end, the rater's agreed on a rating for each of the 31 items. This information was entered into an excel worksheet, school names were de-identified, and averages were calculated for each of the five components of RtI, as well as, for the total RtI score, which was the total of the five components. The data file was sent to the researcher of the current study.
Student benchmark data were collected three times during the school year: September (i.e. fall), January (i.e. winter), and April (i.e. spring). Following data collection at each time point, students' scores were entered on the AIMSweb online data management and reporting system. The outcomes services manager exported data from the AIMSweb website to an excel sheet to provide the researcher with R-CBM words read correct (WRC) scores and rate of improvement (ROI) scores for students in first grade through fifth grade at each of the eight elementary schools. Student data were de-identified using unique identification numbers to maintain student anonymity. identified to define each essential component. The worksheet contains five items related to assessments, three items related to data based decision-making, 12 items related to multi-level instruction, nine items related to infrastructure and support mechanisms, and two items related to fidelity and evaluation. All 31 items are used to establish overall RtI implementation. The worksheet was designed for use as a semistructured interview and sample questions are provided to guide a discussion between the interviewer/s and the interviewees. In addition to the discussion, the interviewer is encouraged to conduct observations and document reviews to collect sources of evidence and a rationale for rating each item (Elledge, 2010).

Measures
Following the interview, observations, and document reviews, a rating was assigned to each item on the worksheet using the RtI Fidelity of Implementation Rubric. The rubric provides a likert-type response format for each item, and schools may be assigned a one, two, three, four or five from the rubric, with a one being the lowest quality and a five being the highest quality of implementation. A narrative rationale is available for a rating of a one, a three, and a five. Although a narrative rationale is not available for a rating of a two and a four, it is clear that a rating of a two falls between a one and a three, and a rating of a four falls between a three and a five. After the ratings were assigned for each individual item, the mean of the items for each essential component was calculated, and that rating was used as the single value that represented each essential component. For the purposes of the current study, a mean rating of four or higher indicates high quality RtI implementation and a mean rating less than 4 indicates low quality RtI implementation.

Assessment.
Educational assessments are measures used to determine what students know and are able to do before, during, and after instruction (Green & Johnson, 2010). Three types of assessments are used within an RtI framework to inform decision making including screening, progress monitoring, and other supporting assessments (i.e. curriculum-based measures, state achievement tests) (NCRTI, 2014). The purpose of assessment is to provide early identification of students who may be struggling to meet grade level expectations by monitoring all students' responses to the general curriculum, as well as, to interventions (Burns & Gibbons, 2008).
The assessment component has a total of five items on the measure, and it is broken into two categories: screening and progress monitoring. Screening is described as measures used to identify students at risk of poor learning outcomes or challenging behavior, and three items fall under screening (i.e. screening tools, universal screening, & data points to verify risk). Progress monitoring is described as ongoing and frequent monitoring of progress to quantify rates of improvement and to inform instructional practice, as well as, the development of individualized programs. Two items fall under progress monitoring (i.e. progress-monitoring tools & progress monitoring process). As such, schools may earn a maximum of 25 points on this component. A low rating indicates that there is a need for improvement in assessment practices, and a rating of a five indicates that all conditions included in the item are being implemented. For example, on the item 'data points to verify risk', a rating of a one is described as, "Screening data are not used or are used alone to verify decisions about whether a student is or is not at risk," and a rating of a five is described as, "Screening data are used in concert with at least two other data sources to verify decisions about whether a student is or is not at risk (NCRTI, 2014)."

Data-based decision making (DBDM). DBDM represents the processes used
to inform instruction, intensity of intervention (i.e. movement within multilevel tiers), and disability identification (NCRTI, 2014). The DBDM component has a total of three items, which include decision making process, data system, and responsiveness to secondary and intensive levels of intervention. Schools may earn a maximum of 15 points on the DBDM component. A low rating indicates that there is a need for improvement in DBDM practices, and a rating of a five indicates that all conditions of DBDM are being implemented. For example, on the item 'data system,' a rating of one is described as, "A data system is in place that meets two or fewer of the following conditions, 1) the system allows users to document and access individual student-level data and instructional decisions; 2) data are entered in a timely manner; 3) data can be represented graphically; and 4) there is a process for setting/evaluating goals," and a rating of five is described as a data system that meets all four conditions (NCRTI, 2014).  rating of one is described as, resources are not allocated to support implementation, a three is described as resources are partially allocated, and a five indicates that resources are adequately allocated for implementation (NCRTI, 2014).

Fidelity and evaluation.
Fidelity and evaluation is described as a system for collecting an analyzing data to measure the adherence to and effectiveness of the RtI model (NCRTI, 2014). Two items are incorporated in this component, and schools may earn a maximum of 10 points (i.e. fidelity & evaluation). On this component, a rating of a one suggests that none of the conditions are met; a rating of three suggests that at least one condition is met, and a rating of five suggests that all conditions are met. For example, on the item 'fidelity,' a rating of a one is described as "Neither of the following conditions is met: 1) procedures are in place to monitor the fidelity of implementation of the core curriculum and secondary and intensive interventions" and a rating of a five is described as "Both of the conditions are met (NCRTI, 2014)." Stephanie Jackson). These experts are highly regarded leaders in the field of special education, who are well known for their long-standing involvement in researching RtI.
Additionally, these experts worked closely to ensure the appropriateness, breadth, and

Oral Reading Fluency (ORF). AIMSweb oral reading fluency (R-CBM)
measures are standardized, curriculum based assessments that are individually administered to all students three times a year. R-CBM is a measure of students' ability to read fluently with connected text (Good & Jefferson, 1998). A trained examiner administers the assessment by having a student read aloud for one minute from three grade-level passages, and the median number of words read correctly aloud per minute from the three passages is recorded. On this measure, student performance is measured by words read correct (WRC) in one minute. The same three passages are administered three times (i.e. fall, winter, spring) during the school year to screen for students who may or may not meet grade-level benchmarks, a process known as screening. National norms are available as a criterion for evaluating student success compared to other similar grade, or age, students across the nation. However, local norms were developed specific to the state in which the data were collected.
Development of local norms was based on the predictability of R-CBM scores to a state achievement test. The education district chose R-CBM as a student reading outcome measure because it is empirically validated as an indicator of a broader set of literacy skills (Good & Jefferson, 1998). Further, the results from the R-CBM are used in the education district to evaluate the effectiveness of core instruction and for data based decision making purposes including resource allocation and levels of support for students.

Rate of Improvement (ROI). Rate of improvement (ROI) is a measure of how
raw scores on the R-CBM increase during a given school year. Screening data may be used to calculate a ROI in terms of the number of words read correctly per minute gained per week. To calculate the ROI, the difference between the first and last scores is divided by the number of weeks in between collecting the scores (Fuchs, Fuchs, Hamlett, Walz, & Germann, 1993). As a result, using screening data, a ROI may be calculated for fall to winter, for winter to spring, and for fall to spring. Students' ROI informs schools if students are making adequate progress to meet end-of-year benchmark goals.

Participants
School Characteristics. This study was a preliminary study for a larger project that will examine district-level implementation of RtI in a rural education district and its' relationship to student outcomes. The project was conducted within an education district in the Midwest region of the United States during the 2014-2015 school year. For the present purposes, an education district is defined as a specialized type of union school district which affords its' member districts certain incentives if they create a union school district. More specifically, incentives may include fiscal equity, shared programs and core services (i.e. an executive director of the education district, a director of special education, an instructional services coordinator, services for students with physical and other health impairments), and administrative effectiveness that would otherwise be unaffordable by the member districts. In other words, a governing board, formed by representatives from each member district, makes decisions regarding funding and educational services.  School 1 is a Title I (i.e. receives financial assistance through federal grants available to districts serving low-income students) pre-kindergarten through fifth grade elementary school serving 373 students. In school 1, over a quarter (i.e. 33.5%) of students received free or reduced lunch, 5.9% were nonwhite students, and 14.2% of students received special education services.
School 2 is a kindergarten through second grade elementary school that served 485 students. In school 2, approximately a quarter (i.e. 24%) of the students received free or reduced lunch, 4.9% were nonwhite students, and 8% of students received special education services. Students who attended school 2 were filtered into school 3 for grades three through five.
School 3 is a third through fifth grade elementary school serving 546 students. In school 3, a quarter of the students (i.e. 24.7%) received free or reduced lunch, 5.1% were nonwhite students, and 12.6% of students received special education services.
School 4 is a Title I pre-kindergarten through sixth grade elementary school serving 397 students. Over half of the students (i.e. 61%) received free or reduced lunch, 14.9% were nonwhite students, and 13.1% of students received special education students.
School 5 is a Title I pre-kindergarten through sixth grade elementary school serving 480 students. Over half of the students (i.e. 64.6%) received free or reduced lunch, 30% were nonwhite students, and 15.4% of students received special education services. School five had the greatest diversity of all the elementary schools in the education district.
School 6 is a kindergarten though sixth grade elementary school serving 1,008 students. In school 6, 34.3% of students received free or reduced lunch, 6.6% were nonwhite students, and 11.5% of students received special education students.
School 7 is a kindergarten through sixth grade elementary school serving 846 students. Approximately half (i.e. 47.9%) of the students received free or reduced lunch, 6.6% were nonwhite students, and 10.4% of students received special education services.
School 8 is a kindergarten through sixth grade elementary school serving 470 students. Over a quarter of the students (i.e. 34%) received free or reduced lunch, 6.6% of students were nonwhite students, and 11.7% of students received special education services.
Interviewees. A small group, of approximately four to eight individuals, from each school participated in the semi-structured interview. With respect to the selection of individuals, faculty and/or staff who were identified as the key people involved with RtI implementation were invited, directed, or asked to attend by administrators in the district. Nellis (2012) Table 2 for the total number of participating students by school and by grade.
Schools 4, 5, 7, and 8 had student data available at all five grade levels. School 4 had a total of 269 students participants, school 5 had a total of 383 student participants, school 7 had a total of 591 student participants, and school 8 had a total of 319 student participants. School 6 had the most (i.e. 606) student participants and student data were available for grades one, two, and three. School 1 had a total of 241 student participants and student data were available for grades one, two, three, and four. School 2 had the least amount of student data available (i.e. 241 student participants), as it was the smallest school serving only grade one and grade two.
Finally, school 3 had 351 student participants and student data were available for grades three and four.

Informed Consent
Informed consent was not required for the current study. AIMSweb screening data were collected as part of a prevention framework that all students participate in as part of federal regulations for schools to identify all children who may require special education services (IDEIA, 2004). As a result, schools are permitted to administer these measures without collecting informed consent from parents. Further, school level data and student level data were de-identified before sent to the researcher.

Design
The current study employed a quasi-experimental, non-equivalent groups design. This design is commonly used in applied educational research because students cannot be randomly assigned into groups. Rather, intact groups based on student enrollment at a particular school are used. As a result, the groups are not equivalent, or as similar as they would have been if random assignment were used. It is important to note because any prior differences between the groups may affect the study outcomes and conclusions (Cook & Campbell, 1976).

Analyses
RtI integrity of implementation data were available for a total of eight elementary schools. The first part of the current study sought to examine differences in the fidelity of RtI implementation between the eight elementary schools. As a result of the small sample size (N=8) and the limited variability in the integrity of RtI implementation between schools, questions one and two were integrated, and examined using descriptive statistics (i.e. means, standard deviations, bar charts).
The second part of the current study sought to examine the relationship between fidelity of RtI implementation and overall student reading outcomes by school and by grade level. Student outcome data were not available for all grade levels at each school; thus, student outcomes were analyzed by grade level. Further, based on the available school-level data from the extant data set, and the small sample size (N=8) it was concluded that prediction methods were not the most appropriate method for analysis. Consequently, the third and fourth research questions were combined and modified to reflect this change. The reader is referred to Appendix A for a further explanation of the edited research questions. Rather, the new, edited, research question was analyzed using analysis of variance (ANOVA) to examine differences in student outcomes between schools, for each grade level. As such, 10 separate one-way ANOVA's were conducted. The relationship between fidelity of RtI implementation and student reading outcomes was analyzed qualitatively.

First Research Question
Descriptive statistics. The first research question explored to what extent staff in elementary schools are implementing the essential components of RtI with integrity, as measured by the RtI Essential Components Worksheet, between schools; and if differences exist between schools, how those differences can be characterized. A total of eight teams, one from each elementary school, were interviewed and assigned ratings on the RtI Fidelity of Implementation Rubric. As such, eight mean scores were available for comparison on each of the five RtI components and for the total RtI score. Table 3       received a mean rating of 3.67, schools 5 and 6 received a mean rating equal to 3.5, school 7 received a mean rating equal to 3.42, and finally, school 8 received a mean rating of 3.08. The greatest difference (i.e. 1.5) observed was between school 2 and school 8. Multi-tiered Systems of Support rating higher than a 2.5. Mean ratings on the infrastructure and support component ranged from 3 to 4.29. Only one school received a mean rating greater than or equal to 4 (i.e. school 2, M= 4.29), indicating higher quality of implementation. The remaining schools received mean ratings less than 4, indicating lower quality implementation.
Three schools received ratings between 3.5 and 4 (i. e. school 1, 3, & 7), one school earned a rating between 3 and 3.5 (i.e. schools 8), and the remaining three schools earned ratings between 2.5 and 3 (i.e. 4, 5 & 6) The greatest difference was observed between school 2 and schools 5 and 6 (i.e. 1.58). Schools 5 and 6 both earned the same mean rating (i.e. 2.71), indicating similarities in their quality of implementation.
Finally, given that only one school fell in the high implementation range, these results indicate that infrastructure and support is an area that generally needs improvement.  Alternatively, differences in RtI implementation are characterized by MTSS, infrastructure and support, fidelity and evaluation, and, consequently, total RtI score.
The greatest difference between the highest and the lowest ratings between schools was observed for infrastructure and support, followed closely by MTSS and fidelity and evaluation. Further, ratings were the least consistent on the fidelity and evaluation and the infrastructure and support components.

Second Research Question
Descriptive statistics and preliminary analyses. The second research question examined to what extent integrity of RtI implementation relates to differences in student outcomes, at each grade level across schools, as measured by AIMSweb oral reading fluency (ORF) data. Descriptive statistics include sample size, mean, and standard deviation are presented below for both words read correct per minute (WRC) (i.e. Table 5) and rate of improvement (ROI) (i.e. Table 6). Separate analyses were conducted for each grade level because data were not available for all grades at each school. Multiple methods were used to assess the normality of the distribution for both student outcome variables at each grade level. First, skewness and kurtosis were examined according to guidelines presented by Harlow (2014). Next, visual representations, including histograms, normal Q-Q plots, and boxplots were reviewed to verify normality of the data and to identify outliers. Finally, the Shapiro-Wilk statistic was obtained and a non-significant result indicated normality of the distribution. Five variables were found to have a non-normal distribution. For these variables, the nonparametric alternative to ANOVA (i.e. Kruskal-Wallis test) was conducted. For three of the variables (i.e. Grade 1 ROI, Grade 3 ROI, & Grade 5 ROI), results from the nonparametric test were in agreement with the results from the ANOVA, and the researcher chose to report the results from the ANOVA. For two variables (i.e. Grade 1 WRC & Grade 4 ROI) the results from the nonparametric test differed from the ANOVA results, and the researcher chose to report the results from the nonparametric equivalent test. Results from the normality testing are discussed below. A detailed comparison of parametric and nonparametric tests is presented in Appendix E. In addition to normality, homogeneity of variances is an assumption of an analysis of variance. The Levene test was conducted and reported below for each ANOVA to assess equal variances across samples.
Spring words read correct. The sample consisted of 699 first grade students.
The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. However, further examination, including the Shapiro-Wilk and examination of Q-Q plots tests, revealed that the data did not meet the assumption of normality due to outliers in the data. As a result, a Kruskal-Wallis test was conducted. Results showed that there was not a statistically significant difference in the amount of words students' read correctly between schools, χ 2 (6) = 10.507, p = 0.105 (see Table 7). Mean rank scores for each school are presented in Table 8.  Rate of improvement. The sample consisted of 698 first grade students. The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. Additionally, homogeneity of variance was not significant, Levene's F (6, 691) = 1.934, p = .073, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on rate of improvement from the fall benchmark to the spring benchmark was significant, F (6, 691) = 6.673, p = .000 (see Table 9). Tukey post hoc tests revealed that rate of improvement was statistically significantly lower in school five than in schools one (p= .000), four (p= .000), seven (p= .000), and eight (p= .006) (see Table 10). Additionally, tukey post hoc tests demonstrated that rate of improvement was significantly lower in school six relative to school four (p= .007). No other school differences were statistically significant. The η 2 = .055 indicated a small to medium effect size, and further, that 5.5% of the variation in first grade rate of improvement is attributable to differences in the ROI for first grade students between the seven schools. Spring words read correct. The sample consisted of 590 second grade students. The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. Additionally, homogeneity of variance was not significant, Levene's F (5, 584) = .332, p = .894, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on words read correctly at the spring benchmark was significant, F (5, 584) = 2.42, p = .035 (see Table 11).  Table 12). The η 2 = .02 indicated a small effect size, and further, that 2% of the variation in second grade words read correct is attributable to differences in WRC for second grade students between the six schools. Rate of improvement. The sample consisted of 585 second grade students. The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. Additionally, homogeneity of variance was not significant, Levene's F (5, 579) = 1.645, p = .146, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on rate of improvement was significant, F (5, 579) = 7.057, p = .000 (see Table   13). .41791), and school 8 (M= 1.3387, SD= .36385). The η 2 = .06 indicated a small to moderate effect size, and further, that 6% of the variation in second grade rate of improvement is attributable to differences in the ROI for second grade students between the six schools (see Table 14). Spring words read correct. The sample consisted of 762 third grade students.
The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. Additionally, homogeneity of variance was not significant, Levene's F (6, 755) = .910, p = .487, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on words read correctly at the spring benchmark was significant, F (6, 775) = 2.512, p = .021 (see Table 15). The η 2 = .02 indicated a small effect size, and further, that 2% of the variation in first grade rate of improvement is attributable to differences in WRC for third grade students between the seven schools (see Table 16). Levene's F (6, 751) = 1.863, p = .085, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on rate of improvement from the fall benchmark to the spring benchmark was significant, F (6, 751) = 5.169, p = .000 (see Table 17). .5695), and for school 8 (M= 1.4599, SD= .70227). The η 2 = .039 indicated a small effect size, and further, that 3.9% of the variation in first grade rate of improvement is attributable to differences in the ROI for third grade students between the seven schools (see Table 18). Spring words read correct. The sample consisted of 533 fourth grade students.
The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. Additionally, homogeneity of variance was not significant, Levene's F (5, 547) = 1.279, p = .271, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on words read correctly at the spring benchmark was significant, F (5, 547) = 3.445, p = .004 (see Table 19). further, that 3% of the variation in first grade rate of improvement is attributable to differences in WRC for fourth grade students between the six schools (see Table 20). Results showed that there was a significant difference in students' ROI from fall to spring, χ 2 (5) = 20.871, p = 0.01 (see Table 21). Mean rank scores for each school are presented in Table 22.  Spring words read correct. The sample consisted of 294 fifth grade students.
The test for normality, examining standardized skewness and kurtosis, indicated the data were statistically normal. Additionally, homogeneity of variance was not significant, Levene's F (3, 290) = .438, p = .726, indicating this assumption underlying the application of ANOVA was met. An analysis of variance showed that the effect of school on words read correct was significant, F (3, 290) = 5.341, p = .001 (see Table 23). further, that 5% of the variation in first grade rate of improvement is attributable to differences in WRC for fifth grade students between the four schools (see Table 24).  Table   25). .52163) and for school 8 (M= 1.0357, SD= .38266). The η 2 = .048 indicated a small effect size, and further, that 4.8% of the variation in first grade rate of improvement is attributable to differences in the ROI for fifth grade students between the four schools (see Table 26). Summary. The second research question sought to examine the extent to which the fidelity of RtI implementation relates to differences in students' ability to read accurately and fluently for one minute, measured by words read correct (WRC) and rate of improvement (ROI), between elementary schools, in grades one through five. The research question was analyzed in two steps. First, some general conclusions can be made about the differences in student outcomes between schools. A total of 25 significant school differences in student reading outcomes emerged across the five grade levels. Greater differences in student reading outcomes between schools were noted on the rate of improvement (ROI) variable than on the words read correctly (WRC) variable. Specifically, 19 significant differences between schools were found on the ROI variable and six significant differences were found on the WRC variable.
On the WRC variable, school 5 was consistently the lower performing school across grades. Schools 1, 3, 6, 7, and 8 had greater student reading outcomes than school 5 on the WRC variable.
The results were more varied between grades on the ROI variable. In grade one, schools 1, 4, 7, and 8 were higher performing than school 5. In grade two, school 1 was a higher performing school than schools 4, 5, 7, and 8. In grade three, schools 4, 7, and 8 were higher performing than schools 5 and 6. In grade four and grade five, school 5 was consistently the lower performing school, compared to schools 1, 3, 7, and 8. Finally, some school differences were recurrent across grade levels.
Second, qualitative, exploratory inferences were made to investigate the relationship between RtI implementation and differences in student outcomes. Table   27 was created to compare the significant differences in student outcomes related to ORF with schools fidelity of RtI implementation. The left column of the table lists the significant differences in student reading outcomes by grade level and the first school listed in the pair performed higher on student outcomes. The remaining columns represent the five essential components of RtI and the total RtI score. The checkmark represents that the first school listed in the pair had a higher fidelity of implementation score than the second school listed in the pair. When reading across each row, the reader can determine on which components the school that performed better on reading outcomes related to ORF also had higher RtI fidelity scores.
The following are qualitative inferences were made based on the results listed in Table 27. The three schools with the highest total RtI scores (i.e. schools 1, 2, & 3) did not have significantly lower student reading outcomes than any other school, at any grade level. However, it is notable that the schools with the greatest RtI total scores also had the least amount of grade levels available for the analysis. All schools that performed significantly better than another school had at least one component, and/or RtI total, that was greater than the lower scoring school. Next, an analysis of all 25 significant differences revealed that 92% of the schools scoring higher on student reading outcomes had higher infrastructure and support ratings, 80% of the schools scoring higher on student reading outcomes had higher RtI total ratings, and 72% of the higher school had higher fidelity and evaluation ratings. In other words, one possible explanation for the trend observed in the data is a positive association between significantly better student reading outcomes related to oral reading fluency (ORF) and higher ratings on the infrastructure and support component, the fidelity and evaluation component, and the RtI total score. Similarly, 72% of the schools that scored significantly higher on student reading outcomes had three or more components, including RtI total, that were rated higher than the lower performing school, with half (i.e., 50%) of that 72% having higher ratings for all 5 components and for RtI total (i.e., all 6 aspects of RtI). In addition, higher ratings on MTSS appear to be least related to differences in student outcomes. Finally, it is possible that assessment and data based decision making (DBDM) also impacted differences in student outcomes between schools, yet the current study was unable to address this because of the similarity of implementation between schools. It is important to note that these inferences are speculative and exploratory in nature.
Although school 5 and school 8 have similarly low RtI ratings, school 8 had significantly better student outcomes than did school 5 at three grade levels (i.e., grade 1, 3, 5). This finding suggests that other variables may be impacting student outcomes.
It is possible that demographic variables, such as the percentage of students receiving free/reduced lunch, the percentage of nonwhite students, and/or the percentage of students receiving special education services, influenced the difference in student outcomes, as school 5 is the most diverse school on all of these factors. In addition, school 5 and school 4 have the greatest overall diversity, with similar percentages of students receiving free/reduced lunch and the percentage of nonwhite students.
Although these two schools have a greater percentage of nonwhite students, school 4 has approximately half the nonwhite students as school 5, Despite these commonalities, the results from the ANOVA's were quite different for these two schools, as school 5 consistently had lower student outcomes, and school 4 had significantly higher student outcomes than school 5 in grades 1 and 3. Finally, school 4 received higher ratings than school 5 on all five RtI components, and on RtI total.
One possible explanation for the significant differences in student reading outcomes related to ORF between school 4 and school 5 is the greater percentage of nonwhite students in school 5. Another possible explanation for the significant difference in student outcomes between schools may be that implementation of RtI based educational service delivery may have an impact on student outcomes above and beyond demographic characteristics.

Table 27
Comparison Of Significant Post-hoc Tukey Tests and Fidelity of RtI Implementation

Discussion
The primary purpose of the current study was to explore several aspects of the fidelity of implementation of RtI, including (1) the extent to which elementary schools were implementing RtI as measured by the RtI fidelity of implementation worksheet and rubric, (2) how differences in implementation could be characterized, and (3)  Examining the relationship between fidelity of RtI implementation and student reading outcomes related to ORF was exploratory in nature and qualitative inferences were used to speculate about the data. Regarding the relationship between RtI implementation and student outcomes, the data indicated mixed results as follows.
Some of the results demonstrated a trend indicating there was a possible positive association between RtI implementation and student reading performance and growth.
For example, most of the data support a relationship between significantly higher student reading outcomes and higher ratings on the infrastructure and support component, the fidelity and evaluation component, and the total RtI score. In addition, the majority of the data (i.e., 72%) support a relationship between high quality implementation on three or more components and higher student outcomes. Finally, schools that scored in the high implementation range on total RtI had student outcomes that were equal to or better than student outcomes from any other school across grade levels. Despite indications of a positive association between fidelity of RtI implementation and student outcomes, other results failed to support a clear relationship between the two variables. For example, the schools with the highest quality of RtI implementation were not significantly different, statistically, on student reading outcomes as compared with schools with lower quality RtI implementation.
This was surprising, and counter to the researcher's expectations. It is possible that this was related to these schools providing the least amount of data. For example, school 2 only provided data for grade one. Another possible explanation for this finding is that student reading outcomes related to ORF were generally higher at the beginning of the school year for these schools, and ceiling effects of the measure may have limited the rate of improvement measure. Second, the two schools with the lowest RtI implementation ratings overall were found to have significantly different student reading outcomes at multiple grade levels. That is, school 8 performed better than school 5 in grades one, three, and five. In summary, some of the data support a positive relationship between RtI implementation and student reading outcomes, whereas, some of the data support the conclusion that the relationship is inconsistent.
A greater understanding of fidelity of RtI implementation, measured by the quality of RtI implementation, and its' relationship to student reading outcomes related to ORF may help schools understand which components of RtI are critical to changes in student outcomes. Additionally, it may guide schools toward the optimal use of time and resources to positively impact student outcomes through RtI. The current study is one of the first studies to examine the fidelity of RtI implementation as a system of coordinated components.

How Are These Results Similar To And Different From Those Of Previous Studies?
The first research question described RtI implementation in eight elementary schools. In the current study, assessment and data based decision making (DBDM) were found to have the highest quality of implementation, which was consistent with prior research (Sharp et al., 2016). Bailey (2014) also found that assessment was the area of RtI with the highest quality of implementation; however, DBDM was found to be the most challenging for schools and was implemented with low quality. In the current study, the variables infrastructure and support and fidelity and evaluation were the two areas of RtI implementation that received the lowest scores. These results were consistent with Sharp et al., (2016) who found that professional development (i.e., supports) and fidelity of implementation received the lowest scores.
Although there were similarities between the current study and Sharp et al., (2016) regarding the areas of RtI with the highest and lowest quality of implementation, it is notable that the scores from the current study were generally higher (i.e., ranging from 3.37 to 4.86) than those of Sharp et al. (i.e., ranging from 1.95 to 2.97), indicating an overall greater quality of RtI implementation. It is possible that the discrepancy in scores is related to the number of years the schools were implementing RtI, as in Sharp et al., (2016) the mean number of years was 3.97 and in the current study the schools were implementing RtI for approximately 20 years. Similarly, the mean overall RtI implementation score in the current study was 79% compared to 55% found in a prior study, with a similar sample size, using an alternative RtI implementation integrity measure (Noltemeyer & Sansosti, 2012). Next, the current multi-tiered system of support (MTSS) appeared to be least associated with student reading outcomes related to oral reading fluency, which is inconsistent with prior research. For example, Noltemeyer and Sansosti (2012) found that academic implementation of MTSS was a significant predictor of student reading outcomes as measured by oral reading fluency. Finally, the current study is the first to examine a "total RtI score" that incorporated all components of RtI implementation. Taken together, the variation in RtI implementation scores across studies reinforces the notion that RtI implementation varies across schools, districts, and states.
Prior studies have suggested that a relationship exists between overall fidelity of RtI implementation and student reading outcomes (Noltemeyer & Sansosti, 2012), and between data-based decision making and student reading outcomes (Sharp, Sanders, Noltemeyer, Hoffman, & Boone, 2016). As discussed above, the results from the current study have preliminary indications for mixed results regarding the relationship between RtI implementation and student reading outcomes. Mixed results may have been found due to other factors aside from the fidelity of RtI implementation such as student demographic characteristics and teachers' experience implementing RtI. Further examination of the results also leads to some interesting exploratory speculation for discussion on fidelity of RtI implementation.
For example, one issue that emerged from the results is related to diversity, and it is described in the context of the relationship between schools 5 and 8 and the relationship between schools 4 and 5. First, school 5 and school 8 acquired similar ratings on the RtI implementation integrity measure, which were characterized by ratings that fell in the high implementation range for assessment and DBDM, and ratings that fell in the low implementation range for MTSS, infrastructure and support, fidelity and evaluation, and total RtI score. Further, these two schools received the lowest total RtI ratings compared to the other six schools. Considering the commonalities in the RtI ratings, it was expected that these schools would have similar student reading outcomes. However, student reading outcomes in school 8 were significantly higher than student reading outcomes in school 5 in grades one, three, and five. Further investigation into these differences revealed that these two schools varied greatly on demographic characteristics. For example, the lower performing school 5 had 64.6% students receiving free/reduced lunch compared to 34% in school 8. Similarly, school 5 had 30% nonwhite students compared to 6.6% in school 8. Two prior studies have explicitly examined the relationship between RtI implementation and student reading outcomes while controlling for demographic variables and found mixed results. More specifically, one study found that the percentage of economically disadvantaged students and the percentage of non-minority students were not significant predictors of student outcomes; however, two schools were removed from the sample to make the sample more homogeneous (Noltemeyer & Sansosti, 2012). In contrast, another study found that the percentage of economically disadvantaged students was a significant predictor of student outcomes accounting for more variability in scores than all RtI implementation components (Sharp et al., 2016).
Thus, it is important to consider whether the observed differences in reading outcomes between school 5 and 8 might be attributable to differences in students SES and diversity.
On a related note, school 4 and school 5 were the most diverse participating schools in the study. For example, school 5 had 64.6% students receiving free/reduced lunch compared to 61% at school 4. In addition, school 5 had 30% nonwhite students compared to 14.9% in school 4. The remaining six schools had between 4.9% and 6.6% nonwhite students, and between 23.9% and 47.9% students receiving free/reduced lunch, which demonstrates the greater diversity in schools 4 and 5 compared to the other schools. School 5 and school 4, however, differ in their fidelity of RtI implementation scores. School 4 earned higher ratings on all RtI components, and total RtI score, than school 5. Both schools fell in the high implementation range for assessment and DBDM, yet school 4 also fell in the high implementation range on the infrastructure and support component. Results from the study revealed that student reading outcomes in school 4 were significantly higher than student reading outcomes in school 5 in grades one and three. Despite the diversity in school 4, student reading outcomes were significantly higher than school 5. The results from the current study are consistent with prior studies suggesting that demographic variables should be accounted for when examining the relationship between RtI implementation and student outcomes.
The second issue that emerged from the results is related to grade level. Some of the differences in student outcomes found between elementary schools were consistent across multiple grade levels, and some were not. School 4 had significantly higher student reading outcomes than school 5 and school 6 in grade one and in grade three. School 1 had significantly higher student reading outcomes than school 5 in grade one, grade two, and grade four. School 8 had significantly higher student reading outcomes than school 5 in grade one, grade three, and grade five. Finally, school 7 had significantly higher student reading outcomes than school 5 in grade one, grade two, grade three, and grade five. The remaining significant differences in student reading outcomes between elementary schools emerged at only one grade level. It is unclear why the results are more robust, or consistent across grade levels, for some differences in student reading outcomes between schools and not others.
These data suggest that further investigation into the relationship between fidelity of RtI implementation and student reading outcomes, by grade level, should be addressed in more detail.
Another issue related to the current study is the use of oral reading fluency (ORF) as the dependent measure. There are limitations associated with using only ORF based scores to evaluate student reading outcomes in relation to RtI. First, despite being developed as a measure of rate and accuracy (i.e. fluency), some researchers have found that teachers perceive ORF as a measure of accuracy and speed, or students' ability to read quickly in one minute (Deeney & Shim, 2016).
Second, some researchers have argued that oral ready fluency measures such as R-CBM (i.e. measure used in the current study) do not measure the full construct of oral reading fluency because the measure does not directly address reading comprehension and expression (Hosp & Suchey, 2014 These are all important factors that impact students' ability to learn in school. Also in future research, other variables should be considered to include more robust reading outcomes that could potentially capture the impact of the challenging and complex process of RtI implementation.
Overall, the results from the current study indicate that the relationship between fidelity of RtI implementation and student reading outcomes is not direct.
Many studies have shown a clear and direct relationship between student reading outcomes and reading instruction (Joseph, 2014). Although many studies have claimed that RtI-based practices improve student reading outcomes, those studies have not provided implementation integrity data, limiting the ability to conclude that improved student outcomes were a result of RtI implementation (Dexter, Hughes, & Farmer, 2008). It is also the case that RtI based educational service delivery is grounded in, ideally, the provision of high quality, differentiated instruction. Thus, it appears that other factors mediate and/or moderate the relationship between fidelity of RtI implementation and student reading outcomes related to oral reading fluency (ORF), and more research is needed to understand the circumstances and nuances that mediate/moderate the relationship. For example, use of other reading outcome variables such as reading comprehension may produce different results. This conclusion is more consistent with the extant literature base, which contends that the multilevel and systematic nature of RtI, with all of its moving parts, makes it challenging to examine as a coherent system (Keller-Margulis, 2012).

Limitations
Although this study yielded useful and important information with respect to the quality of RtI implementation and its relationship to student reading outcomes, there are limitations as well. Numerous limitations resulted from the use of an extant data set. First, a primary limitation was the small sample of schools (n = 8) available for analysis. With such a small sample size, there was limited variation in the quality of RtI implementation and, consequently, meaningful inferential analyses with RtI as the dependent variable could not be conducted. This means that differences in RtI implementation between schools could not be analyzed statistically. In addition, the schools with the highest levels of RtI implementation across the five components had the least amount of student reading outcome data available. As a result, a direct comparison of a high quality implementation group and a low quality implementation group was not achievable. Further, the lack of variation in the quality of RtI implementation impacted the reliability analysis for the RtI measure (see Appendix D).
A second limitation of the extant data set was the multiple (or inconsistent) levels of available information. More specifically, three levels of data were provided: school level RtI implementation information, individual student outcome information, and aggregated student demographic information. With the data only available at these different levels, the analyses that could be conducted were restricted. One limitation to the analysis was the inability to include demographic variables in the analysis because in order to enter the variables as covariates, the demographic information must be available at the same level as the dependent variable (i.e., student level). However, the dependent variables (i.e., ROI and WRC) were individual student level information and the demographic variables were aggregated student level information. Accounting for demographic variables would have improved the internal validity of the study by increasing the confidence in attributing differences in student outcomes to RtI implementation (Cook & Campbell, 1976). Although the student outcome data could have been aggregated (i.e., by calculating the mean) to match the level of the demographic information, the school sample size was too small to yield useable information, as there would have been only eight data points.
Another limitation that resulted from the different levels of information was the inability to conduct a regression analysis. This is because regression analysis requires the level of the data to be consistent across the independent variable/s and the dependent variable/s (Harlow, 2014 Unfortunately, no one measure is available to measure the full construct of reading fluency (Hosp & Suchey, 2014). Consequently, it may be beneficial to replicate the current study with the use of multiple dependent variables that reflect student reading outcomes.
Finally, the extant data set does not include a control for comparison. In the current study, a control could have manifested in two ways. First, schools with limited or no experience implementing RtI may have been used as a control group. Second, student reading outcome data prior to RtI implementation may have been used as a control. Examining prior student outcome data would have provided an estimate of student growth in the same schools before RtI was introduced and allowed for comparison of student growth before and during RtI implementation. As such, including some type of comparison would have increased the overall internal validity of the study because a direct comparison, such as the two mentioned above, would increase confidence that differences in student reading outcomes were a result of the RtI implementation rather than some alternative variable/s (Cook & Campbell, 1979).

Implications for Practice
RtI as a complete and coherent system. RtI is a comprehensive framework for providing multiple levels of service delivery to meet individual student needs.
Assessment, DBDM, MTSS, infrastructure and supports, and fidelity and evaluation have been identified as the five essential components that are critical to defining an RtI model (NCRTI, 2014). Consequently, all five components must be formally in place to attribute changes in student outcomes to implementation of RtI, and to apply RtI to special education eligibility decisions. However, there is no single method to implement RtI appropriately, and various models of RtI exist to fit the structure and culture of school districts and individual schools; some adaptation is permitted and inevitable when implementing programs or interventions in applied settings. The extant literature on RtI has focused on the implementation of the individual components of RtI; however, when implemented as designed, the crux of RtI is the relationship between all components being implemented well. For example, the information collected from assessments is used to advise decisions about student learning (DBDM), and decisions formulated from DBDM leads to movement between multiple tiers of student supports (MTSS). Further, infrastructure and supports either assist and encourage, or create obstacles for implementing assessment, DBDM, and

MTSS.
Arguably, the most important aspect of the RtI framework is fidelity and evaluation (i.e. treatment integrity). Fidelity and evaluation is the underlying piece that unifies the components. As RtI is a continuous cycle of information gathering, decision making, and evaluation, the effectiveness of the framework is reliant on the integrity of each step. Specifically, if the information gathered is not reliable and valid, then decisions about student learning are likely to be inaccurate, and as a result, the tiers of support or movement between tiers may not be appropriate. In other words, if the implementation fidelity of the process breaks down in any one component or aspect within RtI, then the entire system is likely to be flawed (VanDerHyden et al., 2007). In this regard, the current study found that a trend emerged supporting a positive relationship between fidelity and evaluation scores and student reading outcomes. Given the importance of fidelity of implementation to the effectiveness of the overall RtI framework, efforts should be made to support the integrity of implementation. Limited research is available to describe the best methods for supporting the implementation of school-wide programs such as RtI, so it is advantageous to think about ways to generalize what has been learned from the intervention plan implementation research and think about how it can be applied on a broader scale.

What do we know about intervention plan implementation in schools?
Implementing educational programs and interventions in schools is more complex than in a controlled research setting. Previous research has revealed that fidelity of implementation is reduced in applied settings, such as schools, and implementation of interventions is often poor and declines over time, resulting in varied levels of implementation (Noell & Gansle, 2016;Odom et al, 2010). Numerous conceptualizations for the measurement of fidelity of implementation of interventions or programs have been theorized and examined in applied settings (Noell & Gansle, 2016). Despite the rationale supporting more in depth methods for measuring fidelity of implementation (i.e. quality, program differentiation), one of the more basic methods has received the most support. 'Procedural adherence' is the recommended method for measuring fidelity of implementation of interventions in the school setting because it has been found to be the most practical (i.e. reliable and feasible) and successful (i.e. related to changes in student achievement) method (Sanetti & Kratochwill, 2012). To collect procedural adherence data, an educator uses a predetermined list of discrete, observable steps of the intervention and determines whether each step occurred or did not occur during administration. As a result, the percentage of procedural steps completed is calculated and determines the level of integrity to the intervention process. Mixed results have been found regarding the relationship between procedural adherence and student outcomes (i.e. academic and behavior) . For example, it is likely that all steps of an intervention are not equally important, and the importance of the steps may vary over the course of implementation (Schulte, Eaton, & Parker, 2009;Noell & Gansle, 2006). In another example, declines in procedural adherence have been found to reduce child compliance for some students while other students demonstrate growth (Leon et al. 2014). Leon et al., 2014 indicates that variability in response to declines in procedural adherence exists across individuals. It is possible that some students require less procedural adherence than other students to benefit from an intervention; however, the underlying reason why that occurs is unclear. Further, the types of administration errors (i.e. omitting versus incorrectly administering) may impact the effectiveness of the intervention (Leon et al. 2014).
Given the variation in individual response to the same intervention, it is important for implementation to be monitored and supported to achieve the greatest success from the intervention. Research suggests that direct methods such as observation and permanent product review are more effective than indirect methods such as self-report to monitor intervention implementation (Sanetti & Kratochwill, 2012). Regarding support for implementation, performance feedback is the most successful method for supporting fidelity of intervention implementation supported by the literature (Noell & Gansle, 2016). Other methods, including increased training and the quality of relationship between the practitioner and the student/s, have not been consistently supported in the literature (Noell & Gansle, 2016). Performance feedback involves direct measurement of the intervention implementation and a follow-up meeting to provide feedback regarding the procedural adherence to the intervention. In one study, Codding, Feinberg, Dunn, and Pace (2005) found that performance feedback was effective when used on a daily basis, and when the feedback was faded up to two weeks. In addition, to facilitate implementation of interventions in applied settings, resources such as time, space, personnel, and training need to be shifted to meet the needs of the intervention (Maltzman, 2016).
Based on the available information on fidelity of implementation of interventions or programs in schools, it is clear that fidelity of implementation is important, yet achieving high levels of fidelity is a complex undertaking. Rather, research suggests that fidelity of implementation is essentially human behavior and, in schools, the behaviors of educators are maintained by the environment and context within which teaching occurs (Sanetti & Kratochwill, 2012;Maltzman, 2016).

Context of RtI implementation.
As a prevention and early intervention model, the RtI approach is predicated on the belief that student learning difficulties emanate from a mismatch between instructional activities/resources and student needs, and that all students can learn and make progress if provided with the appropriate supports (Burns & Gibbons, 2008). In addition, the implementation of RtI models is an evolving process that encourages flexibility in school systems and infrastructure to effectively meet the needs of students. Further, schools are busy and complex environments, and educators have an overwhelming number of responsibilities to accomplish on a day-to-day basis including the implementation of interventions. The moving parts of RtI implied in the research (i.e. evolving systems and infrastructure, numerous responsibilities) make it more challenging to define, measure, and evaluate implementation.
The current study applied the concept of fidelity of intervention or program implementation at a broader level (i.e., RtI framework) than that of individual interventions, and incorporated the context of RtI implementation in the evaluation.
Results suggested a general trend emerged supporting a positive relationship between integrity of infrastructure and supports and student reading outcomes, which supports the notion that systems level supports are necessary for successful RtI implementation and, more specifically, for the integrity of interventions implemented within all three tiers of RtI. In addition, the current study provides support that the fidelity of RtI implementation is context dependent and the supports that are most effective for one school may not be the most effective supports for another school. This is apparent in the varying levels of implementation within the same collaborative school district, which has the same infrastructure and support services available to the elementary schools. These results raise the question of whether collecting fidelity of implementation data at the broader level is feasible for typical schools and school personnel.
Feasibility and recommendations for collecting RtI fidelity of implementation data. The literature clearly demonstrates that fidelity of implementation is related to the success or failure of interventions or programs in schools (Sanetti & Kratochwill, 2010). Despite this knowledge, many questions remain unanswered regarding the most effective methods for supporting implementation. It is likely that questions remain unanswered because schools are complex behavioral systems, in which the implementation of interventions is one out of a number of competing demands. In contrast to a controlled research setting, educators are limited by the resources, infrastructure, and support systems available to them, which contributes to the necessity of implementation adaptation and flexibility that has been found to be inevitable in schools (APA, 2005). Given the intricacies of fidelity of intervention implementation at the individual and small group level, examining fidelity of implementation at the broader, RtI, level may be intimidating and may appear impractical in both research and practice.
Fidelity of implementation is arguably the most important aspect of the RtI framework because it can be, and should be applied on a small scale as well as a large scale when RtI is implemented as designed. On a small scale, fidelity of implementation information can be collected for tier I whole class instruction, tier II group interventions, tier III individualized interventions. Such information may also be useful in the monitoring of the data based decision making (DBDM) process.
Examining the components of RtI and the total RtI process at a broad level, not only allows schools to attribute changes in student outcomes to the implementation of RtI, it also informs schools with respect to how to improve educational level practices (Schulte, Easton, & Parker, 2009 What teachers need to feel supported in implementing RtI. Based on the results from the current study, it is clear that teachers need support to implement RtI with fidelity. It is important that support mechanisms set a tone of encouragement and assistance, rather than one of evaluation and criticism. Teachers need strong leadership from administration and RtI teams (i.e. leadership teams, problem solving teams), as well as, optimism and acceptability of the RtI process. Implementing RtI fully and well, involves a process that occurs over multiple years, and this understanding is crucial in helping school personnel engage in high quality implementation one step at a time. Teachers also need access to appropriate resources to assist with implementation. For example, it would be helpful for teachers to have a list of evidence-based practices for tier II and tier III interventions available for their use.
Teachers should not be expected to develop materials to carry out interventions in their classroom. It would also be helpful to have data specialists and intervention specialists to assist with the implementation of tier II and tier III interventions.
Unfortunately, it is too commonly seen in schools that general education teachers are expected to carry out these small group and individualized interventions while also managing their classrooms and providing appropriate instruction for large numbers of children. Support mechanisms should allow teachers to feel comfortable and confident with their role in the RtI implementation process, and not add to their numerous responsibilities in a classroom. Similarly, professional development opportunities should be lead by the teachers and the school personnel who are actively engaged in the RtI implementation process. Professional development should be directly applicable to teachers' practices/roles and provide for hands-on opportunities to practice newly learned skills. Finally, making mistakes and learning from mistakes should be encouraged in schools. Too often observations of teachers are associated with evaluation and accountability, rather than an opportunity to learn and improve.
Schools would benefit from creating a culture in which teachers feel comfortable trying new practices or techniques and one in which receiving feedback about their practices and implementation is viewed positively. Although these are some examples of how to support teachers, this is not an exhaustive list. It is important for researchers to learn more about the supports teachers are interested in and to examine the most effective ways to implement those supports in schools.

Future Directions
In the current study, RtI implementation in eight elementary schools was described and differences in student reading outcomes related to oral reading fluency (ORF) were discussed within the context of RtI implementation. Mixed results were found, as some of the data supported a trend indicating that greater RtI scores were related to greater student outcomes and some of the data suggested that greater student outcomes were not related to RtI scores. However, the sample size was small and the extant data set did not include the levels of information necessary to run an anticipated prediction model. As a result, the current study was not able to directly examine the relationship between the implementation of RtI components, and total RtI, and student reading outcomes. These relationships could be studied in future research with some adjustments to the data set. First, individual demographic information would allow the researcher to control for differences in student outcomes due to demographics. The current study had access to school-level demographic information; however, valuable information is lost when aggregating information that has substantial research supporting its' relationship to student outcomes. Further, having this information available would allow researchers to disaggregate findings for subgroups and determine if the impact of RtI implementation varies by demographic characteristics.
Second, with a larger sample size, or a greater number of schools in the data set, multi-level modeling is another option for analyzing the data with the purpose of prediction. Multi-level modeling would additionally begin to inform if some components are more important than other components within the RtI framework, or if the importance of components are related to other factors such as demographics characteristics of the school. Third, to directly assess whether high implementation of RtI is more effective than low implementation of RtI, the data set should include greater variability across schools in total RtI implementation, as well as variability (i.e. numerous high implementation and numerous low implementation) between schools within each essential component. In the current study, all schools fell in the high implementation range on assessment and DBDM, limiting the variability on these components. Examining this topic with two, or three, levels of implementation and being able to directly compare them would be valuable to the literature and to the overall discussion of RtI effectiveness. Studying the varying levels within components as well will provide valuable information regarding whether some components are more important than others and if weighting the components is a better representation of the framework rather than treating all components equally when examining RtI Four research questions were presented in the proposed study. However, the data required to answer these questions were not available in the extant data set. As a result, the research questions were integrated and edited to reflect the available data.
The proposed research questions, as well as, the edited questions are listed below.
The first two research questions were integrated and analyzed together. In the proposed study, the second question intended to use an analysis of variance (i.e. ANOVA) to examine differences between schools on the RtI measure. However, the sample size (i.e. number of schools) was too small, and there was not enough variation in the ratings for the analysis to be meaningful (Rutherford, 2000). Instead, the current study qualitatively described the differences between the schools' levels of RtI implementation on the five subscales and the total RtI scale.
The third proposed research question intended to examine if RtI implementation predicted student outcomes, using multiple regression, and the fourth proposed question intended to examine if the predictors differed by grade level. To run a multiple regression analysis, the data for the predictor variables (i.e. RtI scale & subscales) and the dependent variables (i.e. student outcomes) must be at the same level. However, the extant data set provided varied levels of data, including studentlevel outcomes and aggregated school-level RtI ratings. Further, the researcher could not aggregate student outcomes (i.e. use the mean) to run the analysis because the sample size (i.e. eight schools) was too small. The rule of thumb for running a multiple regression is to have a minimum of 30 data points (Harlow, 2014), and the current study would only have eight data points. Additionally, all grades levels were not available at each school, which required the data to be analyzed by grade level. As a result, the third and fourth research questions were integrated, and modified to examine differences in student outcomes by school, and by grade level.
The proposed research questions: 1. To what extent are staff in elementary schools implementing the essential components of RtI with integrity (using an integrity rubric identified by the American Institutes for Research)?
2. To what extent are there differences between the integrity of implementation of the essential components of RtI, and overall RtI, as measured by the RtI Essential Components Worksheet between schools; and if differences exist, how can those differences be characterized?
The MTSS (rα= .843) and infrastructure and support (rα= .811) subscales fell in the good reliability range (Nunnally, 1967). Cronbach's alpha was not calculated for the fidelity and evaluation subscale because one of the two items was removed due to zero variance. Finally, Cronbach's alpha was calculated for total RtI, which included all 29 items, and fell in the acceptable reliability range (rα= .776).
Given the limitations in the first analysis, the researcher additionally examined the internal consistency of the measure using a sample of 15 schools, which included eight elementary schools, two middle schools, and six high schools, as a comparison with a larger sample and greater variation. The results are presented in the table below.
When all 15 schools were examined together, and there was greater variation in the ratings, the results started to approach more acceptable levels of reliability across all subscales. Although the internal consistency analysis was limited given the small sample size and the subscales with too few items, it is encouraging that in both samples, the total RtI fell in the high reliability range. The internal consistency of the RtI measure should be examined in future research with a larger sample size. ANOVA) and the nonparametric equivalent test (i.e. Kruskal-Wallis test), for all variables that did not meet the normality assumption. Results from the Kruskal-Wallis test were consistent with the results from the ANOVA on all variables except grade 1 WRC and grade 4 ROI. Given that the skewness and kurtosis were acceptable, only some of the normal Q-Q plots moderately failed due to outliers, and the results for the two analyses were consistent, the researcher chose to report the ANOVA results for all variables except for grade 1 WRC and grade 4 ROI. In summary, grade 1 WRC and grade 4 ROI were analyzed using the Kruskal-Wallis test, and the remaining variables listed below were analyzed using a one-way ANOVA.