Appam 2007 Draft

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Appam 2007 Draft as PDF for free.

More details

  • Words: 10,139
  • Pages: 27
Issues in the Design of Randomized Field Trials to Examine the Impact of Teacher Professional Development Interventions

Andrew Wayne Kwang Suk Yoon Stephanie Cronen Michael S. Garet American Institutes for Research Washington, DC Pei Zhu MDRC New York, York

Prepared for presentation at the annual meeting of the Association for Public Policy and Management (APPAM) November 10, 2007

For comment only. Do not quote without the express written permission of the authors. [email protected] [email protected] [email protected] [email protected] [email protected]

1

INTRODUCTION One strategy for improving student achievement is to provide teachers with specialized training. In addition to training provided prior to the start of the teaching career – i.e., preservice training – teachers also receive training during their careers. This strategy is called in-service training or professional development. (PD) Today, virtually all public school teachers participate in professional development programs each year (Choy, Chen, and Bugarin, 2006, p. 47). Professional development programs are also a focus in NCLB, through which states and districts spent approximately $1.5 billion on teacher professional development in 2004-05 (Birman et al., 2007). Although NCLB provides resources for teacher professional development, the broader effect of NCLB has been to promote accountability for student achievement. NCLB also encourages school districts to adopt programs and practices that have been proven by scientifically based research. There is thus a significant need for rigorous studies of the effect of teacher professional development on student achievement. Sponsors of professional development initiatives, such as the National Science Foundation, are particularly eager to find ways to evaluate the effects that their professional development programs have on student achievement. As we note later, the existing literature demonstrates that carefully constructed professional development, delivered on a small scale, can have an effect on student achievement. However, many question the effectiveness of typical professional development and are skeptical about whether PD as currently practiced can improve achievement in challenging school contexts. This paper provides a discussion of issues that must be confronted in designing studies of the impact of teacher professional development interventions using randomized controlled trials. Although not the only method for studying impact, randomized controlled trials address the problem of selection bias – the problem that those “selected” to receive an intervention are often different from those not receiving it. Selection bias is particularly problematic for those studying professional development. Many districts have professional development offerings that they expect only a small cadre of ambitious teachers will join. To complicate matters further, professional development is often mandated for all teachers in school that have been identified under accountability systems. Clearly those participating in the mandated intervention are likely to be different from those not participating. Researchers designing new randomized controlled trials focused on teacher professional development interventions face a common set of methodological challenges. This paper will begin with a review of recently completed studies of teacher professional development. The paper will then discuss a series of design issues associated with using randomized controlled trials to study teacher professional development. Because there is already a significant methodological literature on experimental design, the focus of this

1

paper is on the design issues that arise when applying experimental methods to the study of teacher professional development in particular. This paper is of course not meant to be a comprehensive guide to designing rigorous studies. Instead, it focuses on the issues that are especially challenging in designing and implementing randomized controlled trials of professional development interventions.

EXISTING STUDIES OF THE IMPACT OF TEACHER PROFESSIONAL DEVELOPMENT In her presidential address to the American Educational Research Association, Borko (2004) described the extensive literature on professional development and teacher learning, and she noted the progress that had been made in rigorous impact studies of the impact of professional development. She stated, “We have evidence that professional development can lead to improvements in instructional practices and student learning.” As we discuss in this section, the field still has progressed only somewhat since her address. Kennedy’s literature review (1998) was perhaps the first widely circulated review of empirical evidence on the effectiveness of professional development on student achievement. She analyzed a number of studies of math and science PD programs and their effects on student outcomes. In order to identify features of effective professional development programs, she categorized the programs in several ways. She found that the relevance of the content of the PD was particularly important. She classified in-service programs into four groups according to the levels of prescriptiveness and specificity of the content they provide to teachers.1 Based on her analysis of effect sizes, Kennedy (1998) concluded: “Programs whose content focused mainly on teachers’ behaviors demonstrated smaller influences on student learning than did programs whose content focused on teachers’ knowledge of the subject, on the curriculum, or on how students learn the subject” (p. 18). Kennedy’s literature review shed light on the crucial role of content emphasis in highquality, and effective PD. Her seminal work prompted others to test such a research hypothesis in their subsequent studies (cf. Desimone, Porter, Garet, Yoon, & Birman, 2002; Garet et al, 2001; Yoon, Garet, & Birman, 2007). 1

In Kennedy’s review, Group 1 professional development programs were focused on the activities that prescribe a set of teaching behaviors that are expected to apply generically to all school subjects (e.g., Stevens & Slavin, 1995). Group 2 PD activities also prescribed a set of generic teaching behaviors, but were proffered as applying to one particular school subject (e.g., Good, Grouws, & Ebmeier, 1983). Group 3 PD activities provided a general guidance on both curriculum and pedagogy for teaching a particular subject and that justify their recommended practices with references to knowledge about how students learn this subject (e.g., Wood & Sellers, 1996). Lastly, Group 4 PD programs provided knowledge about how students learn particular subject matter but did not provide specific guidance on the practices that should be used to teach that subject (e.g., Carpenter et al., 1989).

2

Building on the literature reviews by Kennedy and by Clewell, Campbell & Perlman (2004), Yoon et al. (2007) recently conducted a more systematic and comprehensive review. Yoon et al examined studies of impacts in three core academic subjects (reading, mathematics, and science). They focused the review on types of studies presumed to provide strong and valid causal inferences: that is, randomized controlled trials (RCT) and quasi-experimental design (QED) studies. Nine studies emerged as meeting the What Works Clearinghouse evidence standards, from 132 manuscripts identified as relevant.2 All nine studies were focused on elementary school teachers and their students. Four studies were RCTs that met evidence standards without reservations. The remaining four studies met evidence standards with reservations (one RCT with a group equivalence problem and three QEDs). Pooling across the three content areas included in the review, the average overall effect size was 0.54. Notably, the effect size was reasonably consistent across studies in which the PD took different forms and had different content. However, the small number of qualifying studies limited the variability in the PD that was represented. For example, all PD interventions in the nine studies were provided directly to teachers (rather than using a train-the-trainer approach) and were delivered in the form of workshop or summer institute by the author(s) or their affiliated researchers. In addition, the studies involved a small number of teachers, ranging from 5 to 44, often clustered in a few schools. In general, these might be viewed as efficacy trials, testing the impact of PD in small, controlled settings, in contrast to effectiveness trials, which test interventions on a larger scale, in more varied settings. Given the modest variation in the features of the studies that qualified to be included in their review, Yoon et al. were unable to draw any strong conclusions about the distinguishing features of effective professional development – and especially about the role of content focus. The studies they reviewed did suggest that the duration or “dosage” of PD may be related to impact. The average number of contact hours was about 48 hours across the nine studies, and the three studies that provided the least intensive PD (ranging from 5 to 14 hours over the span of two to three and a half months) produced no statistically significant effect.3 Several larger scale studies of the impact of PD on student achievement are currently underway, and, when their results become available, they should add appreciably to the available research base. The ongoing studies include a study of the impact of PD in 2

At the initial stage, over 1,300 manuscripts were screened for their relevance including topic, study design, sample, and recency. After the prescreening process, only 132 studies were determined to be relevant for a subsequent thorough coding process. In this coding stage, each study under the review was given one of three possible ratings in accordance with the WWC’s technical guidelines: “Meets Evidence Standards” (e.g., randomized controlled trials that provided the strongest evidence of causal validity), “Meets Evidence Standards with Reservations“ (e.g., quasi-experimental studies or randomized controlled trials that had problems with randomization, attrition, teacher-intervention confound, or disruption), and “Does Not Meet Evidence Screens” (e.g., studies that did not provide strong evidence of causal validity). The WWC’s technical working papers are found at: http://ies.ed.gov/ncee/wwc/twp.asp. 3 The remaining six studies provided a more intensive PD with contact hours ranging from 30 to 100+. With an exception of one study with 83 contact hours, all of them resulted in significant effects on student achievement.

3

second grade reading involving about 270 teachers and a study of the impact of PD in seventh grade mathematics involving about 200 teachers, both conducted by the authors of the current paper and their colleagues. Other ongoing studies include a study of the impact of science PD in Los Angeles, conducted by Gamoran, Borman, and their colleagues at the University of Wisconsin, and a study of science PD conducted in Detroit by Fishman and his colleagues at the University of Michigan. In sum, while there has been some progress since Borko’s 2004 address, much remains to be done. While evidence is accumulating that PD can be effective in different content areas, more work is needed, especially on some thorny design issues. We summarize some of these issues broadly in the next section, and then turn to a more detailed discussion of four issues that we believe are particularly important..

ISSUES IN STUDYING THE IMPACT OF PROFESSIONAL DEVELOPMENT In this section, as a prelude to a discussion of specific design issues, we lay out an overall model that calls attention to some of the main design decisions that must be made in planning a study of the impact of PD. (See Exhibit 1.) . The selection or creation of the professional development intervention to be tested is of course a key design issue and is the first issue we discuss. The key attributes of the intervention appear in the Intervention box at the far left of the model shown in Exhibit 1. Generally, the PD intervention is a multidimensional package of intertwined elements, including the content focus and structure, as well as the delivery model. The intervention being tested may also include other, non PD components – for example, a new textbook, curriculum modules, or other materials. Saxe, et al (2001), for example, tested the impact of PD in conjunction with the adoption of a new textbook, comparing three groups: teachers using a traditional text, teachers using a reform text without PD, and teachers adopting a reform text with PD. In their study, PD is viewed as a support for teachers’ mastery of new materials: Although good curriculum materials can provide rich tasks and activities that support students’ mathematical investigations, such materials may not be sufficient to enable deep changes in instructional practice… professional development strategies are designed to support teachers’ efforts to transform their practices” (p. 56). These additional components appear as part of the Intervention box in Exhibit 1, A second key design decision involves the context in which the study will be carried out, and this appears in the “Study Context” box at the right of the diagram. Among other things, the setting determines the level and type of PD that occurs “naturally” among teachers in the sample. The setting also determines the curricula that will be in use in the sample classrooms, which is important since PD interventions may interact with 4

curricula. (Note that when curricula are part of the treatment itself, they would be represented in the Intervention box rather than the Study Context box.) Exhibit 1: A model of design of PD intervention study

STUDY CONTEXT STUDY SAMPLE Treatment group Control group

INTERVENTION PD features •Focus •Structure •Other Other features •Unit of intervention •Non PD components

Time 1

Time 2

Teacher Knowledge

Teacher Knowledge Teacher retention

Teacher Practice

Teacher Practice

Student Achievement

Student retention

Student Achievement

Teacher and student characteristics

The determination of the study sample is the third issue we consider in some detail. Exhibit 1 depicts the Study Sample within the Study Context as two boxes – one for the treatment group, in the foreground, and one for the control group, in the background. Retention of teachers and students in the sample, as shown in Exhibit 1, is a significant consideration. The selection of measures is the final design issue we consider. The selection of measures of course depends on the assumed causal chain leading from participation in PD to improved student achievement. Exhibit 1 shows one potential causal chain, leading from teacher knowledge through teacher to student achievement. Presumably, participating in PD will influence teachers’ knowledge and instructional practice, which in turn will improve student achievement.4 We argue that it is critical to measure 4

As Kennedy (1998) argues, PD may operate primarily by improving teacher content knowledge, which will improve teachers’ selection and use of practices, but the practices are not a direct focus of he PD. Or,

5

variables at each stage in the causal chain. In addition, the effects of the PD on teacher and student outcomes will unfold over time, and thus a key measurement decision concerns the timing and frequency of measurement. For simplicity, Exhibit 1 shows two time points. It is quite likely to that the number of waves of measurement might be greater than two.

DESIGN ISSUES Researchers designing randomized controlled trials focused on teacher professional development interventions face a common set of methodological issues. In this section, we discuss the tradeoffs inherent in each issue. The resolution of these issues depends in part on the particular intervention selected and the resources available, so the discussion is meant to raise points to consider rather than provide definitive answers. We organize the discussion under four broad subheadings.

Design issue #1: What treatment will be studied? One challenge in studying the impact of PD is that an intervention rests on at least two theories, a theory about features of PD that will promote change in teacher knowledge and/or teacher practice -- a “theory of teacher change;” and a theory about how particular classroom instructional practices will lead to improvements in student achievement -- a “theory of instruction.” (For a related argument, See Supovitz, 2001.) With respect to a theory of the features of PD that promote teacher change, there is a relatively large literature (Garet, Porter, Desimone, Birman, & Yoon, 2001; Guskey, 2003; Hawley & Valli, 1998; Kennedy, 1998; Little, 1993; Loucks-Horsley, Hewson, Love, & Stiles, 1998; National Commission on Teaching and America’s Future, 1996; Showers, Joyce, & Bennett, 1987; Wilson & Berne, 1999), but little evidence on the impact of specific features on teacher knowledge, teacher practice, or student achievement. Nevertheless, over the years, a consensus has been built on promising “best practices.” For example, it is generally accepted that intensive, sustained, job-embedded professional development focused on the content of the subject that teachers teach is more likely to improve teacher knowledge, classroom instruction, and student achievement. Further, active learning, coherence, and collective participation have also been suggested to be promising “best practices” in professional development (Garet et al., 2001). While these features have only limited evidence to support them, they provide a starting point for the design of the form and delivery of PD, and many recent studies of PD interventions appear to draw on features of this sort. In contrast to the emerging consensus on the features of PD worth testing, there is much less consensus on the theories of instruction. The PD interventions tested in recent studies differ widely in the theory of instruction being tested. Some interventions focus on a specific set of instructional practices. For example, Sloan (1993) randomly assigned the PD may operate primarily by improving teachers skills in implementing specific instructional practices.

6

a treatment that sought to elicit teaching behaviors associated with the direct instruction model—specifically, Madeline Hunter’s “seven steps of the teaching act.” Participating elementary teachers were expected to use these practices in teaching all subjects. Other PD interventions focus on building a teacher’s knowledge of a content area or of student thinking (see e.g., Carpenter et al, 1989), with the expectation that this increase in knowledge will lead to improvements in the quality of teaching more generally. These PD interventions are less prescriptive with respect to instruction but still embed theories of instructional improvement. All of this is to say that studies of PD interventions are tests of a package—a package that inevitably draws on both a theory of teacher change and a theory of instruction. The fact that PD packages draw on two theories can make negative results difficult to interpret.. If Sloan (1993) had found no effect, we would not know whether the flaw was the direct instruction model or the teacher change model, which was a 30-hour treatment that included summer sessions and seven follow-up meetings, spread out over a period of 6 months.5 Researchers interested in “unpacking” the packages of PD or understanding what makes PD effective may want to go beyond studying a single PD intervention. Researcher can instead specify two versions of their PD. For example, the Study of Professional Development in Reading randomly assigns schools to a control group and one of two treatment groups. Both treatments include the same core PD program of summer training and school-year seminars, but the second treatment adds a coaching component, provided by part-time coaches based at each school. The study will show whether or not the coaching component makes a difference. Such “three-arm” studies are a promising way to add to the knowledge base, since one can test the importance of specific PD components. Alternatively, researchers can define programs that share the same underlying theory of instruction but use very different PD delivery mechanisms. The most serious constraint in selecting the treatments for such “three-arm” studies is that the PD received by the two treatment groups needs to differ enough to result in measurable differences in student achievement. (See issue #3, below.). Apart from specifying the features of the PD intervention, a related issue concerns the available evidence in support of the intervention. If no prior research has been done on the proposed intervention, it seems most useful to engage in a small-scale trial, testing whether the intervention can be implemented as anticipated and can achieve results in controlled settings. If at least some prior evidence of efficacy is available for a particular PD intervention, testing the intervention in different contexts may be especially valuable (e.g., Carpenter 5

Sloan (1993) reports statistically significant effects on student achievement outcomes measures. In a subsequent analysis of Sloan’s findings, Yoon et al (forthcoming) adjusted for clustering and for tests on multiple outcomes and found that impacts on student achievement were large enough to be substantive important but were not statistically significant.

7

et al., 1989; Good, Grouws, & Ebmeier, 1983; Saxe, Gearhart, & Nasir, 2001). Such “replication trials” are likely to be especially valuable when previous trials were implemented by a single trainer, with small numbers of volunteer teachers, or in a single district context. Or, if there is reasonable evidence on efficacy, one may move to an effectiveness trial, testing the impact of the intervention across a wider set of contexts. Here, the main design challenges are likely to involve ways of scaling-up the PD so that it can be delivered with fidelity in multiple settings with different facilitators. In many of the available rigorous efficacy trials, the PD was delivered by a single individual, often the primary researcher.

Design issue #2: Where will the professional development be studied? Depending on sample size requirements—which are discussed below under issue #3 — one may conduct a study in several schools and several school districts. Selecting the specific schools and districts requires some review of the fit between the professional development and the features of those locations. In this section, we consider some of the issues involved in selecting the context in which the professional development will be studied. Obviously, the locations have to have suitable administrative conditions, where the professional development can be carried out as specified, but there are two other features to which researchers must pay close attention. First, it is necessary to consider the curricular context and find locations with curricula for which the PD is suited.6 Curricula, like PD programs, embed specific theories of instruction. Ideally, it is sensible to seek locations that use curricula that align with the underlying theory of instruction embedded in the PD. At the very least, it is necessary to choose locations where the curricula would not discourage the practices promoted by the PD. Second, it is important to examine the PD that already exists in districts being considered. It is inevitable that during the study, the teachers in study will receive other PD. As noted earlier, teachers participate in a variety of professional development activities each year, either due to mandates, incentives, or personal initiative. Teachers may be part of informal groups at their schools that serve professional development needs. Teachers will presumably continue to participate in all of these PD experiences regardless of their status in the study, except to the extent researchers are able to negotiate special arrangements. The existence of such PD—what we call ambient PD—concerns researchers because an RFT measures impact as the difference in outcomes between the treatment group and the 6

The requirements placed on the context will differ if new materials are to be introduced along with the PD; in that case, one may need to consider the contrast between the curriculum in use and the new materials to be adopted.

8

control group. If the ambient PD contains elements in common with the study PD, the impact will appear to be smaller. For instance, suppose some researchers want to study a 10-day program of content-focused professional development for geometry teachers. If the district in which the study took place provided all geometry teachers with two days of workshops on geometry, the teachers in the treatment group would probably not learn as much during their 10-day PD program as they would have otherwise. Assuming that it is impossible to dissuade the district from providing their 2-day PD, the only way to avoid such a problem is to conduct one’s study in districts where the ambient PD has no elements in common with the PD to be studied. Another potential concern with ambient PD is whether teachers can attend both the ambient PD and the study PD. Ideally, the teachers in the treatment group will receive all of the study PD and will continue to receive the ambient PD, such that the study will show the value added of the PD treatment that is being randomly assigned. But if the treatment is time-intensive or if a district has a lot of ambient PD, scheduling conflicts could occur, or teachers could begin to feel “overloaded” and selectively not attend some PD events. Thus, treatment group teachers might not get some PD that they were supposed to get. The treatment-control difference would be distorted, since the treatment group would be receiving less ambient PD than the control group. Alternatively, treatment group teachers might decide to attend the ambient PD in lieu of attending the study PD. Thus, it is important to select locations where the chance of interference with the PD they wish to study is low; alternatively, it may be wise to construct treatments that can be delivered without interfering significantly with other PD. It is also necessary to find locations where the ambient PD does not share elements in common with the study PD. Regardless, the extent and variability of ambient PD clearly necessitates the measurement of teacher participation in the ambient PD – to verify that treatment and control teachers participated equally and that the measured impact is roughly the impact of the treatment PD. These measures can characterize the service contrast between treatment and control teachers very clearly (e.g., dosage in terms of contact hours or duration).

Design issue # 3: What sample will be needed? Another design issue concerns the determination of the sample that will be needed for the intended study. Randomized controlled trials of professional development interventions differ from many other educational experiments in the sense that the professional development interventions target teachers while the eventual outcome of interest— student achievement—are measured and collected at student level. Trials like these pose some new challenges and difficulties to sample design (Donner, 1998). This section discusses some of these challenges that are specific to the evaluation of professional development interventions. In particular, it discusses issues related to unit of random assignment, unit of analysis, precision of the estimators, and teacher/student mobility during the program.

9

Unit of Random Assignment As mentioned above, professional development interventions are designed to directly affect the behavior of groups of interrelated people (teachers) and indirectly affect another set of groups of interrelated people (classes of students) rather than individuals. Therefore it is generally not feasible to measure the effectiveness of PD in an experiment that randomly assigns individual students to the program or to a control group. However, one can reap most of the methodological benefits of random assignment by randomizing at the level of teachers or schools.7 Under teacher-level assignment, teachers within each school are randomly assigned to treatment condition; under school-level random assignment, whole schools (and the relevant teachers within them) are randomly assigned. The choice of unit of assignment has important implications for the set of teachers and their students for whom impact is examined. If the teacher is selected as the unit, then each teacher included in the study would be assigned to treatment or control at the start of the study, and the teacher (and the teacher’s students) would be followed up during each wave of data collection. If, on the other hand, the school is selected as the unit, then each school included in the study would be assigned to treatment or control at the start of the study, and all relevant teachers currently teaching in each school (and their students) would be the target of data collection at eave wave. In the absence of teacher turnover, both the teacher and school-level designs focus on teachers identified for inclusion in the study at the time the study begins. But, in the presence of teacher turnover, the two choices for the unit of assignment lead to different teacher samples over time. The teacher-level design involves following teachers as long as they remain in teaching, and collecting data on their knowledge and their student’s achievement at the time points specified in the design. Teachers who exit teaching or who change grade levels or subjects must be dropped from the study, because data on classroom instruction and student achievement could not be collected from such teachers, even if the teachers could be located. Thus, in the teacher-level design, if the PD treatment affects teacher turnover rates, mobility could lead to selection bias, and this bias would need to be taken into account in the analysis. In the school-level design, if a teacher leaves a school in the sample over the course of the study and another teacher enters, then the new replacement teacher would be included in the study sample from that point forward. Thus, in a school-level design, mobility can 7

The use of group randomization to study the impacts of social policies has a long history. Harold F. Gosnell (1927) studied ways to increase voter turnout by randomly assigning one part of each of twelve local districts in Chicago as targets for a series of hortatory mailings and the other part as control group. In recent years, the use of group randomization has spread to many fields (for a review and discussion of the key issues, see Boruch and Foley 2000). Over the past decades, it has been used to evaluate “whole-school” reforms (Cook, Hunt, and Murphy 2000), school-based teacher training programs (Blank et al. 2002), community health promotion campaigns (Murray, Hannan, et al 1994), community employment initiatives (Bloom and Riccio 2002), and group medical practice interventions (Leviton et al. 1999). The two text books on cluster randomization published to date, both focus on evaluating health programs, are by Allan Donner and Neil Klar (2000) and David M. Murray (1998).

10

cause the impact of the PD to be diluted. For example, if a treatment teacher left in the middle of a program year, after the PD treatment was complete, and his/her class was handed over to a new teacher who had not been exposed to the professional development intervention, then the amount of exposure to treatment that the students in this class had would be cut in half. The treatment impact estimated based on the achievement data from this class of students would reflect the program impact from a half of the intended treatment, and is likely to be different from what the students would have experienced had the treatment teacher stayed for the whole program year. This suggests that in a school-level design, it may be desirable to incorporate components in the intervention to provide support for new teachers who enter treatment schools over the course of the study (i.e., supplemental PD).8 Other issues concerning teacher and school as unit Apart from the implications for the analysis of teacher mobility, there are other pros and cons associated with choosing the school or the teacher as the unit of assignment. Using school as random assignment unit helps to reduce control-group contamination. One of the greatest threats to the methodological integrity of a random-assignment research design is the possibility that some control-group members will be exposed to the program, thus reducing the service contrast – the difference between the group receiving the intervention (or the treatment group) and the “business as usual” group (or the control group). Such contamination of the control group is especially likely in the case of professional development interventions. Since many of these interventions incorporate collaboration among teachers at a given grade level, if some teachers in a school are randomly assigned to an intervention, they are likely to share some of the information provided through the intervention with peers who have been randomly assigned to the control group. This second-hand exposure will attenuate the service contrast and make it difficult to interpret impact estimates. By randomly assigning schools to treatment condition, one separates the treatment group from the control group spatially, hence blocks some potential paths of information flow and reduces control-group contamination. Using the school as the unit of assignment may also help to deliver the services more effectively by capitalizing on economies of spatial concentration. Spatial concentration of the target-group members may reduce the time and money costs of transportation to and from the program, and may enable staff to operate the program more effectively by exposing them directly to problems. For example, when a professional development intervention involves intensive coaching, it certainly reduces costs if a coach needs to travel to one school to work with 4 teachers in that school rather than travel to two different schools to work with 2 teachers in each of them. School level randomization enables the coach to spend more time in the school and get exposure to the common problems in that school and thus adjust the delivery of service to fit the school needs more effectively. 8

In a school level design, turnover would not ordinarily lead to selection bias even if there is differential turnover between treatment and control schools, because all teachers teaching the schools would be included in the analysis, but the estimated impact would be due to a combination of the impact of the treatment on achievement for teachers who stay and the impact on turnover.

11

Another very different reason for randomly assigning schools is to facilitate political acceptance of randomization by off-setting ethical concerns about “equal treatment”. Even though random assignment of teachers treats individual teachers equally in the sense that each one of them has an equal chance of being offered the program, this fact is often overlooked because after randomization treatment group teachers have access to the intervention while control group teachers do not. This perception is especially acute if within a school, some teachers receive intervention and some do not. Therefore, schoollevel randomization is generally easier to “sell” than teacher-level randomization. On the other hand, using teachers as the unit of assignment requires fewer participating schools and teachers, for a given level of precision. It is well documented that estimates based on cluster randomization have less statistical precision than those based on individual randomization because possible correlations of impacts across individuals within a cluster have to be accounted for in cluster randomization (Howard 2005, Murray 1994). By analogy, for a given set of data structure, estimates based on school-level (cluster) randomization have less statistical precision than those based on teacher-level (individual) randomization because possible correlations of impacts across teachers within a school have to be accounted for if schools are being randomized.9 In other words, it requires less total number of teachers to detect an impact of a given size with a given level of precision if the randomization is done at teacher level instead of school level. Note, however, that the difference in precision between these two options depends on specific outcomes and analytical methods used in the evaluation and can vary from program to program. Nonetheless, a reduction in required sample size saves money and efforts in the evaluation process. Overall, whether to use school or teacher as randomization unit depends on the specific features of the intervention being tested. If control-group contamination is not a major concern due to the nature of the intervention or there are other effective ways to prevent contamination from happening, at the same time monetary constraint is important, then using teacher as randomization unit might serve the purpose of the program better. On the other hand if the nature of the intervention is prone to contamination and the budget for the program is enough to support an evaluation with adequate power, then using school as randomization unit would be preferable. 9

The standard error of the impact estimator for cluster-randomization program (when no covariates other than the treatment status is included) is

SECL =

1 τ2 σ2 , while the standard error of the impact estimator for individual+ P (1 − P ) J nJ

randomization program is

SE IN =

1 τ2 σ2 . Here P is the percent of clusters (or + P (1 − P) nJ nJ

individuals) that are randomly assigned to the treatment group; J is the number of clusters; n is the number of individuals within a cluster; τ 2 is the cross-cluster variance, and σ 2 is the within-cluster variance (Bloom 2005). The proportion of the total population variance across clusters as opposed to within clusters (

τ2 ) is usually called an intra-class correlation (Fisher 1925). τ 2 +σ 2 12

Unit of Analysis As noted by McKinlay et al. (1989), Fisher’s classical theory of experimental design, without exception, assumes that the randomization unit of an experiment is the unit of analysis. However, this is not true for cluster randomization trials because, as discussed before, the inferences of these trials are often intended to apply at the level of the individual subject. Most professional development intervention programs which use school or teacher as unit of randomization and intend to estimate the program impacts on student outcomes fall into this category. For these studies, the unit of analysis is often different than the unit of randomization. For example, a school-level randomized professional development trial can use individual teacher as unit of analysis if it is interested in estimating the program impacts on teacher behavior, or it can use individual student as unit of analysis if the eventual objective of the program is to improve student achievement. This discrepancy between the unit of randomization and the analytic unit implies that standard statistical methods for sample size calculation and for impact analysis are not applicable. It is well known, for example, that the application of standard sample size approaches to cluster randomization designs may lead to seriously underpowered studies because those approaches ignore the correlation , and that the application of standard statistical methods could lead to spurious statistical significance for the same reason (Donner 1998). There now exist methods and tools that deals with these problems (for example, the MIXED procedure in SAS can correct for the standard error according to the nested structure of data). It is therefore very important for the design and the analysis of a program to clearly identify the unit of analysis early on. Precision of Impact Estimators Another important design element is the precision of the estimated impacts of a study. The precision of impact is often expressed in terms of the smallest program effect that could be detected with confidence, or minimum detectable effect (MDE). A minimum detectable effect is the smallest true program effect that has a certain probability of producing an impact estimate that is statistically significant at a given level (Bloom 1995). This parameter, which is a multiple of the impact estimator’s standard error (see the first formula in footnote 1), depends on the following factors: • • • • •

The type of test to be performed: a one-tailed t-test is used for program impact in the predicted direction; a two-tailed t-test can be used for any program effects; The level of statistical significance to which the result of this test will be compared (α); The desired statistical power (1- β)—the probability of detecting a true effect of a given size or larger; The degree of freedom of the test, which depends on the number of clusters (J), and the size of cluster; The intra-class correlation—the proportion of the total population variance across clusters as opposed to within clusters;

13



The explanatory power of potential covariates, such as student’s prior achievement, teacher characteristics, etc (Bloom 2005).

The first three factors have their conventional values and are relatively easy to pin down.10 The minimum detectable effect size declines in roughly inverse proportion to the square root of the number of clusters randomized, while the size of the clusters randomized often makes far less difference to the precision of program impact estimators than does the number of clusters (for more details, see Schochet 2005). The last two factors, on the other hand, are hard to determine. These factors vary from outcome to outcome and from sample to sample. The conventional practice is to use similar measures from similar past studies as proxies for these factors in the precision calculation at the design phase of a study. However, it is often difficult to find good proxies and judgment must be made when deciding values for these factors. This is especially true for evaluations of professional development interventions that use teacherlevel randomization because measures of teacher-level intra-class correlation and the explanatory power of teacher-level covariates are not common in past studies. To assess the minimum detectable effect size for a research design, one needs a basis for deciding how much precision is needed. From a programmatic perspective, it might be whether the study can detect an effect that, judging from the performance of similar programs, is likely to be attainable. This “attainable effect” is especially difficult to decide for the professional development interventions. Unlike other types of educational interventions, the professional development programs target teachers directly and only affect student outcomes through teachers indirectly. The existing literature on professional development shows a wide range of effect sizes depending on the nature of the interventions and the study designs, thus gives very little guidance in the literature about how much precision one should expect from a professional development intervention.

Design Issue #4: What should be measured? A fourth methodological issue to be considered in conducting research on the impact of professional development is deciding what to measure and how and when to measure it. According to Supovitz (2001), three common weaknesses of PD effectiveness studies are poor alignment between what is taught and the form by which it is tested, poor alignment between the content of what is taught and what is tested, and lack of sufficient time lag between PD invention and the measurement of PD impact. There are several related measurement issues to be considered during the design stage that, if dealt with, may help overcome some of the weaknesses identified. These issues include: • •

Measuring key mediators; Determining the alignment of the outcome measures with the intervention; and

10

Other than efficacy studies, researchers usually employ a two-tailed test with a statistical power of 80% and significance level of 0.05.

14



Determining the timing of measurement of outcomes..

Measurement of mediating variables An important design decision concerns how much of the study’s resources to devote to the measurement of anticipated mediating factors, such as implementation levels achieved and proximal outcomes (e.g., teacher knowledge and practice), in addition to distal outcomes (student achievement). While it is tempting in randomized field trials to focus only on the ultimate outcome of student achievement, excluding measures of proximal outcomes and other potential moderators and mediators can ultimately be costly in conceptual terms. Measurement of mediating variables is especially critical in making use of study results to draw conclusions about the “theory of teacher change” and “theory of instruction” on which the PD intervention is based. For example, if a study of professional development does not find an overall impact on student achievement, and no teacher outcomes are measured, it is impossible to know where and to what extent the causal model broke down (see Exhibit 1, above). It may be that the professional development was effective in increasing teachers’ knowledge or practices (so the theory of teacher change receives support)), but these teacher changes did not result in higher student achievement (so the theory of instruction is not supported). Without a measure of the proximal outcomes of the professional development, the model cannot be fully explored or understood. Similarly, if dosage, or exposure to the professional development is not measured, it will be unclear whether the PD was successfully delivered as intended, but it did not obtain the desired results, or whether it was not successfully implemented, and thus the theory of teacher change broke down at the first link in the chain. For this reason, it is important to anticipate in advance what the most important features of the PD model are and design measures to quantify them. The most critical factor will probably be dosage, but other aspects of the PD may important to measure as well – in particular, the time allocated to the specific topics covered. The difficulty of measuring dosage is likely to vary with the form of professional development—hours of attendance at a training is very simple to measure, whereas participation in coaching activities in the school is more challenging. Alignment of Outcome Measures PD interventions will vary in the specificity of the intended outcomes for teacher knowledge, teacher practice, and student achievement. In particular, as discussed earlier, some approaches to PD may be designed to improve teachers’ skills in implementing a highly specified set of instructional practices; other PD may be designed to strengthen teachers’ content knowledge and pedagogical content knowledge, with desired changes in practice less clearly articulated. Similarly, some PD interventions may be designed to produce changes in relatively narrowly defined aspects of student achievement (e.g., student understanding of a particular concept in mathematics), while other PD interventions may be designed to produce broad-gauge changes in achievement.

15

Clearly, the outcome measures must be chosen to reflect the intended outcomes of the PD; but the desired degree of alignment can be difficult to establish, and it may be wise to choose multiple instruments that are more or less closely aligned with the specific focus of the PD, to provide some information on the generalizabilty and robustness of the findings. Two extremes can exemplify the need for balance between alignment and generalizability in outcome measure selection. Assume two groups of researchers have conducted studies to determine if a professional development on early reading content and instruction leads to higher achievement. In both scenarios, the intervention was focused primarily on providing teachers with an understanding of phonemic awareness, phonics, and fluency instruction in early reading, and the researchers chose a teacher outcome measure that was created by the professional development team to assess teachers’ understanding of the content just learned. The first group of researchers chose the language and examples in the assessment directly from the PD training materials, similar to an “end-of-chapter” test found in many textbooks. In addition, the researchers created a similar student achievement measure that exclusively contained the types of decoding items that the teachers had learned explicitly to address in their reading instruction. Both measures included a range of easy to difficult items, to ensure that ability at all levels was captured. Upon conducting the impact analyses, the researchers obtained an effect size of 1.20 for student achievement, and concluded that professional development in early reading affects reading achievement. The second group of researchers studying the same intervention decided to rely on existing standardized assessments of teacher and student ability. These measures had been widely tested and validated, and were commonly used in the education domain as accountability tests so the results were policy relevant. In addition, the data collection was low-burden for study participants because the tests were district-administered and data were available from secondary sources. The tests measured a variety of early reading and language arts outcomes, but did not provide subscores on decoding, which was a focus of the PD. Upon conducting an impact analysis, this group obtained null results. . The differences in results across the two groups of researchers may have little to do with the PD being studied and more to do with the alignment of measures with the intervention. The first group of researchers chose a strongly aligned test; the second groups of researchers chose a weakly aligned test. If the intended outcomes of the PD are very narrowly and specifically designed, it may be sufficient to focus on a highly aligned set of measures; in most cases, though, it appears that a mix of measures varying in specificity will be required, to permit an examination of the generalizability of the results across , Timing of Outcome Measurement Issues of timing are central to the study of the impact of professional development. As shown in Exhibit 1, moving from the provision of PD to obtaining an impact on

16

achievement involves traversing a number of causal links, and each of these may take time to unfold. How long do teachers need to think about what they have learned in order to put it effectively into practice? Once practices are put into place, are they sustained over time? And, how long does their improved instruction take to affect detectable increases in their students’ learning? The answer to these questions will likely vary by the intensity, specificity and form of the professional development received, and by the alignment of the outcome measure to the professional development. A focused week-long institute on phonics with a lot of modeling and training on specific instructional practices may begin to affect teachers’ practices as soon as the teachers get back in the classroom. If the student test used to measure achievement is very sensitive to these practices, achievement gains might occur relatively quickly. In most studies, however, these conditions are not likely to be met, suggesting that a realistic study will require multiple waves of data collection, with the timing determined by features of both the PD and the measures. .

CONCLUSION For researchers interested in understanding the impact of professional development, randomized controlled trails overcome the perennial problem of selection bias, caused by the fact that districts typically make PD available to some teachers on a voluntary basis, at the same time they mandate participation in other PD. Despite the promise of randomized controlled trials of professional development interventions, the task of designing such studies poses several significant design challenges. In this paper, we have discussed dilemmas that must be faced in determining the professional development treatment to be studied, the context in which to conduct the study, the study sample to be used, and the measurement of implementation and outcomes. Given the current public investment in professional development and the demonstrated potential for professional development to improve student achievement, it is not surprising that increasing attention is being given to the design of rigorous studies of the impact of PD. To maximize what is learned from these studies, it is critical to give more attention to the vexing methodological issues involved..

17

References Ball, D. L. (1996). Teacher learning and the mathematics reforms: What we think we know and what we need to learn. Phi Delta Kappan, 77(7), 500-08. Ball, D. L., & Cohen, D. K. (1999). Developing practices, developing practitioners: Toward a practice-based theory of professional development. In G. Sykes & L. DarlingHammonds (Eds.), Teaching as the learning profession: Handbook of policy and practice (pp. 30–32). San Francisco, CA: Jossey-Bass. Birman, B., Le Floch, K.C., Klekotka, A., Ludwig, M., Taylor, J., Walters, K., Wayne,A., Yoon, K. (2007). State and Local Implementation of the No Child Left Behind Act, Volume II—Teacher Quality Under NCLB: Interim Report. Washington, D.C.: U.S. Department of Education, Office of Planning, Evaluation and Policy Development, Policy and Program Studies Service. Blank, Rolf K., Diana Nunnaley, Andrew Porter, John Smithson, and Eric Osthoff. (2002) Experimental Design to Measure Effects of Assisting Teachers in Using Data on Enacted Curriculum to Improve Effectiveness in Instruction in Mathematics and Science Education. Washington, DC.: National Science Foundation. Bloom, Howard S. (1995) “Minimum Detectable Effects: A simple way to Report the Statistical Power of Experimental Designs.” Evaluation review 19(5): 547-56. Bloom, Howard S. (2005) “Randomizing Groups to Evaluate Place-Based Programs” in Learning More from Social Experiments: Evolving Analytic Approaches, Edited by Howard S. Bloom. New York: Russell Sage Foundation. Bloom, Howard S., and James A. Riccio. (2002) Using Place-Based Random Assignment and Comparative Interrupted Time-Series Analysis to Evaluate the Jobs-Plus Employment Program for Public Housing Residents. New York: MDRC. Borko, H. (2004). Professional development and teacher learning: Mapping the terrain. Educational Researcher, 33(8), 3–15. Boruch, R. F. (1997). Randomized Experiments for Planning and Evaluation: A practical guide. Applied Social Research Methods Series, 44. Thousand Oaks, CA: Sage Publications. Boruch, Robert F., and Ellen Foley (2000) “The Honestly Experimental Society: Sites and Other Entities as the Units of Allocation and Analysis in Randomized Trials.” In Validity and Social Experimentation: Donald Campbell’s Legacy, Edited by Leonard Bickman. Volume 1. Thousand Oaks, Calif.: Sage Publications.

18

Campbell, D. T. & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Dallas, TX: Houghton Mifflin. Carpenter, T. P., Fennema, E., Peterson, P.L., Chiang, C. P., & Loef, M. (1989). Using knowledge of children’s mathematics thinking in classroom teaching: An experimental study. American Educational Research Journal, 26(4), 499–531. Choy S.P., Chen X., Bugarin R. (2006). Teacher Professional Development in 1999– 2000: What Teachers, Principals, and District Staff Report. Washington, DC: U.S. Department of Education, National Center for Education Statistics. Clewell, B. C., Campbell, P. B., & Perlman, L. (2004). Review of evaluation studies of mathematics and science curricula and professional development models. Submitted to the GE Foundation. Washington, DC: Urban Institute. Coalition for Evidence-Based Policy. (2007). Developing an effective evaluation strategy: Suggestions for Federal and state education officials. Washington, DC: Authors. Cohen, D. K., & Hill, H. C. (1998). Instructional policy and classroom performance: The mathematics reform in California (RR-39). Philadelphia: Consortium for Policy Research in Education. Cohen, D. K., & Hill, H. C. (2000). Instructional policy and classroom performance: The mathematics reform in California. Teachers College Record, 102(2), 294–343. Cohen, D. K., & Hill, H. C. (2001). Learning policy: When state education reform works. New Haven, CT: Yale University Press. Confrey, J. (2006). Comparing and contrasting the National Research Council report on Evaluating Curricular Effectiveness with the What Works Clearing approach. Educational Evaluation and Policy Analysis, 28(3), 195-213. Confrey, J. and Stohl, V. (Eds). (2004). On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations. Committee for a Review of the Evaluation Data on the Effectiveness of NSF-Supported and Commercially Generated Mathematics Curriculum Materials, National Research Council. Washington, DC: National Academies Press. Cook, Thomas H., David Hunt, and Robert F. Murphy. (2000) “Comer’s School Development Program in Chicago: A Theory-Based Evaluation.” American Educational Research Journal (Summer). Cook, T. D. & Payne, M. R. (2001). Objecting to objections to using random assignment in educational research. In R. F. Boruch & F. Mosteller (Eds.), Evidence Matters: Randomized Trials in Education Research. Washington, DC: Brookings Institution Press.

19

Desimone, L., Porter, A. C., Garet, M., Yoon, K. S., & Birman, B. (2002). Does professional development change teachers’ instruction? Results from a three-year study. Educational Evaluation and Policy Analysis, 24(2), 81–112. Donner, Allan (1998) “Some Aspects of the Design and Analysis of Cluster Randomization Trials.” Applied Statistics 47(1): 95-113. Donner, Allan, and Neil Klar (2000) Design and Analysis of Group Randomization Trials in Health Research. London: Arnold. Elmore, R. (2002). Bridging the gap between standards and achievement: The imperative for professional development in education. [Online.] Available: http://www.ashankerinst.org/Downloads/Bridging_Gap.pdf. Garet, M., Porter, A., Desimone, L., Birman, B., & Yoon, K. S. (2001). What makes professional development effective? Results from a national sample of teachers. American Education Research Journal, 38(4), 915-945. Good, T. L., Grouws, D. A., & Ebmeier, H. (1983). Active mathematics teaching. New York: Longman, Inc. Gosnell, Harold F. (1927) Getting Out the Vote: An Experiment in the Stimulation of Voting. Chicago: University of Chicago Press. Grant, S. G., Peterson, P. L., & Shojgreen-Downer, A. (1996). Learning to teach mathematics in the context of systemic reform. American Educational Research Journal, 33(2), 502-541. Guskey, T. R. (2003). What makes professional development effective? Phi Delta Kappan, 84(10), 748-750. Fisher, Ronald A. (1925) Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd. Hargreaves, A. and Fullan, M. G. (1992). Understanding teacher development. London: Cassell. Hawley, W.D. & Valli, Linda (1998) The essentials of effective professional development: A new consensus. In Darling Hammond, L .S. & Sykes, G., (Eds.) The Heart of the Matter: Teaching as a Learning Profession. 86-124. San Francisco: Jossey Bass. Hiebert, J. & Grouws, D. (2007). The effects of classroom mathematics teaching on students’ learning. In F. K. Lester (Ed.), Second handbook of research on mathematics teaching and learning (pp. 371-404). Charlotte, NC: Information Age Publishing.

20

Kennedy, M. (1998). Form and substance of inservice teacher education (Research Monograph No. 13). Madison, WI: National Institute for Science Education, University of Wisconsin–Madison. Knapp, M. S. (1997). Between systemic reforms and the mathematics and science classroom: The dynamics of innovation, implementation, and professional learning. Review of Educational Research, 67(2), 227-266. Leviton, Laura C., Robert L. Goldenberger, C. Suzanne Baker, and M.C. Freda. (1999) “Randomized Controlled Trial of Methods to Encourage the Use of Antenatal Corticosteroid Therapy for Fetal Maturation.” Journal of the American Medical Association 281(1): 46-52. Lieberman, A. (Ed.). (1996). Practices that support teacher development: Transforming conceptions of professional learning. In M. W. McLaughlin & I. Oberman (Eds.), Teacher learning: New policies, new practices (pp. 185-201). New York: Teachers College Press. Lieberman, A., & McLaughlin, M. W. (1992). Networks for educational change: Powerful and problematic. Phi Delta Kappan, 73, 673-677. Little, J. W. Teachers’ professional development in a climate of educational reform. (1993). Educational Evaluation & Policy Analysis, 15(2), 129–151 Loucks-Horsley, S., Hewson, P. W., Love, N., & Stiles, K. E. (1998). Designing professional development for teachers of science and mathematics. Thousand Oaks, CA: Corwin Press, Inc. Loucks-Horsley, S., Stiles, K., & Hewson, P. (1996). Principles of effective professional development for mathematics and science education: A synthesis of standards (NISE Brief, Vol. 1). Madison, WI: National Institutes for Science Education. McCutchen, D., Abbott, R. D., Green, L. B., Beretvas, S. N., Cox, S., Potter, N. S., Quiroga, T., & Gray, A. L. (2002). Beginning literacy: Links among teacher knowledge, teacher practice, and student learning. Journal of Learning Disabilities, 35(1), 69–86. McGill-Franzen, A., Allington, R. L., Yokoi, L., & Brooks, G. (1999). Putting books in the classroom seems necessary but not sufficient. Journal of Educational Research, 93(2), 67–74. McKinlay S. M., E. J. Stone, and D. M. Zucker (1989) Research Design and Analysis Issues. Health Education Quarterly, 16(2), 307-313. Murray, David M. (1998) Design and Analysis of Group-Randomized Trials. New York: Oxford University Press.

21

Murray, David M., Peter J. Hannan, David R. Jacobs, Paul J. McGovern, Linda Schmid, William L. Baker, and Clifton Gray, (1994) ”Assessing Intervention Effects in the Minnesota heart Health Program.” American Journal of Epidemiology 139(1): 91-103. National Commission on Teaching and America’s Future (1996). What matters most: Teaching for America’s future. New York, NY: Author. National Research Council. (2004). On evaluating curricular effectiveness: Judging the quality of K-12 mathematics evaluations. Washington, DC: National Academies Press. O’Connor, R. E. (1999). Teachers learning Ladders to Literacy. Learning Disabilities Research and Practice, 14, 203-214. Richardson, V., & Placier, P. (2001). Teacher change. In V. Richardson (Ed.), Handbook of research on teaching (4th ed., pp. 905-947). New York: Macmillan. Saxe, G. B., Gearhart, M., & Nasir, N. S. (2001). Enhancing students’ understanding of mathematics: A study of three contrasting approaches to professional support. Journal of Mathematics Teacher Education, 4, 55–79. Schochet, Peter Z. (2005) Statistical Power for Random Assignment Evaluations of Education Programs. Mimeo. Mathematica Policy Research, Inc. Princeton, NJ. Showers, B., Joyce, B., & Bennett, B. (1987). Synthesis of research on staff development: A framework for future study and a state-of the-art analysis. Educational Leadership, 45(3), 77–87. Sloan, H. A. (1993). Direct instruction in fourth and fifth grade classrooms. Unpublished doctoral dissertation. Dissertation Abstracts International, 54(08), 2837A. (UMI No. 9334424) Sprinthall, N. A., Reiman, A. J., & Thies-Sprinthall, L. (1996). Teacher professional development. In J. Sikula (Ed.), Handbook of research on teacher education (2nd Ed., pp. 666–703). New York, NY: Macmillan. Stevens, R. J. & Slavin, R. E. (1995). The cooperative elementary school: Effects on student achievement, attitudes, and social relations. American Educational Research Journal, 32, 321-351. Loucks-Horsley, S., Stiles, K., & Hewson, P. (1996). Principles of effective professional development for mathematics and science education: A synthesis of standards (NISE Brief, Vol. 1). Madison, WI: National Institutes for Science Education. Stullich, S., L. Eisner, J. McCrary, & C. Roney. National Assessment of Title I Interim Report: Volume I: Implementation of Title I. U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance. Washington, DC, 2006. 22

Supovitz, J. A. (2001). Translating teaching practice into improved student achievement. In S. Fuhrman (Ed.), National Society for the Study of Education Yearbook. Chicago, IL: University of Chicago Press. Talbert, J. E., & McLaughlin, M. W. (1993). Understanding teaching in context. In D. K. Cohen, M. W. McLaughlin, & J. E. Talbert (Eds.), Teaching for understanding: Challenges for policy and practice (pp. 167-206). San Francisco: Jossey-Bass, Inc. Wood, T. & Sellers, P. (1996). Assessment of a problem-centered mathematics program: Third grade. Journal for Research in Mathematics Education, 27, 337-353. Yoon, K. S., Garet, M., Birman, B., & Jacobson, R. (2006). Examining the effects of mathematics and science professional development on teachers’ instructional practice: Using professional development activity log. Washington, DC: Council of Chief State School

Officers. Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. (2007). Reviewing the evidence on how teacher professional development affects student achievement (Issues & Answers Report, REL 2007–No. 033). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest.

23

Figure 1:

STUDY CONTEXT STUDY SAMPLE Treatment group Control group

INTERVENTION PD features •Focus •Structure •Other Other features •Unit of intervention •Non PD components

Time 1

Time 2

Teacher Knowledge

Teacher Knowledge Teacher retention

Teacher Practice

Teacher Practice

Student Achievement

Student retention

Student Achievement

Teacher and student characteristics

1

Related Documents

Appam 2007 Draft
May 2020 1
Draft Verano 2007
July 2019 15
Bethany 2007 Draft 1
August 2019 11
Tech Plan Draft May 2007
August 2019 15