|
DR. BODILLY: First of all I want to say when I read the paper, and I know Steve's reputation in the field, I became very intimidated and feel that I can't properly critique this, number one because I would be afraid to go up against Steve in a debate but number two because I actually after having read the paper agreed with quite a bit of what it said.
So, what I am going to talk about is really comments on particular issues that he raised in his paper as opposed to confronting him on any faulty arguments.
I don't think there were very many faulty arguments in the paper.
I would like to talk about four different issues that were raised in the paper: the non-helpful distinction between quantitative and qualitative methods, the importance of method or analysis other than experimental design for understanding the effectiveness of program treatments, the need for a research agenda to avoid wasting valuable research resources, and whether proven effectiveness should be the sole focal point of the agenda.
So, first let me turn to non-helpful distinction between qualitative and quantitative methods. When I think of research methodology for any specific project I think of at least four components that would be a part of that methodology.
The first is the design approach. It basically says a statement of a hypothesis or a research question, the naming of the dependent and independent variables and the means for isolating the treatment effect from other effects so as to properly test the hypothesis being posed, but methodology doesn't stop there. It also includes the selection of data sources and data-gathering techniques, the defining and measuring of the variables of interest and the data analysis techniques.
On Page 4 of Steve's paper he wonders if surveys are a quantitative or a qualitative methodology. The term "survey data" is indicative only of the source of information, data gathered through a survey instrument. Survey data could record a qualitative measure, something like whether a teacher believes professional development was useful or it could record a quantitative measure, the number of days of professional development the teacher received that year.
Steve says that surveys and qualitative interview studies do not intervene and are thus not experimental but surveys, and interviews refer to the data gathering technique not to the design, and experimental refers to the design and not the data gathering technique.
So, survey and interview data can be used simultaneously with an experimental design and you are going to see later on today several instances of people having done that using experimental design and a pre/post survey to understand effects. Likewise a case study design can use highly quantified measure but lack controls for effects and studies using statistical analytic techniques to control for confounds often use quantitative measures, for example, demographic characteristics such as race.
The bottom line is that the qualitative versus quantitative distinction hasn't helped very much because of the different combinations possible on those four different arrays of methodologies or even within those four different components. More importantly Steve points out that that quantitative versus qualitative methodology is not the issue in any of that. In measuring effects and causality research design is important and the issue of whether of design isolates and properly measures the effects of the treatment from other effects on the dependent variable is a key concern.
That is why there is a focus on random experimental design at this point in time when attempting to understand causality. When attempting to measure the direction and magnitude of the impact one would naturally to turn to numerical measures and so since Steve was confused and I am confused and I still admire Steve and his vast knowledge if he can't understand the difference between qualitative and quantitative then I suggest we drop it and agree with him on that point.
Secondly, the importance of implementation and cost analysis to furthering the knowledge we have about education reforms, Steve argues that, “identifying, testing, and warranting the effectiveness of strategies for instruction is currently the central task of applied research in education.”
He then further argues that this emphasis would lead to research based on experimental design but also on other designs with multiple data collection, analysis techniques and other measures. Accepting Steve's argument for the moment, I would then emphasize several concerns, the first being that implementation analysis, which I defined as understanding what is being implemented, how, and why, is an essential part of the research endeavor for two important reasons.
The first reason is straightforward. Unless we guarantee control or ensure a specified treatment we cannot measure the differential effect of having or not having the treatment on students. Without control of the treatment we don't have an experiment. Many of the treatments I have observed failed to be controlled or even controllable to the extent that an experimental design would not be useful.
Sometimes the treatments were ill-defined to begin with, not easily distinguishable from normal practice or random practice. Alternatively they were specified but haphazardly applied, changed at the site by schools or teachers such that the treatment practice varied in uncontrolled fashion across sites. In these cases the treatment cannot be tested for effects whether using an experimental design or not. The understanding of implementation is important for a second reason. We are interested, I assume, in whether a program is effective because if it is we would like to spread the practice. If it is not we would hope the practice doesn't spread.
Tracking the why and how of implementation can help identify issues that would enable the treatment to be more spreadable-the “Jiffy” factor. For example, implementation analysis could identify policy support that infrastructure needed to ensure sound practice.
Steve has a wonderful example in his paper of how a treatment can be tested in different environments to find out how hearty it is, under what conditions it maintains its effectiveness. This is a vital part of the research agenda.
Finally, accepting the premise that causality of instructional strategies is the key agenda focus you need to be sure to track cost of implementation and understand the long-term benefits of treatment.
No matter how effective the treatment is, if we do not know the cost, we cannot advise policy makers on their main decision-making function to effectively use scarce resources to produce public goods. This statement not only draws attention to the need for analysis of costs and benefits but to the long-term danger of such effects. The current emphasis on short-term test score changes draws attention away from the longer term achievement goals of education continuation, graduation and attainment.
In terms of benefits these might actually be more important than immediate test scores. Steve refers to this obliquely on Pages 28 and 29 when he discusses the need to measure outcomes well and the necessary research involved in the development of solid measures.
I am simply emphasizing that even if we had solid measures for all outcomes we will still need to know which measures best capture the pubic good we wish to promote.
Next, defining the research agenda. Perhaps the greatest contribution of Steve's paper is it draws attention to the need to define and prioritize the research agenda. As Steve notes, we cannot have experimental designs for every half-baked development. There is in fact a development process that would lead us to treatments deserving this level of research design and that development process and the research that supports it is essential to fund, but which instructional strategies are the ones we should focus on? I don't know what the best process is for determining that but some process is needed to avoid misusing funds on evaluating marginally or poorly developed treatments.
I would like to see at least some part of this forum focus on mechanisms to tell whether the treatment proposed for evaluation is at a stage ready to be evaluated by large experimental design, what concrete proofs are necessary before public funds are spent on these expensive and time-consuming evaluations; how do we know if they could be evaluated in other less expensive but sufficiently rigorous ways?
Alternatively a review system could routinely shift through evidence prior to an RFP and pick those treatments or classes of treatments that appear to be at the appropriate point of development and issue RFPs for them.
In sum an experimental design in and of itself does not confer legitimacy or worthiness on the public funds for research.
The treatment's readiness for evaluation, the scope of possible impact or its strength of possible impact on key groups needs to be considered.
Finally, in the above few points I assumed that Steve was correct in identifying that strong curriculum and instructional strategies should be the focus of the research agenda, and he argued it so well I really had trouble not accepting that, but something about me is perverse and says not to swallow that argument quite so quickly. So, I am going to throw out a few examples that we might think about, other areas of inquiry that are important but do not fall into the narrow domain of proving out curriculum and instructional strategies.
I suspect, and I have some support in the literature, that while we can create curriculum pedagogy that improves learning in subjects such as math and science we also need to have experts in the classroom. A short course in pedagogy and a good textbook cannot duplicate 4 years of study in college in these specific subject areas. This is especially true if those teachers not only did not study math and science in college but probably did not excel in it in high school.
If this is true, and that is an assumption, a questionable thing-that the quality of the staff will make a difference, then a significant area of research must focus on labor market issues: How to attract and retain high-quality teachers, principals and administrators. It also follows that policy research and analysis is needed to determine how to use personnel assignment policy to effectively match personnel skills to the needs of students and schools. This labor market and personnel assignment research does not easily fall into Steve's more confined domain of testing instructional strategies.
Rather it gets at the systemic or instructional practices that might be preventing strong attainment. Likewise access issues, especially issues for special education students, access to AP courses or gatekeeper courses and access to higher education, are areas of importance that do not fall into that domain, but understanding how the current system of assignment and access is working, what its impact is on different groups, and what policies are effective in changing these structures for the better are all vital to improving educational opportunities for students.
A further agenda item that is missed or made marginal by the focus on whether treatment produces an effect is the issue of how to scale the treatment up to be useful across sites or how to ensure that good practice is adopted. This is the knowledge utilization issue.
I would like to see at least some part of the research agenda be focused on this issue. Once you prove a treatment to be effective how do you ensure it travels usefully; how do we make sure the research findings are used in an effective manner and now diluted through poor implementation or not used due to lack of knowledge?
Note, this moves past the issue that Steve mentioned about under what conditions does the treatment remain hearty. Instead it asks the question: how can we create the conditions that will allow it to be hearty? I am not sure how experimental design fits with the above three policy issue areas. I can imagine very mixed methods approaches to investigating them. I am more concerned that we not forget significant under attainment might be due to structural or systemic factors that deny valuable educational resources to children or that reduce the usefulness of research itself in terms of changing practice.
My comments do not detract from Steve's paper but simply add a slightly different perspective. Steve's paper is common-sensical and provocative at the same time and is a wonderful beginning for what I hope to be vibrant discussions that will lead us to better understanding of appropriate methods and better allocation of scarce resources.
Thanks.
(Applause.)
DR. RAUDENBUSH: I think we mainly agree. So, I shouldn't say much more since we want to get the audience to have a chance to speak.
The only thing I might mention is I think it is really interesting to think about how a research agenda on interventions to improve teaching and learning in the classroom would articulate with a research agenda to looking at things like resources, for example, how well teachers need to be trained or how they need to be trained or what they need to know or research on school instruction. In a paper that David Cohen and Deborah Ball and I have [written] in Educational Evaluation of Policy Analysis, we talk about this, and we say that it is very hard to get straight on those other issues until we have brought into the forefront is this business of what goes on in the classroom and how to improve classroom instruction because schools need to be organized to support those things that go on in the classroom. We need to know what resources are necessary, or how resource effects modulate or moderate the effects of those interventions. Obviously it would be silly to say that there should only be one thing we should study. What I am trying to say, and I think what we are trying to say, is that we need to kind of redesign the research agenda so that our causal model has what goes on in the classroom at the core and that affects kids’ outcomes. Then the other things, such as school organization and resources sort of interact with or have an effect on that process instead of having kind of this black box in the middle where these other governance, resources, and organization kind of ultimately affect the outcome. I think that is all I am going to say.
|