Events

Topics

Completed Projects

Newsletter

Center for Education
The National Academies
500 Fifth St., NW
Washington, D.C. 20001
Tel: 202-334-2353
Fax: 202-334-2210
E-mail: cfeinq@nas.edu

DR. CONFREY: Thank you, Mary Ann. She is going to switch over to PowerPoint and as she does I will just do a quick sort of framing question. It was something I was thinking about all morning which is the question of do causal effects equal effectiveness and that is really the question that we struggled with when we were doing the review of the evaluations of K-12 mathematics curricula.

Actually I am at Washington University. That is not her fault. I should have checked the bio. Okay, it is going to be an interesting challenge to do two different ones at a time.

I am going to do three things. I am going to introduce you to the report. I am going to talk about her study which was a very important study in terms of our thinking and then I am going to try to connect up the view of multiple methods that was in our report on evaluating curriculum effectiveness. I want to hook that up to the conversations we had this morning.

We ended up with about 600 submissions for the study that we did. We wrote those down and eliminated the historical and background information and we ended up identifying three types of studies which are identified here: content analysis, comparative analysis, and case studies. The Huntley study was one of our comparative analyses.

We also set a set of criteria that we thought needed to be there for us to do more detailed studies. So, we ended up with about 147 studies that we studied in detail.

In order to do this work we really had to propose a framework that we could work with and this is a framework that we used where we talked about in order to evaluate curricular effectiveness, you need to look at the program components — the mathematical content and the design elements, look at its implementation in terms of things like resources, processes and contextual influences, and look at student outcomes. We drew this as a triangle with feedback loops because I think it is really important to remember that there has been a lot of talk here today about “cause” and “cause and effect.” In fact, if you look at the effect of No Child Left Behind on classrooms, the outcomes are affecting the choice of curriculum not as much as the curriculum is affecting the choice of outcomes. So, to ignore that feedback seems to me to be a naive notion of how schools are actually functioning.

After we had that framework we also talked about methodological choices and in the report we distinguished the three different types that we found which was the content analysis, the comparative analysis, and the case study; we talked about experimental and quasi-experimental and as I think Steve pointed out this morning we only had one experimental study in the entire spectrum we looked at.

Once we did that we set out criteria or dimensions that we thought were essential that we learned from studying this body of knowledge and I do want to say that methodology independent of the actual study of evaluation studies seems to me to be problematic, that it is in reading all of these studies and taking seriously the challenges that were in front of the authors which Mary Ann described beautifully some of the ones that they faced, one really begins to realize the complexity of the challenge of doing something like evaluating curricular effectiveness.

Now, what we came out doing was saying that look, in order to advise the field because we knew the What Works Clearinghouse and some of the controversies out there, we said, "First we are going to define a scientifically valid study," which we said needed to conform to the components and needed to address the components in the framework and conform to the methodological expectations of each of the categories that we reviewed.

We then defined curricular effectiveness. After defining a scientifically valid study the question really is there and it hasn't been addressed today and that is: What the definition of effectiveness that you can use? Because it is really the link between that and the quantitative and the qualitative or the implementation that was said already you have to define in order to make sense out of what the implications of these kinds of judgments on studies and on curriculum are. So we defined it scaling from Messick's notion of validity; we defined it as an integrated judgment based on the interpretation of a number of scientifically valid evaluations that combine social values, empirical evidence and theoretical rationale.

So, we put the values issue right on the table as well as the theoretical issue and said that you are going to have to make an integrated judgment. We recognized then that in order to make that kind of integrated judgment you had to use multiple methods and evaluate each of which must be scientifically valid study as well as periodic synthesis across evaluations which is an area we haven't talked much about, but it is very hard to do and really needs some attention in its own right.

So, we had this sort of general principle of multiple methods. So, now, I just look at the Huntley study in particular. One of the things that the committee valued particularly in the Huntley study was that they clearly articulated their program theory. I put that first picture up there of program implementation and outcomes. When I put that first one up there it is important not just to say what the program is but what is the program doing? How do you think it is going to work in relation to practice, because the question is how do you think the flow of influence on student outcomes is going to happen through implementation? They do that very clearly by their definition of their multiple views of algebra.

Secondly, it is a positive study in that it does address some of the implementation fidelity issues within a comparative study through the use of teacher interviews, because we know different teachers cover different amounts of curriculum. So the opportunity to learn is the growth measure that you at least have to look at if you are going to look at curricular effectiveness.

Finally they use multiple contrasting outcome measures. We defined a term in the study which we called curricular validity of measures and we contrasted that with systemic alignment of measures pointing out that unless your measure actually measures the curriculum that you are trying to check on effectiveness you don't have an appropriate measure.

So, for instance one of the studies approved on the web site for What Works uses the FCAT exam in Florida to evaluate a seventh grade algebra program. That would not have curricular validity of measures because I have never seen a state test of seventh grade that measures algebra adequately.

Weaknesses in design of the study: One of the things we were concerned with, and they thought it through but we had a different conclusion than them, was that we felt they used an incorrect unit of analysis and therefore got higher significance than they might have. We reran that test in their case just to look at it but it is a problem. If you go look at a classroom and the classroom is being taught all by the same teacher and then you use your number of students to gauge your significance there is a problem of unit analysis. Secondly, they had unnamed or varied comparison groups and the problem is when you compare curriculum to an unnamed group. We ran a filter to look at those studies that named them and when they do name them they are less likely to get positive results or they are positive and significant and probably because of those unnamed ones may not be a curriculum at all.

Finally, there is volunteer sample in field test sites that they worked in which they acknowledge very carefully in the report.

One of the things that we noticed in there is that there is the value of these multiple authors and perspectives. One of the things you can see in that particular research report is it is designed with a deep understanding of the complexity of variations in practice. The author teams represented math, math education and methodologists and you can see the impact on the design and they recognized the interplay of the student sample, teacher quality, context of implementation, varied types of outcomes, and school policies. This is a complicated system that we are talking about working in.

Now, why do I think it is so important that we push this notion that effectiveness is an integrated judgment? One of the places that is in there and this is a quote from their study and they say, "Our study does not provide information needed to answer the question about what mathematics is most worth learning but does suggest the kind of trade-offs that might be expected. "

Really these are questions often because of questions of values and questions of trade-off. They also reported on and used their multiple methods to allow them to recognize the variation in implementation and actually explore it and try to come up with explanations and the overall results. This is a really important thing because it is a question about average results.

The overall results suggest achievement patterns related to curricular content disguise very substantial differences in implementation and the results of different sites. They acknowledge that in their particular study and they recognize — and this is a huge issue for us and was in our review at the high school level — that there is a need to see the larger picture. It is not just about what happens in terms of the results of that grade level, but parents and people who have to judge effectiveness are concerned about the transition to college. It is a nice study to represent that because most universities use algebraic manipulations placement exams. So, even if you value in long term whether people will stay in careers based on these other things it is an issue.

The conclusion did say at one point we believe our results do provide support for the basic reform position and what I would want to say is this is a warranted conclusion within their study if you accept their design and statistical test, which we had some issues with but just assume it was fixed, say, and depending on one's values and theories about technological dependency, one could disagree with these conclusions. In other words, it has got to be a judgment. Curricular effectiveness has to be a judgment because of the issues of values that are implicit in the studies.

So, is it mixed methods? Well, yes, it certainly uses different methods especially to explain the discrepancies, but I think it is important just to go beyond just that general description as to how it fit with their model of the kind of research they were going to do.

Okay, now, one of the things that we pointed out in the report is a comparative study, even a fixed comparative study, even a perfect comparative study, can't stand alone as a method. So, we argued for the multiple methods in that and that is because you would have to have a content analysis. You have to know whether or not what is being taught is comprehensive and complete and fair and engaging the students and uses interesting forms of assessment. So, a content analysis is necessary in addition to comparative analysis, and we called for that.

Okay, and we also called for case studies because of these issues that have been raised a lot today about implementation. So, when I compare our results and the approach we took to the approach that Professor Raudenbush used this morning I would say that both of them support multiple methods. Both identified the importance of the instructional core and both recognized the centrality of issues of equity and equal opportunity. Both recognized the problem of precise outcomes which I cannot overstate because we have very poor measures in the country for precise curricular outcomes at the level of being able to judge the quality of a curriculum like “does your approach to transformations work?”

I preferred the discussion in the paper of the expectation of research where it is viewed as a research program not a sequential process. I don't think a sequential process is the way to go in terms of saying how it is that these multiple methods need to relate to each other and I think our study shows some reasons for that.

Also, we recognize that you are going to have theory and what we need is general theories of evaluated curricular effectiveness like we put but you have very specific theories to your studies and I think that some of this general discussion of what works tends to ignore the role of theories and values in those discussions.

In his talk this morning he identified three important areas for using these qualitative or descriptive correlational methods which are laid out here and suggested that those are requisite for conducting a randomized field trial. I think our study tends to show that there are a variety of other things that really have to be part and parcel of recognizing the needs of multiple methods and I have listed some of those up here in terms of things like in the last one, changing curricular focus, new topics, new technologies, new career paths. Those things influence fundamentally how you are going to define effectiveness if in fact you want effectiveness to drive schools as they are currently being practiced and there are some other ones that we found from across the study, and I think that we underestimate the current controversy about the comparison.

I did a paper that is in review right now that contrasts and compares what works in our study and I think people are underestimating the seriousness of the distinction between these different approaches and some of these things, such as the importance of multi-disciplinary perspective. Our committee could never have functioned as well as it did if we didn't have mathematicians and math educators and methodologists because we had different things we cared about. Validity was front and center for the mathematicians. Breaking down by subgroups was different for different groups. There are different issues of importance and I think that this also raises the question about contingencies and contextual variation and practices. I mean I think we have to ask the question seriously about whether we really are talking about a causal situation or whether we ought to step back from the notion of cause and effect in these complex settings and talk about likelihood and tendencies. So, certain curricula will have more likelihood in certain contexts to have a certain kind of impact and it needs to be a much more nuanced set of distinctions and so we propose a sort of general question here that is written up here as we need to provide. Steve suggested we need to evaluate claims about causal effects of interventions and teaching and learning in the nation's classrooms and I would revise that to just that we need to evaluate claims about the effectiveness, not necessarily causal effectiveness, because I think we have to define effectiveness as it relates to causal effectiveness and that doesn't mean I am opposed to causal effectiveness as an analysis but it is one tool and it is part of the definition of effectiveness that you need to get to and think about it in terms of providing decision makers with adequate information to make informed judgments about how to improve teaching and learning in the particular context that they find themselves in with the particular research that is available to them.

Thanks.

Feedback | Back to Top
Copyright @ . National Academy of Sciences. All rights reserved. 500 Fifth St. N.W., Washington, D.C. 20001.
Terms of Use and Privacy Statement