Events

Topics

Completed Projects

Newsletter

Center for Education
The National Academies
500 Fifth St., NW
Washington, D.C. 20001
Tel: 202-334-2353
Fax: 202-334-2210
E-mail: cfeinq@nas.edu

MS. WINTERS: Mary Ann, do you want to take a moment to respond to anything? I can open it up to questions. Let me ask you if you would please come up to the microphone and state your name if you have a question that you would like to ask.

DR. TRACTENBERG: Rochelle Tractenberg from Georgetown. I had a question about the design of the study and then your comment about the claims that the study found support that the curricular reform sort of basically worked.

According to the test that you devised, which wasn't inherent in either instructional setting, the premise was extremely poor, like radically, these people would not pass the class; it is not possible that they would be able to go forward and it doesn't make sense to contemplate that the instructors were teaching to that level. So, my question I guess generally is how to reconcile the need to design an assessment that would allow you to assess effectiveness, but generally, with what you have to take into consideration which is that teachers are teaching to the test or they are teaching specifically to the curriculum and may be balanced. So for example you don't have any multi-variable methods in your multi-methods paper. So, for example covariates where you have your assessment that was created but you also have a covariable of what was the average performance on that instructor's test or something. So you’re basically integrating the context into the analysis. Could you sort of discuss that and the considerations for defining effectiveness and how you can study it?

DR. HUNTLEY: The first thing you spoke about was the low performance of students on all forms of the assessment and this was true for both Core-Plus and students using the more traditional approach and we really struggled with this. How do we get the students to take this assessment seriously? For many students, they had been away from algebra for some time because algebra is integrated in the Core-Plus approach. So, they hadn't seen some of these particular concepts very recently before we gave the assessment. So, we really struggled with this and that was one of the motivators for our conducting a clinical interview study because many of us who administered the assessments by talking with students after administration of the tests we knew the students knew a lot more than what they showed on their papers.

So, this is a really tough issue and you also asked about covariates. It was really difficult to get information from the school administrators about the students' performance and it would have been a much worse headache to try to get information about students' tests that their teacher had administered. So, we recognized these issues. They are really tough challenges.

DR. CONFREY: And I may have not been clear. I was quoting them and my point was that that is why it is an integrated judgment. So, that wasn't my judgment on that. That was the conclusion they drew and my argument really was that it was warranted by the study by the way in which they set up a theoretical argument in their sense of what is important but somebody else could make a different notion and also that they did acknowledge the low performance overall.

So, they did try to sort of set it in that larger context. So, it is a very nuanced thing about how you draw conclusions in terms of how local and how broad you set the context.

DR.HUDGENS: Stacie Hudgens from Learning Point Associates. I brought props. So, I am sorry. I wanted to kind of go back. I should say I am a psychometrician and so I have a question in regard to the low scores that you saw and it also goes back to Dr. Raudenbush's manuscript when he was talking about if you measure the right outcome unreliably or more likely find a new program ineffective even if it is effective.

So, I was kind of looking at the variance in your scores that you had which was incredibly high and at times where the standard deviation was double the average score.

My question is, and this could be just a global question in general, when you are talking about mixed methods really paying attention to the psychometrics or the outcome that you are looking at or that you are testing or you are framing the test measure, the outcome that you are looking for, and I think that that might be kind of ignored or it has kind of been touched on a little bit today, but if you could speak a little bit about that in addition to that and like I said that might be a global question that you could speak to on the psychometrics of your assessment because maybe they are good assessments needing a little bit of work. In addition to that, when you speak about effect and you are looking at effects in educational assessment, I keep seeing effects being presented without looking at effect size estimates that when you are able to maybe look at effect size estimates that calculate effect sizes on some of the key statistics that you did and if so what did those look like given the variance that you saw?

DR. HUNTLEY: Okay, if I could redesign the study and now do it better given the lessons that we have learned I think we would have had one person devoted to the psychometric aspects of the study. In retrospect it is something that we did not pay enough attention to and I know that there are a lot of concerns about the study and that for instance we didn't look at the various forms of the assessment. For part one of the assessment we had four different forms that we claimed are roughly equivalent in terms of difficulty, but there are ways to assure ourselves that this is really the case and we didn't go that extra step, and I think if we had had more rigor behind that our conclusions would have been stronger.

So, I do recognize the pitfalls and in a perfect world I could redo the study and have more people involved with the expertise that we really needed and didn't have that represented it as fully as we should have.

In terms of the effects, no, we did not look at effect size. Person power is of course very, very time consuming. I mean every part of the study was time consuming.

DR.CONFREY: I also just want to point out that I think you are right about the psychometrics in these studies. We have to tie it into the issues of validity which really are poorly treated in most psychometric studies because it is really the math and math educators and methodologists combination that we felt on the committee we struggled with repeatedly in terms of how to think about this because, for instance, most tests that you do have reasonable statistics on in terms of reliability. They aren't valid at the level of subtests which is the content strands, right, because they give them to all students and they don't have enough items at the level of content strands, but the level of content strands is the level that a curriculum person cares about because at that level you don't know what to fix or not to fix in a curriculum.

So, in effect that overall score is just a kind of broad result that doesn't necessarily tell you the hills and valleys that you need for revision of materials. So, I agree with you and I want to say that at the same time we have to put the issues of validity up in front and center in relation to the issues of psychometrics. The question on the effect size for us in our overall set of studies we didn't do it but the reason was because we felt that the variety of outcome measures was so varied across the different studies we were working with that effect size really was not an appropriate thing to use across the grades and levels and types of curriculum that we were working with. So, we chose not to do that.

PARTICIPANT: I really have two related comments. One is in terms of understanding the process, the classroom process that is induced by one or another curriculum. Have you thought about or can you envision a possible use of work samples or some kind of direct representation of what it is that students are actually doing from day to day and in a similar way on the outcome side in some areas of importance including teacher preparation we have seen the emergence of portfolios of capstone work as one basis for assessing the outcomes of a program of instruction and I guess I am interested in whether there is any hope of moving beyond these sort of short, simple task tests that have been sort of the gold standard for the gold standard to actually looking at in detail at the things that students can do and building rubrics that could well be quantified to assess those performances.

DR. HUNTLEY: Those are excellent ideas. We conducted this study on a shoestring. I know there are funders in the room. With increased funding yes, we could collect more of this nuance data. We did what we could within our research questions, the purpose and the amount of funding that we had. Yes, I am aware of these other sources of data and I find them very, very promising.

DR. CONFREY: And we did have some studies that in particular had sort of, not very many of them but some of them had focused on conceptual work. There was one by Ben Kiam on proportional reasoning where they had used items that really had good task validity and looked at student work and tried to compare and contrast that but it is not done broadly and it is hard to get it robust in terms of getting your sample sizes up high enough. There is a few but not a lot.

MS.WINTERS: Before we take any more questions I would like to suggest we take a break and go get lunch which is right outside and come back in about 15 minutes for more questions.

(Brief recess.)

DR. WINTERS; We are ready to begin again and I am going to invite Karen to come up because I know she was ready with the next question.

DR. KING: Okay, I am Karen King from the National Science Foundation and Michigan State University. I will say from the beginning I went to graduate school with Mary Ann and I was there when the study was happening and so what I want to ask her to specifically comment on is, given that we are going to talk later on today about the role of mixed methods training and graduate programs, how doing this particular study, which wasn't a focus of any of the graduate students' own personal research for their dissertation, fit into the work and life of a graduate student and how that could impact the future of thinking about how to do mixed methods research in a graduate program.

DR. HUNTLEY: Thanks, Karen. Apparently all of us were driving Karen crazy when the study was going on complaining to her about various aspects of the study. It was difficult.

As Karen said, let me emphasize this was not my dissertation. This was not Chris Rasmussen's dissertation and it was not Roberto Villarubi's dissertation. It was not Jaruwan Santong's dissertation. So, we were all doing this on the side.

For some of us it was our graduate assistant and for others, Jim Fey pulled us in because of our expertise to work on the project. So, it was really tough to balance this with our other obligations.

Some of us were teaching while this was going on. We were teaching undergraduate math classes or math education classes. It was really tough. At College Park, Maryland I felt very fortunate though in my training because we were trained in mixed methods. We were expected to know how to conduct quantitative study and qualitative study. I think it was a real strength of the program there, but it is true that this study really did take over lives for quite a while.

I mean I was still working on it at least a year after I finished my dissertation. So, it didn't stop when graduate school stopped for me.

DR. SAUL: Mark Saul, City University Research Foundation and NSF. I am really a classroom teacher incognito and so I have some questions involving practitioners which we haven't talked about very much today, and I am not sure how they involve methodology but I think you can tell me.

First of all, Mary Ann, I was very glad to see your unpacking a little bit the notion of fidelity of implementations. It is troublesome to me for exactly the reasons you talked about up there.

My question is what about the possibility of if you took those implementations that were most faithful you might be selecting for a certain type of teacher for example or a certain type of classroom. Maybe the new teacher needs more structure. Maybe the teacher who has depth of subject, content knowledge would require that. Is there a way in the methodology you can control for that?

Another thing which I reacted to was — we are talking about exploding dichotomies here a lot today — this context-free vs. contextualized has always been a problem for me, again, because there comes a time in a student's life when the mathematics becomes the context where it is not just that the mathematics has to be learned from something in the real world into that, into a representation but sometimes talking about the representation itself is the context a meaningful mathematical inquiry. This is I think a matter of values chosen and I am not sure how that relates to the methodology you chose.

The third thing relates to assessment which is to me the big black box here. There was some dichotomy between well, mechanical manipulation versus something else and you talked about interviewing students to talk about they frequently knew more. I would make the argument that they frequently know less than what shows on the test. If you have a kid who can factor X squared minus 1 but can't factor 2499 as 50 squared minus 1 squared you have a kid who doesn't know something although they can perform perfectly well algebraically on the test.

The fourth thing that struck me which is again from another point of view is this notion of causality which has been haunting us I think also all morning. It is very problematic. We had huge discussions about this at NSF which didn't resolve which is probably appropriate and the physical sciences don't worry much about causality. They talk about correlations and somebody else worries about the causality but social scientists are obsessed with the causality or maybe it is the policy setters who are obsessed with causality. I am wondering from one of Jere's comments maybe causality is likelihood of some curriculum dimension causing something and maybe we don't have to talk about the mechanism or maybe the methodology could unpack that mechanism. Anyway that is a lot.

DR. HUNTLEY: Controlling for fidelity, are you talking about that in terms of an evaluation study you know a comparative study?

DR. SAUL; Your study.

DR. HUNTLEY: Oh, my study of fidelity, my type of study where I am focusing on fidelity. I am going to have to do some very careful writing about the teachers that I select and what I am seeing and I am not very far along in my data collection yet in classrooms. I am in other parts of the research project further along but I have reports of people teaching from the authors. I am going straight to the authors and saying to someone like Glenda Lappan, “who are the people you think most faithfully implement your curricula?” So, it is a very special sample and so I am going to have to address that.

DR. SAUL: An issue very practically in funding curriculum is for whom are we writing this curriculum. Is this a curriculum for teachers who know a lot of mathematics? Is it a curriculum for teachers who need a lot of structure and maybe you could help us learn about that.

DR. CONFREY: Also, I am not sure fidelity is the word you want. One of the most significant areas that came up across all these studies is the question of the interactions among professional development, teacher effects and classroom implementation and lots of people when we first did our evidence panel came to us and said, "Look, it is all professional development. It is all capacity of the teachers," and other people said, "No, no, it is all curriculum," and somebody else said, "It is pedagogy," right? So, I mean the question has something to do with teasing apart and using sophisticated enough methods to look at the interactions amongst those three.

It is also an issue of program theory. When Frank Wang came to talk to us from Saxon, he talked about the design of the Saxon curriculum which was in part one of the things that it has in it is this incremental practice but the other thing was they find somebody who they see as very effective out in the field and then they ask them to design and they designed it so that it didn't require a lot of homework. What he said straight out and we cited as a quote in the report and so if I don't say it accurately here, read it in the report and read it accurately at least, but basically he said, "We don't make huge assumptions about the extent of capability of classroom teachers to implement. We take a modest assumption about that," and so there is a certain amount of design of that curriculum that in effect doesn't assume high capacity.

Now, you contrast that like the TERC Investigations where the curriculum was designed to try to build in a lot of those examples of student work such that it would result in professional development that over time would show increasing ability for teachers to teach with it. You can't have one evaluation model do both those things because they are different program theories about how the program is supposed to be implemented and they also are going to spell out different kinds of outcomes.

So, it is a very important question but the term "fidelity" glosses over the notion that there are these interactions that have to do with your theoretical point of view.

The other question you asked that I just want to respond to is the question about causality. To me, and this is not a committee point of view; this is my own point of view as one member of the committee but we tried to allow for multiple points of view in the perspective that we put forth, and that is I would prefer to see us talk about modeling than talking about causality because the point has to do with this temporal sequence thing.

Causality requires you to make the argument that there is this temporal sequence and the question has come up a couple of times today. The question is, “Is education the same as every other social science?” I would answer that the answer is actually no. Certainly it shares things and certainly we can learn a lot from it but fundamentally education is about learning and learning is about feedback. So if our models don't take into account the role of feedback in the design and underlying epistemology — if we don't take those seriously in our designs, then in fact I think the comment from I think it was Dick this morning who said about those two different approaches, the sort of learning environment versus effectiveness has to diverge. But if in fact you recognize that that is the nature of the educational setting is a setting in which it essentially has to be a learning environment for us to be successful in what we are doing and efficacy and effectiveness have to be defined in that context. Then you can ask the question about whether causality is in fact the best, I am just going to say the best model to talk about it. Now, that is not to say that I disagree with statistics or probabilistic reasoning, because clearly those are models that we have to be concerned when we deal with representativeness and large samples, etc.

DR. BERCH: Dan Berch, National Institute of Child Health and Human Development. Mary Ann, I was concerned by something at the end of your talk and hope I am characterizing it correctly, namely I think it was in answer to the question what works and you thought the naive response was whichever one scores the highest. As I recall you said that the reason that is naive is because some curriculum might lead to better performance perhaps on some outcome measures than on other ones, better performance in others some of which came out in your study.

So, how do you go about making the decision to recommend a particular curriculum if you have multiple outcome measures and you don't have some pretty specified rationale, specification or whatever for the importance of those different measures that can help you in making such a decision or recommendation?

DR. HUNTLEY: I guess largely I think two values. I mean, you have to look at the data alongside your own values. Do you value students being proficient in simple manipulation to the exclusion that they are not learning about mathematizing problem situations? The world is very different than it was even 30 years ago. I mean applied math has come very, very far with the advent of calculators or with computing technology. So, the field is changing. Do we value simple manipulation? Do we value students being able to solve applied problems or do we value a mixture of both? I think that this is a complex question and it must be looked at in a context. We can't just look at the results. We have to look at the context including your values.

DR. CONFREY: I think along with that though, and one of the points that they make in their study is, and this goes to something that you were saying before about this, even their symbol manipulation in there is not that taxing or demanding. I mean they didn't go after the stuff like adding rational expressions or certain kinds of radicals or things that we know students have more trouble learning at that grade level which is course three. So, it is an appropriate question to ask at that grade level. They didn't ask that. So, I mean my answer to your question is one of the recommendations we made is we have to educate people on how to read these studies and make their decisions and weigh the evidence and that has got to be part of the agenda is how to help people to make these kinds of decisions.

I don't think we do that by casting this as an oversimplification. I think that that does a disservice to the field on how the field needs to think about these things. I mean a simple example really is that distinction we made between curricular validity and system alignment of measures because you are a superintendent. Pat is over there. You are a superintendent of schools and you have the question do you want scores so that your students graduate from high school or do you want a valid measure of a Core-Plus curriculum in terms of student outcomes from the use of say a Core-Plus curriculum. There isn't any way around the problem at least at this point in our culture. There is no way around that problem that can give us a simple definition of effectiveness.

So, what we have to be doing is being sure and our hope in doing the study was to say yes, we can't tell you which of these curricular programs are most effective, in fact that is beyond our charge, but we can try to help you learn how to design evaluations and draw the kinds of distinctions that will move us forward in having the right kind of conversations about making those decisions, in that that is what we feel we have to really draw on the existing field and draw from the work that people like Mary Ann and others have done to try and make that a possibility.

DR. BERCH: If I might make a quick 2 second response it is probably more appropriate for another forum but that is true values and there are very different values and if you are going to try to convince people that in some cases parents that they don't have the right values and don't understand what kids really need to build out math you also had better have data to demonstrate that that has an impact on their performance after school preparation for the work force setting. I am not sure that those data exist, but in any event --

DR. CONFREY: In the math field if you don't look at the impact of parents’ (especially in some of the higher income brackets) affect on what turns out to be seen as curricular effectiveness you are ignoring a huge event that is going on right now in the schools everywhere. The fact is — this is why I was saying with the feedback cycle, we have this right now in Clayton outside of St. Louis where it is this particular curriculum and two things have happened at the same time. The parents have insisted they go to two choices for curriculum. So that decision is being made by school boards. At the same time the effects on the teachers are that the teachers feel deprofessionalized. So, now how in that system do you make a judgment about effectiveness when you start to think about the complexity of the interaction between deprofessionalizing your teaching group and yet offering choice relative to the parental concerns and the thing that struck me most of all because I have volunteered multiple times to go into that school board and do a report on the report, they didn't want to know about it. So, you know the question about hunger for evidence or how to make better decisions out there is I think a serious issue and it is not just a fringe issue.

MS. WINTERS: This will be our final question.

DR. MITCHELL: Thank you, Monica Mitchell from the National Science Foundation and New Visions for Public Schools. This is a question related to the issue of the practitioner and it deals with the fidelity of implementation if you want to pose that perspective differently, how the curriculum is being implemented in the classroom. On your study you mentioned that you questioned the reliability of the teacher interview data because of the trust factor.

Since you are now working on fidelity of implementation in another study how are you addressing trust with the teachers to ensure that you do have reliability in your data?

DR. HUNTLEY: That is a really good point. I have interviews with teachers as well as observations of their classroom practice plus I used a very similar protocol with the students to try to find out what they see are the big overarching characteristics of their math classroom. So, I am trying to triangulate the data but that doesn't get at the trust issue. I have only been in one teacher's classroom so far and I spent a week with this teacher. I knew the teacher prior to going there; I had met him briefly. Also these teachers that I am working with now are pretty close to authors. So, hopefully my relationship with the authors will help establish that trust, because if Teacher Joe thinks that I might not be genuinely interested and want to use the videotapes really to criticize his practice — I mean maybe it is naive of me but I am thinking that he would talk to the authors who recommended that he be videotaped and find some comfort from the authors about my study because the authors are very supportive of what I am doing right now. But it is an issue because it is the case for some of the other teachers in my fidelity study that I know them, so, I think I have earned their trust, but other teachers that is not the case. So, they are one step removed from me. So, trying to establish that really talking a lot to teachers during the teacher's free period and just trying to let the teacher know that I am really interested in what is going on. I am not there to evaluate but trying to establish that relationship. Going in for more than a day I guess really helped. I spent a whole week in my last data collection period and I think that that was really helpful. The teacher thought that I cared enough to see the ideas being developed over the course of a week. It wasn't a one-shot deal.

So, there are lots of things that I am trying to do to establish that trust but it is a huge issue.

DR. CONFREY: And I think in relation to the report one of our recommendations was the independence of evaluators. So, there is a double-edged issue about that relationship and we really argued that the summative evaluations has to be done by an independent evaluator and it allows me at least to end with a couple of quick points. One is the question of who is going to do this work. I think it is really an important question because once you separate it — I mean the majority of our studies were connected to the development of curriculum as part of the grant — and once you separate it from that you have got to ask the question where is the incentive and the motivation to do this kind of research, especially if you are not a graduate student and somewhat captive an audience because a lot of them were similar to the situation that you described. The second is the cost. Usually within an RFP for curriculum I think evaluation studies are probably about 5 percent of the budget maybe or at least maybe on these about 5 percent of the budget.

So, if you want to start talking about different methods we are really going to have to take seriously both how do we work as a community to develop the instrumentation to do that and how do we actually support the costs that are involved in doing this work in a serious way.

MS. WINTERS: I want to say thank you to both of you and especially Mary Ann for coming and opening up your work and thank you very much, Jere for bringing your perspectives as well.

(Applause.)

MS. WINTERS: We will take a break and the next session is supposed to start at one-fifteen.

(Thereupon, at 1:03 p.m., the breakout session was adjourned.)

Feedback | Back to Top
Copyright @ . National Academy of Sciences. All rights reserved. 500 Fifth St. N.W., Washington, D.C. 20001.
Terms of Use and Privacy Statement