|
DR. KELLY: You’ll have to excuse the short bio describing me in the program, I was coming to this meeting as a participant and in the slot was TBD. I found out from an email from Lisa that I was, the TBD is NRC for me, so I RSVP and IB here QED.
One of the reasons I guess why I’m here is because I show up to annoy the Academy every now and again, I used to be a program officer of the National Science Foundation for knowing what students know and other things. I was the editor of this educational researcher that came out in design research methods in education, and have a couple of NSF grants to look at methodology as a problem in education research. And it seems to me before I launch into my prepared remarks that I came here expecting to hear a lot of the political rhetoric about gold standards, I think I heard one reference to gold standard, and this notion of unreflectively taking what is presumed to be practice in medical research and then saying this is what we need to do in education. We haven’t heard that, which has been very good, we’ve heard, in fact I saw a wonderful morning of qualitative research, I was delighted with everything that has happened here so far.
So maybe when the committee was thinking about this there are randomized clinical trials as intended and then there’s randomized clinical trials as carried out, and listening to the machinations, and these are all good human machinations, nothing wrong with it, it seems closer to me to engineering design studies in the field rather then medical drug trials, in which case the committee might think about this is a methodological issue rather, I don’t think I know anybody who when they read Fisher are not impressed with the notion of randomized clinical trials to figure out whether you put the milk in the tea first or the water in the milk, whatever that simple experiment was. Now the logic of experimental design is clear, I don’t think you’ve got a group of educational researchers who are just degenerate, who don’t want to use this. In fact many of them were trained by people who were trained in this method, Allen Collins(?), Anne Brown, a number of people who were seriously trained in this method understand in all its limitations were the ones who were training the current generation of educational researchers who are not using this method. And part of the reason they’re not as you see that as soon as you begin to try it out the system bites back, you get percebations(?), it’s more like a percobation study then it is a drug study in that sense. And you can imagine you’ve got an aspirin that’s decaying at random or not deciding to show up, very funny medical trial.
That’s cause and effect, you hit the cue ball of the black ball but the black ball disappears, or hides behind the other ball, develops an attitude.
If educational researchers believe, this is the implication of researchers, if they believe in randomized field trials or whatever this really is, and they believe it can be used successfully in educational context, and if they believe that they can draw causal claims from its use, and the program evaluation satisfy their career ambitions, then there may be some implications. Because what I hear a lot here is not research in the sense that I would think about it in terms of scientific research in education generally, it sounds like a lot of good program evaluation. It’s very interesting that I think it was a million dollars spent more on the evaluation then the program. I think we’re talking about something we should be very, very clear here with the panel that this looks like a program evaluation study and by the way, if you can carry out randomized clinical trial as done with drug trials in education that’s absolutely the way to go.
If this is going to be the direction what are the implications for researchers? You’re going to have a capacity problem, I was just referring to, well this capacity problem, most of the people who are trained in education research were not trained in this method, so they’re going to have to look at and understand the stretch to validity, internal and external ecological validity, they’ve got to worry not only about attrition of subjects but if you’re looking at school settings you’ve got to look about random in migration into your study. I mean I work with teachers in Virginia, they can have 10, 15 non-English speaking students show up at random in their classrooms, they may end up getting the classroom they didn’t expect. There’s a huge amount, you’re working with a system that’s in flux and you’ve got to handle it, and it’s not going to be the same as doing a laboratory study in psychology or doing a drug trial. And of course drug trials in medicine are only part of the effort and as you know if you study the compliance studies after the fact the amount of up take in the clinical practice of doctors from the results of the randomized clinical trials in medicine is actually very, very small. In fact the studies all the time showing doctors don’t wash their hands, they continue to do routine physicals when they know it has no predictive validity, you know the stuff about the hormone replacement, and on and on.
Another consideration they’re going to have to face is the ethical implications, we’ve heard a lot about that today, and the political implications. Why? Because a lot of the qualitative research that they do is often done with themselves as the researcher so they sidestep a lot of these issues. If they decide to get into this field they’re going to have to face squarely all the issues that you’ve been talking about here, and it’s something that they haven’t I think in general had a training for and it’s a serious issue because you could mandate this but you need to have the horses in order to pull the cart.
This is an interesting thing because the mandate, particularly from the Department of Education, to move towards I think it was three quarters of the trials being, of the studies being randomized clinical trials, it was an interesting thing in terms of deciding if you want to spend your career doing this because there’s going to be a political influence. If you look at the National Reading Panel, which was cited here earlier, they threw out the qualitative research that was done, they looked at the randomized clinical trials, there was a study just published by Kimillie(?) and company at Rutgers showing that if you looked at the qualitative research you get very important stuff in order to determine effect. So if it’s a thing that you’re looking, if you’re coming into this field, and we have a huge problem attracting people into the field in mathematics education, for example, a lot of them being trained in qualitative methods and if they look at what happens in the National Reading Panel they decide why bother, or also consider that the most determined part of acceptable reports has a real political consequence, the ASCD just came out with a report where they cited some documents in the White House suggesting that certain reports could be deleted at the whim of politicians. If you go back to Shavelson’s presentation on the peer group that decides what science is, the peer group is now getting a political, federal political tinge to it that it didn’t have in the past.
Another thing is that if a person decides to get into this line of research, if they’re doing the funded research that is related to the political desires of people, then they may lose some freedom in the measures that they decide to use because more and more states are beginning to look to the standardized tests that are coming out from the book companies, presumably that are aligned with the state standards, as the gold standard in this case for graduation. So if you’re going to do work in the schools you’re going to find yourself having to respond to these, they may not be the right measures for what you want to do, but it may have an impact on whether or not people want to get into that sort of work. And there hasn’t been a lot of comment on it here but as you know there was this panel that had Knowing What Students Know and looked at an assessment itself as a problem. And the unit dimensional model that was underlying the atom(?) response theory may or may not reflect what people believe is occurring in learning due to the cognitive revolution if you were to look at the How People Learn book. So we’ve got to think about whether or not we are in some ways prisoners to a projective geometry, it’s a very, very, very complex cognitive model and we project it onto uni-dimensional structure and that becomes the gold standard by which you decide that you have been effective.
Also there will be a shift if this becomes more prominent towards what you’d call a context of verification rather then one of discovery. I know the philosophers in science are going to tell me there’s very little difference, I accept that, but there’s reasonable sense in which it keeps coming up, there’s a recent study by the Max Plank Institute. But a lot of people in education see the education research question as open and what to do exploratory and discovery oriented research and this one is pushing more towards verification. There’s nothing wrong with verification but as you can see in the example we just saw associates are perturbing the situation, all these discovery questions begin to, historical questions begin to show themselves. So the question is if you have been, sorry, I was going to point here to the second point, if you have been trained in qualitative research in education at the moment in this model you’re being given a supportive role to somebody who’s doing an RFT and that’s an interesting question in terms of how will you see the whole progress of science from discovery to verification.
As I note here the career implications of this are unknown but interesting, especially when you consider that I think the amount of new money in WECK(?) when I was there at the NSF might have been $7 million dollars of new money, I mean that’s about two or three sites in randomized clinical trials, so you’ve got to think that if you are going to fund scientific research in education you’ve got to make a huge increase in the budget.
Implication six please. This is one in which, just alluding to the fact that in this special issue here of the Educational Researcher the people who set the ball rolling or trained as experimental psychologists, and I had the privilege of having a class from Lee Kronbach(?), so the people who were responding to randomized field trials weren’t misunderstand it, it’s just that they found that when you try it out the promise of, it’s like the intended curriculum versus the enacted curriculum, the intended design versus the enacted design can be very different.
Barry Sloan asked questions from the audience, you’ve seen that other Irish man, he currently has the Irish endowed chair of the National Science Foundation which I dedicated when I left. But he talks about a model building structure in which you’ve got a model formulation, model estimation and validation. A question is this variant of randomized field trials as it occurs in education, and this is just as I was responding to it today, seemed like it’s occurring more actually in model formulation then it is in either estimation or validation. It’s an open question. A paper in this, by the way Barry’s paper is in this as is the one by Brenda Bannon Wickland(?), where she’s got looking at a program research as a design event. Looking at her model, which has an exploratory piece, a prototyping or enactment piece, some local impact and then broader impact, it looks to me like a lot of this work that we’ve just seen while rhetorically it’s in definitive trial is actually in practice going on much more in terms of exploration and prototyping.
Her model also looks at the question of diffusion and if you are to draw out the lessons from a lot, at least when I was listening to these presentation, I kept hearing oh, it’s important to prepare the ground, partnerships are important, money is important, all these things are important. Well when we publish the research it would be great if we could also have available that team of people with partnerships skills and all the money because you just can’t go to the study and read the finding and then apply the finding, the finding wasn’t applied, this huge big social engineering event was what was applied. So I think we’ve got to think about the diffusion of innovations problem because just because something is good, Beta is better then VHS, Mac is better then PC and so forth, but that doesn’t mean that people are going to follow it. Democracy is a very good idea, California doesn’t like it.
Quote here from Yogi Berra, and I’m doing this one with respect, seriously, because I found a study by Tom Cook and Company in AERJ, year 2000, as you know he had a paper in Evidence Matters, and it’s very fascinating to read. This is where he squarely and very honestly faces the problems of the intended design and what he called the achieved design. And as you read through it, and I know I don’t have an awful lot of time, you can see these slides later, huge questions come up, things don’t happen in a vacuum, let’s just go through these quickly. Validity questions, measurement questions come up, the next one, I’m going to go through it quickly, this is a study of the Comber(?) schools, look at this one for example. The conclusions depends upon the validity of the implementation index, and the one we used meets all the usual psychometric criteria with respect to reliability and face validity, no other forms of validity as it turns out. It was created after writing a paper on the program together with its designer, it was checked out with both Yale and Prince George’s County officials. And then on another page, the preceding results on the implementation index depend on single items of inevitable questionable reliability. And this interplay between the measures that you use and the type of data you can gather and the type of theory which you can generate from it, there’s some interesting work by Mystery(?) in physics education to show the data that is available and not available to you as a function of the measures that you use.
This one is, I won’t go to anything except just read the last sentence, so causal inferences about the differences in achievement gain is especially dependent on the design used and the adequacy of the statistical control for selection. This is a reflection after the design was put in place rather then before it, which is interesting. Next slide.
People who follow this have to learn to live with a lesser form of certainty. You’ll notice here that there were two schools they were looking at, one in Prince George’s County, Maryland and one in Chicago, and you’ll notice that he lists all of these differences. We cannot be sure which of these many sites differences alone or in combination account for the different students by site. However, all the presumptions are the Chicago site is superior. Notice also that he said most of the effects on students are not clear until at the last two years of the secure program. In Prince George’s County the entire program lasted two years. So you have got a, you’ve got some issues of logic that are not cured just because you did some randomization at the outset.
Look also at Clandon(?) Maddeneck(?), this was in a book that Dick Leshini(?) edited, a corruption of research design where they were looking at, they were looking at the effect of a systems thinking based curriculum, and they found out that after three years of funding they hadn’t got the curriculum materials or the teachers understanding to the point where they were happy, but everybody understood what systems thinking was so they couldn’t do the randomized trial because they couldn’t get a good mature treatment. Next one please.
So in summary evidence does matter but all the other things that you’ve come to understand matter also, continue to matter, and doing randomized clinical trial, like something that would have, the drug trial would be highly desirable, I’m glad actually I don’t live in a world in which it were that simple but if it were my parenting would be much more straightforward.
We had a couple of people from Success For All, I just want to leave you with that final quotation here, this is a different study also in AERJ by Dacho(?) and Costellano(?), teachers level of support for Success For All did not necessarily predict the degree of fidelity with which they implemented it. Almost all teachers made adaptations to the program in spite of the developers demands to closely follow the model. Teachers supported the continued implementation though many teachers felt that the program constrained their autonomy and their creativity. So if we live in a world in which all this were possible and had the resources and the training and people to carry it out we still haven’t as a field faced the diffusion of innovations problem, which is a crucial one I think in terms of trying to make a difference in a clinical field such as education.
Thank you.
|