The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
CORE HOMEPAGE

ABOUT CORE

FOCUS OF CORE

CORE MEETINGS, WORKSHOPS & PRODUCTS

RELATED NRC EFFORTS


DR. SHAVELSON: Thank you Brian, Lisa. A couple things to set the record straight. Lisa was the co-editor on the book and you know that she probably did all of the work, so she really deserves the credit. I’ll take what isn’t the credit. I’m here to try to dredge up memories of our committee meetings and since several committee members including Jack and Bob Dehan(?) are here they may try to keep me honest as I supposedly report to you what the committee was thinking at the time we were doing this work. So I’m going to try to provide some background to the workshop based on the committee report and that’s what I’m going to focus on. I’ll get a little bit out of hand on my concluding comments probably. So I want to say a little bit about the background, focus on some of the pieces of the report that might be germane to the workshop, and then turn my attention more to the issue of research design and designs for conducting research, including randomized trials, and then concluded this thing.

I think that the Committee on Research and Education is quite appropriate. Our committee was given about six months to deliberate and write the report, and that means that there are probably a lot of things that need to be taken care of that we were unable to take care of in that short period of time in writing the report and I’m happy to see that this workshop is going on and that there have been other workshops and I assume there will be some more as well, because there are plenty of areas that need to be fleshed out.

One of the areas that we were asked to think about in the report was whether or not science or scientific inquiry was different in education, the social sciences, and the natural sciences. And I think the committee went in believing that there were differences, at least I thought that we would find differences, and we tried to argue those differences pretty hard. And every time we’d argue a difference somebody would give us a counter example in the natural sciences and say well, yeah, but, and there were enough yeah buts to the committee that we inevitably gave up trying to convince ourselves that there were big huge differences and that there was a different science for education or social science or the natural sciences, and figured that there’s as much variably within as there is between, which in Laurie’s quest of unambiguous causal effects he will find that that’s probably going to be true as well. And so we concluded that generally when we do science there’s certain general characteristics, which I’ll get to in a second, that are more common then they are different, that’s not to say that there aren’t differences, there are, but the overwhelming “main effect” really is substantial similarity.

The second thing that we said, and I think is important, is that each field of science, area of science, has because of the nature of the phenomenon that it studies, very different kinds of methods that they use to ferret out causal effects and mechanisms and other things. And that being the case one can expect variability in “the scientific method”, there isn’t the scientific method, but there are many methods that fit the phenomenon under study, and that’s certainly true in education and social sciences as well as the natural sciences.

Of course as in any field then what are the characteristics that shape what we think of as doing science in education and the committee considered that as well. Before I get to that I want to say something about the common features and these things are kind of apple pie and so on, but I want to talk about a couple of things in particular.

One has to do with this provide a coherent explicit chain of reasoning. One of the things that I think distinguishes science from non-science is the attempt of first of all, bring observation or empirical data to bear on the contentions of various views or various theoretical points of view or whatever. So one is that it’s got to be empirically testable, you got to go out and show me in some way.

The second piece of it is, and this is the really critical piece, is that you attempt to rule out all counter interpretations to your favorite interpretation. And that quest for ruling out counter interpretations is extraordinarily important, whether it be in what people call qualitative methods or in quantitative methods. That’s what we are about, it’s bringing evidence to bear and that evidence has to have an empirical base on alternative conceptions or contentions in trying to rule them out. That’s a characteristics that we feel is very important.

The other one I would point out is that one of the things the committee felt was important in education was the development of a scientific community that is critical not of one another but of one another’s work. It’s not to be critical of people, that’s dead wrong when people attack other people. What’s important, though, is that we be very critical of our work and make sure that we have ruled out counter interpretations or if we haven’t at least admit the ones that remain. That’s absolutely essential, so it’s essential to have a community, it’s essential to have a place in which you publicly display that information and the knowledge that you’re generated so it can be debated in the public so that we know what’s going on. And I would emphasize that it’s just as important for those who fund research to allow this debate in the public to go on as it is for the research community to debate it.

So those were the kinds of things that we were concerned about in our committee report and the rest of the stuff. Of course I’m hoping, has the committee worked on the whole issue of replicability and generalization? I know somebody tried to get me to do something there and I was lucky enough not to have time to do it, but I want to come to that workshop please.

As far as critical features of education in research in education of course we’re familiar with many of these things and anybody who’s doing work in the field realizes that this all comes to play, but in Washington you probably don’t realize that there are politics involved in education, I came here to tell you this from the West Coast just so you know. From what we see politics and values plays a huge role in education research, it plays a huge role in the alternative conceptions that get played out in the research design and they get contested but public education is a very public matter and in our country it’s an extraordinarily public and important matter and with the decentralization of education it makes our challenges even greater.

Of course there’s human volition and most people don’t behave the way they’re supposed to and that’s a problem for our work, it creates variability and uncertainty and it’s to be sure the research that we do can take into account that volition, it needs to take into account that volition, but it always presents very big challenges to the research that we’re doing.

The other thing, I won’t go through all this stuff, you guys, it’s in the report if you’re really interested in it, but the last thing I’d say about education research is relationships. It turns out that if we’re going to do research in education we have to in a strange way have the cooperation of the people that we’re studying. And in fact whether we’re studying them or collaborating with them it’s clear there also has to be a collaboration to have them at the table, they have a lot to contribute to the work that we do and our understanding of education. But these relationships are absolutely essential, without them no matter what research you’re doing it can be scuttled instantly and therefore we have to think about this as a community. There’s a report from the NRC, the SERP(?), Strategic Education Partnerships, that is well worth thinking about because they consider that in some detail. I think that’s enough of that.

I have to confess here to you all that I am a randomizer, I am currently engaged in a randomized trial. And as I was preparing these notes for this talk emails were coming in because it’s a fairly large trial, it’s a trial, well, it’s not a large trial, it’s an unwieldy trial. We have classrooms from Hawaii to Maine in our study and we are dealing with human beings and human nature, and so these were the emails that were coming across my screen as I was writing this report and trying to implement this trial. One of our teachers sons came down with cancer, what’s not up there are the emails about school districts going on strike, scuttling people we had in the field, costing us a fortune to get them there to collect the data because they had to come back home because as they got to the school the school is on strike and the kids were gone. So this is the reality, if you want to know the education context this is the very true reality in the policy environment right now, the economic environment right now, it’s pretty tough being out in the field and having anything be halfway predictable in what you’d like to see happen. So this just gives you some ideas of what we’re dealing with.

Let me know turn to the issue of research design and begin to home in on the workshop today. One of the most important things I think the committee did was in part to say that it’s the question dummy, not the design, and that is that the punishment should fit the crime. That is you ought to design a study to answer the question that you think is the important question, not create the question to fit the design. I think that was an important thing that needed to be said, I don’t think it was rocket science, but somebody needed to say it and we said it.

I think more importantly once we said it we said well what are research questions, and we realized there was an indefinite number of research questions and we were then in hot water. Because how could we give an example of what we meant if there were so many different questions, and we came up with a way of categorizing them, and I like the category system, you may not but I do. And that was to say that there seemed to be three kinds of questions that we ask. One is a descriptive question, what’s happening, and I could ask that in a materials science lab or I could ask that in the field of education. And we can ask the question what’s happening out there? We’d like to be able to describe certain characteristics, something like well what’s the achievement level of eighth graders in our country in science and how does it stack up with other countries, so that’s a descriptive question that I think there are whole sets of questions and oftentimes we don’t realize how important the descriptive questions are and we don’t get our descriptions quite adequate to build our theory on.

The next kind of question was a question of is there a systematic effect, is there some consistency, that is is there a causal effect by systematic, that’s the code word, remember at one time we weren’t allowed to talk about causality because it was so complicated, now everybody is talking about causality, but the idea is there a causal effect, that’s the question, ultimately the question you’d like to answer.

And the third question is if you have a systematic effect what’s the mechanism that creates the effect? Reducing class size seems to have a salutary effect in spite of what a colleague not too far from my office has to say, most people think that there probably was an effect in the Tennessee Study. But we’ll be darned if we can figure out why that effect occurred and why it persisted in the way that it did and it didn’t. So that’s the question of mechanism.

So let me peel the onion a little bit more and say a few more things. In terms of what’s happening one can think of statistical studies, like estimating eighth achievement, estimating it not only for the United States but for other countries as we did in the TIMS(?) Study and so that would be one kind of question that you could ask. You could ask, which we did in a study I did on TIMS, what’s the relationship between achievement in mathematics and achievement in science? And if you break down the areas of math and science tested in TIMS, what are the relationships there and does it give us any insight from one country to the next. Or you can ask a descriptive question of what does school look like through the eyes of an inner city child who lives in a very low income area, surrounded by violence when they go to school, when they get up in the morning, they go to school, they go into school, they sit in classrooms and they come out of school, what does that life look like? Well there you need to have very rich descriptive study through the eyes of this child and that also is the kind of descriptive work that I think is absolutely essential that we conduct in education. So that gives you a sense of what I mean by descriptive questions.

If I turn my attention to is there a systematic effect we can ask questions about causality, and those are important questions. And we can ask well, can we ferret out causality when randomization is possible, and there are many cases where you can’t randomize, and if you can’t randomize then what are the other methods that are available to us to begin to approach the question is there a systematic effect. As you move away from the randomization the uncertainties increase assuming that each and every design is equally possible.

And then finally there’s a question of mechanism, a mechanism when you have a thorough understanding of phenomenon, and mechanism when you’re not so sure and what you do is you cobble things together to see whether or not you can create something that brings about a cause and effect relationship. So that’s what we meant by mechanism.

So if I think about a study about what’s happening then the study done by Holland and Eisenhart(?) comes to mind and you’ll find this described a little bit more in our report. They were interested in the question of why do so few women who begin their careers in non-traditional areas, like science, end up either working or not working in those areas. And they had 23 women in their study, it was a small study, in two different small colleges that they picked up, and they did detailed studies of these students over several years. And for each student they modeled the kind of processes, their engagement in classes and so on, and came up with three variables that they thought that were fairly important, views about the values of schoolwork, reasons for doing schoolwork, and perceived financial opportunity costs. These things turned out to be fairly important variables. What I like about the study is they didn’t stop there and just describe it, they said how well can these variables actually predict the real careers of women. So they waited around a while, which you know, funding agencies kind of get jitterish about, they waited around a while and found out what the women did. And in all 23 cases they were able to predict whether they stayed in the non-traditional or didn’t stay in the non-traditional path that they had pursued in terms of their later careers.

Now that’s pretty good prediction. Then they went back and did the usual regression prediction that many of us would have done and we used the school that they went to in their prior grades in trying to predict out of high school where they would end up and it didn’t do nearly as well. So what I like about that is there you have a highly qualitative study that brought longitudinal and empirical data to test, whether or not the models that they had developed were accurate, that was a nice study.

This is one on is there a systematic effect and the study was done, oh my goodness, I think I blew it, this is Suzanne Loeb(?), a colleague of mine at Stanford, so I didn’t have her name up there but it’s also in the book. I want to say something about causality now and so on. When I think about causality, and Brian will correct me of course, I think about two pieces in causality. You have to establish consistent covariation, consistent correlation going on. And some people stop there and think that they’ve established causality and one can argue, and this is argued in philosophy of science and among statisticians and so on that that’s far enough, if that systematic effect is there, it’s consistent and so on, you have it. Others will argue that you need a consistent effect but you also need to understand the mechanism ultimately. And as I read the Journal of Science I see this question for we’ve got a systematic effect but we’ll be darned if we can figure out why it’s happening. And so it’s both the systematic effect and from my point of view I want to know ultimately why it’s happening if I can know why it’s happening.

And again, I point out that when you deal with causal assertions you’re always trying to rule out all the possible counter hypotheses that you know of at the time, and oftentimes in our research we don’t know all the counter hypotheses but as a research program moves along if you rule out a number of counter hypotheses there’s always somebody out there with a new one that you have to deal with and that becomes very important in the science that we do. I think that when you do these causal studies it’s important to have enough descriptive work so you understand the phenomenon and how to design the study well, and I think work in the area of reading has been a long time understanding what’s on the ground as well as how to design these studies, and there’s a number of programs of research in that area.

That’s it on causality, okay, Lisa, she wanted me to say causality, I said it.

So this is the Loeb and Page Study, which I really like because I’ve always had trouble with production function studies, but I’ve never understood economics anyway so that’s why I was a professor and didn’t make any money so I didn’t have to worry. Anyway, I like this study and the question they’re asking is that if teacher quality should be related to student outcomes then it seems like why is it that teacher salary, or teacher pay, seems at best to have a weak relationship to student outcomes. And Suzanne and her colleague said well if you just to production function study you’ve got input variables and you have output variables, and if you correlate the outcome variable of your interest, in this case it was drop out rates, with salaries they seem not to be highly correlated at all. So what’s the problem, shouldn’t salaries reflect at some degree quality after controlling for other things?

And they said well, it just might be that there are other things in the lives of teachers then salary that may have meaning, there may be local job markets where to teach you’d be a foot because your opportunities in Silicon Valley up to a few years ago were fantastic, now we have a lot of teachers, how to make teachers. So there must be alternative job opportunities as well as the nature of the occupation if you’re going to have children and so on it may be that this occupation may fit better the life that you plan then another occupation. And so they added this in the production function model to see whether or not this could account for it and they tested these two alternative models, the usual model in which you had outputs as a function of inputs, and the second model in which you looked at opportunity costs, not just the usual type of inputs, and low and behold they did find a very systematic effect and you see the results of the study. So here’s a real nice study where some theorizing and thinking about the local context conditions and taking those into account in the end allowed them to ferret out a systematic relationship there. And I point out this isn’t a randomized trial, but it’s a correlational study and I think very nicely econometric analysis.

This is another study in terms of what is the mechanism, it was done on a non-contentious topic, Catholic schools and public schools, and the study was done by Brike(?), Lee, and Holland, and it was a nice study because they were trying to figure out what is it that gives the advantage to Catholic schools, what is the nature of the Catholic schools and they had three rival hypotheses. Well, it’s a sector difference, whatever it is between the two sectors that goes on, one’s spiritual and private and maybe there’s something about that that has an impact. The second alternative was compositional effects, not only is it private but the composition of the Catholic schools. And finally school effects, the features of the organization of the school. And through a number of, both quantitative modeling but also through case studies to try and understand what the quantitative parameters meant they went and did case studies and took a hard look in schools that they could identify as effective schools to see what was going on there and in the end they felt that probably the predominant reason for this is that the coherence of the school life in terms of outside influences and the erratic kinds of things you saw in the emails were certainly at a much smaller level in the Catholic schools then they were in the private schools that were the comparison schools.

And in the end they say nonetheless issues of controlling for family differences in choice and policy implications in bound, that is this is where you get into huge uncertainties with correlational data, that how do you really control for these things, it gets very difficult.

So let me conclude with a few observations. The first is that if I leave you with anything from the committee it is it’s the research question that matters, the design has to follow the research question and there are a lot of questions that are important for us to be doing research on in science, including the science of education, they could be descriptive, they could be causal, they could be studies of mechanism. If you’re looking for a systematic effect it seems to me that logically randomized trials should be the preferred method if they’re feasible to do. And again, the issue is one of feasibility and ethical considerations in the conduct of these studies. Nevertheless, we encounter difficulties when we do randomized trials, as we do in other research in education and I would be amiss not to try to point these out. One has to do with the fidelity of the treatment implementation, you think you’re doing treatment T but in fact it turns out to be T*, and it ain’t quite the treatment that you thought it was and you speak in your research report as if it was a homogeneous treatment.

The second thing is that there’s variability in treatment implementation and I’m doing a study in which we’re looking at the impact of formative assessment in science education trying to change the conceptual understanding kids have, the mental models kids have, of why things sink and float. And if you think for a minute that you think you know why things sink and float just think about that, very hard to teach, the Israeli’s dropped it from their curriculum and we may as well. But we have a treatment condition and we want to assume that all of the teachers are doing the wonderful things that we work with them to be able to do. Oh, and they’re really wonderful, Bob you’d love this stuff, it’s just great. I can assure you I’m going to do some observing and unfortunately two of our teachers are in Hawaii so I’m going to have to go to Hawaii, and I’m going to be observing in their classrooms and I know full well what I’m going to see. But the whole idea is what is the treatment and how do you characterize the treatment, and there’s variation from T to T* to T**, etc., etc., and you need to characterize that variability. So that when we talk about a treatment, the experimental group, it’s a fiction, it’s part of our imagination, what we have to do is pinpoint just what the treatment is that we’re talking about in these studies and it can be done, it’s not that it can’t be done, but we have to spend the extra time and money and it’s very costly to get that data.

Oftentimes in studies, and Jerry Pine is finishing up a study where he’s comparing traditional and more constructivist or inquiry science right now. What happened was that when we went out, this was a quasi experiment, when we went out and found the teachers and observed what they were doing the distribution of treatment implementations, if you think about variability within the treatment group and variability within the control groups, there was a fair amount of overlap. In fact if you were blindfolded some of the control teachers looked a lot like the experimental teachers. And this goes on, remember volition, teachers will do these things to you, so again you have to make sure that you have some handle on capturing the variability. I think we have ways of modeling it, I think we have neat ways of modeling it, but you have to model it to capture that because you get variability in the treatment implementation and you also have huge variability in the outcomes within each of the groups.

Adequacy of outcomes measures, given my background, I’m hugely concerned what we take as a measure of achievement, kids understanding in science. I can tell you that on the best measures where you sum all of those things up, do a wonderful job of rank ordering but there’s so much more information that’s lost in that rank ordering of generability that we need to begin to pay much more serious attention. If you let me select the outcome measure I get to tell you who wins the race, or at least I can go a long ways in doing that.

The relevance of the control condition for policy, the control conditions in the Tennessee Study was a control group of a regular class size and a group with teacher aid. Unfortunately in California the debate is now revolving around class size reduction versus paying teachers more and increasing the quality of the teaching force in California. But the problem is the control conditions or the alternative conditions didn’t cover the policy options, and this is really crucial because oftentimes you’re going to have to do these studies more then one time to incorporate what harebrained ideas people can come up with in terms of policy options.

And finally, of course, there are always the issues of external validity and we can’t forget those external validity issues. Right now the experiment I’m doing I’m asking teachers to use and incorporate formative feedback processes that depend a lot on their knowledge of science, depend on the active participation of students criticizing one another as parts of a community, it’s very hard to do that and whether or not that would generalize to a larger teacher population if we do get effects I don’t know. So this external validity question, these trials that we do, we have to pay particular attention to.

So all of what I’m saying in terms of randomized trials is that we have to understand the implementation of these trials that we carry out, it’s absolutely essential. And my former dissertation advisor and mentor Lee Cronback(?) said wisely at one point in time even policymakers need to understand something about why these trials had the effect that they did if they’re going to design reasonable policy. And it behooves us then as a community not to stop short and say gee, it works a little bit better, the effect size is .1 or .14, but we have to begin to give some vision of why it worked and which of the particular contexts and how that might or should influence policy design, whether they will listen is a different issue.

I told Lee before he passed away that I was chairing this committee and that got a grunt out of him, Lee’s kind of that way, and he said I understand but just be very careful with randomized trials. When they fit they are the most powerful way you can do the research and you should use randomized trials. But he said be very clear with people about the nature of the context that you’re dealing with and the difficulty of doing the trials well. And so those are the kinds of words I guess that I would leave you with. Randomized trials, yes they’re important, yes if they’re feasible they’re the most powerful method we have for looking at systematic effects, but we have to understand the context, we have to understand the implementation, we have to understand what the treatment is we’re talking about. And obviously this workshop is quite timely.

Thank you.

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top