|
MR. WISE: Let me first mention that this session is being recorded. Those of you in the audience, if you have questions, if it not too much trouble, if you could come speak into the microphone, say who you are, that would be helpful in generating a set of proceedings that actually reflects what was said.
I think first, we will ask the committee members, and Jack, let's start with you, if you have comments or questions.
MR. FLETCHER: I have a comment. I want to thank you for really an illuminating presentation. I found your paper terribly interesting.
One of the things I want to be really clear about relative to your presentation, as well as to this workshop agenda as a whole, is that our committee is not evaluating peer review at IES. We are looking at peer review broadly defined, in educational research.
I guess the comment I want to make is that I think that the suggestions that you make at the end about peer review at IES really should be broadly conceptualized and really apply to peer review of educational research at any of the major funding agencies or foundations. I think they are really a much broader consideration.
MR. HENLEY: Could you elaborate somewhat as to what you meant when you said that peer review erodes legitimacy and culture. How does it erode legitimacy?
MR. HACKETT: It doesn't sound like something I said. Do you remember what part of the talk it was in? I don't think I would ever think that peer review erodes. It actually builds.
No, I don't think peer review erodes. I wouldn't say that at all.
MR. FLETCHER: There was a line in there about being culturally erosive.
MR. FLETCHER: Oh, culturally corrosive is ear marketing. Ear marking erodes the culture of peer review. If I write a proposal and it gets funded and start a grant with $200,000 and get an ear mark for $500,000, I start to wonder what I am about.
If you look at the trend in recent years of congressional ear marks for universities, there is a pretty steep upward incline. I think that is corrosive.
I write a proposal and go talk to my congressional delegation to see if I can get it. A sentence is all I need.
MR. FLODEN: The contrast that you make, or the tension you placed between reliability and validity, you made it as though this was almost a necessary fact.
I thought that perhaps it is more the case that there is going to be a tension if the things that you are most interested in measuring can't be measured reliably. That is what creates a tension. It is the joke about looking for your keys under the light.
If finding the scientific merit in a proposal were relatively straightforward in some field or on some topic, then reliability and validity wouldn't necessarily be in tension, because you are only drawn away from validity if you are trying to increase reliability, and that shifts you toward something that is not what you are trying to measure.
MR. HACKETT: Fair enough. You will hear also, and perhaps you have read also some papers that are greatly concerned with inter-rater agreement and low reliability.
The possibility that we could ever get such high validity accompanying that high reliability seems so remote that I don't consider it likely.
When we have reached that point where we can measure exactly what we need to measure to assess scientific quality, sure, those things will not be in conflict.
MR. STRAFF: Can peer review be used to make comparisons across disciplines in disparate areas of research.
MR. HACKETT: Yes. It is not always pretty. At NSF, there is rising interest in the past several years introduced by initiatives of various sorts to bring together panels.
It is almost a mini-talk in itself. Part of what goes on there is that I think that interdisciplinary peer review would work well if you could do it with established panels.
One of the problems is the goal of bringing together people from different disciplines to evaluate proposals is difficult.
It is more difficult when they come together only once and never see each other again, which is what happens also in special initiative panels.
If you could put together a panel that worked together for a while, maybe had certain self-perpetuating elements, that is, where panelists nominate succeeding panelists, or identify the kinds of expertise they would like to add to the panel to address weaknesses that they felt in the past, I think a self-perpetuating interdisciplinary panel could work well.
In fact, having said that, the program that I was the manager for at NSF is an interdisciplinary program. It is a program called science and technology studies, which supports research in the history, philosophy and social studies of science and technology.
Now, that could sound like angels dancing on the head of a pin, but philosophies of science are so different than stories of technology.
Stories of technology are going to recount the origins and maybe characterize a particular material artifact, and philosophers of science can range from those still chewing on what Aristotle said, to folks addressing the philosophy of quantum physics.
You are putting together a panel of eight to ten people who are doing proposals across that diverse range with those diverse interests.
I think it worked reasonably well, because the group had a history together and a culture formed, so that people made allowances.
For example, when you hear the philosophers in this panel talk about a proposal, the proposals that they like -- yes, this is sensible, that one is going to work -- are the ones that not are going to be declined because they will say, for all those reasons, this is worth doing. It is obvious, it looks like it fits together well.
Those that they fight about, disagree about, that they are ripping and shredding and gnawing and chewing about, those are the ones that technologists say, those are good, which is precisely the opposite of what happens in others.
Because the panel exists together for a while, you get used to that, and it works. When danger arises, I think, is when you have these one-shot interdisciplinary initiatives where you put together people who have never been in a room together, in fields they don't know anything about and say, okay, here are your proposals.
MR. FUCHS: My question is a large point. I wonder whether that doesn't sort of assume that you have a cohort of equally qualified applicants.
I think in reality, you know, there are subgroups whose application funding rate might be 50 percent, and then there are other subgroups whose success rate might be essentially zero, and it ought to be essentially zero.
MR. HACKETT: That could be. Does anyone have any statistics on that? There is certainly success breeds success in science. Is that less corrosive?
What that means, then, is that a certain fraction of a scientific field is engaging in ritualistic behavior.
MR. FUCHS: I guess the point would be that there is probably some percentage of applicants that you want to discourage from clogging the system over and over again, because they probably don't have that much to offer.
MS. FALKENBERG: I appreciated what you said this morning about the tensions, and you talked at some length about the reliability and validity. I wonder if you could talk a little bit more about your opinion on the size of the panel.
You said at one point, if everybody agreed, you could obviously choose only one person to do that review. How do you balance the tension between having a disparate set of inputs versus having an unwieldy number of people you are trying to get input from. What is your sense of that?
MR. HACKETT: I would defer to the social psychologists in the room on this, because folks who study group process probably have a pretty good idea of what size works.
I think larger than 10 or 12, I think, air time gets to be a real problem. You have probably had this experience as well, which is the points made and pursued and objected to and such, and you really want to get in on step three of that, but you are sixth in line, and so, you can't.
You want a panel small enough that everybody can keep the ball up in the air. If you imagine the discussion is a ball, everybody wants to be able to put their energy into it and keep it moving in a direction.
If you have so many people that some can't, then it is going to move in a direction that they may not like.
I don't want to offend NIH representatives in the room, but my experience with NIH peer review, for example, is that we sap a lot in having people read prepared statements.
So, you end up with four or five people who are assigned reviewed, and there are maybe 12 or 14 reviewers in the room, panelists, in the room.
By the time the four or five have read a couple of single spaced pages that they brought, you know -- but that is necessary, I think, with a large group.
I think maybe a smaller group, eight sounds right. Twelve to 14 gets to be too many and six or fewer seems to be too few. Again, social psychologists, people who study communication, probably know this in detail.
MS. KLINE: You said you had background in technology in your work at NSF. I was wondering, what success have you found with using panels that use technology for review, but parts of technology and have there been studies of the interaction of having panels do in-person meetings, as well as using technology.
MR. HACKETT: I don't know of any studies. I am wondering of Steve Breckler from NSF will later talk about some of this. He has been there for a long time and has had experience with this technology transition.
Some NSF panels -- I think virtually all NSF panels that meet at the foundation now use fast lane interactive panel systems.
So, everyone sits in front of the computer and the panel summary is prepared live, rather than being something that is done later, and it is often circulated.
So, if I am responsible for drafting the panel summary, I listen, take notes, and at some point later in the discussion, I phase out and start typing.
In the best use of this, you then circulate it to the panel electronically for comment and approval. So, you will see, while you are sitting here, it will pop up, this proposal, the summary is now available for peer approval.
What it does is, it forces you to multi-task. So, it is something better suited for people younger than I am. My daughter can do her homework, play a game, listen to music, and IM and e mail simultaneously.
I also see people checking stocks and sending e mail at the same time. I do wonder whether we need the younger people whose brains do this better.
I recently did a virtual panel, sitting at home, on my computer, and by telephone. It was about a half hour to get started, and that was not the computer's problem. It was the telephone.
It was a lot of, you know, Vivian, are you there? Are you there, Vivian? Then you would sort of go through the panel and it was, Ed's here, Tom's here, Jane's here, Bill's here. Vivian? Vivian? Are you there, Vivian?
Then it went okay, except for the crimp I got in my arm. Smart people had speaker phones. I don't have one at home. So, I had a phone and the computer, and the proposals and my notes. But I didn't have to fly to Washington.
It would be fun if somebody would collect even that grade of data about panel experiences, let alone something a little bit more formal about the technology.
Again, the reason I would like Steve to speak to it now or later, I think it takes a real burden off the program officer to have the panel summary prepared in real time and by the panel, rather than by the program officer.
When I used to do it, I would go home with my notes and the official panel notes that the panelists took and then, for the next several weeks, I would be taking and trying to form panel summaries, that are done live now.
MS. EISENHART: You suggested toward the end of your talk that a mixed panel that had some peer review and a strong manager might be a good approach to take. Did I understand that correctly?
MR. HACKETT: I was thinking of a mixed program that had some things that it did with a strong manager and some things that it did with peer review.
MS. EISENHART: My question was really, is there something about that approach that you find more attractive or better than having, say, multiple panels with different kinds of people on each panel, and there being some sort of staged or phased way of having each group make a decision.
Maybe those aren't mutually exclusive in the way you just explained it. It sounded like you were making a pitch for one particular way of approaching the process.
MR. HACKETT: What I was trying to make a pitch for was pluralism. I think when you look across the U.S. research funding system, it is pluralistic. There is a strong presumption that things are done by peer review, but there are some other alternatives.
Small grants for exploratory research at NSF, for example, can be awarded for up to $100,000 on a program officer recommendation.
MS. EISENHART: What about different strategies for conducting peer review? Do you have any advice or recommendations about that?
MR. HACKETT: By strategy, you mean for example, just using a mail or a panel proposal?
MS. EISENHART: Yes, or multiple panels versus one panel. When I talk about multiple panels, I mean situations where you have lay persons on a panel and then you have scientists on another panel, researchers --
MR. HACKETT: Oh, you mean in sequence rather than having them together.
MS. EISENHART: Yes, rather than having them all together.
MR. HACKETT: I don't have any experience with that. NSF -- this is not quite thirds, but it is easiest to think of them as thirds -- you know, NSF has a third that do only mail reviews, a third of the programs seem to do only panel reviews, and a third seem to do both.
A lot of these really go back to traditions in the discipline and maybe, if you really pushed to the level of consensus and what some call codification in a field -- so, in physics or math, you might be able to get by with just a set of mail reviews, because the standards are so clear and so well agreed upon that the experts who read them can apply the standards and provide you with advice.
In engineering, my colleagues looked at me as if I were crazy to say that I was sending things out for mail review and also arranging a panel.
In social sciences and in biological sciences, at NSF people think you are equally crazy if you don't have both mail review and a panel, because without the panel, how do you really know what the mail reviewers said, because the panel helps you interpret them.
Partly, I think this is something that one would arrive at through process. You try to lay down some additional principles, try to establish an organization that is committed to learning, being reflective about how it goes about its work.
Then you would try different things. If a problem seemed to be particularly well suited to involve advocacy groups and practitioners, you would assemble a heterogeneous panel that involved them, along with research, and let them apply the criteria together.
I think the NAPA report on NSF makes this point, and it is an obvious one that I sort of made on the fly, which is that when you look at the indifferent application of the societal benefits criteria at NSF on the one hand, and probably the case at NIH, on the one hand you can say, oh, this is terrible, can't these people do their jobs right.
On the other hand, if you look at most researchers, they really don't know what the practical implications of their work are.
If you put experts in there, in the review, you can see that piece, and in education, this really does happen. I think you can get a complementary review that will be stronger than just having the one or the other.
I guess this is a long-winded way of saying, I can see a mixed panel as being probably a better option for education research than a sequential.
MR. WISE: At this point, I would like to stop and thank Ed for getting us off to a good start.
[Applause.]
|