The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
CORE HOMEPAGE

ABOUT CORE

FOCUS OF CORE

CORE MEETINGS, WORKSHOPS & PRODUCTS

RELATED NRC EFFORTS


MR. FALKENBERG: I have a question. I know we are going to hear later about training peers for the process. I wonder if we might hear more from the panel about the way their organization goes about training peer reviewers.

MR. BRECKLER: One of the natural mechanisms -- we don't have formal programs or courses or anything like that for training peer reviewers at NSF.

What we do rely on is our ad hoc review structure, the way it is sort of a system for peer reviewers. We rarely would ask somebody to sit on a panel who hasn't had experience in reviewing proposals for us before.

They don't necessarily have to have had grants, or even applied for grants, but having reviewed proposals before. So, that is one way.

The kinds of reviews that NSF encourages are the same kind of reviews that you would expect for a journal. So, NSF expects reviews with a lot of text and merit and thought put into them and analysis and commentary and so on. We sort of pride ourselves on producing a lot of those kinds of reviews.

So, what we will try to do is get reviewers to model their reviews on the kinds of reviews we would expect for journals.

Our disadvantage at NSF is that we can't provide model reviews. We can't even send copies of the other reviewers' reviews, like most journals do. That is a decided disadvantage because it is hard to train them.

For the most part, it is using the ad hoc review system as sort of training.

MS. FALKENBERG: Steve, could you talk a little bit more about how you would train those ad hoc people? The reason I want to know more about that just in general is because, when I became a reviewer 15 years ago, it was jump in, drink from the fire hose, you are right there on the panel.

I was interested to see how you bring along the people. Could you talk a little bit more about the ad hoc process?

MR. SLOANE: I am not sure it is totally different than it was. I would suggest that the technology actually helps in that regard. Once you are physically on location -- there are two systems within NSF, one for submitting your reviews, and another for panel kinetics, interacting with the panel system.

You get to see, once you have been in, what other reviewers are writing. I think when you listen to the reviews and get to see the quality of the other reviews, we are not talking about a mix of junior and senior people. We are leveraging the senior people into mentoring the junior people in the process.

The fact that we have a technology that supports that and doesn't change the fire hose scenario, so that you are getting access to raw data rather than looking over a senior person's shoulder while you are writing.

MR. BRECKLER: Otherwise, the methods -- because things are so centered around the programs at NSF, the methods that are used for nurturing the reviewer community really varies by program quite a lot.

There is a huge variance. Somebody said here earlier, commented on the value of having junior people, junior faculty come sit for just one round on a panel and then go home.

I did that once as a junior faculty at an NIH panel, and it really was career changing. It gave me a good perspective.

In some of our psychology programs, we will always have one or two junior people on each panel as one-time guests. They are subjected to the full treatment. It is just that it doesn't last for three years. It only lasts for six months, and then they are gone, and that is another way.

MR. STANFIELD: We use a variety of approaches, and we are going to discuss this later, but just a couple of things, in terms of actually how to review the application and write a critique, most of that relies on interactions between the SRA and the reviewer.

Most of our reviewers have seen critiques. They might have seen their mentor's. So, they are not totally unfamiliar with it.

However, I would say, the only difference, from NSF's perspective, we sort of like our critiques to be different than journal reviews.

Journal reviews tend to be rather technique focused. We don't like to see critiques that talk about the buffer being wrong. We like to have critiques that are a little broad for that. So, that is a mind shift that we try to get our reviewers to make.

We do more formalized training, and this is somewhat dependent on the SRA. Sometimes, the SRA and the chair will get together with the new reviewers, even before the study section meetings.

In terms of critiques, that is too late, because they wrote the critique when they were back at their university. So, we need support.

Sometimes that meeting will happen and they will take it through. Many people who have never been on a study section before are sort of anxious about it. They don't know exactly what to expect of it the first time.

The other thing that many SRAs do is, at the meeting, they start off the meeting in conjunction with the chair, in selecting applications to be reviewed.

It is sort of seen to represent the spectrum of quality of applications at the meeting. They start off the meeting with experienced, very good reviewers, just to sort of put some anchor points down and set a tone for how the reviews should be done, and we feel this helps the new folks get on board.

MS. CHIPMAN: Just to comment a little bit on the idea of showing people sample reviews, I think that would be a good idea in an education context, even if it is sort of not allowed by the process.

There are ways you could sanitize things or get somebody's permission to use, just so people can see examples.

Unfortunately, in education, I don't think there is a good strong tradition in terms of the quality of reviews. So, I think that would be important.

To talk about -- there were some things I thought about say, especially after hearing the review before. In this NIEMSF program, what happened was quite interesting, especially because NSF wanted everything reviewed twice, essentially.

I had to recruit every young cognitive psychologist in the country practically to get enough reviewers. This was a new field.

I, myself, am a co-author of the first paper ever published in a journal on cognitive psychology. So, the cognitive psychologists were a relatively young population at that time.

There were a number of benefits to this. Many of these people learned what a proposal was supposed to be like, from the experience of reading a bunch of proposals for the first time.

They were people with degrees from good places, but they might not have seen proposals. The first complete proposal I ever saw was one I wrote myself.

It was nice that NIH provided a package that tells you the story of what a proposal was supposed to be like, and I tried to do what it said. I was successful. So, there was a lot of value there.

There was also value in that some of these people looked at these proposals and said, gee, I can do better than this and submitted the following year.

For instance, just to name one example, Susan Carey submitted the second year and initiated a long line of important research in the cognitive aspects of science education.

There were a number of people who were recruited to doing research on educational issues and making that a significant part of their career by being involved in the review process. I think it can work out fairly well.

One of the things that characterizes any competition, as we were paying the reviewers -- we were paying the field reviewers -- this cost a lot of money.

Like I said, the ratio of funding was typically about 10 proposals funded for 100 submitted on a project like, say, reading comprehension.

If we hadn't been paying the reviewers, we might have done 12 or 13 grants, and we weren't paying them enough.

So, you kicked into this sort of, well, if that is all they are going to pay me, I am only going to do that much work.

That was my personal opinion, although I didn't prevail, that we would have gotten better reviews if we hadn't paid people at all, because we were not paying them enough to pay for the time it took to try to get good reviews.

MR. SLOANE: Piggy backing on Susan's comments, with respect to the ROLE program at NSF, we had fostered successful proposals.

MR. FLODEN: I am wondering, when we are talking about the nurturing of reviews, how about the nurturing of individual people proposing?

I think Steve said something about a feedback loop through different cycles of the proposal. How does your system work to support people to write better proposals over time?

MR. SLOANE: Each proposer receives every single review, and they also receive the verbatim reviews. Occasionally we will call a reviewer and ask them if they had something that was either ad hominem.

Then, the final summary that is reviewed by each of the individual reviewers and signed off by them is also relayed to the prospective PI.

Then, the staff analysis that occurs beyond and above the formal review process. So, each of those three pieces.

MR. FLODEN: Then, what happens when a person re-submits? Does the panel get to see the comments from the first time around?

MR. SLOANE: Generally, the person will interact with program staff, so that they actually have a set of one-on-one conversations with the program officer.

Over and above that, people will actually pick up the phone and say, can we come in and talk personally.

MR. BRECKLER: This is one of those areas where there seems to be a lot of variability in practice across programs.

I think generally the approach at NSF is the same as NIH. That is what the program staff are there for. So, we will spend a considerable amount of time talking to people, particularly junior people, about the grant writing process, and we will do that before they ever submit a proposal.

Most of us will look at drafts of proposals and give them feedback, usually as to sort of form, whether this feels like an NSF proposal, in some sense.

We will also suggest lots of strategies that people can take. Each of us has a sort of script that we can start talking about strategies for writing good grant proposals.

Now, grant proposals, we can send successful examples of. It is the reviews that we can't send copies of. Most of us spend a lot of time on the road giving grant writing workshops, things like that. I think it is the same at NIH.

MS. FALKENBERG: I want to just piggy back on that one question, if I can, because this has to do with the NSF. Have you found -- I don't know if every area of the NSF does planning grants.

When I have been on panels, they do planning grants. So, a lot of people who have provided what I thought were marginal to possibly poor proposals to start with, did that, not so much because they didn't know how to write a grant, but they never had enough time and resources to put a good plan together.

I was wondering if you could speak to whether providing planning grants has improved proposals.

MR. STANFIELD: There are a couple of answers to that question. One is that NSF's grant proposal mechanism is a one size fits all. We don't have RO1, RO3, RO-this, RO-that. It is just a grant proposal.

In the psychology programs, we will very frequently take a marginal proposal from somebody who falls under this category of just needing to get a start, and turn it into a planning grant, to reduce the budget and give him $30,000 to get it going, with strong encouragement that they ultimately submit a better proposal.

We do have some planning grant programs, but they are very kind of narrowly focused special purpose kinds of things.

Personally, I think the idea of a planning grant is a great idea. NIH has more formal mechanisms for smaller grants that fall into that category. I have had tremendous success with them. I think it is a great mechanism that needs to be developed.

MR. STANFIELD: In terms of your question, yes, we are just like NSF. That is really the program official's responsibility, to help the investigator interpret the summary statement and help them plan their response.

We have rules put in place by Harold Varmus. You can come back in after an unfunded application with two amended applications, and within two years of the initial application, and then that is it. That is the end of the line.

Each of the amended applications, the applicant has an opportunity to say personally how they addressed the concerns outlined in the previous reviews. The previous reviews are given to the reviewers for the A-1 and A-2 review.

We try to get the reviewers to understand that we don't view what they are doing as mentoring or as tutoring the applicant. What they are doing is evaluating the proposal.

So, we try not to have reviewers put things in their critiques that are telling the applicant how they think they ought to rewrite the proposal.

MS. LAGEMANN: [Question off microphone].

MR. STANFIELD: Part of that may be history, the weighting of it. I guess the thing I would say is that, if you have a two-stage process that first established equitable soundness, then it would seem to me that it would be -- it might be easier, then, to move on to importance.

I think partly with a single rating system, what the 35 points, in fact, does is, it lays out that we are not willing to fund something that is not first technically sound and then the issue of importance comes in. So, it is probably necessary but not sufficient. That is a bar that people must get over.

I would probably worry that if, for example, the points were reversed, that it is quite possible that things could score in the funding range and yet be technically marginal.

I think that is one effect that the 35 points -- I do think that one of the things I would say that I think happens is, remember, there is also this qualitative review, approved, disapproved.

The one thing that sometimes does happen is, we sometimes do have applications that have high scores, but that are disapproved.

I think, more often than not, when that happens, it is because people think that it has got all the components of a good application, but it is just not an important question.

I think you need to consider that there really are two ways in which reviewers are rating, the qualitative review, where it is kind of a thumbs up, thumbs down.

Even if it was maybe a great application in that respect, it maybe got all the points and only 10 or 20 points in importance, and the reviewer could basically say, it should be funded, though, even though it got maybe 90 points. That sometimes does happen.

MR. FLETCHER: We had a lot of conversation about the role of lay people on review panels. I think your group is an exception because you actually have statutory language that promotes that.

For example, it could be interpreted very easily to mean that every review panel had to have lay people on it. Do you actually have lay people review research grants for technical adequacy?

MR. DANIELSON: I would say no, if what you mean by lay people is just kind of --

MR. FLETCHER: A special ed teacher, for example.

MR. DANIELSON: No, no. That is part of the point -- that is partly because I heard the earlier conversation. Part of the point that I wanted to make is that in disability work, there is strong advocacy from -- I think in some of these programs, there probably wouldn't be a citizen that would be saying, I am insisting on being part of your review.

I think that, in the work that we do, there is very strong advocacy, and that is partly why the legislation requests this.

The way we have attempted to implement that is to say, that is fine, that we will attempt to ensure that there are people with disabilities and parents of children with disabilities and others that are part of the peer review, but that we can't lower the standard for what it means to be a qualified reviewer in order to do that. That is what we attempt to do.

There are other mechanisms, and I think earlier people talked about that, where people might -- in your words, lay people -- would be involved, in helping establish the priorities for the agency, for example, where we do engage consumers in that process.

The other thing that we do as well, though, is kind of at the end of the process, in evaluating the investments we are making, in our GPRA indicators we use, we of course look at rigor, but we also look at relevance.

We ask consumers to evaluate the investments we are making for the relevance of these investments, the kinds of issues they think are critically, whether they are a parent or a person with disabilities or a teacher.

Typically, it is kind of front line people that we ask, both at the beginning of the process and then, again, at the end, to make some judgements about the investments we make.

MR. FLETCHER: Related to that, Lou, more than any other federal research agency, it sounds like you have more intrusion of legislation and policy into the research process. You are reauthorized every five years, for example, you have formal statutory language around reviewers.

Does that process impact your ability to conduct peer review in any way, like getting clearances or things of that sort?

MR. DANIELSON: Well, the legislative language, I guess I wouldn't be honest if I didn't say that it didn't impact. Of course, it impacts.

Does it impact in a negative way? We try to make sure that it doesn't. Part of what -- I guess we all work in a political context.

The assistant secretaries come and go and the relative beliefs of the assistant secretaries that now that participation is a factor in sometimes making it a little harder for me to make sure that we have got qualified people on every panel -- you actually asked -- I think there was a second question.

MR. FLETCHER: I am thinking about the whole process of writing an announcement and competitions and clearances and things of that sort.

MR. DANIELSON: Thank you for asking that. Actually, one of the things that -- as I look at one of the things that, more than anything else, perhaps, would give us the capacity to get high quality reviewers, it would be our ability to have predictable schedules for conducting our reviews.

The biggest impediment of that is the unpredictable nature of the clearance process for announcing our competitions.

In fact, last year, we actually lined up reviewers for our competitions and then, because of slippage in getting them announced, we actually had to reschedule the reviews. This year, we didn't even attempt to schedule the reviews.

In fact, it is, I think, an important constraint and is was one of the things that the commission, as you know, indicated, was the issue of predictable schedules.

I think it is a very important one because, clearly, if you are contacting -- the best people are busy people, and if you are contacting them with relatively little notice, they have got to be really committed to participating in the review to change their schedules in order to do it.

Very often, unfortunately, that is what we are asking them to do. So, we start off with a very difficult problem.

I think, in part, that is why getting people to review is a challenge for us. I think if we had a much longer lead time on it, that a lot of very good people in the field would be a lot more cooperative. Yes, it is a very big issue.

MR. HENLEY: I have a question for Susan Chipman. How do you prevent friendships and enmities to influence your program office's decisions?

MS. CHIPMAN: Well, I didn't get to the part about the pros and cons, but that obviously is going to be an issue. It is an issue all the time.

One of the problems with the peer review process is that you don't always know who the enemy relationships are in the field and so on.

Yesterday, when I was talking to somebody that I was going to come here today, he told me a story about some review that DARPA was running, and DARPA is not necessarily doing things with lay people.

There was a group of reviewers, and it turned out this group of reviewers involved everybody who had funded this particular researcher throughout his entire career. They all decided together, it is time we quit.

He said, this is a researcher's nightmare. I said, no, the nightmare is the peer review panel full of research rivals. These are potential issues, obviously, but there are no magic answers to that.

MR. FLETCHER: Can we ask the same question of NSF, how they avoid charges of cronyism, and you have rotating people. How do you avoid criticism, that you rotate in, you get people funded, you rotate out and then your grants get funded, just to touch on the seamy part of it.

MR. SLOANE: On an individual review, the PI can make a list of people that they think would be good reviewers, and people they think would not be in a position to provide a good review.

MR. BRECKLER: Like many of the agencies, NSF has a very complex array of conflict of interest rules. It is probably the one area where they assess NSF over detecting and preventing conflict of interest.

Every years, the program officers have to go to a conflict of interest briefing and spend the day doing that, and there are all sorts of threats that hang over your head, you go to jail, all kinds of stuff. You have to declare them.

There are very formal rules about that, who can and can't be part of a review. The systems are in place that prevent you from influencing that, and it is part of the culture. The same issue comes up with having friends on panels.

MR. FLETCHER: I am thinking, for example, some of Checker Finn's criticisms of peer review, where we are all in it together, even this panel, for example.

MR. BRECKLER: What is surprising on the inside, though, is you don't know who your enemies are. As Barry said, I can ask somebody to suggest some reviewers and I will go to them and they will be the harshest, most critical, people, and I will be accused of not having used any of the people they had suggested.

At NSF, the main thing that keeps it honest is this process I described of the committee of visitors and opening the books.

Yet, another group comes in and says, look what this person did. These people weren't getting funded, and then they left and did this and did that. That keeps it quite honest.

Things happen, but generally it is pretty honest, as long as you keep the books open.

MR. STANFIELD: Just to clarify, at NIH, the practice is, applicants may suggest folks who should not be reviewers because they are just too close competitors, but we discourage folks from suggesting reviewers.

MS. WANG: I am Aubrey Wang from Educational Testing Service. The question is for Barry. I wonder if you could elaborate on the feasibility studies. You mentioned that NSF funds a lot of feasibility studies before you fund the full research, if I understand that correctly.

MR. SLOANE: I was contrasting NSF's culture and the focus at NSF on funding things that are innovative in nature, relative to the general medical model of research.

In the general medical model of research, there are four phases of research from feasibility to group consensus. It is understood that there is a ratio of about ten to one across those phases, in terms of input, output. So, there would be a lot more feasibility studies before you would ever have a study where NIH would be invested in having a consensus, for a large national randomized trial for a period of time.

So, there is a lot of different information that has been gathered from different types of studies before a consensus study would even have the thought of being brought to the table. I am broadly saying that we fund 10 feasibility studies for every phase two study.

MS. WANG: That is fine. Thanks for the clarification. Just in terms, then, what does the NSF do if -- thinking about what you mentioned, what sort of evidence do you need when you are ready to fund a full study in the EHR, especially.

MR. SLOANE: When you say a full study?

MS. WANG: I just mean a full randomized study.

MR. SLOANE: Usually, obviously you need a significant question. We pretty much would follow the first four principles of the six principles that the NRC would have in place, that you have thoroughly reviewed the literature, that you have sensible questions that need to be answered, that the state of the knowledge about an intervention, if it is an intervention study within that genre, puts you in a position to say that a randomized trial is a valuable thing to conduct, in other words, to map with the NRC's report which stated that the methodology should map to the question.

So, if you are going to use the randomized trial methodology, you should have a question that that randomized trial would allow you to gain some insight on.

Then, the fourth criteria, there is an explicit chain of reasoning between theory, question and data to be gathered, the matrix used, the type of analysis to be used, the evidentiary base that would accrue as a consequence of that process, and then that links back to theory.

MS. CHIPMAN: I really wanted to comment on the significance, technical quality question, going back to my experience in NIE days.

This was something where we did, often, get things that had -- and some of them had higher significance, point value -- that were rated as extremely significant but, in fact, were not technically competent.

Nothing is ever going to be significant, in reality, if it is not technically competent. In fact, it might have sort of negative significance.

This was the one case in which I took a really active role as a manager at NIE. I did shoot down a number of proposals like that.

I never advanced something that I thought was great into the funding category, but I did shoot down some things that I thought were really incompetent.

I think the basic problem is, it is much better, I think, to have the kind of global rating sheet that NSF has, because the true structure of the decision isn't a linear additive thing like that.

My favorite example is adequacy of facilities. That typically will get five or 10 points on one of these rating forms. That is because you don't want to make it a big thing.

If you don't have facilities to do the research at all, say, if you want to do an fMRI study and you don't have access to an MRI machine, forget it. It completely knocks it out.

So, the real structure of that decision in terms of how those factors interact is very complex, and the global one to five type rating, at least, allows a reviewer to express that.

Also, we have had a lot of talk about the individual proposal could be rated for its significance by essentially naive consumers, but you actually have a complex scientific structure.

Say if you are doing research on reading comprehension, as we were, for years, the reading process is extremely complex.

So, if you understand all the science, this could become very significant to a practical application, but if you have no idea about the science that links the investigations to the application, you are not going to call it significant. So, that needs to be attended to as well.

MS. KLINE: I am Sue Kline from the U.S. Department of Education. I would like to build on a colleague's comment this morning.

Vinetta was asking about diversity, and the only person I heard -- I may have missed somebody -- who mentioned diversity in the selection of panelists was Lou, following congressional mandates.

I would like to ask about -- I do remember that NSF used to have, I think it was a set of criteria where it said diversity. I didn't hear anybody mention that.

I would like to ask about not only diversity in the selection of panelists from the others, but particularly any kinds of ways that you integrate diversity and equity considerations, societal equity considerations, in the criteria, as the criteria categories, as some of you mentioned, were fairly broad.

MR. BRECKLER: That is a good question and I am glad you asked it. There are multiple levels of answer at NSF. You can start from the bottom and work up.

At the individual program level, program officers, I said that there are individuals who are personally held responsible for the management of programs and the decisions they make.

Ensuring appropriate diversity in the panels and in the reviewers and in the grants that get made and so on, are the responsibility of individual program officers.

All the programs at NSF pay a lot of attention to this. In fact, the committee of visitors at NSF every three or four years spend a considerable amount of their time focusing specifically on the outcome end of accomplishing diversity.

That is, what kinds of efforts were made and what kinds of successes were achieved in ensuring diversity in the reviewer pool and the panel and so on.

The things that are harder to document is what kinds of proactive efforts are made to improve the quality and breadth and diversity of the programs.

NSF program officers spend most of their outreach hours going to people and places to try to improve the participation of people who ordinarily are not plugged into the NSF review process.

So, it is something that is sort of in the consciousness of all the program officers, but that is not enough.

To really succeed in improving the diversity, especially in a government agency, it has to become a major priority of the entire agency, of the institution. It has to be incorporated into the fabric of the institution itself.

So, at NSF, every proposal is evaluated on two criteria, in the simplest form. One is intellectual merit, which is all the things we think about in terms of evaluating the science, the scientific merit of a project.

Another -- I am not going to say it is the second -- another is what we call broader impacts. It is in the broader impacts where you will find a lot of attention being paid to diversity kinds of questions.

The reason I am glad you asked the question is that, at NSF, people misinterpret the broader impacts as meaning the applications, and that is not what NSF means by broader impacts.

In fact, I can read to you what NSF was thinking of as broader impacts, because I am speaking to this issue. So, one is integration of research and education, is pretty much a mantra at NSF. You have got to be able to integrate research and education.

The other one is under broad impacts, and I will read this because this is what all reviewers are asked to evaluate in every proposal.

How well does the activity broaden the participation of under-represented groups, for example, gender, ethnicity, disability, geographic and so on.

It is right there in the second -- I shouldn't say second, but other review criteria.

At NSF, in fact, the current policy of the foundation is that these two review criteria are of equal importance. A grant proposal will not be funded at NSF unless it succeeds on both. It can't have one and not the other.

It is frustrating to some people who are proposing grant proposals to us. If you submit a grant proposal to us now, and you fail to explicitly fail to tell us what the broader impacts of the activity are, we will return the proposal unreviewed, period. So, NSF is taking this really seriously.

Diversity is one part of that. It is a very important part. There are other parts of this, disseminating results to the tax payers who are paying for this in a form that they can appreciate and understand and those kinds of things.

So, it is not just applications. It is really speaking to the whole infrastructure of science.

MR. STANFIELD: Speaking for NIH, there are sort of two aspects of the question. One is the peer reviewer and one is the applications. I am going to talk in a little while about the reviewer selection, and I will address that there.

In terms of applications, NIH is very concerned about underserved populations and health disparities. We have a number of initiatives. That is one way that NIH can focus what research they fund.

The second way that happens is, in a sense, similar to NSF. Each institute has a different funding strategy on how they select what to fund.

Many of them focus on the pay line, and then they have other money that they then decide how to use it. Many institutes call them the specials.

Sometimes the specials are young investigators. Sometimes the specials are addressing an underserved population.

Sometimes the applications that are special are for a minority researcher that the institute wants to encourage. So, that is the second way the NIH can respond to those concerns.

MS. EISENHART: I guess I am wondering a little bit what is broken about peer review. The reason that I ask that is, I think you did a really nice job illustrating for us the different ways that your organizations go about doing peer review.

It is very interesting hearing the differences and how to compare across the different groups, but for the most part, you are presenting a picture that is pretty good, I think.

All of you have said, well, there are small ways that this could be improved and there are ways that we could do better.

Earlier this morning, we heard about how the peer review process is inherently tension laden. There are contradictions and conflicts throughout that really can't be ignored, given the nature of the activity.

I am wondering, is there something to fix here, or in your view, are we doing what we can do, and there is something else that is a problem, like you aren't getting the quality of proposals that you want or something else, some other issue.

MR. STANFIELD: So, being bold, my opinion for something that is broken at NIH about the system, it is the fact that there is a lot of pressure from the community of researchers to pay grant applications strictly toward the outcome of peer review.

Yet, we know darned well, at the pay line, we aren't making discriminations that make any difference. So, the community is asking the system to make discriminations that it is just not capable of making.

That is why I believe that it is important for institutes to have a secondary layer of review, where the programmatic considerations come into play. At the margins, you are just not able to make discriminations to the level that we are being asked to.

MR. BRECKLER: In my view, I think this is precisely why NSF doesn't separate peer review and program, because the peer review is viewed as a source of input.

Whether it is perfect or not, it is one source of input, and there are scientists who are there to interpret them and make decisions about boundaries, but whether you talk about them as problems or just characteristics of the tension they create, the arguments, the disagreements, the competing demand they place on peer review, that is also the way science is done. These tensions exist everywhere.

At NSF, at least, it is those tensions that we like to bring out, problems like any pressure being put on peer reviewers to produce final assessments, or things that have the appearance of consensus.

It is okay for our peer reviewers to disagree. The program officers actually kind of like that, because it suggests that there is something worth looking at. There is a disagreement enough that two peer reviewers coming at this get into an argument about the scientific merit or the broader impacts of a project.

They get heated about it and they are jumping up and down about it. We love that at NSF, because we want to understand why a proposal is provoking that kind of response.

It suggests that there is either nothing there or there is something there, and we need to look at it more carefully.

What is sometimes characterized as a disadvantage of peer review, I think, is used to advantage at NSF, trying to figure out whether there is something novel there, something innovative, something creative.

MS. CHIPMAN: I just want to say, while I represent the sector in which peer review usually isn't used, I don't think there is something wrong with peer review.

I think there are many equally good, equally bad ways to go about making the decisions that have to be made. I feel that it is the real strength of our system in our country, contrasted with other countries, when I talk to people, that we have multiple sources of decisions, multiple processes.

I think it is relatively unlikely that anybody who has got really great ideas is going to find it impossible to get funding.

If you have a unified system, which often the Presidential science advisors would like to unify everything under them, I am afraid you would see fads and fashions that would exclude people and so on.

I think, despite the unreliabilities, et cetera, et cetera, I think we have a pretty good total system.

MR. DANIELSON: I think that probably none of these systems would be considered broken, although there are probably people, principally applicants, who might see them as broken.

I can only speak to the one that we use. I think there are a lot of ways in which it could be improved, and mostly when I say that, improved from the perspective of applicants.

For example, improving the quality of feedback, we currently don't have a process where people can have applications resubmitted and considered it as a resubmission. They can resubmit but it is just like an initial submission.

When we did the work groups, that are mainly people who apply to the program, that was one of the things that people indicated to us, that they felt that would be a useful thing to have available to them, is to be treated as a re-submission.

My point really is, I think there are ways that we can improve. I think among the ways that we can improve, some come with a steep price tag and some probably do not. This problem of resubmission would probably not cost us much to do.

Improving the quality of feedback will cost us a lot to do, at least to improve it to the level of the kind of feedback that NIH will cost a lot. So, they come with a steep tag.

I think those judgements are harder judgements to make and, to some degree, may be also judgements that, in this case, may be outside of my control, because if the price tag is steep enough, higher level people are going to have to make decisions if they are prepared to find that amount of money, and underwrite that.

I do think that probably for all of these systems, there are ways they can be improved. Some may come with steep price tags.

One of the surest ways to improve our system is to go to five person panels as opposed to three person panels, but that would come with a very steep price tag, and also the difficulty of, we have trouble getting three. How would we ever get five.

MS. TOWNE: I just want to follow up on the question, how would you know what the system needed. For example, I heard a number of you, in your talks, make reference to a system being effective or successful.

I am wondering by what method you are making those kinds of claims? What are the outcome measures? How do you know that a particular peer review system is working or not, or needs improvements in what way.

MR. SLOANE: I think one way to think about the puzzle is that people can and do ask for their proposals to be re-reviewed or further processed.

The amount of that that occurs, I think, is one measure of whether we are doing a reasonable job.

MS. TOWNE: Responsiveness?

MR. SLOANE: Not responsiveness. I mean, 80 percent of the proposals that come to the NSF, or 80 percent of the proposals that come to my program get returned. What proportion of those people are so peeved about the process that they actually go and do something about it.

In terms along the dimensions of fairness, adequacy of response, adequacies of descriptions of the process itself, there are mechanisms for us to get data back that says whether we screwed this up or not. Not everybody will use those mechanisms, obviously, but we do get feedback.

Why we might say it is effective, I don't think anybody would say it works perfectly. In fact, this gentleman sitting to my left specifically noted that the inter-rater reliability on his panel was somewhere around .75.

Above 50 percent of the variance in those decisions is unaccounted for. One alternative would be to toss a coin, but it would leave you with a lot of information that you wouldn't get by tossing a coin. I am going to stop.

MR. BRECKLER: Speaking of tossing a coin, we have discussed this idea, at least in the social and behavioral science center.

The scientist in you wants to do the experiments, and figure out whether that system is better than another system, if you can settle on a definition of what is the best outcome.

The problem is -- and NSF has struggled with this -- when GPRA started imposing on government agencies the requirement of making good on and accounting for investments, NSF had a really hard time trying to get a handle on what it meant by an outcome in a basic science investment.

That outcome may not be evident for decades and GPRA demands that that outcome be documented that year. That is really hard to do in science.

What is also hard to do is do proper experiments. Barry's suggestion could be taken quite seriously. Maybe the right experiment is to take a cohort of proposals and randomly assign some of them to peer review, and the other ones get randomly assigned to the flipping of a coin, and then we will see which ones do better. Nobody wants to do that experiment.

MR. SLOANE: Moreover, we would need a lot more proposals coming in for the tossing of the coin, so that we would be sure about the coin tossing.

MR. DANIELSON: I think that part of the question, too, I think, a distinction that I tried to make in my presentation is that, from the point of view of the citizen who is looking at whether these programs are funding research that is important and is sound, I think, for the most part, what we fund, when critics review it, they generally judge it to be the sort of research we ought to be funding.

As part of our GPRA review, we actually had a second look at the integrity of the research we fund, and it comes out well.

Part of that, though, I think that -- I mean, that is one way one can look at it. The other way one can look at it is how the applicants feel about the process.

I think a lot of the way in which the process, maybe our process at least, does not measure up is in terms of how applicants feel about the process.

A lot of what we struggle to try to do is to try to respond -- seeing applicants as our customers, we try to respond to applicants, and will that -- there is the prospect that that will ultimately improve the quality of the research that we fund.

I believe that the research we are trying to fund, we may not actually need to improve the peer review. It is perhaps for other reasons, mostly having to do with the satisfaction of applicants, who feel that the process is not providing adequate feedback, along the lines of the things I have said.

MS. SCHNEIDER: I have been trying so hard not to be contentious, but on this one I am afraid I have to be. It seems to me that if the only criteria by which you decide whether the system is working or not is who complains about not getting money, then that is something in my mind that there are some issues of, is the work that we are doing making a difference in the field.

I think that, at least from my perspective, that this requires a different kind of perspective.

MR. STANFIELD: That is not what I am saying. I am saying that, on the first dimension, about whether this is making a difference, I think that the work that we are supporting on that dimension is fine.

On the second dimension of whether applicants are satisfied with the process, I think that it is not. I mean, in the process we use, we hear frequently from applicants that the feedback is inadequate, and that is a very big one. The issue of resubmission is probably a smaller one.

There are a variety of ways the process could be improved, but I think on the broader issue of societal benefit, I think if you laid it out to the citizens of the country, is this making a difference, and I think independent of that, if you just look at the history of investments and where we make big investments over a long period of time, you will see big impact.

I am just trying to make the point that I think there are a couple of different ways in which you can view this question.

I think they are both important perspectives, although it is possible that one could look at it and ignore the second part of it and say, I don't care if people are happy with the process, if it is ultimately benefitting society.

MS. SCHNEIDER: Then it seems to me that if you are talking about benefitting society, how are we going to know if it is benefitting society.

With that kind of philosophical question, by which I think some questions are probably going to have to come up in some set of criteria eventually, I think it is time to move on to another session.

[Applause.]

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top