The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
CORE HOMEPAGE

ABOUT CORE

FOCUS OF CORE

CORE MEETINGS, WORKSHOPS & PRODUCTS

RELATED NRC EFFORTS


MR. HACKETT: Thanks, Laurie and Daryl, Lisa, and all of you for coming. I am on Arizona time. So, it is really early for me. I drank as much coffee as I could.

Daryl apologizes for not being here. Since I pretty much wrote the paper, you are not going to miss a whole lot.

Daryl said, look, when you think about how we really should do this, it is not from the podium of power. If I don't stand at the podium, I can't see the power point and I have trouble following along. So, if you ever see him, tell him it wasn't slick.

I am not going to do a history. If you want a history, here is a capsule. Everything begins in about the 17th Century in England and in France. I bet if we look closely, we will find non-European precursors.

Grants peer review dates from about the 1930s, the National Advisory Cancer Board, the Office of Naval Research, the National Science Foundation. That is about all the history you are going to get this morning.

What this talk is really going to do is present some framing ideas more broadly about peer review and its place in science.

In fact, it is going to be a peculiar sort of framing. What I am going to do, I discovered last night, is some institutional analysis of peer review.

This institutional analysis we associated with Robert K. Merton(?). That is when we look at a piece of social activity. You try to identify what purposes it serves and how principles guide it.

So, thinking about Bob Merton and my talk, I discovered, by accident, I had actually been involved in institutional analysis of peer review, but simply hadn't recognized it. So, that is what you are getting, if you want to know the category.

The discovery and context of peer review provides an analysis of what are its intended and non-intended consequences. That is important, because when you read many policy documents on peer review, you will see people say, well, we don't need this system to do the following. We will provide expert advice to the allocation and support, but it is not to advise scientists and researchers. It is not for this and not for that.

I think it has unintended consequences that are possibly more significant than its intended consequences.

I look at the value of the principles that guide it and I would try to argue that you will see this guidance in competition. Again, a good old Mertonian idea of sociological ambivalence, that through the peer review process we try to accomplish a variety of inconsistent things, and I will lay those out.

Finally, I want to wrap up by boldly explaining what I see as the choices and challenges facing educational sciences.

To set a context, there is a variety of ways that we could allocate funding for science. We can start by appreciating some of these. The place for peer review will be more obvious.

We can begin with a process of ear marking or pork barrelling, where Congress directly allocates funding to institutions, service centers, states, and perhaps individual investigators.

This has the merit of being democratic, because it is part of the democratic process. I would imagine that a certain degree of distributional unfairness is what resulted in a peer review conference maybe 15 years ago.

I am going to talk about peer review maybe 15 years ago at Penn State. There was a representative from the Hill present.

This person was baffled to see that money was most heavily distributed in scientific research on the coasts. He thought this was a terrible wrong that needed to be righted immediately.

Somebody had to point out that there were large educational institutions that anchored both coasts.

So, when you look at the allocation of money from a congressional perspective, why isn't it distributed as maybe senators are, about the same amount to each state, or as representatives are proportional to the population. It is a real puzzle.

In the paper we talk about how a member of Congress, when we decided exactly how to slice them out, because Congress slices them out -- that is their job. So, you could go with a congressional ear marking sort of strategy, but that would not be particularly expert. You would have inexpert advice. The allocations would be based on political criteria.

It would be culturally corrosive. By that, I mean that merit and the allocation of resources according to the quality of what is done seems to be a principle that imbues all of science and, to allocate otherwise, would erode that.

So, while you could, instead of peer review, use ear marking, you could, stated Congress, let a strong manager make the decisions.

In other words, put somebody in charge of deciding, given a project or a goal to accomplish and say, here is your pot of money, here is your purpose. Go. Take whatever advice you would like on your own.

That has the merit of being flexible and responsive. A good idea comes in the door, a couple of pages of white paper, a little discussion, and you can put money into it and let it run.

It assumes that you have objectives and standards, that you know what you are doing. You don't need a committee to tell you what to do to improve the totals. It requires outcome accountability.

You need to hold this person, this strong manager, accountable for accomplishing the project. It may not work in all fields. Some fields are more amenable to this than others.

DARPA makes a distinction between projects that have a purpose to accomplish something, such as a technology, and they end when they are done.

The programs can exist in the same field, but if you try to build and sustain a field, you may not succeed on the projects.

There is concern about whether you could scale this up. It works with small pots of money and it is sort of a boutique program at the Department of Defense. It might not work if you tried to spend every nickel of that maybe $5 billion that way.

Finally, the system has to be, I don't want to say ruthless, but decisive, in that you really have to accept that you are going to fail, and cut your losses. The guy that walks in the door and doesn't succeed, you may have to turn it off after two or three years of the five-year award. Again, the processes may seem unfamiliar to those who work in the NIH and NSF context.

So, you can use direct allocation by Congress, you can put a strong manager in, or you can use a formula of some sort which, about 10 years or so ago, said, why don't we simply look at the track records of investigators, and allocate according to who has published the best, who has produced the most students, substitute your favorite formulas.

The problem, of course, is that writing the formula itself becomes a political act, and there are probably as many formulas as there are people proposing them.

The system wouldn't encourage a great deal of risk taking and responsiveness. You would end up having support for whatever it was that people were in a track for accomplishing, rather than ideas that would be proposed and evaluated for different programs.

Finally, how would you fit young folks and old folks into this framework. How do you get young people started who don't have a track record, and how do you gracefully pass your old folks who have pretty well run along the track.

Finally, if you put a formula in place, you invite investigators to game the formula, as they say, and figure out that it favors citations, and you and your friends cite one another a lot. If it favors publications, you slice papers and publish them in units. Whatever the game requires, you provide it.

So, these are alternatives, none, perhaps, as well known as peer review. Two ideas to hold onto as we go into the discussion.

One is this notion of a boundary option. These ideas will come up a lot, and I would like to introduce them now.

I will start at least to talk about boundary options that fit across social roles. A zoo, for example, is a place the public could come to look at the animals. It was also a research instrument for science, that actually collected and studied specimens.

Museums had these dual purposes. Behind the galleries of the Smithsonian, there is a research institution, up against the boundary of public understanding and inquiry.

I would suggest to you, in the same way, peer review is a boundary process. It spans the role of science and policy and academe and governance.

If you go to the National Science Foundation, for example, you are in a government agency that has got a lot of academic tweedy types wandering around.

Some of them come in as rotating program officers. Some of them come in as panelists. Some come in with their cup in hand, asking for money.

Many of the values of academe are found in the National Science Foundation. At the same time, anyone who works there appreciates that they work for a federal agency. They are apprised of this early on in a briefing, where the lawyers upstairs like to remind you that if you do this, that or the other, you can go to jail.

So, you have to understand that peer review is a boundary process that spans a couple of worlds. If you keep that in mind, some of the things that follow make more sense.

Keep in mind also that, for the process to work well, it needs legitimacy. You need to ask yourself, what characteristics of peer review make it seem a legitimate, fair, reasonable and just allocation mechanism so inhabitants of the various social worlds can intersect in the peer review process. Throughout, these ideas will come up, boundary options, legitimacy.

This notion of boundaries is not a trendy thing. If you think about it as community purposes, evidential standards, argumentative procedures, ethics, and even epistemic cultures, different sets of roles that are established that we really know what we think we know. Sometimes this happens when disciplines come together in a panel.

So, the next chunk of the talk is to explain the various purposes of peer review. The one that we are the most familiar with, and the one that I think dominates a lot of the policy thinking to the expense of the others is that it is a process for grading the brains, is a phrase used by the science policy administrator.

It is a way for evaluating proposals that are submitted, and for allocating scarce resources. We are very familiar with this mechanism, and it is one that almost all of us have been a part of.

I would suggest that peer review does a variety of other things as well. The first of these is that the peer review process is a source of expert advice.

It provides its advice both to the agency that wants to know, should we fund this one or that one, and to the proposals.

In the course of receiving your reviews, your summary, your pink sheet from NIH, you get the distilled wisdom of some subsection of the community.

This aims both to improve science by making wise allocations and by advising, actually providing direct advice that flows through the peer review process to the people who are doing research.

If you look at this over time, it cumulatively shapes the field. You will see that the kinds of research being done are shaped by the peer review process.

What agencies decide is fundable and not fundable is also shaped. One thing that happens routinely in an NSF panel is that, toward the beginning or toward the end, or at dinner sometimes, there is some discussion of, well, what should we do differently. What are the new areas that we should move into. What are the new mechanisms that we should explore. Should we use more or less of this or that.

That way, peers are actually shaping the field. What NSF, NIH and other agencies decide they will fund provides a target for research.

So, more than just grading the brains, peer review provides expert advice. It is also a flywheel. It lends stability.

In this, it embodies what Keene(?) in 1977 called the essential tension of science between tradition and innovation.

On the one hand, we want new ideas in science, new methods, new discoveries, but each claim for novelty is tested against the body of established knowledge, established procedure, to ask whether it is truly new, whether it is sound, whether it is reliable, whether we should build it.

So, you try to get peer review to accomplish both ends at once; that is, both to protect the traditions of the discipline and to recognize and support originality.

A final point on this flywheel notion, the peer review process helps researchers sustain a course in their research.

Think about projects you have done. You usually reach a point where you say, this is a bad idea, how does this get funded, but some insurmountable or seemingly insurmountable problem.

The instrument is developed and a sample is chosen, or how do all these pieces fit together, but the fact that it has been through a peer review process that eight or 10 other people thought this was a reasonable idea, worth doing, plausibly proposed, helps you stay the course, get over the bumps.

It may seem obvious, but compare it to another method. Suppose instead, because you are a member at a university with formula funding or direct allocation, you are given $100,000 a year for your research. Do anything you like.

You could imagine that each bump in the road could set you off in a new direction, and that might happen even more intensely if you needed to get three papers published at the end of this year. Otherwise, you would not look fundable according to the formula, in the upcoming year.

So, the peer review provides this kind of stability, where you made a public promise, really, to conduct a piece of research, and others have said, you know, we endorse this promise.

Yet another unseen function of peer review, to continue this institutional analysis, is that it is a motive for scholarly communication.

When you submit a proposal, you are putting ideas into circulation among a small group of influential researchers in a field.

Conflict of interest rules and confidentiality rules prevent them from using a lot of the ideas, but there is still something in the spirit of what you are proposing that they will retain and build upon.

In some things, I would say, for example, a panelist will notice a piece of work that is done by a new investigator and invite the person to give a talk at an upcoming meeting, or to come to campus to give a talk, because it will appear to be exciting new stuff.

Because of this communication function, the peer review process prepares the ground for the acceptance of new ideas.

You can imagine a new idea being rolled out. You see it first as a proposal, then in a colloquium on your campus, later a talk at a professional meeting. So, when the draft of a manuscript hits your desk you say, well, yes, you know, this actually does look like a good idea.

It might be three or four years since the idea might first have appeared in the proposal, but the ground is prepared in different places and different ways for this to be accepted.

Again, much more than grading the brain, peer review really sits at the heart of what a lot of people in the field think.

The peer review process also provides an entry point for social consideration. This is formalized, these days, in the number of proposals, its cascade of influence on peer review criteria at NSF and NIH, where they now take explicit and substantial account for societal benefits in the evaluation of proposals.

We can set aside, for the moment, the degree to which it is done, whether it is actually actively done. It is evident that it is still a problematic criteria, for reasons that won't surprise anyone.

Scientists actually are not generally the experts on how to apply their ideas. So, the fact that they can't do a very good job explaining it in the proposal, or in evaluating that in the review, doesn't surprise me greatly. It will take a while for that criterion to take hold.

As an entry point as well, if you think about the NIH system before, special interests -- not of a negative sort, but a positive sort -- tend to come to bear. These tend to do with hereditary diseases, AIDS, underrepresented segments of the populations whose health needs need to be addressed.

NIH began, a few years ago, to take more seriously conducting research on women and on children, to be sure that medical research accounted for the entire population of the United States, not just the portion of it that is male and white.

So, you end up with an explicit criterion that says, does this proposal take into account the diversity of populations that it should address.

When a proposal is funded by an NSF program officer to establish balance of some sort -- gender, or substance, ethnicity or geography, or the inclusion of an undergraduate college or historically black college or university, they, too, are bringing social considerations to bear on a process that otherwise could be seen as merely scientific.

Peer review, finally, is an enactment of professional authority. This is a complement of the principles I just described.

It is a place where scientists and researchers apply scientific criteria to the allocation of support. It creates a buffer, or establishes a boundary that separates science from other spheres.

We don't allocate highway construction money by peer review. We don't let companies that build highways all come to Washington, sit in a committee, and decide which highways they would like to build where, and to what specification.

Science, especially, is allowed to slice the melon for itself, to some degree. So, it has the practical importance of both the expert advice and of establishing a special position of science in society. It also has symbolic importance, in setting this endeavor apart from other parts of society.

So, the first part of the talk is intended to establish in your mind that peer review is far more than a system for grading brains, providing expert advice to evaluate proposals and make wise allocations. It does many more diverse things that are at the heart of building strong research communities.

The next piece of the talk outlines for you the values that Daryl and I see underlying this process. The theme is that these values are intentional.

In developing any particular peer review system, you have to decide which end of the value continuum is most important for which purposes or at which time, and why.

So, initially, we thought that these are homogenous standards of what is good, true and beautiful in a society, but later analysis of cultures of different sorts found that you can find ambivalence in defined values and intentions.

People prefer to honor this value or that value, depending on where they are at. So, I would like to outline for you now some of the competing values in peer review.

First is that the peer review process is, at once, open and secret. It is certainly open to the procedures, the criteria, the rating scales, everything that is knowable and published. It is transparent. Someone is watching. This is accountable. Principles are intended to be applied to all in the same way.

But it is secret. A proposal is a confidential document. The fact that a proposal was even submitted is treated as confidential by NSF.

Reviews are sent to the investigators. They are not sent to anyone who asks to see them. To those outside the system, it is a mystery, how people get chosen to serve on NSF panels, or are chosen to receive proposals to review, which promotes the idea that it is an old boy network, where they nominate each other and exclude everyone else.

The record of a peer review meeting is passed along in fragmented form to the investigators. They get a kind of summary of the discussion.

The official minutes of the meeting are quite minimal, who was there, and who had a conflict of interest and the like. So, the system is asked to be both open and secret at the same time.

It is asked, at the same time, to be both effective and efficient. On the one hand, we want peer review to accomplish a lot. Look at the list of purposes that I provided at the outset.

We want it to do a very good job. We look at the criteria and we expect each criterion to be applied fully and fairly. When we look at the National Association of Public Administrator's report on peer review, they complained, effectively, that reviewers ignore certain criteria or apply them in idiosyncratic ways. So, it looks like the system is ineffective and people are not following the rules.

We also demand efficiency. At the outset, for example, we don't pay reviewers. If you send a mail reviewer a proposal, in exchange for the review -- it probably takes two or three hours -- the person receives no pay.

For the time you spend at a panel meeting for NSF, you receive about $280. That doesn't count the time that you spend preparing for the panel meeting writing those reviews.

For my program -- we can think about other programs later -- it would be not uncommon to handle 100 proposals in a round. A dozen panelists would each receive 100 proposals.

When I first served, they arrived in boxes. There were one or two full boxes of proposals. The time taken reading those, before the meeting for two days, you got two days' worth of honorarium.

We also complain about how much time the peer review takes. It is a burden on the review community. It costs money for NSF to do.

At the same time, you notice we are asking the system to be very effective in accomplishing a variety of goals that require a great deal of skill, and to do so at almost no cost.

An example I put in the paper is, imagine, for example, when you bought a house. The first house I bought cost $80,000.

The purchase and sale agreement was a standard printed form that we paid a lawyer $600 to look through. Basically, that is all the lawyer does, is look through it and say, yes, a very standard document that is about two pages long.

We are looking here at a highly original, complicated document that runs 15 single-spaced pages, for which you receive very little in exchange.

You can have either. If you would like a good thorough review that follows my criteria precisely, you know, pay me lawyer rates and I will promise to do it.

We ask the system, at the same time, to be sensitive and selective. These are good old engineering terms.

On the one hand, a sensitive process is going to detect a signal wherever it exists. A selective process is going to filter out signals that shouldn't be noticed. You can't have both at the same time.

If you want to be sensitive to every iota of scientific quality that might be in your box of proposals, you may have to accept low selectivity, and basically fund some things that later earn you a golden fleece.

On the other hand, if you want to be absolutely positively sure that no proposal is funded that includes anything that anyone can raise questions about, the chances are that you will leave some good ideas on the cutting room floor.

Maybe I have read more of this stuff than you have, but if you read the literature and the commentary on peer review, you will see it repeatedly castigated for one or the other of these flaws, but no one realizes that you can't have both at the same time.

On the one hand, somebody will complain, I had a great idea, it later got me an award and you didn't fund it. Well, it seemed a little outlandish at the time.

On the other hand, look at this crazy idea, that you did fund. On the one hand, the system fails to do its job. On the other hand, it costs too much.

If folks could realize that these values are both desirable and inconsistent, they recognize that they have to make choices. Maybe they can design a system that would work better.

We ask the system, at the same time, to be responsive and inertial. This is the essential tension again. On the one hand, every new theory, method, topic and need ought to be captured by the peer review system.

It neglects something. If there is new interest in a topic, an area, there is a new method that someone devises, but we don't see it represented in your grants portfolio, the peer review system gets rapped for being unresponsive to novelty and originality.

On the other hand, we ask the peer review system to guard the body of knowledge and to keep crackpot ideas, bad and questionable research ideas out of the body of public research.

At the same time we ask the system to be mericratic. When it follows a mericratic criterion, we are asking it to evaluate all the science, only the science and nothing but the science in the proposal.

This requires a great deal of boundary work on the part of reviewers because, of course, a proposal is an expressive document that uses rhetoric and invokes many different things in a blend.

So, we ask a reviewer to parse all that out and response only to the science. At the same time, we are asking that this be done in a way that is fair, that meets societal criteria of fairness, which may not fit the criteria of merit.

Finally, we ask the system, at the same time, to be reliable and valid. If you remember your research method course, a reliable measure has little random error. So, if you repeat the measure, you will get general agreement or different raters, using the same standard, will get the same measurement.

A valid measure is one that measures precisely what it is supposed to measure and nothing else. So, if you are trying to measure intelligence, you are measuring intelligence and not socioeconomic status or height or cranial capacity.

Measurement theory tells us -- and some people who write about peer review remind us -- that reliability sets an upper bound on validity, if you have a certain amount of random error in a measure. You can't have more validity than that.

When you think about what is going on in peer review, the scientific quality of a proposal has many elements. Perhaps they are in different conventions, have different levels of importance, and they interact with one another, which is something which is often not appreciated.

A valid review is asked to address all of these elements from a weighted composite, with some thought given to the interaction.

A story I remember from graduate student days, where somebody is talking about a crecklock(?), whatever that is. This person goes into this story and explains that it is childlike, and a child should like this stuff. Well, do you like flour, I like flour, do you like eggs, I like eggs. I like all the pieces. Put them together, I don't like that stuff.

In the same way, the elements of a proposal may each, on its own, be good, but they may interact in ways that are unfortunate and leads to a decline. So, a valid measure is probably more than the components.

You are asking this valid measure to address all the elements. It even gets worse, because the variety of elements of a proposal are masked by the variety of competences of a reviewer.

Some reviewers are more sensitive, more informed, more intelligence about some elements and not others. So, a valid review, actually, is patched together from diverse reviewers assessing diverse facets of the proposal.

If you think about that for a second, if you are taking a complicated document and asking people with varied expertise to evaluate it, it is no wonder that they disagree. In fact, if you knew that they would agree, you only needed to ask one of them.

I think the pursuit of reliability in peer review would actually compromise validity. If, as a program officer, you wanted to be sure that you got reviews that agreed, one reviewer to the next, what you would do is get very precise criteria -- how many pages in the proposals, maybe something more imaginative than that, give very precise criteria to a homogeneous body of reviewers -- then you could be pretty sure they would agree.

If you try to evaluate a complicated, multi-faceted, multi-contextual document with pieces that interact in diverse ways, you are going to need something more than that.

So, it seems to me fair to say, in fact, that pursuit of validity in a review sets an upper limit on the reliability you can have, rather than reliability setting an upper limit on validity.

Precisely because you are trying to evaluate this complicated, multi-faceted document, you are driven to a set of reviewers that probably won't agree, but will be sensitive to different facets.

This is the next section, choices and challenges. These are the choices and challenges I see facing us. You have got to choose which purposes and which values, of those that are listed, and there are probably more we could imagine, that you wish to emphasize in the program that is being developed.

There are some preliminary choices that confront us now in the design but, importantly, because this is a boundary process that involves not just administrators, but also researchers and, indeed, not just researchers, but also members of activist communities and practitioners and the lay public, it is one that you develop with a process and emerge with time and experience.

You can lay down a template to begin with, but don't expect that to be followed linearly. Rather, expect that this will arise through social action.

The policies and procedures that IES will write initially will be a score without an orchestra. You need to build a community and a culture for reviewers and program officers, which takes time and is initiated through action, not just by scripts, not just by the score.

Because it is a boundary process, there will be diversity and, importantly, this agency that is being born these days must be a robust learning organization, not merely a teaching organization.

In other words, it has to be willing to learn from its communities and reflect on their action, in order to develop workable procedures.

Throughout, the agency needs to consider the intentional and unintended consequences of its actions, and practices that will be seen as legitimate in the diverse communities, each of which has different values.

Some decades ago, Leo Solard(?) was observing the low funding rates at the time, that if they continued to get lower and lower, we would reach a point where all you could do is write proposals full time.

You reach that time. If it takes a month to write a proposal and you get a funding success rate of about eight percent, you can do the arithmetic.

Those of us who work in contract research organizations, it seems that you are writing proposals non-stop and doing the projects at night.

When I look at some of the funding rates in the fields of education, they propose this Solard point, which will challenge the legitimacy of the process. The reviewers will say, why should we bother with this.

It will exceed the resolving power of peer review. What I mean by resolving powers, remember, you are making these complicated evaluations, probably at a level of decision of a whole number.

If you give me a one to five scale, I can probably be pretty sure that this proposal is five but not four. I am probably not sure it is 4.5 or 4, and I am really not sure it is 4.45, not 4.43. So, at some point we are going to draw a line that will exceed the resolving power of the instrument.

Another way to think of this is, go back to your 11th grade chemistry course. You remember the teaching hammering away at not carrying forward more significant figures than you had in your most crude measurement?

When you watch these long funding lists, where they carry the priority score out to the thousandth place and draw a line, the chemistry teacher is going to be offended.

Anyway, if you start to reach the Solard point, lots of bad things are going to happen, not just for people writing proposals, but for the entire social process, the entire community that you are trying to establish.

Another challenge for IES is bridging policy and practice. I think it is an ideal agency to try to do this, certainly.

Better teaching and learning is a societal goal. Do research on it, people will admire you. Involving practitioners in this is an added dimension that actually would be difficult to enact at NIH and NSF, to the degree that you could do it at IES.

Many of these things do seem to engage the practitioner community pretty fluidly, but involving practitioners, advocates, the lay public in peer review, upsets delicate sensibilities that professionals begin to develop.

The story I think of is how ecologists try mightily to separate themselves from environmental activists, from environmentalists.

We ecologists study the processes of energy flow and the distribution and abundance of species, we are not conservationists and environmentalists, who try to conserve the environment. That is activism in society.

There are good reasons to want to be a scientist and not an activist. One of them is you are applying to a science agency for research support, rather than to an activist agency for activist funding.

Every so often, why results count, it also shows that you are making a difference at the bottom line. So, this has to be handled delicately, so that the professional sensibilities of a budding community of education researchers is not compromised by the engagement of activists and practitioners and such. At the same time, trying to build that bridge is invaluable.

I think there are pluralist possibilities in IES. I think it is possible for them to build a complex peer review process that includes not merely what we discussed as the traditional peer review, but also possibly elements of a strong manager, to work on certain specific objectives. That might be a way to partition it.

I think that is it. Thanks very much. Sorry about the technical difficulties. Questions? Comments?

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top