The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
CORE HOMEPAGE

ABOUT CORE

FOCUS OF CORE

CORE MEETINGS, WORKSHOPS & PRODUCTS

RELATED NRC EFFORTS


Workshop on Understanding and Promoting Knowledge Accumulation in Education:

Tools and Strategies for Education Research

Day 1 – June 30, 2003

Remarks by Dr Gary Natriello

DR. GARY NATRIELLO: Now, 15 years ago, I was involved in a study, and one of the things we found in the study that stuck with me is that many of the problems that students experience in school have to do with their failing to understand their assignment, and I can relate to that very clearly this afternoon, because the assignment that I came with is not the assignment that I now realize I have from the introduction, and it is perfectly understandable. I do remember a conversation in which we talked about incentives for data sharing, and then, subsequently, I received the agenda, which - much more context about the entire meeting -- and so deciding to be terribly responsive for once in my life, I actually addressed all of the questions posed for the panel and put them on a handout for you.

So my talk won’t have to do with what is on the handout, but I wanted you to have the handout anyway, and you’ll now see the results of three years of extemporaneous public speaking - (laughter).

And you’ll notice on the handout that the thing that I spend the least time addressing of all the questions is the question on incentives and disincentives for data sharing. So we’ll expand on what is there a bit.

Let’s start with one very simple observation about data sharing, and that is that there are at least two roles involved, and I want to talk about both of those roles. One is the person who has the data who is going to share it and then the other is the person who they would like to share it with, and, of course, the sharing process is not complete unless both people play their part, and so when we talk about incentives and disincentives for making that relationship work, we really have to focus on both of those folks.

Before I start, I also want to observe that two people in this room came up to me at the beginning of the session, before it started, and apologized beforehand for the sense that they would be likely to fall asleep, and they were very convincing in explaining to me that it had nothing to do with my talk, but only with the time of the day and the heat in the room, and had they not been two people who had heard me speak before, I would have actually believed them.

Okay. Let’s start with disincentives for data sharing and try to be as forthright and honest as we can. I think there are a whole series of disincentives that good folks will overcome, diligent folks will overcome, good scouts will overcome, but, in fact, there are lots of not-good-scouts - and that is most of the folks - I would include myself -

Number one, a disincentive is it is not always clear how to prepare a data set to share. There are lots of different conventions. People have learned them or not learned them at different stages in their careers. The data sets are becoming more complex, both the quantitative data sets and the qualitative data sets. They are becoming larger. They are taking different forms, so if you learned how to archive punch cards, those strategies probably don’t work anymore, at least if you want people to be able to retrieve your data. If you have archived them on five-and-a-half-inch discs, that probably won’t work. So there are lots of problems, but the big issue is that we really lack a set of clear and unambiguous and comprehensive directions that someone who might be inclined to share data and prepare data for sharing could use. So that it one disincentive.

Second disincentive is it actually does require effort to prepare a data set for sharing, and that effort is over and above what is required for getting the data ready for analysis and publication, and I particularly enjoyed Barbara’s talk, as I know Barbara spends a lot of energy preparing data sets and makes them available to folks in all stages of their career, but most people don’t, and the reason I think that most people are wondering is that most people don’t is that it is a lot of work. It is not a trivial task to take the data to that next step and truly get it ready so that someone who is not sitting in your office with you, in fact, can make good use of the data and not misinterpret it.

A third problem which I think there is some sensitivity to is that sharing data, at least potentially, leads to loss of exclusivity and recording on the data, and there are investigators who like to hold onto the data as long as they possibly can, so that they are the only source of it, so that they have an advantage in getting the data out before the larger community, in fact, has access to it.

Fourth disincentive is that sharing the data leads to examination by the outside community of scholars. Now, I think we have heard Barbara talk and others have talked earlier about how the community of scholars, in fact, is the principle that we are trying to strengthen, but I think in a lot of areas of social-science research and a lot of areas of educational research that is not a well-accepted principle, and people are not used to having their work scrutinized carefully, and by that I mean all aspects of their work. So I think there is concern there that exposing one’s work and work products, in fact, subjects it to additional evaluations by folks on the outside in a way that simply publishing a final paper does not.

I think what comes with that is sort of a fifth disincentive which is it is now going to require, if we move into this data-sharing mode, greater attention to research at all of its stages and all of its phases; that is, there is no more shortcutting. There is no more putting stuff under the rug. There is no more papering it over in the publication process, and, again, sounds like more work.

So there are lots of disincentives for people who can avoid data sharing, if that is an option, as it is in many parts of social and educational research. There are lots of reasons why people might want to avoid that.

Let’s talk a little bit about disincentives on the other side; that is, the sharee side, the person who is going to make use of the data, and here I would argue there are at least three kinds of disincentives that we need to address. One is there are technical barriers, and no matter how elegantly we prepare the data and how easy we make it to access, it still requires an investment of time and energy to access that data.

Secondly, there are what I would call ownership issues; that is, depending upon how scholars are trained, they may feel that it is not their work, in fact, unless they own the data, which means they were there when it was collected. They shepherded it all the way along, and, in fact, they, in a sense, had custody of this evidence from the very beginning. So there’s a sense in which it is theirs, and it is a personal identification with the data, and I think we see a lot of areas, both in the social sciences and outside where that is true, and that ownership is an incentive for people to participate in the process. So we need to recognize that.

And then third on the sharing side, the sharing-from side, I would say are issues that I would call artistry issues, and those are a little different than ownership issues, but they are related to it, and that is there are people who really do want to have that fine-grain control and be able to tweak the data in its preparation at all stages in the process, and, in fact, they don’t quite trust the way in which others might do it. “They would do it in a way that is okay, but not the way that I would do it.”

And then, finally, connected with that there is this issue of getting a sense of my work being whole, that is, let’s not divide up the research process in 77 different pieces, and I get to work on this grand assembly line of - you know - position number 76, where the data has all been prepared and I am going to then pass it on to someone else.

Now, you can take these disincentives seriously or not. My sense is for some people they are quite serious. For other people, they are trivial, but I think if we want to bring a larger group of people into this process, we at least need to think about them in a kind of comprehensive way.

Okay. Let’s turn to incentives, and what you’ll see quickly is my notions of incentives have to do with addressing, in many cases, the disincentives. I mean, do you think there’s some relatively straightforward things that can be done to minimize at least some of the disincentives.

Number one, I think that establishing standards for the preparation and archiving of public data sets I think would be quite useful. I think it is an appropriate kind of conversation for this group or other groups to have, and promoting them widely would be a tremendous service. It would require many fewer institutions to reinvent the wheel on their own. It is also a good time to be doing that sort of thing now, because, as some of you may know, there is this worldwide discussion of standards for all kinds of intellectual property. These data sets are but one of the many kinds of intellectual property that are going to be subject to those standards, particularly as digital archiving comes on in a fuller way. So it is a good time for us to be thinking about those standards.

Coupled with those standards I think might be a set of tools that would embody those standards and that would make it relatively easy for people to use these set of standard tools and archive the data. Again, the preparation of the tools, I think, would be a really important task for perhaps a group such as this.

Coupled with that would be a set of instructional opportunities, because no matter how good the standards are and good the tools are, there needs to be some place for people who haven’t done this before who would like to do it, would like to take their data to this next step to, in fact, learn how to do that from people who have the expertise.

In addition to that, I think Barbara has mentioned one of the things that has made it possible for her to do this kind of work is financial support. Building in financial support for a step of preparing data for archiving and public accessibility, I think, would be a legitimate call on resources and an important one to perhaps institutionalize.

Another strategy which it seems to me is well worth pursuing is figuring out how we recognize investments in data archiving and data sharing. We need something equivalent to authorship of a journal article connected with these kinds of products, and we need to think of them as valuable intellectual products, and that means not only do we need to think of them, but promotion committees need to think of them that way and funding agencies need to think of them that way, and we need to, in a sense, create that kind of authorship or creatorship, and I think that, right now, is pretty much missing. I think the people who have invested in doing this kind of work largely go unrecognized for it or if they are recognized, they are recognized for the publications that come from the work, not from the preparation of the archived data.

I think another incentive that we could work on and could create is developing communities of exchange around these issues, and that is professional communities that would reinforce and provide feedback to the creators of these data sets. So one of the incentives for creating the data set ought to be the learning that comes from interacting with all of those folks who were using it in a sustained way, so there is a real educational opportunity for the creator and for the developer of the data set.

On the sharing side - that is the from side again how will people use this - I would like to suggest that there are probably two things that we should really think carefully about. One, I think we want to think about - sorry. One of these is still on the making-available side, and that is one of the things I mentioned about the making-available side would subject one’s research to greater scrutiny and that there is a disincentive to doing that. I think one of the things we might want to revisit is the expectations that a lot of institutions have for junior scholars in this regard; that is, if we impose publication standards for junior scholars that are primarily quantitative standards, and in a lot of institutions they are quantitative standards - How much have you produced in a limited period of time? - then we create a disincentive for people to invest in doing very careful, high-quality work, if, in fact, at the end of the day, it is going to be a tallying an account, and it seems to me setting expectations based upon what we actually think high-quality work takes in terms of resources and time and energy would be something worth thinking about.

Also on the from side, I think we have an issue in dealing with publication outlets, particularly as we think about data sharing. I think the norms, in fact, do not support replication and do not support data sharing. I can tell you that as a journal editor I receive many, many, many reviews back from reviewers who say, “I have seen something very similar to this done before, and unless there is some grand ‘new’ knowledge, I don’t really think you should be publishing it in the journal.” Journal space is too precious.

Now, in some cases, that is just a cover story for something that is not very well done or very exciting, but in other cases, it really is a statement that says, “We will not support replication, because it is too close to something else that has been done,” and I think the expectations are pretty clear on this.

So, for example, I have also had conversations with senior scholars who will say, “Well, you know, you have published a paper of mine two years ago. I have done something which extends that paper, but I’m not going to send it to you because you are probably going to tell me it’s too similar to the paper that I have published before, even though it builds on the paper that I published before.”

So I think we need to create some greater sensitivity that if we really, in fact, want replication that the second study that may be - close to the first study is a very major contribution. There’s a number of findings for which we actually have a second study that is convincing and well done is not all that great.

So I will leave it there, and encourage people to help me think through more incentives for data sharing and replication.

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top