BEHAVIORAL SCIENCES

SOCIAL SCIENCES

EDUCATION

NATIONAL STATISTICS

500 5th Street, NW
Washington, D.C. 20001
Tel: 202-334-2300
Fax: 202-334-2201
E-mail: dbasse@nas.edu

Roundtable Participant: Luna Levinson, Institute of Education Sciences

MS. LEVINSON: I'm Luna Levinson. I'm with the ERIC program, and I have been with the ERIC program since 2000. And that period covers the clearinghouse model era into the redesign of the new ERIC contract and now implementation into the first 10 months of that contract.

And I think that ERIC entered this conversation about new forms of abstracts in part because of our new contract. But as background to that, I think it is important to remember that the ERIC that we are now using is the ERIC that has been built over decades, in which the database was a bibliographic database, and there were the abstracts, which were a central part of offering free database information about education literature.

And so, in the new contract the main focus that the Department had in mind was to connect full text articles to these abstracts, but not lose sight of providing as much free information as we could to make things easy to use. And so, abstracts became one of the problems or one of the issues that we wanted to address.

And I think as perhaps I move later on and talk a little more about what's going on inside ERIC as far as abstracts at the present time, it's important to remember that the ERIC bibliographic database was built through a system of clearinghouses that wrote abstracts on the journal literature; that the authors themselves did not provide those abstracts.

And so, when it came to the CIJE database, they were very abbreviated abstracts in which you would have less information than a journal might provide in its own abstract. And then when it came to the gray literature, there were much longer abstracts.

So, the statement of work approached the whole issue of trying to connect full text to provide as much information as possible, not to have a truncated system that would force out of the database into another vendor, to perhaps buy an article, and at the same time not to compete with the publishing industry which was selling journal articles on its own.

So, we were attacking a number of things at once in a great, big contract, and we continue to discover things along the way. But that's my introduction by way of what I have been doing in ERIC now for several years.

DR. BRADBURN: Let me make an assertion, which people probably will not like, and then get you to react to it. I should say my background originally at least was as a psychologist, so my first publications were in the APA journals. So, I was acutely aware of the structure of articles in the APA journal.

And there they have a structure of abstracts, but not structured abstracts. But they do have a structured form of articles, and that is quite rigid for at least most of the APA journals. So, much so, I say as an aside that it drove me out of publishing in APA journals.

But the assertion I want to make is that journals' views about abstracts are in a way as Susan said, I don't want to say tantalize the potential readers of the journals to read the articles, but I put it this way. It's to sort of maximize the search process for people who are in that universe of discourse.

So, that the particular structure of abstracts, just take APA journals, is different for different journals. And I think this was sort of referred to about the Sociology of Education Journal. It's sort of oriented towards the people who are reading those journals, and will be familiar essentially with the discourse.

The other thing which is interesting, again, as an aside, and I don't know whether this is just peculiar to the APA journals, but when I said it was tantalizing, because one of the things that always annoys me about abstracts in the APA journals is they don't tell you what the result is.

They describe the experiment, and then it says we looked at the relationship between X, Y and so forth, but they don't tell you what they found, so you've got to go read the article to find it. Whereas, contrasted with the structured abstract, essentially when they are sort of telling you how it came out.

The thing I wanted to pose to the panel, since those at least in your roles as dealing with databases that are covering lots of different journals, there is a sort of tension here between maximizing the work or the attraction for the people who are reading the journals, as contrasted with the people who don't ordinarily read the journals, but are trying to find out about some particular topic across journals.

And I think the structured abstract thing as a response, as we heard the background of that, to the frustration of people who are trying to make it more efficient to look across journals. Whereas, I think most journal editors and policies developed from journals is to essentially make it optimal for the people who read the journals regularly so to the speak.

Or if it's not just a single journal, at least a set of journals that cover it and share -- and when I say share the universal discourse, they typically are sharing the same methods, they are looking at the same sort of topics, and there are a whole set of things they just take for granted, and then they are looking within that.

So, I guess in the function of abstracts, could you comment on how you think that tension works out in the kind of areas that you deal with?

MS. HARRIS: Let me take Gina's approach and answer your question by not answering your question. It is a tension that we deal with, and I think we hope that it is answered through the structure of what we are asking the abstract to accomplish, whether it is structured or not.

And so, that leads to what has been an ongoing and very important conversation amongst the council of editors. We have a governance system where the editors of our journals, of which there are 45, meet once a year and discuss issues. And the issue of abstracts and structured abstracts has actually been on the table since 1998, just to give you a sense of how long we have been talking about this.

And as a part of that conversation, it was exactly what you are speaking to, is do we impose a structure that will drive like researchers into a pool of information that serves their need through the abstract itself? Or do we instead use the advantages of our search systems and our databases as a controlled set of terminologies to create the same and perhaps even more in-depth response to a researchers need in a given area.

So, the controlled terms is one of the ways, and the categories of managing the content as a whole is one of our approaches to that very research issue, as opposed to having the abstract be the answer to that question.

As a result of the discussions we've had with the council of editors, and then there is also a decision-making body, a policymaking body, the Publications and Communications Board. We have actually examined a good number of research studies about structured abstracts and how useful they are, regardless of what your discipline is.

It starts out of course with the basis of medicine, which is where they emanated. And one of the more important considerations as it emerged in these discussions was that regardless of whether the abstract is structured or not, most abstracts are not terribly reliable in terms of reporting accurately what the article was stating.

There was one particular study that was done in 1998-1999 by Dr. Picten(?). He took actually The New England Journal of Medicine, JAMA, it was BMJ, Lancet was another, there were six really big players in the medical industry, and looked at how the abstract mapped to the actual article.

He was gracious enough to call them A, B, C, D, so he didn't know exactly which journal results came out, but between 18 and 68 percent of all abstracts were deficient. Either they omitted key information, or they distorted key information in the abstract compared to what was appearing in the article.

Now, at the time we didn't have an APA study, and there has subsequently been an APA study or study of APA journals which fared a little bit better. In the group of eight that were studied, the range of poor abstracts went from 8 percent to 18 percent, depending on which journal you were looking at specifically.

But the result of all this was that our concern became more the accuracy of what was in the abstract, regardless of whether it was structured or not, regardless of how long it was. And in fact, the concern was the longer it was, more opportunity for error. It sent us back to the 120 words was sufficient, and left it at that.

So, this may not be answering your question, but it is the back drop against which our current thinking resides. We have a test group of editors whose journal content overlaps with medical, and those would rehab, psychology, and health psychology, and the psychopharmacology and others. There are several that overlap medicine, and they are actually looking specifically at what kinds of structured abstracts that they see in their universe might well be used within those particular journals, because there is such overlap with medicine.

And there is a lot of pressure there, because that community is very accustomed to seeing a structured abstract. We had one journal, and that was Rehabilitation Psychology, who in fact used structured abstracts for about three years. But the new incoming editors decided to abandon that going forward. So, it was an experiment of one, not a broad-based study at all.

So, at the moment, that's sort of the environment in which we are working in.

DR. BRADBURN: Why did they abandon it? Do you know?

MS. HARRIS: The interesting thing was that along with the other studies that we had looked at, we decided that the structured abstract neither informed nor improved the actual writing of the article or the substance of the science being presented. So, they took it from both the perspective of did the abstract help the researcher understand what was in the article? And the answer to that was given the errors that were involved, probably not.

The other approach that was taken was if you had a structured abstract, could that inform how authors actually wrote their articles over and above the fact that APA definitely has a very rigid structure? And that was proved to be not very useful either. It actually did not inform or improve the quality of the writing. So, given those two sides of the parameters, we're at a status quo right now.

DR. BRADBURN: Actually, this is a factual question, which I guess I have assumed the answer, but I just put it -- maybe Doug might know. In the medical journals, do they use the same structure for all journals, or do different journals use different structures?

MR. JOUBERT: No. When I was preparing for this, I can only look through the lens of working with a very specific -- my group is the Division of Computational Bioscience. And they use a certain section of literature which mainly has structured abstracts.

So, we have 100 core journals which have high impact factors, which our researchers love, because they are good for tenure and promotion. And I actually just went on the shelves and looked at those 100 journals, and about 60 percent of those, Nature and Science as prime examples, do not use structured abstracts. Within the context, there were a lot of different forms.

And the Medical Library Association, the MLA that I below to actually has been struggling with this as relates to our professional meetings. And we have adopted structured abstracts for the last two years. And I'm on the planning committee this year, and it's greatly increased the way that I'm able to actually go through these abstracts and look for the content of the research.

But we allow for qualitative, ethnographic studies, case studies, projects. So, I think when you decide on which method you are going to use, there is a lot of flexibility within there.

MS. LEVINSON: Currently, the ERIC statement of work calls for the contractor to produce within the first year, a draft working plan for assessing and moving towards a common abstract form. It doesn't say necessarily a structured abstract, but to lay out a plan for considering the factors of moving towards a common abstract form.

And the group of people that are working on that are a subgroup of the ERIC steering committee, some of whom are here today, and they through email are called the SAWG, Structured Abstract Working Group. And I think that it's a wonderful exercise, because what I see taking hold is a great deal of momentum among a small group of people that are researchers that are dedicated to testing out the idea, probably within limited a specific genre, to empirical research, of whether in fact this helps clarity and vocabulary, if there are existing vocabularies that could be used for this. And if in fact if it does help, systematic discovery and retrieval of information.

And so, in that respect I think there is the parallel that I see with medicine, because when Annals of Medicine published this first article, there were signatures of researchers to this that signed on, that necessarily weren't part of that ad hoc committee, but said we think this is good, we think this will facilitate assessment of the literature and move us quicker towards the kind of information that will help our clinical practice. So, in that regard, I see some parallels that are moving forward, and I imagine things will be tested within this group.

Beyond that, I think just my own reactions to this is that there is a great deal of merit to the idea. I also think that there is a lot that probably needs to be nailed down from a technical perspective that is probably specific to a database. The current design for ERIC architecture calls for machine-aided indexing, that will in all likelihood will not be searching full text, but will be searching the abstract.

And so, there would probably be some specific weighting and tooling and training that would have to go towards those abstracts if they were in a structured format versus a narrative. And so, that raises a larger question, if you have the structured ingredients, whether they are the nine ingredients, five or four or whatever they are, would discovery be basically the same in a narrative format versus a structured format?

Which might lead one to the larger conclusion that this is a simply another way, another access point, whether you are using a thesaurus, whether you're using any other specific meta-data field to discover things. But I think that it's a wonderful idea that we have a group of people encouraged to talk about this and explore it.

DR. BRADBURN: Did I understand you correctly that in ERIC there are abstractors? That is, you're not depending on the abstract of the submission?

MS. LEVINSON: We've got a combination. Everything in ERIC going forward will have an abstract. To the maximum extent feasible we'll use author abstracts. But some things won't have abstracts, government literature for example.

DR. BRADBURN: But if you are having a common framework, wouldn't you have to take the author's abstract and put it into a format that fits into ERIC?

MS. LEVINSON: That may be the case. Again, we have to see what a plan looks like for that, and whether we would publish such a template, and how complex it would be.

DR. BRADBURN: I was sort of fascinated with the error rate in authors' abstracts. And that does, to my mind, call in the whole question if you are really going towards structured abstracts with the idea that that is going to help, particularly in meta-analyses or in systematic summaries of literature and so forth, and there is that kind of error rate.

It seems to me you would want to think more of having some abstractors or something to do it, to at least control the error rate, as you would with coders in coding materials. So, you would do re-checking and things like that.

MS. HARRIS: For the Psych Info product we actually have a staff of 90-some people. A good portion of those folks are abstractors, and do exactly that. But that's against a controlled index of words. So, we do exactly that.

DR. BRADBURN: It is interesting, because this morning we were sort of assuming that -- one of the arguments for structured abstracts was quality control, essentially both for the journal editors presumably, and for the kind of wider searches of literature and so forth. And I guess the assumption, which wasn't then made clear, but I guess now is made clear is that you've got to do a lot of quick quality control on the preparation of the abstracts or the whole system would fall apart.

That sort of brings us I guess to the question of resources. What kind of financial and intellectual resources are necessary to realize a database that contains structured abstracts? How much different is it? Is there any way of thinking about how much different it would be than the present system?

MS. HARRIS: Strictly from a production perspective, there really isn't much difference in producing a structured or a non-structured abstract as an add on to the article itself. It really is fairly marginal. But in point of fact, to sustain a database of the magnitude of Psych Info, which is several million records presently -- just to give you a sense of scale, we actually posted 160,000 additional records in 2004. That was the largest we had seen for the lifetime of the product, but it's growing exponentially.

But also just to be clear about it, we support 2,000 journals in Psych Info. And if we are presented with a structured abstract we actually strip out the headings and make it a more traditional abstract to fit the models that we use, the infrastructure of the product, which again is driven by fields and keywords -- not keywords. I'm very careful about that.

Keywords are a very different universe for us. These are actually controlled terms and categories that we structure all of the content against that.

DR. BRADBURN: That's like indexing terms.

MS. HARRIS: It's absolutely indexing. And I can share with you as an average, that's about a $6 million a year project for us to sustain that.

DR. BRADBURN: On the indexing, do you add new indexing terms, or have you got a sort of defined set of indexing terms?

MS. HARRIS: They are very defined. In preparation for this in the last two and a half days, I spoke with some of the index folks. They tell me that they look for a critical mass of terms that are being introduced into the literature. And once it reaches its critical mass, then it becomes an index term. But it's sort of like adding a new word to the dictionary. It takes a certain mass to accomplish that.

Now, we are looking at those words that are below that buffer. In order for us to look at new opportunities for other content areas that we might want to explore in sort of growing the program. But it doesn't reach the Psych Info record base until it is to a certain critical mass.

DR. BRADBURN: If you added a term, can you retrofit old --

MS. HARRIS: I didn't ask that question. I wish I had. I would guess not. Well, probably not, because the record that is there does not include that term, so it wouldn't know how to search against it.

DR. BRADBURN: Oh, it's not that you are picking it up out of the article.

MS. HARRIS: We are assigning.

DR. BRADBURN: You're assigning it.

Doug, do you know how it works?

MR. JOUBERT: Well, at NLM, which is one aspect of NIH, and I don't work with that division, so I'm not terribly clear, but NLM processes about 500,000 new citations a year for PubMed, which is one the NCBI databases. They have a team of about 1,000 indexers. And they are mainly contracted out through the United States.

I went to a presentation at the American Medical Informatics Association this year, and since 2002, they have been actually using NPI, which is a medical text initiative, which handles about 20 percent of their indexing. And as the recall and the precision, when they do their results to see if they are actually getting the articles that they wanted to get increases, they will actually reverse that ratio, and try to have more of it done at a machine level, which creates its own problems. I can give you a really good citation if you are interested in that. Librarians are interested in that.

DR. BRADBURN: Luna, does ERIC use --

MS. LEVINSON: We are the wee little ones next to these giants. During the clearinghouse era ERIC accessioned about 30,000 new records a year. We have not begun accession new material yet. We haven't completed our list of journals. But I am sure that it will not be as large as what we are sitting next to, and certainly the contracted staff of information specialists and indexers is really quite, quite small by comparison.

DR. BRADBURN: Actually, this brings up a question which we didn't ask anybody to think about, but let me throw it out, and maybe people in the question and answer will comment on that. That has to do with the function that was mentioned about the relative value of structured abstracts against ordinary abstracts or whatever for essentially searching for relevant literature if you are in the mode of looking to do a review, or just because you want to find some things relevant to what you are interested in and so forth.

And that is whether you search on the abstract or whether you search on index, whether you have indexing terms. And it strikes me that at least for the function of finding articles, that indexing is probably a more reliable -- assuming that the database puts in resources like this in terms of indexing, and particularly if it's done in a judgmental way, not to deprecate mechanical forms of indexing. They can do fine, but at least at the moment relatively proportionally few can be done that way.

I'm a great believer in the value of good indexes. When I look at a book, I always look first at the index to see how it is structured in some way or other from the indexing.

And at least for the searcher -- of course that doesn't say once you have located an article, then the problem of finding out what is in. But I wonder, I guess the question that we all need as we go forward in thinking about this is given the problems of the accuracy of abstracts.

And even as we heard this morning, when you go to the articles, sometimes the information about the study that you really want in order to use it for like if you are doing something like what works, or you're making some sort of judgments about studies above and below some -- or have some method, or above or below some particular criteria, if the material isn't in the study either, then of course you are sunk.

But I just wonder in fact how you would go about thinking about the relative merits of that? How would you design a study to sort of say, well, let's compare doing a search using index terms, as opposed to doing a search using -- just searching abstracts for particular words, and the various sorts of the way some of the text ones do. And then what you could find out from the abstract as compared with having to dig into the article?

Does anybody know of any such studies?

MR. JOUBERT: There are about 20 studies that probably would be more interesting to someone who in an information science perspective. I did bring citations with me. As a librarian, I always bring my citations. And they were probably -- some of them are good studies, and some of them are not so good.

One of the ones that was actually by McKibben(?), who is a Canadian librarian at McMasters, who works with Brian Haines(?), who was one of the original ad hoc for the Annals article, did not find a high precision level of recall for the articles that were done in a particular method.

What was more shocking is that clinicians were using abstracts only about 20 percent of the time, rather than going to the original article, which I found a little more scary.

When I was preparing for this, I was looking through the lens of someone who is in the clinical sciences. And I was re-reading Haines' original objectives were they wanted to facilitate peer review before publication, they wanted to assist clinical readers to find articles that were both scientifically sound and applicable to their practice, and allow more precise computer literature searches.

This was written in 1990. I think the last issue is becoming a little less relevant as search algorithms become more complicated. But I think there is a unique aspect of clinical research rather than some other areas of research, because basically clinical research is translational research. At least at NIH it's done that way, and at freestanding academic health science centers, where you have core hard science bench researchers doing the groundwork that actually leads to discoveries in clinical medicine, and then that's what is actually used on the ward.

So, I think -- this is kind of off the subject, but I think that put in place, a series of steps which on the surface seem simply an editorial position by the Annals of Internal Medicine, but has had much broader implications for medical researchers.

DR. BRADBURN: Has the structured abstract movement or policy let's say in biomedical journals, does it go beyond the more clinical-oriented terms?

MR. JOUBERT: As I mentioned earlier, I was looking at the 100 high impact publications that are our core journals. It was unevenly matched. I did find I thought just because of the group that I work with, that there would be more structured abstracts in the harder sciences like the physics and chemistry and physical chemistry. And they tended to not have as many structured abstracts.

And I think that speaks to the sense of urgency that the ad hoc committee had in 1987, because as you are seeing now, there is just this huge amount of information being produced. And as someone said, you can't read everything. But at least in a clinical situation you need to make those sort of critical decisions about what therapy is going to work in a particular situation. And at least on a superficial level they felt that structured abstracts were the answer to some of those questions.

DR. BRADBURN: I think one thing I have come to an appreciation for is the magnitude of what is out there, so to speak, not just in the biomedical world, but what Psych Info does. Even those ERIC seems small compared maybe to these two, but it's still pretty large if you think about an individual facing this level of potentially relevant material, which seemed to be the presenting questions there.

And I think the background for the Mosteller, Nave, and Miech paper is how do you cope with finding all of this? This is of course a big problem in general in this electronically stored information and the whole information explosion, is how do you find and cope with all of this enormous amount of information?

And I guess the question we keep asking, what is the structure of the abstracts against alternative strategies? Alternative strategies being I suppose searching on indexing terms, putting your money into better indexing, or putting your money into better abstracting the sense that you have professional abstractors, rather than trusting the author to do the abstract.

MS. HARRIS: That would have huge financial implications.

MR. JOUBERT: You can create a number of tools. I have heard people mention keyword searching a number of times. And the librarian goes, no, use an index. There are subject terms that exist for actually finding relevant articles.

But I'm looking through the lens of someone as an information professional, and not necessarily on the other side of the spectrum. But you can create a lot of tools, and a lot of it is educating users on the effective way of using these tools.

MS. HARRIS: We are actually looking too at developing tools that I would describe -- so, this is only Susan's description, not APA's description -- as a decision tree type of methodology for looking at your content. So, that if you have a symptom, and of course this is behavioral, so if you have a behavior that you are trying to treat, it gives you options as to what the subtleties of that behavior might be that lead to a particular kind of intervention.

Whereas, most search engines chunk the information in a way that you never get to the subtleties of the decision-making tree. This is just my description of it. So, I think that while that is far more complex a database to be able to build, it certainly is one I think -- in our discipline I think it would serve very, very well, and perhaps that's true of education as well.

DR. BRADBURN: Actually, this brings up another issue which I don't think really got into our framing questions, but it's one that we had a discussion on when we had a little pre-thing here. And that is the relationship between abstracts, structured or otherwise, and the particular search engine that you are designing.

And as Susan just mentioned, think about alternative strategies for accomplishing purposes of finding things, and finding relevant things, and then knowing about them when you get them. That's more creative ways of doing search engines.

This leads you to bring in cognitive psychologists or other people who are interested in how people search for information, and trying to design search engines that are more in line with the way people coming to somebody would ask questions and try to find -- it gets into certainly the problem I have always had with keyword type things, because the ambiguities in language take you off into all kinds of different ways. So, these kind of decision trees which say which meaning of resistance did you mean, or which meaning of what ever.

MR. JOUBERT: It's kind of a little off topic, but there is some interesting research being done on the schematic Web. But as you just mentioned, cognitive mapping as it relates to language is a very complex task. And I don't think natural language processing will be at that level for a while.

DR. BRADBURN: One of the things I have been pushing in my previous role at NSF was research on just natural language processing, and understanding how meaning is encoded in language, and how ambiguities in language can be solved in some kind of machine way.

But I think it is worthwhile as we go forward in this discussion to think about not just structured abstracts as opposed to regular abstracts, but to think in the larger context of what structured abstracts are trying to do, and are there alternative ways of accomplishing the goals that either more in line with -- well, first of all, one would be utilizing the capacities of information science, computers in ways that we may not be using very well now.

MR. JOUBERT: And I think it will address, probably not perfectly, but some of the issues that were addressed in the first session about including all these rich, descriptive ways to talk about research that are actually not confined, and that probably will occur in meta-data. And that will be under the hood.

Feedback | Back to Top
Copyright @ . National Academy of Sciences. All rights reserved. 500 Fifth St. N.W., Washington, D.C. 20001.
Terms of Use and Privacy Statement