|
NOTE: This is an unedited verbatim transcript of the Symposium on Electronic Scientific, Technical, and Medical Publishing and its Implications prepared by CASET Associates and is not an official report of The National Academies or of the Committee on Science, Engineering, and Public Policy. Opinions and statements included in the transcript are solely those of the individual persons or participants at the symposium, and are not necessarily adopted or endorsed or verified as accurate by The National Academies.
******
THE NATIONAL ACADEMIES
COMMITTEE ON SCIENCE, ENGINEERING AND PUBLIC POLICY
SYMPOSIUM ON ELECTRONIC SCIENTIFIC, TECHNICAL AND
MEDICAL JOURNAL PUBLISHING AND ITS IMPLICATIONS
May 20, 2003
The National Academies
2100 C Street, N.W.
Washington, D.C.
Proceedings By:
CASET Associates, Ltd.
10201 Lee Highway, Suite 160
Fairfax, Virginia 22030
(703)352-0091
* * *
PROCEEDINGS
(8:35 am)
Agenda Item: Welcoming Remarks - Bruce Alberts, President, National Academy of Sciences
DR. ALBERTS: Good morning. I'm Bruce Alberts, the president of the National Academy of Sciences for the last 10 years. It's a pleasure for me to be here to welcome you to our home.
Unfortunately, I came back from Cleveland last night, got on an airplane, and I got a very bad cold. Traveling is not much fun. So, I may not be too coherent this morning, but fortunately, I'll be brief.
Let me just say a word about the academy, because I know not all of you are familiar. We got a charter as a private, independent organization existing in Washington, DC from Abraham Lincoln's Congress in 1863, which said we could exist as an honorary association of the nation's best scientists, at least that was the way it was supposed to be.
But to do that, we had an obligation, and that is the academy shall, whenever called upon by an department of the government, investigate, examine, and report upon any subject of science or art -- art actually meant technology in those days. But here is the catch, the academy shall receive no compensation whatsoever for any services to the government of the United States.
We had a rocky first 10 years, but it worked out very well, because it turned us into a massive volunteer organization. We do a lot of work, as I will tell you in a second.
We now call ourselves the National Academies. This name is about two years old, because under the same charter, two other honorary organizations were subsequently incorporated, the National Academy of Engineering, and the Institute of Medicine. Together, these three organizations have more than 5,000 members.
In World War I, an operating arm of these organizations, which was named the National Research Council, was set up, because it was realized that scientists, engineers don't have all the answers, and we need lawyers, teachers, others to put on these advisory panels that we established. And of course, World War I made that very obvious.
Today, we are a very active organization, more than one report every working day; 85 percent of these are requested by the US government. The critical issue for everyone who uses our advice is that they know they are getting independent advice. What the no compensation whatsoever clause turns out to mean is that the government will pay for the cost of the study, that is, it will pay for the airfare for the volunteers who serve on the panels, their hotels and their food while they are here, as well as for the staff to support them.
Despite the fact that the government pays for it, they understand that they have no control over the result. So, after the first meeting or so when they put their input into the panel's deliberations in an open session, the committee writes their own report, and the full text is released to the press and to the public as soon as the report is delivered to the government. We don't negotiate the answer, and that's what makes us valuable.
There are two kinds of reports, science for policy, which is most of what we do. I'll give one example of that. And policy for science, which is a very important function. And what this meeting is about, at least to some extent, is policy for science.
Science for policy, there was a dispute right after Pres. Bush came into office about the level of arsenic in drinking water that should be set by the Environmental Protection Agency. Gov. Whitman asked us to look at the science, and we did that, and discovered that arsenic was actually more dangerous than we had thought earlier, and therefore, she accepted the 10 part per million standard that Clinton had recommended just before he left office.
Policy for science, this is our huge report, a contribution by the academies after the September 11 event. This report called, "Making the Nation Safer: The Role of Science and Technology in Countering Terrorism," basically lays out a road map for what is now the Department of Homeland Security. This was very unusual, because we had more than 160 people involved in this one.
Well, this symposium involves policy for science, and some important aspects of this concerning scientific publication as far as I'm concerned are the validation aspect, the dissemination aspect, and the fact that specific obligations are created when the publishes, and I just want to go very quickly over those.
The validation function of publication of course is accomplished by peer review and editing to be sure that the science is valid. And we insure that the data and the methods are complete and clearly presented. This is what distinguishes scientific journals from the mass of stuff that is on the Internet.
I really very much appreciate the editing and the refereeing function, because I get all kinds of scientific ideas mailed to me personally as president of the academy, which I have no time or ability to evaluate. There are a lot things claimed to be science out there, and we need to discriminate between what is and what isn't good science, otherwise, we can't move forward.
Publication of course also has a dissemination function, and that's what this meeting is really -- that's what is really novel about this meeting, because we have a new way of disseminating, electronic publishing on the Internet. It greatly enhances the potential reach of the dissemination part of science. It allows us to reach nations and peoples who would otherwise not be reached. And by allowing powerful search engines to rapidly find desired specific information, can we in principle, make much better use of the data and the information and the knowledge that is out there.
This latter function I assume will be talked about here. We could do a lot better. There is a great opportunity for what is called data mining, so, if we could really find what we really want. There are a lot of journals that are not yet up in available form. I get letters from our friends in India, for example, pointing out that their journals are not really abstracted by our services. Science is an international activity, and we need to do better about making the knowledge that is developed everywhere, available.
Well, we all know about this. I'm a biological scientist, by the way. This, I think is a wonderful effort by our government to take what was formerly the Medline services, that when I was in my laboratory at the University of California, San Francisco, I used to ration by students' use of Medline searches, because it cost me $100 every time they were -- they weren't too careful about how they used it, and it would generally cost me $100 for every session that one of them was involved in.
So, I, along with everybody else in the world, was overjoyed when this came to be a free service, basically, a contribution of the US government, using taxpayer monies, to make the biomedical literature really accessible. This, of course, is run by the National Library of Medicine.
Our own journal is the Proceedings of the National Academy of Sciences (PNAS). I see Nick Cozzarelli, our editor-in-chief this year. We have tried to set an example by making it as available as we can without going broke. And we make it immediately free to 130 developing nations on the Internet now. And we make it free to everyone after a six month delay.
Here are some of the countries. I couldn't get them all on the slide, but you get the idea. As these countries become more effectively connected to the Internet, this is going to be enormously valuable for them, and it will change their opportunity of their universities and their scientists to participate in this great international scientific effort.
We also publish, of course, our reports. And we want to do the same with them. This is a different kind of scientific literature. This is consensus studies of what the science that other people have done means in terms of arsenic in drinking water, climate change, thousands of other issues. This is about 10 years worth of our reports. All 2,800 books are up online, and we have now made the PDFs that we have for these, immediately free to 130 developing nations as well.
What about the obligation? So, I have talked about the validation. I have talked a little bit about dissemination. And I want to talk about the obligation part. We had a workshop here about a year ago or so. It was sponsored by the Board of Life Sciences, and its title was, "Sharing Publication-related Data and Materials: Responsibilities of Authorship in the Life Sciences."
Tom Cech, the head of the Howard Hughes Foundation, was the chair of this group. Tom, of course, is a Nobel prize winning scientist, and a very outstanding biomedical scientist.
And basically, they made these points. The publication of scientific information is intended to move science forward. More specifically, the act of publishing is quid pro quo in which authors receive credit and acknowledgement in exchange for disclosure of their scientific findings, providing these findings in a forum on which other scientists can build with further research. That's how science works.
An author, therefore, has the obligation to reasonable data and materials -- this is life sciences, remember -- to enable others to verify and extend published findings. Therefore, publication creates obligations in science that need to be enforced. These need to be enforced through grant agencies that fund, and through the journals that publish the work.
And without this kind of sharing, the progress of science will be greatly slowed. So, these are all issues that come to my mind. I should point out although I have been a bureaucrat for 10 years, full-time job here in this building, before that I was a scientist for 30 years, so I'm telling you things that I understand from my own life as a biological scientist.
So, in ending, I just want to invite you to do two things. If you have not been here before, there is a wonderful statute of Einstein right out on Constitution Avenue. It's the last place where the school trips end, where they take the pictures of the kids sitting on his lap, and sometime during your stay here, please do that.
We also have a wonderful new building downtown. We actually inaugurated it about a week ago. It's the new Keck Center of the National Academies at 500 Fifth Street near the National Gallery of Art. And I urge you to come visit us there as well.
Thank you.
[Applause.]
It's now my pleasure to introduce Ted Shortliffe, who deserves a great deal of the credit for organizing this symposium. He is a member of COSEPUP, which is the Committee on Science, Engineering, and Public Policy. It's the highest order group we have doing mostly policy for science. He took on this obligation as part of his membership in that group.
Ted.
Agenda Item: Symposium Overview - Edward Shortliffe, Professor and Chair, Department of Medical Informatics, Deputy Vice President for Information Technology Health Sciences Division, Columbia University, and Symposium Chair
DR. SHORTLIFFE: Thank you, Bruce, and welcome to everyone here. We have a great turn out, and actually many more people planning to arrive. So, I wanted to reassure you that those barriers are going down at the break, so that we will use the entire auditorium. We were a little surprised to seem them there when we arrived this morning, and we will be able to spread out a little bit more later in the day.
I'm pleased to have an opportunity to welcome you on behalf of COSEPUP, the Committee on Science and Engineering Public Policy of the National Academies, and its chair, Dr. Maxine Singer. This was a topic that was brought to COSEPUP as a potential issue worth of symposium and further study about a year and a half ago, and I'll tell you a little bit about that in a moment.
My own interest in this topic, I quickly became interested in participating as COSEPUP decided to take this on, because I come to this as both a medical scientist, and a computer scientist, and therefore have been aware of the fascinating interplay between scientific activities, scientific research, dissemination of information and technology actually not just since the Web, but going back some 20 or 30 years, certainly to the introduction of Medline.
In my early days in the National Library of Medicine on committees here, I became aware of a major foreign publisher that was upset that Medline was being provided at prices that seemed so low, that the commercial sector could not effectively compete, and therefore it was actually a foreign who was lodging serious complaints about something that most of us had come to accept as a wonderful resource, just as Bruce was mentioning a moment ago.
It's hard to be in science, and not be aware of the changes that have occurred in the last 25 years or so. As an administration in the dean's office at Stanford University in the 1990s, I became aware of what was happening in the library budgeting process.
I oversaw the library there, and the acquisitions budget going through the roof, and the cutting of more and more journals, and the challenges of wanting to move into a digital collections. And simplistic questions about gee, if we can go digital, do we really need a physical structure anymore? Or can't we just use it just for old stuff? And actually, these are big issues that we need to address and handle.
And it is part of the reason that every institution is feeling challenged by being caught in a period of transition where we deal with the paper of the past and the present, and the recognition of a very different electronic world that is already upon us, but will be evolving even more in the future.
We see the publishers struggling with these same issues, trying to determine proper pricing mechanisms. Trying to decide whether to tie electronic access to paper subscriptions, the kinds of topics that we will be discussing over the next couple of days.
Now, today, I edit a journal, which is a traditional journal in the sense that it actually comes out on paper. On the other hand, we decided, given the topic and the kind of people that write for it, that it was quite reasonable for an international journal of this sort to be done totally electronically until the moment when the paper is produced in the final form.
And that experiment has been going very well. We do essentially all submission, all reviews, all interaction with authors and reviewers electronically. And everyone has sort of embraced that without any difficulty whatsoever. But we encounter significant debate among the editors, the editorial board, the authors, and the publisher about just what the access to this now electronic publication ought to be for our readers, not just in the Third World, but in the US. Who really owns the content that we are putting out, or who should own it? And what costs should be.
So, when the council of the National Academy of Sciences asked us in COSEPUP to consider this topic a few years ago, I think they were driven by very much these same kind of soul searching questions going on within the academies; major publication activities such as those that Dr. Alberts just mentioned, obvious PNAS, and then all the National Academies Press publications already caught up in the transition to an electronic world, and with significant issues about how much the role of paper would be in the future, and what the pricing and access issues ought to be.
Now, there have been many symposia, and in fact even here at the National Academies, several reports on the general subject of electronic publishing and rights, and copyright. And so as we began to plan for this event, we wanted not to be duplicative, and instead to try to identify those areas in which there seemed to have been less attention, or where the answers were less clear.
We recognized that some of the answers we would all like to have need to be supported by data that are not yet readily available. And that will become clear, I think, as we discuss this topics today and tomorrow. But I was fortunate in having a tremendous steering committee that brought the expertise I lacked in knowing a tremendous amount about this topic.
All of them are involved in some way in the program, and you will get to meet them and hear from and every one of them: Dan Atkins, from the University of Michigan; Floyd Bloom from Scripps Research Institute; Jane Ginsburg from the Law School at Columbia; Clifford Lynch from the Coalition for Networked Information; Jeff MacKie-Mason from the University of Michigan; Ann Okerson from the library at Yale University; and Mary Waltham, currently a publishing consultant.
All these individuals live lives closely connected to this topic, and worked hard, with many, many phone calls among ourselves to try to put together what we hope you will find to be both stimulating and novel over the next two days.
I'm not going to give long introductions to individuals, because all the bios are provided to you in the registration packets, and you can find out about people's backgrounds and interests and activities that way.
Among other things, I hope you find the next day or two to be provocative. I know that several of the speakers are planning to be provocative, and that is fine. We hope for that. You will note that we have also done something that is a little unusual for meetings here in that there is much more time for discussion than is sometimes allowed at meetings of this sort.
It's one of the reasons that this is almost a two day meeting. We felt that the topics were sufficiently controversial and difficult, and there were so many opinions and questions that might be asked that we really wanted to make this as accessible the audience to ask questions, and make some comments, as just to have selected individuals who are on the program, do the talking.
So, you will see as you look at the program, significant time after every one of the five panels for discussion. And we have microphones here, and we encourage you all to feel free to participate at that time, although we will, so that everybody gets a chance, ask you not to take too much time at the microphone when you come up and ask a question or make a comment.
How we organized it, we tried to be a little different in the way we did this. Today, there are three panels dedicated to the sort of here and now. Tomorrow is much more looking to the future. The first panel is just looking at costs. What does it really cost to do publication in an electronic world? And it's interesting how little public data there are on that subject.
In the second panel, looking at the various business models for balancing cost, income, and access concerned. And in the third panel this afternoon, legal issues in the production and dissemination and use of materials in an electronic world.
And then tomorrow, two topics, both in the morning. First, what will publishing be in the future? And second, what will it mean to be a publication in the future? And we have some indications of this as our knowledge of this topic evolves. And then we have some sort of predictions or anticipations about where it is all headed.
After lunch tomorrow is an attempt to try to bring this all together. We have some excellent observations, who are going to be here listening to every word, and trying to help us summarize and talk about some of the key lessons. Our goal is to produce, when this is all done, not only the symposium itself, but a report based upon the symposium that will be available both electronically and in print form, and also to give some advice to the NAS council and National Academies, as they requested when they asked this to be put together and scheduled.
Now, because there was substantial interest in this event around the country from people who could not actually physically be here, the National Academies have decided to make this available over the Internet. You may have seen some messages to the effect that this is being Webcast today. So, there are a couple of implications of that.
[Administrative remarks.]
And with that introduction to the event, and my welcome to you all, I would like to start things off by introducing our keynote speaker, and that is Dr. James Duderstadt, president emeritus and university professor of science and engineering at the University of Michigan. Jim has been there about 35 years at the University of Michigan, although he, as a member of the faculty, has taken time off not only to be president, but also provost and dean.
His training was in electrical engineering from Yale, with a BS, and from CalTech with a PhD in engineering science and physics. He has worked in nuclear energy, both fusion and thermonuclear fusion, and has received the National Metal of Technology and the Yale Lawrence Award. He has been chair of the National Science Board. He is a director at Unisys and CMS Energy, and he too, is a member of COSEPUP at this time.
He's been very involved with the National Academies over the years. He has done studies for the National Academies, "Scholarship in the Digital Age," with the NRC, "The Impact of IT on the Research University," with the NRC, and he recently wrote a book with one of our steering committee members, Dan Atkins, as well as with Doug Van Houweling on higher education in the digital age.
So, he has thought a lot about the issues that are before us today, and we are delighted that Jim has agreed to kick things off.
Thank you, Jim.
Agenda Item: Keynote Address - James Duderstadt, President Emeritus and University Professor of Science and Engineering, Millennium Project, University of Michigan
DR. DUDERSTADT: I suggested to Ted that perhaps we regard this front section as first class, and those folks that arrive too late, we'll back in coach, where most of us fly, I suspect, these days.
Well, it's a remarkable turn out on a crisp winter day in Washington. I suppose it's a good sign of just how important the issue of electronic publishing is.
The goal of this conference is to bring together experts from an array of constituencies, producers and users, to look at some of the technical changes that have occurred in electronic publishing, and how that influences decisions to publish or not; to identify the needs of the science, technical, and medical publishing enterprise itself as users of journals; to understand the responses of both the commercial and not-for-profit scientific, technical, and medical (STM) publishers; and to examine a very broad spectrum of proposals and activities underway that are attempting to respond to the needs of the community with these new technologies.
I think a major focus of today and tomorrow will be on looking at business models, and trying to establish the degree to which they address many of the challenges and concerns. But I would also suggest that during the discussions it important to keep in mind the ongoing developments in the scientific enterprise itself, stimulating in part, and being stimulated by this kind of a scholar communication.
How is electronic publishing affecting the practice of scientific research, the communication of research results to scholars and others, perhaps including the public, the curation of data and evaluation of research and archiving of results?
The challenge is to identify the issues, the problems that the STM community really needs to control and resolve if it is to exploit the remarkable opportunities, and to cope with the challenges presented by this very rapidly evolving technology.
Now, I must confess that when Ted approached me about kicking off this discussion, I was a bit perplexed, because as a one time, and now once again scientist, and a has been university administrator, my basic approach to these issues has been one of avoidance -- trying to avoid the hassles of publishing as a scholar, and trying to avoid as an administrator, the costs of acquisition, and the maintenance of STM journals.
After some further, and I should acknowledge desperate thought, I was able to identify several potential hyperlinks between my own experience and the subjects you will discuss today and tomorrow. I continue to be a voracious consumer and occasional producer of electronic media. I know write books rather than papers, but I also dabble in multimedia, 3-D simulations, and virtual reality, and so forth.
At least count, I not only have a half a dozen computers in my office, but over a terabit of data sitting on my desktop in various firewall drives. And I don't have a clue on what is on most of those drives, but it just passed the terabit.
During my last years as president of the university I built a major library of the future at Michigan we called the Media Unit. And that houses part of the servers for the JSTOR project.
And I went through a rather unique experience several years ago. As you know, Michigan is very much in the news these days with a major court case, two court cases actually, in front of the Supreme Court. And I am the “et al.” in those cases.
As a result of the deposition experience and getting ready for that, they decided to put the entire electronic inventory of everything I had scribbled down, received, drafted during my roles in university leadership, online on the Web through our library, over 3,000 documents. And once again, I don't have the nerve to go to that Web site and see what's on it, but I understand some of it is boring, and some of it is rather disturbing.
Most important, over the last two or three years, Bruce and his colleagues in the National Academy, convinced me to co-chair with Bill Wulf, president of the National Academy of Engineering, an interesting research project trying to understand better the impact of digital technology on the future of the research university. And that will relate to some of the comments I'll make later today.
The current situation can perhaps be described as a chaos of concerns, the continuation of some disturbing trends that have evolved over the last couple of decades that access to STM information is increasingly expensive, and in some cases restricted. And yet, the amount of information generated at research institutes continues to grow.
Journal subscription prices continue to escalate, and yet university library budgets fail to keep pace, particularly in these days with the collapse of the equity market, and the collapse of the state budgets. Last week, I had a chat with Bill Gosling, who is head of the University of Michigan libraries, to get his sense as someone on the front lines, of what has been happening. He has noted that the price inflation in electronic publication resources has continued to run well ahead of the CPI. He estimates it running at 10-15 percent increases a year.
But even more dramatic has been the increase in the pricing for reference tools, increasing as much as 600 percent over the print cost of bound volumes. The complexity of dealing with various financial models, the traditional acquisition of the physical form of the journals, but then licensing schemes to have access, and provide access to broader communities to broader forms.
The nature of searching these days that requires full text searches against a large number of titles leads to what librarians call super sizing, where you have to now subscribe to the full stable of titles offered by certain publishers in order to do this kind of search.
It's clear that these new technologies have created very fundamental changes in the production, the management, the dissemination, and the use of all kinds of information. And if I were to categorize very simply the two camps of concerns, on the part of the publishers the critical question is how many copies of work will sold or licensed if networks make possible planet-wide access? And the nightmare of course is that answer is only one. One document can be replicated time and time again, to not only serve, but perhaps collapse the entire marketplace.
On the other side, the nightmare to consumers is that in our efforts to preserve the marketplace, we'll put in place an array of technical and legal protections that reduce access to what should be a public good, society's intellectual and cultural heritage.
There are a lot of reactions and counterreactions at the level of the universities. The first reaction is a budgetary one. They simply cancel any subscriptions. In some cases this is mandated by simply the necessity of limited resources. In some cases it's an effort to kind of use a 2 by 4 to get the attention of the publishing industry, although what cancellation generally does is simply drive up the costs even further.
Since libraries are at a disadvantage in trying to negotiate one by one, they are increasingly forming an array of consortia to negotiate subscription licenses and packages, SPARC, the Scholarly Publishing and Academic Resources Coalition, within the CIC, which is the provosts' name for the coalition involving the big ten universities, plus the University of Chicago. There is an effort made to coordinate the libraries.
In fact, during the 1990s, the Big Ten Athletic Conference, which actually is the name of the presidents that sit over those institutions, actually took time out from debating football revenue sharing, and actually tried to merge those libraries into one gigantic library with 70 million volumes, with common acquisition and so forth. Some steps have been taken toward that, but that's an example.
In other instances you have seen essentially rebellion at the grassroots. Editorial boards that have protested against the commercial publisher journal prices, and have essentially resigned and moved to scientific societies, less expensive publishers.
The complexity and shifting from a first sale approach characteristic of paper, to licensing has caused a good deal of experimentation.
There are a lot of other variants. One that kind of reminds me of one of the components of Ann Arbor's heritage is what I called the university microfilms approach, an edition of one. That is, to begin to make it acceptable that there may only be one physical copy of a document that say a young faculty member may use in tenure considerations, but with the ability to reproduce that from an online copy, at the user's expense.
One of the more interesting approaches you will discuss I suspect over today and tomorrow are what I would call open source strategies. Universities are historically about open inquiry and communication. And the success of the open software movement through Linux, the Apache Web server and so forth have given rise to a number of initiatives, the Open Knowledge Initiative, the MIT OpenCourseWare Project, and others that aim at developing new financial models for the open distribution of scholarly materials, perhaps building charges for dissemination into research grants that generate the information in the first place.
I might note that this is not only consistent with the traditions and values of academia, but also reinforces the definition of the university as a public good, an issue that university leaders are increasingly worrying about these days, when the rest of society tends to look at us more as a market commodity.
In summary then, advances in digital technology are producing radical shifts in our ability to reproduce, distribute, control, and publish information. And yet, as this becomes more a part of scientific activity, it tends to run head long into the existing, practices, policies, laws that government traditional publishing.
The issues are complex in part, because the stakeholders are so many, so varied, with different agendas. People who fund research want to see that the information is advanced and made available to the public. They regard it as a public good in essence.
Authors, editors, and reviewers of course don't charge for their labor. They are motivated to contribute to the public good, but of course they also have other rewards, not the least of which is tenure.
Publishers, as intermediaries, while they don't pay for content, they do add significant value, and provide the work in published form. Libraries, similarly, are intermediaries. They provide access to STM content. They pay for content, but they usually don't charge for providing access to it. And of course, the users may either pay for content in some cases, or obtain resources free through libraries.
The dilemma I think was in my mind, best stated a number of years ago by an individual some of you may know, John Perry Barlow, who with Mitch Kapor founded the Electronic Frontier Foundation. He portrays himself as a gentleman rancher and former songwriter for the Grateful Dead.
Barlow's enigma is stated as follows. "If our property can be infinitely reproduced, and instantaneously redistributed all over the planet without cost, without our knowledge, and without its even leaving our possession, how can we protect it? How are we going to get paid for the work we do with our minds? And if we can't get paid, what will assure the continued creation and the distribution of such work?" That was an article that he published in Wired magazine in the mid-1990s.
And it was followed on by another provocative article by Michael Crichton in he suggested that this technology is completely undermining all media, which he referred to as mediasaurus, like the dinosaurs. This was in 1995. He said, "Soon we will have AI based agents roaming the databases, downloading stuff I'm interested in, and assembling for me a front page." It sounds like Google News, doesn't it?
So, how do we approach the issue? Well, your conference is going to break it down into five different panels, and then a summary panel. But what I like to do is make some particular observations in three areas; the changing nature of the practice of research in science and technology and medicine; just a brief comment about a couple of policy and legal issues; and then finally a comment about the extraordinary exponential evolution of this technology.
But first, I must step back, as all people that address this issue do, and make a general comment. I think we realize that the age of the 21st century is different than the industrial age of the late 19th and 20th centuries. We have an economy that is shifting away from material and labor-intensive products, to knowledge intensive products that increasingly depends on the creation and the application of new knowledge.
In a sense that means that intellectual property is becoming the most important asset of all as we approach the future. Again, returning to Barlow, "Notions of property, value, ownership, and the nature of wealth itself are changing more fundamentally than in any time since the Summarians first poked cuneiform into wet clay and called it stored grain. Humanity now seems bent on creating a world economy form based on goods that take no material form. In doing so, we may be eliminating any predictable connection between creators, and a fair reward for the utility of pleasures others may find in their works."
"Since it is now possible to convey ideas from one mind to another without ever making them physical, we are now claiming to own the ideas themselves, and not merely the expression. And since it is likewise now possible to create useful tools that never take physical form, we have taken to patenting abstractions, sequences of virtual events, mathematical formula, the most unreal state imaginable."
Well, it's clear that this creates a quite different world, both for those who generate knowledge, and for those who distribute it and use it. Now, back to the three perspectives.
First, the changing nature of science and technology research. One of your colleagues who will chair the fourth session, Dan Atkins, has just finished chairing a major National Science Foundation report, "Revolutionizing Science and Engineering Through Cyber Infrastructure." That is, trying to understand better, the changing nature of science itself as this technology becomes more and more pervasive.
As this report, which is now known as the Atkins report -- so, there is a certain fame associated with that -- is becoming more and more widely read, it points to the process of knowledge creation itself, experimentation, analysis, theory development, forming conclusions that is increasingly occurring entirely in the digital world.
And what that has done is shifted from kind of the sequential process of research, publication, validation, dissemination to more of a parallel flow model that is interactive, in which the process of publication and distribution actually becomes almost the process of research itself.
The key point of the report is that distributed network computing technology is providing a new kind of infrastructure for federating people, information, computational tools and services, and specialized facilities into virtual organizations, so-called colaboratories, grid communities as the Europeans call it, eScience, a cyber infrastructure.
The vision put forth by this report is to use this infrastructure to build more ubiquitous, comprehensive digital environments that become interactive and functionally complete for research communities in terms of the people, the data, information tools and instruments, and that operate at unprecedented levels of computational storage and data transfer capacity.
Part of the aim of this is to trigger the necessary public and private investments to create this cyber infrastructure, but nevertheless many elements of it are already in place, and it will significantly change the nature of scholarly activity. An example that we are all discussing today and tomorrow is scholarly publication.
A common quote many of you have seen before is that of Vannevar Bush a century ago, who wrote, "Our methods of transmitting and reviewing the results of research are generations old, and by now are totally inadequate for their purpose." And yet, much remains the same as it did in Bush's time, except the volume of literature has increased vastly, as have prices.
In a sense, the reality today is that electronic publishing is becoming the dominant mechanism for publishing and reading scholarly materials. It opens vast possibilities of course, but on the other hand, it challenges existing practices and principles, including the way in which we handle intellectual property.
It will likely then add a new paradigm for scholarly communications capable of providing open online access to the work of scholars without payment, online repositories of high quality, certified materials, along with a stable economic model to sustain these resources, a challenge of today and tomorrow.
This will pose a particular challenge to libraries, shifting them from a focus on collecting and archiving knowledge resources, to rather assisting scholars to navigate. Today, the campus library has become somewhat less central to researchers' lives. In one sense you can ask your colleague when the last time they visited a library was. And the reason for that is the library has evolved from a place into a utility. It too is becoming a part of the Net.
Legal and policy issues, the second topic. Well, I have to confess that I have the last six years of dealing with what looked like an army of lawyers to get ready to deal with this Supreme Court case, spending at last count between $15-20 million on their services, I'm not in much of a mood to talk about legal matters, even if I knew much about it.
But nevertheless, it is clear that the digital infrastructure that we are talking about imperils a great many of our ongoing practices and policies and laws that have served so well, intellectual activity in this country and globally over the last two centuries, perhaps forcing the rethinking of many fundamental premises and practices associated with intellectual property.
Indeed, there is a concern that many of these will be challenged to the bedrock. I think from the publishers point of view, a copyright is really the legal foundation of the business itself. The exclusive right to publish material for a length of time creates the basic legal mechanism that will allows publishing costs to be recovered. It essentially shifts the financing of the publication from that of a patronage activity, to the marketplace.
And again, to quote Barlow, "We are sailing into the future on a sinking ship of our current intellectual property policies. This vessel, the accumulated cannon of copyright and patent law, was developed to convey forms and methods of expression entirely different from the vaporous cargo it is now being asked to carry. It is leaking as much from within as from without."
The final topic, the evolution of digital technology. As I mentioned, a couple of years ago the National Academies decided to launch a project, forming a study group chosen from industry, higher education, and people that were involved in policy to understand better what the implications of digital technology were for the research university, but I suggest even more broadly for the research enterprise.
The concern was that while the opportunities and challenges of this technology were important, there was a sense that many of the most significant issues were neither well recognized, nor understood, particularly in the wake of the collapse of the dot-coms, which really has put many institutions and their leadership to sleep.
The first phase of the project was aimed at addressing three sets of issues: to identify those technologies likely to evolve in the near-term, that means a decade or less; second, to examine the possible implications of these for the research university; and third, to determine what role, if any there was for the federal government and other stakeholders in the development of policy, programs, and investments in the development to protect the role and the contribution of research universities.
A report was put out on this first effort. You notice how revolution tends to appear in all of these reports these days. It kind of tells you something. And what I'll do is I will just briefly mention some of the conclusions from that, because I think they apply very much to this particular discussion.
Although most people on the study group had some familiarity with the technology, I must say we were both surprised and to some degree, made uncomfortable by the future that the chief technologists from companies like AT&T, Bell Labs, IBM, Xerox Parc, and so forth put in front of us.
The fundamental conclusion was that this extraordinary evolutionary pace of digital technology shows no sign of slowing, and in fact, components of it may be on a superexponential evolutionary trend; so-called riding the exponential. It could be that processing power begins to move away from Moore's law, but wireless capability, storage, printing and so forth may be evolving even more rapidly.
Many examples of this, the enormous effort from the national laboratories has once again driven a very substantial investment in supercomputing. And technologies to achieve petaflop computers are now running several years ahead of where we thought, with the first of those computers likely to appear within the next three years.
I mentioned that I have about a terabit of data storage on my desk. That in itself is a surprise, but apparently that is also increasing at a doubling time of a year or less.
Bandwidth, maybe we can ask people that have their laptops with their Y5 to sit behind the screen and look at this on the Net. But even that technology is moving very fast. My current computer has 55 megabit per second wireless capability, and our IBM colleagues tell us that's on the way to a gigabit per second.
Displays -- resolution is already much better than paper in the laboratory, soon to appear in the commercial sector. I've seen the new flexible displays and so forth, that have been announced over the last week.
Similarly, in software and system technology, whether it's algorithm improvements, new approaches to developing technology, such as the open software effort, new technologies for the Internet, such as the semantic Web, where we build in to Web-based documents, the capacity for machine readability. All of that suggests this technology is not slowing down, that the kind of killer app surprises we have had in the past are likely to appear again.
The second conclusion, well stated by Mary Anne Fox, who is head of the Government, University, Industry Research Roundtable, and president of the North Carolina State University is that the impact of the technology on the university will be profound, rapid, unpredictable, and discontinuous. In the words of Clay Christianson, it's a disruptive technology. It will affect the activities of the university, our teaching, our research, our outreach, how we are organized, how we define our faculty, and students, how we finance ourselves, how we manage and govern ourselves.
In that kind of an unpredictable future, the belief was that procrastination and inaction are the most dangerous courses of all during a time of rampant technological change.
Point three, and an interesting one, it was our belief that universities should begin the development of strategies for facing this kind of technology-driven change with a firm understanding of those key values, missions, and roles that need to be protected and preserved during a time of transformation. Traditions such as academic freedom, a rational spirit of academic inquiry, and liberal learning. Here again, you can see the degree to which these fundamental and early conclusions propagate into the broader research enterprise itself.
As Ted mentioned, this conference is really organized into a here and now series of panel discussions this morning on costs, on business models, on legal issues, and then moving tomorrow into the future. What does this new kind of cyber infrastructure-driven research enterprise suggest about the nature of publishing in the future? What is a publication? And then finally, pulling this all together.
Two other issues that I might suggest will come on the table from time to time that relate to this, but are not specifically woven into the program are other kinds of constraints. That posed by commercialization, as we see the soaring commercial value of much of the intellectual property that rolls out of our research laboratories, and increasingly out of our classrooms.
The tendency of institutions and individuals to exploit that raise very significant challenges to traditions such as openness and academic freedom. Don Kennedy pointed out in an editorial in Science magazine a couple of years ago that this may be the great enclosure, the restriction of the openness of research that we have to deal with.
The second one Bruce and his colleagues have been dealing with, within the academy goes under the title of homeland security and openness. Achieving an appropriate balance between scientific openness and the restrictions on public information necessary for national security, once again, an issue of great importance that weave in and out of some of these discussions.
In conclusion then, I think the turn out today, and particularly the presentation from various constituencies -- authors, publishers, libraries, readers, and users -- demonstrate a commitment and compelling importance of developing workable, sustainable models for scholarly communication in the digital age.
As you begin your discussions, I would like to leave you with a quote. You have heard it before, but I think it's perhaps appropriate to put it out on the table again as we begin the discussion.
"If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess, as long as he keeps it to himself. But the moment it is divulged, it forces itself into the possession of everyone, and the receiver cannot dispossess himself of it."
"That ideas should freely spread from one to another over the globe for the moral and mutual instruction of man, and the improvement of his condition seems to have been peculiarly and benevolently designed by nature when she made them like fire, expansible over all space without lessening their density at any point, and like the air in which we breath, move and have our physical being, incapable of confinement or exclusive appropriation. Inventions, then, cannot in nature, be a subject of property." Thomas Jefferson, not a bad principle to start a symposium by.
Thank you very much.
[Applause.]
DR. SHORTLIFFE: With that great introduction, we are ready now to start with our first panel. And I will let the individual panelists be introduced by Floyd Bloom, but Floyd from Scripps and former editor of Science is here to get things underway.
Thank you, Floyd.
Agenda Item: Panel 1: Costs of Publication, Moderator: Floyd Bloom, The Scripps Research Institute, Opening Remarks
DR. BLOOM: While our panelists are taking their seats, I want to welcome you all again to this symposium, and thank my panelists for agreeing to take on the challenge of responding to the remarks you will hear shortly from Michael Keller in a very time-limited session.
As you heard, we are audiocasting this from the Web, so we are going to try to make every effort to be as precise as we can by following the time. We hope those who are listening to us on the Web will feel free to send in their questions. We are going to have a break in about 50 minutes, and following that break we will have an hour and a quarter for discussion. So, your questions from the Web site are most welcome.
As a scientist, as a reviewer, as an author, as an editor I don't think I ever considered costs in what I was doing. It was only when I became president of the society that sponsored a journal that cost reared its evil head. Cost concerns control quality and timeliness. So, it's natural that we start this symposium of the consideration of the costs of scientific and medical publishing.
We have to consider the costs of production, the costs of paper and ink and sending it out, and the costs of acquiring the content that will be produced, and that make our journal competitive with other journals. How much quality can we afford to pay for, and how much competition can we afford to let go by the wayside?
We have to be concerned with the explicit, rigorous peer review that makes a scientific journal carry its merit. We have to be concerned about the copy and quality of what we put out. Nothing irritates and editor so much as finding typos when the journal is received in the mail.
How about the era of online publication? How about the linking to the databases that give your publication equal access and immediacy, and carry it back into the perspective of the past? Why do these things cost so much, and which of these costs can we factor out and control them if we knew how to?
We have a variety of perspectives that you will hear from this morning: a librarian, publishers for both online and journals of large and small societies, and a commercial publisher as well. We hope from their perspectives that we will gain at least a broader understanding of what the costs are, how they vary across fields and forms of publication, and what if anything, as a proactive participation in this never ending challenge, we can do to control them.
We're going to start with an overview, to be delivered by Michael Keller, Stanford University librarian, and electronic publisher of the HighWire Press. He will followed by Kent Anderson, the publisher for the New England Journal of Medicine, then Robert Bovenschulte, publisher for the American Chemical Society, and Bernard Rous, electronic publisher for the Association of Computing Machinery. Lastly, we will hear from Gordon Tibbitts, president of Blackwell Publishing USA.
Before I introduce Michael, I just have to say that Prof. Duderstadt's recollection of Wired magazine from 1995 gives me a recollection of 1995 as well. Exactly eight years ago today I was in the third month of learning how to be editor-in-chief of Science magazine. I was called into the office of the executive officer, who had the treasurer by his desk. I thought they were going to increase my budget.
They said, young man, let me remind you -- I was young eight years ago -- that Science is a not-for-profit publication. That also means we are not-for-loss. And the costs of paper is going up by $700,000 in the next six months. And the cost of mailing the magazine is going up 20 percent. So, where in this budget that you are responsible for do you want to make the cuts?
And I swallowed deeply, and I said, there has got to be some way that I can control these issues, and not have them control me. And four months later I met Michael Keller, and my world changed in ways that I never even knew how to predict.
Michael.
[Applause.]
Agenda Item: Overview Presentation - Michael Keller, CEO, HighWire Press
MR. KELLER: I want to start off by saying hello, mom. It's so rare.
Secondly, I deeply respect Dan Atkins, with whom I have worked for about 10 years I guess, more or less off and on, and Jim Duderstadt, the president emeritus of University of Michigan, but this is not a Michigan tie. This is a tie from my college, Hamilton College, a somewhat older and much smaller place in upstate New York.
And finally, I need to issue a disclaimer. Although the program says that I'm the CEO of HighWire Press, I'm not the CEO of HighWire Press. I may be the chairman of the board of HighWire Press. I may be the responsible officer at Stanford University of HighWire Press, but John Sack, my colleague in the back is really the CEO and the director of HighWire Press.
Let me start this presentation with a few slides composed by Michael Clark of the publishing division of the American Academy of Pediatrics, presented in a session of the HighWire publishers conference a few weeks ago. These slides give a good overview of the institutional marketplace for STM journals. And while these notions are not directly on the costs of publication of electronic journals, they do set the stage, I think, for many issues that will arise today.
I do not propose to read these slides. I'm going to stand here silently and let them appear.
Some of the information in Michael's slides is derived from a Morgan Stanley report promoting the value of Science as an investment opportunity in a report named, "Scientific Publishing: Knowledge is Power," issued September 30, 2002. Morgan Stanley of course, has Reed Elsevier as a client.
We do not live in a tidy world. In order to present some trends in the arena of costs of publications from the time just before we all got seriously into Internet publishing until the present time, I conducted a survey of a very small sample of not-for-profit STM publishers. I will present the digested results of that survey. But I must emphasize that these can only be regarded as illustrative of the changes in costs, not as definitive.
I will present as well, some categories of expense that must drive the pricing, but do not appear on the expense budgets of not-for-profit publishers. And I will mention as well, some new categories of expense that may, and I emphasize the tentativeness of the verb, may appear on publishers' budgets in the future, but do not now.
Based on the literature on the subject, including especially the reports and writings of Donald King, Carol Tenopir, and their associates, as well as in what I have learned over the recent years from the publishers receiving services from HighWire Press, I offer the following formulation of the main elements of expense budgets for some STM journal publications.
First, the process costs for the content of the journals come in several subcategories. By the content of journals, I mean primarily articles, reports of the results and methods of scholarly investigation. Other sorts of content such as news report, policy, and editorial statements, and other organizational content are included in the figures below -- yet to be seen -- but are incidental to the main content type, scientific articles, for the vast majority of journals emanating from STM research communities.
Thus, the cost budgets for Science and Nature, for instance, would have more elements than specified. And costs for secondary and tertiary publications include different elements than these. I do not cover these sorts in STM publishing.
Here are the categories on the content side. The costs for manuscript submission, tracking, and refereeing, operations performed manually before the World Wide Web, and now assisted by specialized software applications using Web communications among authors, editors, and referees.
Second, the costs of editing and proofing contents. Operations increasingly innervated with the origination of the text from the minds of the authors, to their word processors on one side, and with the composition of pages on the other.
Third, the cost of composition of pages, increasingly more efficiently, less common, or at least related environments for creation, editing, and page make-up are employed.
Fourth, the costs of processing special graphics, costs most publishers find increasing, because the means for collecting or constructing graphic images are more easily employed now than a decade ago, and the authoring tools are easier for authors to use. In addition, the fact of Internet publishing and its capacity to deliver more images, more color, more moving or operating graphics has made this expense grow for STM publishers in the last decade.
The second category of expense is a familiar one, but is also one of two targets for complete removal from the publishers' costs. It is of course that of the costs of paper, printing, and binding. The year-old survey undertaken by my organization, and involving over 10,000 respondents produced some telling results.
The first survey, reported in October 2001, showed that 75 percent of respondents preferred online retrieval, most of whom printed out articles locally they wanted to read. The second survey, conducted in February 2002, confirmed that finding. I'm going to present information about that survey a little bit later with a hyperlink to its site.
This suggests the possible that as researchers educated and beginning their careers in the 1990s replace retiring older members of the STM research community, publishers might finally switch over to entirely Internet-based editions, and distribute no paper at all. This transformation would thus move the costs of printing and paper to the consumer desiring articles in that form, and remove binding from the equation altogether.
The third category of expense is that of distributing the physical volume, the costs of mailing. Obviously, this expense could disappear if we got to Internet only distribution.
The fourth category of expense is that of the Internet publishing services. These are new costs, and include lots of activities performed mainly by machines, though in some situations staff perform quality control pre- and post-publication to check and fix errors introduced through the publishing chain.
Errors corrected the process result from incorrect use of SGML or XML codes, new non-standard characters, references that will not link, because the citation information is inaccurate, and various formatting issues that work on paper pages, but wreck havoc when a DDT is attempting to convert a coded file through SGML to an HTML utilization.
And the elements of these costs vary tremendously among publishers and Internet publishing services. At the high end: parsing supplied text into a rigorously controlled version of SGML or XML; making hyperlinks to data and meta data algorithmically; presenting multiple resolutions of images; offering numerous elaborate search and retrieval possibilities; supporting reader feedback and e-mail to authors; supporting alerting and prospective sighting functions; delivering content for indexing to secondary publishers and distributors, as well as to Internet indexing services; and supporting individualized access control mechanisms would be included.
At the low end, that characterized by PDF only e-publishing, just access control, simple search, common access control mechanisms, and delivering content for indexing would be included.
The range of costs in this element is quite wide, though as you will see, the size of this category in the expense budget is relatively small.
The fifth category is that of publishing support, everything from catering of lunches, to finance offices, including facilities and marketing. All of these fall into this one.
Another category is the cost of reserves. Some organizations have money set aside for disasters, or to address opportunities. Some of these reserves for capital projects, or to hedge against key suppliers failing. A great many not-for-profit organizations do not label reserves as such, but have investments or bank accounts whose earnings support various programs, but whose principal could be used in a reserve function as needed.
There are a couple of categories I include here for debate, though I am sure that they will rise in later sessions on pricing and business models. For the sake of argument, I present them here. The first debatable category is the cost of non-journal support for other organizational programs. By this I mean demands on journal expense budgets eventually driving pricing, and yielding revenue in excess of expenses, including overhead and reserves with expense items for application to conference calls, fellowships, and other non-publishing activities.
A similar cost item in the for-profit sector of STM publishing would be I suppose, demands of owners and equity holders for profits. Note, however, the tremendous difference in scope and intentions of these two last items. One the hand, income in excess of expense in the not-for-profit sector, and profit in the for-profit sector.
Another related cost mainly in the for-profit sector and particularly prominent in these past years as mergers and acquisitions among the largest companies has occurred is precisely the cost of mergers and acquisitions. Another controversial category of costs is that of declining circulation. The cost of fewer subscriptions covering all the expenses, and in the case of a for-profit, the cost of profits, mergers, and acquisitions.
Here are the results of the statistically insignificant sample of six not-for-profit publishers' costs concatenated so the results can be compared. The data were presented in a wide variety of categories, so element of interpretation is at work in these results. And I finally settled on presenting the data as a pair of ranges, one for the data from the early 1990s, and the other for recent data.
A cost not appearing on the expense budget is that of the intermediaries, the subscription agents who get from 5-10 percent of the prices for their work. Maybe this figures into the pricing structure, but one way or another for the institutional subscribers, it is a cost.
How might these data be interpreted? First, if you had read my notes and seen what I have gotten, you would see that the publishers have much tighter control now over their budgets than they did 10 years ago. The definitions of categories are better.
Second, it is clear that editorial costs have not changed much, but that printing, paper, and binding costs are down, at least on a unit basis. Also, the cost of Internet editions have entered the budgets at the level of 4.5-9.1 percent of the costs. Respondents to my inquiry indicate that their publishing budgets have doubled since the early 1990s, but that the individual subscriptions are almost half in many cases, apparently cannibalized by institution Internet subscriptions.
Certainly, cost increases reported by these not-for-profit publishers are on the order of 6 percent annually. With a reduction in individual subscriptions, the number of copies of issues printed, bound, and mailed has gone down, but increases in costs have kept budget numbers from falling.
Manuscript submission tracking and refereeing support applications have reduced mailing costs, and made it possible for more manuscripts to be processed by existing staff. Increases in the size of the journal in page equivalents, increases in graphics and colors in some instances, and the adoption of advanced Internet features like supplemental information and so forth, have increased the costs of Internet publishing services about 50 percent higher per year than other costs.
Obviously, there is some sort of dynamic balancing act going on with regard to publishers' costs in this Internet era, some increasing, and some decreasing. What is most intriguing, however, is the possibility of removing from 25-32 percent of the costs of publishing by switching to electronic journals delivered over the network, and eliminating printing, binding, and mailing paper copy to any subscribers at all.
One might further observe this as a transfer of costs from the publishers to the readers who desire to read articles on paper. On the other hand, one might observe that this is not a new transfer of costs to readers, because many already photocopy like crazy, and presumably cover that time and cost somehow. Eliminating paper editions would also offer some promise of reducing prices to the institutions who are so clearly providing the publishers with the economic basis for publishing at all these data.
What change condition or conditions would permit the removal of the printing, binding, and mailing costs? Simply put, the successful operation of true digital archives, protected repositories of bits and bytes for the contexts of journals would make that difference. A true digital archive or repository in my view, and that of most librarians is one that is not merely an aggregation of content accessible to qualified readers or users, but one that preserves and protects the content, features, and functions of the original Internet edition of the deposited journals over many years, decades, and even centuries.
True digital archives will have their standards and operational performances publicly known and monitored by publishers, researchers, and librarians alike. Their operations and content then will be audited regularly. We do not yet know how costly automatic data migration will be over time, and therefore, we do not know the costs for these operations. It is not enough for a publisher, a library, or a third party, say an aggregater or other information business, to declare themselves to be an archive. They must prove it constantly.
What might be the annual cost of true digital archives? They range from the incredibly cheap, as in the case of LOCKSS. These are network caches, a design that involves publishers and libraries in willing partnerships enabling dozens, maybe hundreds of local caches on cheap magnetic memory, and using very ordinary CPUs. Our estimate is that each of these LOCKSS caches could operator for only tens of thousands of dollars per year.
Another model is that of the large, managed digital repository for multiple data formats and genres of publication. Our estimate is that Stanford can operate and maintain a very large one, perhaps a petabit or two of data for between $1-1.5 million per year, half for staff, and half for technology.
A petabit for those of us who are not members of the National Academies here, is 10 to the 15th power. HighWire's database now is in excess of 2.5 terabits, or 10 to the 12th power, to give an example of how very large a petabit is.
Which institutions will undertake such large managed digital repositories? Almost certainly a few national libraries and university libraries will. But publishers or their Internet service providers could develop and run them as well.
Apparently, the European Union's laws require deposit of digital editions in one or more national libraries soon. And the Library of Congress, along with other US federal libraries promises to develop both its own digital repository, as well as to stimulate and support a distributed network of them. If publishers undertake digital repositories, their costs will enter in the expense budget, and of course drive prices higher.
Another cost about to hit home to many publishers is that of converting back sets of journals to digital form, providing some level of metadata for each article, or perhaps providing word indexing to the contents of each article, posting and providing access to the back sets. The costs of this are just now being encountered.
We have done a study on back sets of HighWire journals. Our estimate is that about 20 million pages could be converted, and that the costs of scanning and converting pages to PDS, keying headers, loading data to the HighWire servers, keying references, and linking references could approach $50 million, or about $150,000 per title.
If all this retrospective conversion of back sets occurred in one year, HighWire would have to spend about $250,000 in capital costs, and about $300,000 in initial staff costs, declining to annual staff expenditure of perhaps $250,000 or $275,000 thereafter.
On average, for the 120 publishers paying for services from HighWire that would mean about an additional $2,500 in new costs each year. In other words, the increase in annual costs to publishers for hosting and providing access to the converted back sets would be a fraction of 1 percent of their current expenses each year.
While the costs back set conversion are high, our experience suggests that the pay off could be 5-10 times more use of articles in the back sets than is presently experienced. Articles running to the HighWire servers are read at the following rates: within the first three months of issue, about 95 percent of all articles get hits. That's presumed that hits means that somebody is actually reading something.
In the next three months, that is when the articles are 4-6 months old, about half of that, slightly less than 50 percent of all articles get hits. And when articles are 10 months or more old, on average only 7-10 percent of all articles get hits. But that rate of hits seems to persist no matter how old the online articles are.
We believe on the basis of citation analyses that only 10 percent of articles in print back sets older than the online set of digital versions of themselves get cited. That is cited, not necessarily read. That they should do so is entirely consistent with the commonly held belief since 2001 by publishers associated with HighWire that the version of record of their journals is the online version, making the sell for the entire run of their titles as logical next step, and many are taking it.
I digressed into benefits there, forgive me. At any rate, unless other sources of funds are forthcoming, the costs of back set conversion will become a temporary cost in the expense budgets.
There are chickens and eggs in expense budgets too that complicate understanding them over time. For instance, it is our observation at HighWire Press that publishers in the early adopter class, those who first define and desire advanced features, play the cost of developing those innovations. Certainly at HighWire, those early adopters reap the benefits of innovation in attracting authors and readers. Eventually, many of the innovative features are generally adopted, and usually at lower cost of adoption than paid by the innovators to innovate.
In order to maintain a reputation as innovative, one has to continue adopting new features. On the other hand, some innovation leads to lower costs. HighWire recently announced reductions in prices thanks to some processing innovations recently published.
We had a feeling that the Internet editions have either reduced calls upon librarians for help in finding relevant. But I can tell you that no Stanford science librarians are volunteering to reduce their staff size as a result.
Certainly, the Stanford e-journal study shows that readers of online journals value the search engines and strategies, the navigation devices, hyperlink, and availability of various resolutions of images. Alerting services are particularly well thought of too. Certainly, these sorts of online functions lead users to be more self-sufficient in their search for information.
On the other hand, some at HighWire feel that the customer support has been transferred in some measure to publishers. Certainly, that is true in activating subscriptions, but my limited and statistically irrelevant survey did not show any significant increases in customer support provided by publishers over the past decade.
Let me close with a few observations and questions. A close study of the costs and benefits of electronic journal publishing from the birth of the World Wide Web would be a good thing to do. It would document in a neutral way, the profound transformation of an important aspect of the national research effort.
That there is likely to be as much change in the next 10 years as in the past does not obviate the need for the study. In the light of our current economic situation, such a study may help us develop new strategies or evolve current ones for accommodating needs of scientists and scholars to report their findings, and for assuring the long-term survival of the history of science, medicine, and technology. The National Research Council is ideally suited to conduct such a study.
University budgets are under considerable strain, and will be so for at least several more years. Will be deals for access to all journals from a single publisher survive? Will we see new deals for multiple titles, cut just right to fit true institutional needs, real institutional needs?
Finally, a comment on why there will be costs in any version of the present STM publishing system, regardless of the mode of presentation, whether published in print, electronic, or via the yet as to be invented selective extrasensory perception method. It's coming.
Why shouldn't there be a highly diffuse distribution scheme based on authors simply posting their articles on their own sites, or on an archive like the Los Alamos National Lab (LANL), arxiv archive, and let Google or more specialized search engines bring relevant articles to readers on demand? Who needs all this expensive apparatus anyway?
The answer lies partly in the strong need for peer review of content, expressed variously by most communities of science, and partly in the functions provided by good publishers that are valued and demanded by the scientific community itself. The e-journal survey mentioned earlier shows that the vast majority of respondents to that extensive survey, want, and have come to expect a wide array of features making their regular surveys for articles relevant to their work, easy to find, and to use. By implication, the readership focuses their attention in the constantly churning galaxy of new and old articles on a few journals with editorial policies and content that are known and trusted.
Providing relevant, reliable, and consistent levels of content in journals costs money. Highly distributed, diffuse STM publishing with sketchy peer review, dependent upon new search engines to replace the well articulated scheme of thematic journals and citations in a multidimensional web of related articles is a descent into information chaos.
Perhaps in the next decade the segment of STM publishing most at risk are the secondary publishers, the abstracting and indexing ones, and the tertiary publishers, those producing the review and prospective articles long after the leading edge researchers have made use of the most useful articles.
None of the alternative publishing experiments underway or about to get underway, including LANL, arxiv, operate independently of a larger STM journal publishing establishment, and none operate without costs. Taking LANL, arxiv in particular, while it is certainly an archive of articles to which many in physics, mathematics, and computer science go to first and constantly, it has had negligible effect, if any at all in the direction of reducing the number of peer reviewed articles published in these fields.
It may have improved the articles by exposing them in pre-print form to lots of readers, some of whom may have commented back to the author with helpful suggestions. Physics Letters and Physical Review have continued to grow, continued to publish peer reviewed articles, and those articles continue to be cited.
It must be observed that some communities use and read pre-prints reluctantly. While there are 200 articles in the British Medical Journal's Clinical Netprints, few have received online and public peer review from readers, and fewer, if any, have been cited in that form.
One note of interest. Several experiments in journals, depending nearly entirely on fees paid by authors. The new journal Physics, for example, has published between 25 and 50 articles in each of its five years of existence. It has not yet had sufficient citations to be indexed by the Science Citation Index.
The gyration of business plans of BioMed Central are instructive too. Now, after trying author fees alone, they are selling memberships to institutions so that authors from member institutions do not have to pay for publications. But the fee for memberships are very high.
The Public Library of Science will enter the list with its first articles in the fall of 2003. It too will depend upon authors' fees for support of its operations. And is starting with an admirable $9 million war chest from the Moore Foundation. It will charge $160 for print copies of its volumes.
The point of mentioning these efforts, all of which are involved, I believe, in the open archive initiative standards, is that none, not one has done away with the costs of publishing. Someone always pays. And none of these journals with new business models have become self-sufficient.
For the costs of peer reviewed publishing in science, technology, and medicine to disappear, requirement for peer review, the demand for thoughtfully gathered, edited, illustrated, and distributed articles must disappear too.
How the experiments in business models might provide competitive pressure on traditional business models and pricing is a topic for discussion here, and examination over time. Experiments should be tried, but for me, the solution to the serious crisis lies in the not-for-profit societies whose purposes and fundamental economic model are very closely allied with the purposes and not-for-profit economic models of our research universities and labs.
Thank you very much.
[Applause.]
DR. BLOOM: Thanks very much, Michael.
We are going to continue for the next half hour by hearing from four different societies, different sized society home journal publications, and one commercial publisher. We are going to begin that discussion with Kent Anderson from the New England Journal of Medicine (NEJM).
Agenda Item: Comments by Panel Participants - Kent Anderson, Publishing Director, NEJM
MR. ANDERSON: Thank you, Floyd. And thanks to Mike Keller for a good introductory presentation.
My name is Kent Anderson. I'm from the New England Journal of Medicine (NEJM). I wanted to begin by giving you a quick overview of the New England Journal of Medicine. We live at the interface of scientific research and clinical practice. We have been published continuously for 191 years. We were published in paper first in the 19th century. We published online first in the 20th century. And we continue in both media in the 21st century.
We mainly serve clinicians, physicians who take care of patients. And we are a key translator and interpreter of new science to general and specialist physicians around the world.
When I was preparing these comments, I thought about how unique our situation is. We are a large circulation publication that relies on individual subscriptions, and we are owned by a small state not-for-profit medical society. So, I wanted to limit my comments to three areas. One, how the definition of publication is changing, and how that is modifying how we conduct our business of getting the journal out week to week. And finally, some of the cautionary notes about conducting a study of this due to the diversity of publications represented in the room, and among the users here.
Publication used to refer to the act of preparing and issuing the document for public distribution. It could also refer to the act of bringing a document to the public's attention. These definitions served us well for a good long time, for more than 400 years, or even longer.
Now, publication means much more. Now, it means a document that Web-enriched, with links, search capabilities, and potentially other services nested in it. A publication may soon be expected to be maintained in perpetuity by the publisher. A publication now generates usage data.
A year ago if I had been asked to give this talk, I would have probably given a talk that would be much different, because costs in this area are emerging at such an unpredictable and rapid way, that if I give this talk a year from now, it will probably be much different.
So, it's May 19, and I will give you what I know today. For publications serving physicians, costs for print have reminded steady and risen here and there, and we try to control those. However, what we found behaviorally is that physicians are not willing to give up print. They are too pressed for time, and it's too convenient. I think there was recently a publication by King and Tenopir supporting this finding as well.
We also support significant costs for Web publishing, data analysis and reporting. We develop new services at a rapid clip. And we support the costs for new publishing modalities, our online only publication, early release articles, free articles to low income countries, free research articles after six months, and selective free articles online.
Our mission is to work at the interface of biomedical research and clinical practice. As medical and scientific findings become more complex and sophisticated, we are investing in editors, writers, and illustrators who can analyze and interpret these findings for a clinical audience so our readers understand precisely how medicine is evolving.
Our education mission is more complex now. We have invested heavily in new continuing medical education initiatives, new ways of illustrating and presenting articles, and new ways of helping users find what they are looking for. It is more important than ever in some ways, more difficult than ever, to know who our readers are.
We can no longer look at print distribution to point us to our readers. The Web amplifies and concurrently obscures readership. So, investments in market research, Web data analysis and warehousing, and attending medical meetings with physicians have increased significantly.
Customer service demands have escalated, and building systems to handle these changes, staffing to handle the new demands of service by e-mail, which typically leads to additional phone calls and mail service are adding to the cost of running a publication.
I think fundamentally, we are now on a software upgrade path with our services online, our content presentation, our content parsing, and search engines. For the New England Journal of Medicine we are currently at version 4.0. I think we will probably be going to version 4.1 sometime next year.
Online peer review tools in parallel with paper systems have recently added a new layer of cost, with no apparent end to the investment, because that is another software upgrade path we are on. We also have to find ways to modify our skill sets of our highly skilled editors and workers, to make sure that they can handle all these new inputs, and maintain the quality of the journal.
With all of these programs, ways of accessing the journal, making our information available, we have to tell people about this. Marketing these new programs an services is another expense that we typically don't recoup, but we execute despite that. We need to get the word out that there are new ways to access our information, and we do that as part of our mission.
E-mail systems are part of this distribution modality now, and we have to support those, build those, maintain the databases, and consistently send out e-mails to people without strain off the line.
The pressures to be fast are growing, yet we have to maintain high quality. We publish information about health, and if we make mistakes, they can be serious. These emerging demands have lead to increased investments in systems and people to insure that we can be responsive when the need is warranted.
Recently, we published a set of articles on SARS, Sudden Acute Respiratory Syndrome, and we published those in two weeks or less of receipt, completely peer reviewed, edited, and illustrated papers. These were translated into Chinese within two days of their initial publication, and distributed in China in the thousands in print, where we hope they made a major difference.
As demands for faster publication mount, we will need to find ways to accomplish it without sacrificing quality, and this is already leading to significant investments in people and systems.
We have I think -- often in these discussions the people behind all this become obscured. We are fortunate. We have, I think like a lot of publications here, very talented editorial production, IT, and publishing people, and we want to keep it that way. And those people have to be paid.
Now for some closing comments on the viability of a study of this. I would just like to throw out a few questions. I think that if this is studied well, there has to be an acknowledgement of the diversity of the publishing landscape, even in the scientific, technical, and medical publishing area. If only a few publishers participate, the selection bias could drive us to the wrong answers.
We have to look at which cohorts we are trying to analyze. We have to clearly state what the null hypothesis is. What is the question we are asking? And what is a reasonable control group? Consumer Price Index and others are used, but those are very, very pooled numbers based on all sorts of different industries. What's a control group for science, which is growing at a rapid clip?
I think that a study of this nature could be valuable, but it needs to be well designed, rigorously conducted, and carefully interpreted.
Thank you.
[Applause.]
DR. BLOOM: We'll next hear from Robert Bovenschulte, who is the publishing director for the American Chemical Society (ACS), another small society-owned series of publications.
Agenda Item: Comments by Panel Participants - Robert Bovenschulte, Director, Publications Division, ACS
MR. BOVENSCHULTE: Good morning, and thank you, Floyd.
I hope it doesn't come as too big a shock to Mike Keller that I actually agree with a great many of the things he said, probably in fact the vast majority of them. Electronic publishing has not only revolutionized this industry, it has also tremendously changed the fundamental economics of what I will characterize as the business.
Even though I represent a not-for-profit publisher, in fact many of these issues do overlap both commercial and not-for-profit publishers. And having just completed a two year term as chairman of International STM Association, I feel obliged at least in some of my comments, to represent a broader view than just the not-for-profits.
Well, what is driving this change? Our costs are going up very rapidly, and much more than would have happened if we had stayed with print alone. There are really two principle factors. They are obvious I think to all of your. One is of course the cost of new technology. And the other is the volume of publishing that is being done.
The good news is that with all of this expansion of cost, and commensurate expansion in price, although some would say an incommensurate expansion in price, we are delivering a tremendously more valuable product to our users. There are enormous functionalities that I think all of you are familiar with that are being conferred upon our scientists.
The access to information is swift, it is convenient, and it is improving productivity. We have in fact tangible anecdotes coming back from many of our end users about what they have been able to do as a result of having all of our content available to them on the Web.
It is implicit I think in some of the remarks that have already been made, and I want to make it very explicit that technology is not just about Web publishing. Technology now imbues all facets of publishing from author creation and submission, all the way through peer review, to production and editing, and to output, and to usage.
That's a very important dimension, because often when we say technology, we think it means Web, and it includes Web, but it goes really far beyond it. Why that is important is that the costs of the technology are not just the Web. They really apply to all the other systems that we have to create. And not only do we have to create them, we have to find some way in the fullness of time to integrate these systems into one. That is no small task.
I want to give you a few examples of the kinds of costs that may not be obvious if one is simply clicking on a Web journal. For the 31 journals published by the ACS, we have 186 editorial offices worldwide. All of those offices have to be supported. That's mainly in terms of the technology. And we have to develop new technologies that support the functioning of those offices, and make them more productive.
That mean visits to these sites by technical people to provide support. It can't all be done on the phone. And this is a very, very costly enterprise. Furthermore, we have about 100 staff located out in Columbus, Ohio as part of Chem Abstracts, that is to say housed with Chem Abstracts, part of the publication division, responsible for editing and production. This is a very major expense, as you can imagine.
Now, turning to studies, in fact, a year ago the ACS conducted a study through the Seybold(?) Consulting Group, and it was really quite interesting. We were engaged in a very thorough assessment of the future of electronic publishing broadly speaking. And in particular, we were trying to put our hands around the cost drivers.
So, Seybold did this study, and we offered it to I think 16 or 18 publishers. Almost all of them took advantage of participating in the study. And the deal was that they would see the same study that we saw, even though we were paying for the work to be done, and their cost was coming up with the data.
Some interesting numbers. In terms of revenues generated by a publishing endeavor, both the median and the average cost among these 16 publishers was 5.5 percent going into just this IT function. Furthermore, large publishers allocated less than 2 percent, and ACS, to give you a benchmark, spends about 9 percent.
It is obvious then that larger publishers with a lot more revenue can spend a much smaller percentage, and yet they are far outspending the medium-sized and the smaller publishers in the total amount that they can invest in the IT operation.
To give you another idea of how this trend is going -- now, this was done a year ago -- and the predictions of the 16 publishers in aggregate were an average of a 21 percent increase in their IT spending year over year.
So, that's enough about technology. Let's look briefly then at the growth in the number of articles being published. The ACS has gone from 15,000 articles in 1993, to 23,000 in 2002. That's a 53 percent increase over that 9 year period.
At the moment, we are actually -- I say at the moment. I mean over the last two years or so, we are now experiencing double digit increases in submissions. And I think this largely spurred by the fact that online submission makes submitting an article even easier. And we are receiving a much larger fraction of our submissions from outside the United States.
During this same nine year period, total costs increased at 64 percent, again, that's versus the 53 percent increase in articles published. But the cost per article published -- this is a very important point -- has increased only 7.4 percent. I'll give you the exact numbers. In 1993, for every article we published, it cost -- these are all costs that we can attribute to the journal operation -- $1,712. In 2002, the cost was $1,838.
So, I think that there has been quite a significant gain in publishing productivity, because I think that we would probably see similar numbers if we looked at other publishers, not just within the ACS. I think this is attributable in large measure to technology. Technology does cost a lot more, but it is seeming to produce efficiencies.
I want to say something about the digital repository. Mike Keller referred to this as the back sets. The ACS is one of the first publishers to create a full digital archive, PDF form only, of all of its journals. This was a very large upfront cost. I think in the future we will see that preserving that could be both the responsibility of the library community and the publisher.
At some point, Mike Keller’s notion of trying to move toward one system I think has a lot of merit. That is not going to be easy to achieve, however, and in the interim I think the publishers, even the commercial publishers, but certainly the not-for-profit publishers feel a very strong obligation to preserve that digital heritage.
And by the way, this isn't just about preservation. Talk about costs, there are enhancements, there are costs associated with the migration to new technologies. It is as simple as just thinking that the one time we spend a couple of million dollars, and we won't have to spend much more in the future on it. No, it goes on and on.
Ending print is certainly a worthwhile goal. And we have a position at the ACS for five or six years now that we are doing nothing to retard the rate at which the community wants to dispense with print. And we would be very happy to reduce our price increases, possibly even hold them flat, or even a small reduction during a period when the print is being eliminated as a way of returning those costs to the community.
However, all of the information that we have -- this specific now to the chemistry community, broadly defined -- is that the end users, and particularly the scientists who write for our journals are not ready to give up print. And for good reason, I think most librarians are not ready today to give up print, because of the preservation issue.
So, if that happens, if print does go away, there might be a savings of, as Mike Keller suggested, 15, 20, 25 percent of costs. What concerns me is that in very short order, with the rising volume of publication, the costs of handling those many more articles will in fact wipe out whatever transitory gains we have from saving on print.
That concludes my remarks.
[Applause.]
DR. BLOOM: Next we will get the views of Bernie Rous from the Association for Computing Machinery.
Agenda Item: Comments by Panel Participants - Bernard Rous, Deputy Director/Electronic Publisher, ACM
MR. ROUS: I've been involved with the development of electronic products and digital production processes for over 20 years at ACM, that's the Association for Computing Machinery. And I would venture to assert that the costs of electronic publishing are not really well understood at all. I believe that there are some very good reasons why this is the case. And it would make a study of costs both very difficult to carry out, and very important to attempt.
So, first, electronic publishing is not a single thing. This has been mentioned a couple of times. You can take author created PDF files and mount them on a Web server with a simple index, and that is electronic publishing of a sort.
Or you can manage a rigorous online peer review tracking system, convert multiple submission formats with single structure document standard, apply editing, digitally typeset and compose online page formats, and apply style specifications to generate Web displays with rich meta data, supporting sophisticated functions, build links to related works, to associated data sets integrated with multimedia presentations and applets that let the user interact and manipulate the data.
This too is electronic publishing. And it is miles apart, and many decimal points away from the first approach. In our own ACM digital library we have worked that are produced at both ends of this enormous cost spectrum called electronic publishing.
A second reason why electronic publishing costs remain fuzzy comes as a consequence of living in a bimodal publishing world. Even some direct expenses can arguably be charged to either print or electronic cost centers. Where you put them often depends on your conceptual model. If you look at online offerings as an incremental add on the print, then you charge more costs to the print. On the other hand, if you look at print as a secondary derivative of your core electronic publishing process, more costs are likely to be charged to the digital side.
And when it comes to the indirect costs of staff and overhead, the same applies, with perhaps even greater leeway due to the guesswork that is involved in these types of cost allocations.
Thirdly, the decisions to charge costs to print, or to electronic publication are part of a political process. There are times when you want to isolate and protect an existing and stable print business, so you attribute any and all new costs to the digital side. You may also want to minimize positive margins on the digital side to avoid debate over the pricing of electronic products.
At other times, the desire to show that the online baby has taken wings and is self-sustaining, and has a robust future can tilt all debatable charges to the print side. And this is not to say that the books are being cooked. It's just the way you look at the business that you are running.
Fourth, it is very difficult to compare print and electronic costs, because the products themselves are just not the same. The traditional average cost per printed copy produced is meaningless and very hard to compare to the costs of building, maintaining, operating, and evolving a digital resource as a single facility.
Fifth, accounting systems sometimes evolve more slowly than shifts in publishing process. New costs appropriate to online publications are sometimes dumped into pre-existing print line items.
Sixth, electronic publishing has not reached a steady state by any means. There is still lots of development going on, some of which lowers costs, and some of which raises them. So, these are some of the reasons why I say that the costs of electronic publishing remain somewhat obscure.
And in the remaining minute or two I would like to mention several components of electronic publishing costs that we did not fully anticipate when we went online. First of all, customer support costs have been phenomenally different than in print. Not only is there a larger volume and variety of customer complaints and requests for change, but the level of knowledge and the expertise required to answer them is much more expensive.
The cost of sales is higher. The product is different, and the market is shifting. We no longer sell title subscriptions. We license access to a digital resource. The high price tag for a global corporate license, or a large consortium means that more personal contact and hand holding is required to make the sale. And furthermore, such licenses are not simply sold. They are negotiated, and sometimes with governments. And this requires much more expensive sales personnel.
Third, digital services are built on top of good, clean meta data. Meta data costs are high. The richer the meta data, the higher the costs.
Fourth, subject classification is costly. The application of taxonomies is a powerful tool in organizing online knowledge. The costs for building what has been referred to as the semantic web are still largely unknown. There are a surprising number of opportunity costs. There are so many new features and new services, new ways of visualizing data and communicating knowledge. There is a lot more work that can be done than has been done so far.
And lastly, I would simply add that some upfront, one time guesstimates in electronic publishing turn out to be recurring costs, some of them with alarming frequency. At ACM we are in the middle of our fourth digital library interface release since 1997.
Thanks.
[Applause.]
DR. BLOOM: Our last panelist is Gordon Tibbitts, President of Blackwell Publishing USA.
Agenda Item: Comments by Panel Participants - Gordon Tibbitts, President, Blackwell Publishing USA
MR. TIBBITTS: I see that I have two very big challenges. One is that I'm the last panelist between you and a break, and that usually means that I better say just a very few things. And the second is that of all things that they asked me to speak about is cost, and I personally am not widely inspired by costs. In fact, I think it's what between us and good things like dissemination of ideas.
I decided to just ask a simple question. I'm kind of a little popularist here. Is there a cost tipping point in e-journals, i.e., the place where all the sudden the costs will dramatically drop, and everybody will run in, and there will be a brand new golden era?
And what I would like to do is just take you through very quickly, five slides, make a few points. I heard a wonderful thing earlier, and that's very good, because I prefer it, I heard that maybe say a few provocative things. I am a commercial publisher. I will try to represent commercial publishers, and perhaps I will be a little bit provocative. I must say that some of my colleagues on the panel have already been quite provocative.
Very quickly, this is just a look at two classes of commercial publishers. One kind of commercial publisher out there, and there are really are two blends. One kind is very heavily society-oriented publishers. It's more of a service agency. I'm highly familiar with that one. That's the one on your left.
And then there is the commercial publisher who owns most everything, sometimes people think most everything on the planet. And I think that the major difference between two is in the case of the one on the left, there are quite a few more royalties paid out -- profit shares, royalties, stipends, money flowing back to the societies who are the gatekeepers for the information, for that peer reviewed information, and the societies who publish the journals. And on the other side, that is taken mostly in profit.
Just another point, an issue of scale. You will notice that marketing and sales, there is about a four point difference between the two in sales, but when you talk billions of dollars on the right, that four points translates into massive marketing dollars.
What I did was I took a sharp pencil. My CFO was not happy, but I did it. And I went through and I took a look at what I call e-incremental costs, those things that I could separate out as being purely new costs that the business is incurring that are electronic. On the one hand, and from a marketing perspective, we contribute our data to SOROS and WHO and our data is in Eastern Bloc countries, and in the sub-Sahara for free. And you can look at that as a charity event, or really from a commercial publisher's standpoint. What it is, is a good marketing or PR activity.
Many of the societies though do have a fundamental bylaw in their society which is disseminate i |