The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
USNC HOME

WHAT'S NEW

COMMITTEE MEMBERS

ACTIVITIES

PUBLICATIONS

RELATED LINKS

CONTACT US

BISO HOME

LOCAL SEARCH


Workshop on Strategies for the Preservation of and
Open Access to Digital Scientific Data

Abstracts

Session 1: Opening Remarks

Academician Xu Guanhua, President, Ministry of Science and Technology of China, Goals of the National Facilities and Information Infrastructure for Science and Technology and Scientific Data Sharing Program

Session 2: International Perspectives

Dr. Yasuyuki Aoshima, UNESCO

As a result, UNESCO took several proactive measures in order to encourage Member States to establish a right of universal access to information and formulate policies and regulatory frameworks, which would determine the future orientations of the information society.

After a considerable round of negotiations, the General Conference of UNESCO, recognizing the importance of promoting equitable access to information and knowledge, especially in the public domain, and reiterating its conviction that UNESCO should have a leading role in encouraging access to information for all, multilingualism and cultural diversity on the global information networks, adopted at its 32nd session in October 2003 the Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace.

Prof. Michael Clegg, Inter-Academy Panel and the U.S. National Academies, Inter-Academy Panel Initiatives on Promoting Access to Scientific Information

The Inter-Academy Panel (IAP) is a global network of 90 science academies designed to promote their greater participation in science policy discussions. Toward that end, the IAP creates partnerships among its member institutions and works closely with other scientific organizations. Because of the importance of the issue of improving access to scientific data and information as a matter of science policy at the national and international levels, the IAP issued a policy statement on this topic in December 2003. It is currently launching a new initiative under the leadership of the U.S., Chinese, and Senegalese Academies of Science on access to scientific information in developing countries. This presentation will discuss the role and functions of the IAP and will describe in greater detail the IAP’s activities in promoting better access to scientific information worldwide.

Dr. Peter Schröder, Organisation for Economic Co-operation and Development, Towards International Guidelines for Access to Research Data from Public Funding

The ever increasing use of information and communication technologies (ICT) is bringing dramatic changes to the way our global science system operates. Digitisation has become an essential part of the scientific process and the management of research. To sum these changes up in a simplified manner: Yesterday’s scientists studied Nature, but today’s scientists study digital data—digital data on nature to be sure. Sir Isaac Newton did not need more than a pencil and a pad to process his observational data into the ground breaking scientific laws. But today the next step in physics demands a Large Hadron Collider that will produce 12 to 14 petabytes of digital data per year, the full capacity of about 16 million CD ROMs, to be analysed by some 6,000 researchers, scattered around the world, but tightly knit by the Grid computer-network of our global science system.

In this way use of ICT has made collections of scientific data in many respects comparable to musical scores: to be used time and again for a diversity of performances by a diversity of artists for the different audiences of society. Optimum access to research data should enable researchers from all over the world to compose the full score for our knowledge-based international society.

Consequently, access to the gold mine of research data has become quickly a major issue in international science policy and research management. The traditional exchange arrangements between scientific colleagues no longer suffice to guarantee the necessary openness of access to digital data resources. Optimum access requires formal agreements on the conditions of access on the national and international levels. The main task of establishing an adequate regulatory framework lies within the research community: the national research councils, institutes and funding agencies. But the general principles to build data access regimes should be a responsibility of governments. Considering the international dimensions of the scientific effort in general and of access to data in particular, national data access regimes will only work when closely connected to international agreements.

At the meeting at ministerial level of OECD’s Committee for Scientific and Technological Policy on January 30, 2004, the ministers responsible for science policy endorsed a Declaration on Access to Research data from Public Funding including a draft set of principles and Guidelines. The Declaration will be an important step towards further international scientific co-operation.

Carthage Smith, International Council for Science

The mission of the International Council for Science (ICSU) is to strengthen international science for the benefit of society. One of the key principles that underpins this mission is the Universality of Science, i.e., all scientists should have the possibility to participate, without discrimination and on an equitable basis, in legitimate scientific activities whether they be conducted in a national, trans-national or international context. Scientific data and information are the input and product of scientific research and the practices and policies that dictate their use must reinforce the universality of science and in so doing improve science for the benefit of everyone.

Since its inception ICSU, whose membership includes both international disciplinary science Unions (26) and national interdisciplinary science bodies (101), has been involved in scientific data and information issues. A particular focus has been on the international and interdisciplinary issues relating to data production, management and access, When ICSU was established in 1932, the challenges were very different than they are today. While data exchange was logistically more difficult and much slower in 1932, it is paradoxical that the incredible advances in information and communication technologies that have taken place in the last decade have also, in many ways, made the exchange of data much more complex.

In order to address the key management and policy challenges, ICSU has established a number of specific interdisciplinary bodies for data and information, including the Committee on Data for Science and Technology (CODATA), a co-organizer of this workshop, and the International Network for the Availability of Scientific Publications (INASP), also represented here. ICSU also developed the major international research programmes on global environmental change and is currently planning major new scientific initiatives such as the International Polar Year 2007-8, which produce, collect, analyze, and disseminate large amounts of diverse scientific data from and for many sources around the world.

With regards to data policy, ICSU is a strong advocate of “full and open access” to scientific data and “universal and equitable access’” to scientific publications. ICSU has been very actively involved with CODATA, UNESCO and other international science partners in the UN World Summit on the Information Society (WSIS). On behalf of the international science community we argued very strongly for recognition of the critical role and needs of science in the information society. In particular an agenda for action—Science in the Information Society—was developed, which highlighted the key issues relating to preservation of and access to scientific data and information. This agenda was very influential and its main recommendations were incorporated into the formal documents that were agreed by Heads of State at the end of phase I of WSIS (Geneva, Dec, 2003).

In the light of the rapidly changing international scientific (and political) landscape and in parallel to WSIS, ICSU has also just completed an assessment of the international needs for scientific data and information—production, management, access and dissemination. While the report of the this far-reaching assessment has not yet been published, it addresses many of the future challenges related to data preservation and access and identifies clear actions for ICSU, the scientific community, policy makers, funders and other stakeholders. It is clear that many of the existing practices, mechanisms, structures, and policies relating to scientific data and information need to be over-hauled if the optimum benefit from data and information is to be obtained for both science and society.

Session 3: Overview of Thematic Areas in China

Zhang Xian’en, Ministry of Science and Technology of China, China’s Open Access to Scientific Data Program

Academician QIN Dahe, Meteorology Bureau of China, Earth sciences, Environmental and Natural Resources

Academician LIU Depei, Chinese Academy of Medicine and Chinese Academy of Engineering, Life Sciences and Public Health

Academician HU Qiheng, Chinese Association of Science and Technology of China, Information, Journals and Digital Libraries

Session 4: Policy and Management

Policy and Legal Issues

Peter Weiss, J.D., U.S. National Weather Service, Borders in Cyberspace: Maximizing Social and Economic Benefit from Public Investment in Data

Many nations are embracing the concept of open and unrestricted access to public sector information—particularly scientific, environmental, and statistical information of great public benefit. Federal information policy in the United States is based on the premise that government information is a valuable national resource and that the economic benefits to society are maximized when taxpayer funded information is made available inexpensively and as widely as possible. This policy is expressed in the Paperwork Reduction Act of 1995 and in Office of Management and Budget Circular No. A-130, “Management of Federal Information Resources.” This policy actively encourages the development of a robust private sector, improved access to critical information in the academic and research sector, and offers to provide publishers with the raw content from which new information services may be created, at no more than the cost of dissemination and without copyright or other restrictions.

In a number of countries, particularly in Europe, publicly funded government agencies treat their information holdings as a commodity to be used to generate revenue in the short-term. They assert monopoly control on certain categories of information in an attempt—usually unsuccessful—to recover the costs of its collection or creation. Such arrangements tend to preclude other entities from developing markets for the information or otherwise disseminating the information in the public interest. The world scientific and environmental research communities, and especially developing nations, are particularly concerned that such practices have decreased the availability of critical data and information. And firms in emerging information dependent industries seeking to utilize public sector information find their business plans frustrated by restrictive government data policies and other anticompetitive practices.

Recent economic research, initiatives at the European Commission level, UNESCO, and OECD and in individual countries are examined with a view towards creating an international framework for open and global data sharing.

LIN Xin, Ministry of Science and Technology, A Framework of the Legislation System for Open Access to Scientific Data

Institutional and Management Issues

Dr. Raymond McCORD, Oak Ridge National Laboratory, U.S., Managing the Impacts of Change on Archiving Research Data

The archiving of scientific data and information is made more difficult by the evolving changes associated with research accomplishments. Research discoveries lead to a continual series of revisions to sampling schemes, measurement methods, and scientific objectives. All of these changes add to the scope and complexity of information that must be recorded and logically organized as part of a successful data archive. Recording and communicating these changes for future data users is facilitated by additional supporting information and a futuristic evaluation of the rules that define the information. Most of the available information technology (hardware, software, and implementation methodology) originates from business applications, which are designed to accommodate fundamentally different patterns of change. Managers of scientific data archives will need to adapt the traditional designs of information systems to meet these special features of research data and its users. Management needs to encourage extra effort during initial design and later operations to accommodate the future changes that will occur.

Local and Regional Issues

Prof. Menas KAFATOS, School of Computational Sciences, George Mason University, U.S., Local and Regional Earth System Science Applications and Associated Infrastructure: The Mid-Atlantic Geospatial Information Consortium

The VAccess/Mid-Atlantic Geospatial Information Consortium (MAGIC) is a federated consortium of seven universities or educational institutions supported by the National Aeronautics and Space Administration (NASA). It focuses on the usage of Earth observing and other data for local, regional applications that cut across different focus areas. They include public health, carbon sequestration, regional effects of climate change, severe weather and other natural disasters, urban and watershed pollution, and wetlands. MAGIC also includes the requisite infrastructure, such as HRPT antenna for real-time monitoring of tropical storms, metadata search engines, standards, spectral libraries and grid-based technologies for seamless data and metadata access. The open-source and federated nature of the data present interesting data access challenges.

WANG Qinmin, Fujian Political Consultative and Department of Science and Technology, Fujian Province, Digital Fujian

Session 5: Breakout Panel Discussions on Crossdisciplinary Issues

5-1: Policy and Legal Issues

This discussion will expand on the issues examined in Mr. Weiss’ previous presentation—Borders in Cyberspace: Maximizing Social and Economic Benefit from Public Investment in Data. It will address additional details not covered in the first presentation.

Jerome Reichman, Duke Law School , U.S., and Paul Uhlir, U.S. National Academies, Global Trends to Restrict Access to Data from Government-Funded Research

Scientific data produced from government-funded research constitute a fundamental element of the modern research infrastructure and, if well managed, can greatly accelerate scientific progress at the national and international levels. Newly emerging possibilities for greatly enhancing this role of scientific data resources in the digital environment truly constitute another “endless frontier.” High-level policy attention is necessary in order to maximize the inherent value of data collections and to minimize the negative effects of forces that can undermine the full realization of the value of those data.

In this first presentation we review the many economic, legal, and technological pressures that have been placed already on public-domain scientific data throughout the world. From an economic perspective, the trends to privatize the government’s public-good functions and to commercialize more of the academic sector’s research activities have been underway for over two decades throughout the world. While these trends can support significant research advances and economic benefits, they are not without costs. Moreover, we see a further continuation of such privatization and commercialization of upstream public-domain resources as potentially having greater associated costs than benefits.

In addition, recent changes to the international copyright law regime and the widespread use of licensing contracts of adhesion in commerce—as well as exclusive licensing agreements on onerous terms for research tools and technologies in academia—are further diminishing the availability of public-domain data in science. Moreover, these highly protectionistic legal mechanisms are increasingly enforced by more effective digital rights management technologies. Such developments are intensifying the tensions that already exist between the policies that favor sharing of scientific data and the perceived need to restrict access to and uses of data in pursuit of increased commercial opportunities. Restrictions on access to and dissemination of potentially sensitive research data and information are further constraining the availability of substantial amounts of material in the public domain. Finally, the recent enactment of a powerful new database protection statute in Europe and proposals for equivalent legislation in the United States and in other countries might be expected to push these tensions into other areas of public research, that up to now have been less affected by the proprietary pressures from the commercialization and privatization trends.

John Willinsky, University of British Columbia, Canada

With both policy and legal issues, there is a rising sense of research and scholarship falling within the more general public’s right to know that supports initiatives for increasing and opening access to research. To begin with the policy issue, there are two aspects to consider, as government policies can determine how scientific knowledge circulates and as policies are affected by the research that is consulted in its formation. While my work has been largely on policymakers’ access to research, shifts are taking place in policies affecting science, and these shifts are motivated by the basic human right to know, as recognized, for example, by the U.N. Universal Declaration of Human Rights. They also are motivated by greater demands for accountability and transparency in the government’s administration of funding in areas such as government research grants. In Canada, for example, one of the principal granting councils for the social sciences and humanities is transforming itself into a “knowledge council,” which gives a high priority to the public impact and awareness of research. Now, in terms of our study of policymakers’ actual use of research, online access is having a substantial impact, increasing the amount of research consulted, even as the policymakers are largely restricted to “open access” or free materials due to budgetary restrictions and the limited number of subscriptions held. Online access has also expanded policymakers’ circle of consultation, as they are relying less on a small set of academics to advise them. The role of research in policymaking has recently been challenged in the United States, as the Council of Concerned Scientists has taken the government to task for a misreading of the literature in setting standards for the environment. Greater public access to the research literature would seem to support more informed and rational policymaking.

In terms of legal issues, two pertinent areas of law are Freedom of Information (FOI) and copyright. In the United States, for example, there are measures afoot to bring federally funded scientific research within the purview of FOI, while charges have recently been laid in New York against a major drug company for suppressing unfavorable research to its medication. These are also instances of rising public expectations of science that people have a right to know what is known. The copyright issue with scholarly publishing is occasionally portrayed as a matter of protecting authors from plagiarism with open access reducing the likelihood of getting away with plagiarism considerably. The more substantial legal issue concerns the basic principle of copyright, namely, to protect the interests of the author and the public. Here a new argument can be introduced pointing to how open-access publishing, made possible by online technologies, best serves the interests of the author and the public, suggesting that publishing models that depend on subscriptions and copyright control infringe unnecessarily on both the author’s and public’s rights. In short, increasing access to research has much to contribute to the policy and legal issues entailed in scholarly publishing.

5-2: Institutional and Economic Issues

Dr. Belinda Seto, National Institute of Biomedical Imaging and Engineering, Data Sharing Policy of the National Institutes of Health

Data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health. The National Institutes of Health (NIH) endorses the timely sharing of research data to serve these and other important scientific goals, particularly research data from NIH-supported studies for use by other researchers. This presentation will outline the guidelines associated with the recently published NIH Data Sharing Policy pertaining to final research data, and discusses as a case study the evolution of expectations and requirements for pre-publication data sharing associated with the Human Genome Project and genetics-related research data. Resources supported by the NIH National Library of Medicine that are available to assist investigators in the sharing and accessing of research data and information will also be highlighted.

Roberta Balstad Miller, Center for International Earth Science Information Network, Operating a Twenty-first Century Data Center

In the past several decades, large-scale data resources have assumed an increasingly larger role in scientific research, particularly research on Earth and its environment. There are a number of reasons for this, including advances in computational technologies, software, and observational capabilities, and a growing emphasis on empirical and interdisciplinary research. One of the consequences of the increasing dependence on data resources across fields of science is that in the coming century, the scientific community must devote a larger share of its resources and energies to data management and preservation than it has in the past.

A recent Priority Area Assessment on Scientific Data and Information, presented to the strategic planning committee of the International Council for Science (ICSU) earlier this month, emphasized that the scientific community needs to develop strategies for data management over time periods of decades to centuries. The report also stressed the critical importance of professional management of data. That is, it is no longer sufficient for the scientists who analyze scientific data to be responsible for managing those data; professional data managers, with professional expertise, are needed. Finally, the report recommended that there be a common international approach to data and information management. I will look at each of these points in terms of my own experience in running a data center, the Center for International Earth Science Information Network (CIESIN) at Columbia University.

  1. Planning for long-term data management
  1. The role of data centers vs. archives
  2. The importance of obtaining regular scientific advice on data and archiving decisions
  3. The need for long-term financial support for data center and archival operations
  1. Professional data management
  1. Technological drivers of data center operations
  2. Financial implications: hardware, software, and training
  3. Rapid pace of change and the need to update data, software and hardware
  4. Need for career incentives and rewards for data production
  1. Common strategies, standards, and software
  1. Benefits of an international approach to scientific data and observations
  2. Need for software interoperability

Ted Bergstrom, University of California at Santa Barbara

The technology of the production and distribution of information is very different from that of producing and distributing consumer goods like automobiles or shoes. The marginal cost of providing an automobile or a pair of shoes to one more consumer is about the same as the average cost per consumer of producing these goods. In contrast, while there are costs to gathering or creating information, the marginal cost of supplying this information to an additional consumer is small. In fact, with electronic access this marginal cost is almost zero. Because of this simple technological fact, efficient pricing methods are very different for information goods than for ordinary physical goods. Prices for ordinary commodities serve the dual purpose of repaying the producers for the costs they have incurred and of restricting consumers from consuming goods that are worth less to them than the costs to others of providing these goods. With information goods this does not happen. The social cost of allowing an extra reader online access to information is almost zero. Thus it will not be possible to repay the costs of producing information in the first place by pricing at marginal cost. If, on the other hand, prices are set high enough so that revenue from users will repay the total supply cost of the information, then some potential users will be excluded from access to this information even though they would benefit.

I intend to discuss the performance of alternative institutions for provision of scientific information that have been prominent in the academic community with a particular focus on academic journals, along with a short history of the development of these institutions over the last century. These include non-profit scientific societies, university presses, government agencies, and for profit publishers. I also discuss the benefits and costs of the non-profit versus the for-profit business model and the practice of some of the "non-profit” societies of using profits from their journals to subsidize other activities.

5-3: Management and Technical Issues

Dr. Raymond McCORD, Oak Ridge National Laboratory, U.S., Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving

The structure and content of scientific data and information can be very complex. Successful archiving of data requires that the variation in this complexity be minimized. The efficient operation of information systems and effective communications with future data users are enhanced by minimizing the variation in the logic, concepts, and keywords used in the metadata. Some of the complexity is inherent to the variety of measurements and materials included in the research and cannot be avoided. Additional complexity occurs as archived information is aggregated into more extensive systems and accessed by broader user communities. This presentation will address management and institutional issues that must be considered to avoid unnecessary complexity and uncertainty in the archived information. The impacts of many dimensions of programmatic scale (volume, diversity, longevity of data and research programs) will be considered. Institutional impediments and incentives also affect the willingness of scientists to contribute information to archives. The documentation and archiving of data should be integrated with the publication process as part of the “modern scientific method” and should receive similar incentives. Management should reinforce these practices by insisting on early planning for data archiving and providing specific rewards for these activities. Other management issues include protecting initial discovery opportunities, supporting long-term stewardship of data (answering questions after the project is completed), and providing “cross training” of archive personnel in both scientific and information disciplines.

Anne Linn, U.S. National Academies, Involving the Private Sector in the Environmental Enterprise

Addressing environmental (atmosphere, ocean, land) issues requires expensive observations taken from all parts of the world and participation by the government, academic, and private sectors. Ten years ago, the roles of these sectors in the United States were straightforward: government agencies collected observations and disseminated information to the public; academia used government-collected data for research; and private sector organizations used government-collected observations for developing information products targeted to paying customers. However, advances in science and technology have made it possible for private sector organizations and academia to perform many government tasks, including collecting data, running models, and disseminating information. These overlapping activities create potential inefficiency and friction, especially because the sectors have different goals and require different data policies. U.S. government agencies and environmental science researchers require full and open access to data (i.e., data are available without restriction for any use for no more than the cost of reproduction). On the other hand, private sector organizations must restrict access to data in order to generate a financial return. Two recent National Research Council reports provide guidelines for resolving these different data policy requirements and for easing friction between the sectors. These reports found that it is counterproductive to establish rigid boundaries between the sectors or to proscribe what each sector should do. Although decisions have to be made on a case-by-case basis, the private sector’s greatest contributions to the environmental enterprise are likely to be the development and distribution of value-added products and the collection of certain observations.

Feng Zukang, Protein Data Bank, U.S., Data Integration and Management: A PDB Perspective

The Protein Data Bank (PDB) is the single worldwide repository for the structures of biological macromolecules and currently contains over 25,000 entries. The PDB provides a rich history from which to explore the practices of biological data management, as it contains a data set that has many characteristics found in other biological data—diversity, complexity, variable quantity and variable quality of annotation.

A database resource is only as good as the data it contains. It concerns data representation, acquisition, annotation, and distribution. The data representation used by PDB is the macromolecular Crystallographic Information File (mmCIF) standard mmCIF dictionary. The mmCIF dictionary contains notion of relations (categories), attributes (date items), primary and secondary keys (mandatory date items), and so on. Although the dictionary was written in STAR (Self-defining Text Archival and Retrieval) format, the ontology and its derivations are independent of STAR or any other particular file format. It can be automatically converted to XML DTD/Schemas, SQL Schema, CORBA IDL and so on. The PDB has been actively involved in various aspects of automated and accurate data acquisition, annotation and distribution. It provides integrated software systems for building robust automated data pipelines.

Biological data management concerns more than just the technical aspects; there are sociological and political issues as well. A key element for success is good communication among those running the resource, who need to have diverse skill sets, and among every member of the team and the communities they represent. Community feedback must be treated seriously and lead to a prioritized set of action item to be addressed by the resource available. The technology must take advantage of the most innovations in hardware and software. These technological developments, however, must be introduced so as to enable and not disrupt the users of the resource. It is critical to maintain an interactive dialog with the user community about desired new functionalities and the feasibility of their implementation. Beyond all else is the need for good data and a robust data representation that is flexible enough to meet the needs of a changing science.

Helen Doyle, Public Library of Science, Launching a New Open Access Journal

The rapid spread of the Internet and concurrent innovations in electronic publishing have dramatically transformed scientific publishing and the dissemination of technical information and data. Electronic publishing has spawned an entirely new economic model for the sharing of scholarly research, just as the Internet has transformed other areas of commerce and information management. This model, known as open-access publishing, is predicated on the fact that an article published on the Internet can be read by ten readers or ten million with virtually no additional cost to the publisher. Many new open access publishers, including the U.S.-based Public Library of Science and UK-based BioMedCentral, generate revenue through one-time publication charges, which are generally paid from the author’s research grant or from institutional or library funds, rather than by charging subscriptions or fees for readers. Covering the cost of publication once through publication charges then allows the content to be made free to all readers.

New open access journals face many challenges, just as any new journal trying to compete with the most prestigious ones in their field would. In addition to questions about the viability of the open access economic model, these challenges include: convincing authors to submit their best work; establishing a rigorous peer review system and respected editorial board; creating a high-quality production; and marketing the journal to the appropriate audiences. Conversely, there is also great opportunity for new electronic open access journals to be innovative in their publishing practices: managing manuscripts and the peer review process electronically; developing interactive web functionalities for different content and audiences; and encouraging the creative re-uses of the content allowed by a more liberal copyright license. These challenges and opportunities will be discussed using the Public Library of Science’s new open access journal, PLoS Biology, launched in October 2003, as an example.

Demonstrations of Open-Access Initiatives

Open Journal Systems: A Live Demonstration of Open-Source Journal Management Software, John Willinksy, University of British Columbia, Canada

One element that is critical to increasing access to research and scholarship is the effective use of the Internet to reduce publishing costs while improving publishing quality. To illustrate how that can be done, this demonstration will feature an example of open-source software designed specifically for that purpose. Using a demonstration journal set up on the Web for this purpose, Open Journal Systems (OJS) will be used to illustrate how such software works (http://pkp.ubc.ca). Developed by the Public Knowledge Project at the University of British Columbia, OJS is a highly flexible editor-operated software that establishes a Web site for the journal that is used to manage all of the editorial and publishing processes, as well as indexing each article and establishing an archive for the journal. At this point over 30 journals from around the world have installed OJS on an association, department or university library server, and are using it to manage and publish journals in three languages. This real-time demonstration of OJS will establish how it can reduce the time and energy devoted to the clerical and managerial tasks associated with editing a journal, while improving the record-keeping and efficiency of editorial processes. This demonstration will also cover how OJS supports the entire range of online journal publishing processes, from setting up the journal website, through the author’s submission to the peer review, editing, publication, archiving, and indexing of the journal. OJS also helps to manage the people aspects of organizing a journal, including keeping track of the work of editors, reviewers, and authors, notifying readers, and assisting with the correspondence. It will point out how OJS can improve the scholarly and public quality of journal publishing through a number of innovations, from making journal policies more transparent to improving indexing.

International Network for the Availability of Scientific Publications, Pippa Smart, INASP, U.K.

It has been recognised that research and access to research results play a vital role in the development of all countries, and that transitional and developing countries often cannot obtain and make use of research which would benefit them. In response to this "information divide" the International Network for the Availability of Scientific Publications (INASP) was established in 1992 as a programme of ICSU to facilitate networking, to promote awareness of information development and to improve access to information. INASP now performs a range of activities to enhance the sharing and use of information, particularly using online technology. INASP works with a range of communities including researchers, publishers, librarians and health workers, and work closely with relevant organisations and associations. In response to requests for support and assistance, the Programme for the Enhancement of Research Information (PERI) was launched in 2002. This programme provides support for, (1) access to global information, (2) increased visibility for local publications, (3) training in the use and management of online information, (4) support for publishers and editors and, (5) research and networking support. There continues to be a number of challenges for information access, and INASP is continually developing its activities to respond to requests from partners and increase its activities to develop sustainable methodologies.

Session 6: Crossdisciplinary Issue Reports

This plenary session will consist of rapporteur reports from the three main crossdisciplinary areas discussed in Session 5 above.

Session 7: Future Opportunities

Shuichi Iwata, University of Tokyo and President, CODATA, Future Role of CODATA

The future role of the Committee on Data for Science and Technology (CODATA) is discussed after a brief review on the history, objectives, and biggest impact in the last 5 years. In addition to traditional activities such as meetings, workshops, and publishing for improving data quality, reliability, management and accessibility of data of importance to all fields of science and technology, CODATA is now working to enhance such international activities by (1) project-oriented approaches to highlight models to follow up on fruitful ideas created and proposed in the traditional activities; (2) articulating issues in global access to scientific and technical data and identifying missions of CODATA through intensive commitments to global and societal activities, such as the World Summit on the Information Society (WSIS), as well as ones for science and technology; and (3) expanding human dimensions to enhance data flows beyond borders, disciplines, organizations, and generations, and to extract manifold values from data. As an expansion of current activities of CODATA, potential actions for the future are given as CODATA’s mandate in the 21st Century, and plans for collaboration with ICSU and other organizations, namely ICSTI and INASP, are reported as new challenges for CODATA to strengthen the “human dimension.”

Prof. Liu Chuang, Institute of Geographic Sciences and Natural Resource Research, Chinese Academy of Sciences and Co-Chair of the CODATA Preservation and Archiving Task Group, Future Role of CODATA’s Task Group on the Preservation and Archiving of S&T Data in Developing Countries

Please provide.

Liu Yanhua, Ministry of Science and Technology of China, Summary Remarks

Session 8: Breakout Panel Discussions on Thematic Issues

8-1: Life Sciences and Public Health Data

Jerome Reichman, Duke Law School , U.S., and Paul Uhlir, U.S. National Academies, A Contractually Reconstructed Research Commons for Scientific Data in a Highly Protectionist Intellectual Property Environment
If the economic, legal, and technological pressures on public-domain scientific data that were discussed in our first presentation continue unabated, they will undermine some of their potential benefits to scientific research and result in lost opportunity costs across the entire research enterprise. These pressures, which are especially pronounced in the area of biomedical research, could elicit one of two types of responses. One is essentially reactive, in which the public scientific community continues to adjust as best it can on an ad hoc basis, without organizing a response to the increasing encroachment of a commercial and proprietary ethos on data produced by government-funded research. The other would require a science policy response to the challenge by formulating a strategy that would enable the scientific community to take more active control of its basic data supply. The idea is to reinforce, by voluntary means, a public space in which the data sharing ethic in public science can be promoted and insulated from some of the excessive privatization and commercialization trends, without necessarily impeding socially beneficial commercial opportunities. In this presentation we will review some institutional and legal approaches that are now being considered in the United States and Europe, and that the Chinese science policy community might consider as well in addressing this challenge in biomedical and all other types of publicly funded research.

Dr. Belinda Seto, National Institute of Biomedical Imaging and Engineering, U.S.

The National Institutes of Health (NIH) launched the Roadmap for Medical Research initiative in 2003. The roadmap is a framework of priorities, a vision for a more efficient, innovative and productive research system, and a set of initiatives that are central to extending the quality of healthy life for people in this country and around the world. Among the research priorities of the NIH roadmap is the reengineering of the clinical research enterprise. The reengineering efforts consist of multiple components, including integration of clinical research network in part to advance data sharing goals.

The National Institute of Biomedical Imaging and Bioengineering, a component of the NIH, is addressing specific issues in the clinical research network initiative. There are a number of barriers to creating successful network, which can include fundamental differences in informatics infrastructure and communication tools used at various research sites. To the extent that commonalities can be implemented and data and tools shared, studies can be initiated more quickly. For the perspective of the Institute, there is a need to create imaging database/repositories where researchers can access such data. However, access of databases and data mining require more user-friendly informatics tools. Approaches that combine images, genomic, gene expression, and patient medical records data will ultimately deliver patient-specific information at a time and place where clinical decisions are made regarding risk, diagnosis, treatment, and follow-up.

The overall strategy involves the development and standardized validation of application specific software for integration and knowledge extraction of heterogeneous clinically relevant data. Specific issues are:

  • Quantitative data integration, knowledge extraction, and clinical interpretation,
  • Linking imaging and other databases with software tools,
  • Managing software in the scientific and clinical workflow,
  • Industry-academic partnerships for software development and dissemination,
  • Database development specifically for software validation and regulatory approval,
  • Standards related to inter-operability of imaging and other databases and including results of quantitative analysis of metadata.

Feng Zukang, Protein Data Bank, U.S., Protein Data Bank: A Key Biological Resource

The Protein Data Bank (PDB) is the single international repository of three-dimensional data for biological macromolecules and currently contains over 25,000 entries. The PDB began during the late 1960s and early 1970s with community discussions about the need for such a resource. Protein crystallography was still in its infancy, but it was apparent to the producers of these structures as well as the potential users that every structure contained valuable information that needed to be archived and maintained for posterity. In June 1971, the two communities attended the Cold Spring Harbor Symposium on Quantitative Biology and agreed that the time was right to create the PDB. In October 1971, the PDB was established at Brookhaven National Laboratories as an archive for biological macromolecular crystal structures. In the 1980s the number of deposited structures began to increase dramatically. This was due to the improved technology for all aspects of the crystallographic process, the addition of structures determined by nuclear magnetic resonance (NMR) methods, and the changes in the community views about data sharing. By the early 1990s the majority of journals required a PDB accession code and government funding agencies adopted the guidelines published by the International Union of Crystallography (IUCr) requiring data deposition for all structures. The archive's growth has been accompanied by increases in both data content and the structural complexity of individual entries over years.

In October 1998, the management of the PDB became the responsibility of the Research Collaboratory for Structural Bioinformatics (RCSB). In general terms, the vision of the RCSB is to create a resource based on the most modern technology that facilitates the use and analysis of structural data and thus creates an enabling resource for biological research. Our mission is to provide the most accurate, well-annotated data in the most timely and efficient way possible to facilitate new discoveries and advances in science.

Dr. James Edwards, Global Biodiversity Information Facility, Denmark, Open Access to Scientific Data on Biological Diversity: an Urgent Need for China

Biological diversity (also called biodiversity) is generally divided into three categories: genetic-level diversity, species-level diversity, and ecosystem-level diversity. This presentation will focus on the two latter categories.

China’s species-level biodiversity is immense. For example, it contains about 30,000 different plant species, and nearly 500 mammal species. In fact, it has been estimated that the 15 so-called “megadiverse” countries, which include China, contain 70% of the world’s species of plants and animals within their borders. However, unlike some of the other megadiverse countries—for example Costa Rica and Mexico—China has not initiated systematic efforts to develop computerised databases about its biota, or to access the great wealth of biodiversity information about China that is contained in the world’s natural history collections. As a result, China cannot currently use this considerable body of knowledge for informed decision making, land management, or research.

The Global Biodiversity Information Facility (GBIF) is an international consortium aimed at making the world’s biodiversity data openly available over the Internet. Begun in 2001, GBIF’s members include 40 countries and 24 international organisations, each of which agrees to set up a computer node to share primary biodiversity data. Control of the data, including the decision on what information to make available, resides with the data providers in each country or organisation. GBIF’s role is to aid the data providers in setting up their databases and to provide a portal (www.gbif.net) that allows users to search all the databases at once.

Currently, the GBIF data portal is serving nearly 24 million records containing information about specimens in natural history collections, as well as observational data. These records are being served by 63 data providers from around the world. Even though China is not yet a member of GBIF, the portal already contains more than 45,000 records of plants and animals, representing more than 9,000 species that were collected in China.

The kind of data being served by GBIF can be a very valuable resource for many scientific and societal problems, including tracking invasive species, predicting the spread of emerging infectious diseases, optimal design of protected areas, and even making decisions about where to undertake field trials of genetically modified crops. Some innovative examples of how other megadiverse countries have used these data will be presented.

China has been much more successful at developing and archiving ecosystem-level biodiversity data. The Chinese Ecological Research Network (CERN) is a consortium of 33 field research stations and one synthesis centre. The Network was established in 1988, and currently provides access to a wide range of ecological and environmental data, including more than 3000 historical datasets (www.cern.ac.cn). CERN has also developed a comprehensive Data Sharing Policy. It is to be hoped that China will give similar attention to developing and archiving species-level biodiversity information.

8-2: Earth Sciences, Environmental and Natural Resources Data

Dr. Raymond McCORD, Oak Ridge National Laboratory, U.S., Special Considerations for Archiving Data from Field Observations

Scientific data from field investigations are fundamentally different from laboratory observations. Laboratory studies are conducted under controlled conditions, whereas field observations are collected from incompletely controlled environments. Data archives for field observations must include additional design features to accommodate, but minimize and rationalize, this additional complexity. Special techniques are needed to accommodate the following events:

  • Revisions of geographic descriptions, and changes and corrections to spatial coordinates and place names;
  • Temporary adaptations in measurement methodologies;
  • The occurrence of unmeasurable parameters (for example, no water sample from a dry well, or biological populations too large to count); and
  • Continually growing and evolving reference lists for measurements (for example, taxonomic lists for biological communities, or chemical constituents found in soil and water samples).

Adoption of standardized codes from discipline-specific reference lists is very helpful. Standardized conventions for recording unusual measurements may also be defined for some disciplines. The additional complexity of field observations can be successfully managed in a data archive with good planning and by the elimination of spurious variations in metadata codes.

Prof. Paul Richards, Columbia University, U.S., Uses of Seismic Data and the Importance of Open Access to Major Data Centers in Seismology

Scientists and engineers study seismic signals for four main reasons: (1) to make earthquake catalogs and bulletins (giving basic information for all other studies); and to study (2) earthquake hazard, (3) the physics of the earthquake source, and (4) the structure of the Earth's interior. The great progress in seismology in all these fields has been stimulated principally by the availability of better and better data, which can still be expected to improve in future decades.

China has increasingly excellent data sets of seismic waveforms, which will surely yield new insights into earthquake physics, tectonics, and the Earth's internal structure. New methods of locating earthquakes have recently been applied to limited datasets. They indicate the potential for China to produce one of the best bulletins of seismicity in the world covering a large region (more than 10,000,000 square km). Bulletins are a starting point for hazard management, as well as scientific projects in the study of Earth structure and earthquake physics. But at present there are handicaps, in that station coordinates are not made easily available, and waveform data are accessible for only a very limited numbers of stations.

Experience in the United States indicates that data centers that are not open are rarely able to attract researchers who apply state-of-the-art methods of data analysis, and, in turn, such data centers find it difficult to maintain a high quality of operation. The best research is usually associated with centers that make their data openly and easily available. Users draw attention to errors that inevitably arise in the data, they find ways to correct the data, they share information about how to use the data center effectively, and they contribute to new ways to process the data. Information from users of the data is needed to provide guidance on data center management. From this perspective, it appears that an important part of providing international scientific leadership in seismology, is making seismic data easily available to all interested potential users.

Raymond J. Willemann. GEM Technologies, U.S., Existing Infrastructure for International Exchange of Seismic Data

Knowledge of earthquake hazard can advance as a result of unrestricted sharing of seismic data, including seismic station information, bulletins, and waveforms. The infrastructure to arrange for and carry out international data exchange has existed and been used successfully for many years. The International Association for Seismology and the Physics of the Earth’s Interior (IASPEI) includes commissions to discuss specific arrangements and establish data format standards. The International Seismological Centre (ISC) collects, merges and redistributes seismic bulletin data. The Federation of Digital Seismic Networks (FDSN) helps broadband seismic networks to coordinate their activities. The United States and China help to fund and participate in each of these organizations. But the amount of data that the Chinese have been able to share lags far behind most other countries with similarly extensive earthquake monitoring. Two recent changes offer an opportunity to improve this situation. First, IASPEI recently passed a resolution urging all seismic networks to share information about all seismic stations. China might respond to this IASPEI resolution by starting to send the ISC a computer-readable bulletin each from the CEA that is complete for a network of several hundred seismic stations in China. Second, the FDSN has broadened the definition of membership to include many more regional and local networks of broadband seismic stations. In response, numerous provincial networks in China might join the FDSN. This would provide the networks both an opportunity to contribute data from selected stations to the FDSN archive and access to software and other assistance from the FDSN to establish their own data centers to distribute their own data on the Internet.

8-3: Scientific Information, Journals and Digital Libraries

Helen Doyle, Public Library of Science, An Open-Access Future

The open-access movement has gained momentum over the past several years, with increased visibility and recognition from the various stakeholder communities, including research and publishing communities. Since the Budapest Open Access Initiative began collecting signatures in February 2002, more than 3,500 individuals and organizations have signed on with their support for free access to information. The Directory of Open Access Journals at Lund University, which now contains over 1100 journals, recently announced that the launch of its second phase, allowing sophisticated searching of the full text of the Directory’s articles. The U.K.-based open-access publisher BioMedCentral now publishes over 100 open-access journals. Since its launch in October 2003, PLoS Biology--the first peer-reviewed, open-access journal of the Public Library of Science--is demonstrating remarkable strength as a competitive new journal: submissions are increasing; readership, measured as visits to the site and as downloads of individual articles, is increasing; and PLoS Biology’s reputation as a high-quality, peer-reviewed journal is increasing among scientists, publishers, librarians, and other stakeholder groups.

In addition to a transformation of the economics of scientific publishing, open-access publishing also represents a modernization of traditional copyright laws that are based on an outdated print-based economic model. In the open-access definitions used in both the Bethesda Principles and the Berlin Declaration, an open-access article can be reused and redistributed freely and without permission from the publisher, for any responsible purpose. The author retains their copyright. In the case of PLoS journals, the copyright license is the Creative Commons Attribution License, which preserves the authors right to be acknowledged for the original work.

Many journals that are labeled open access are in fact free access, meaning that the restrictions on use and distribution are the same as for many subscription based journals. It is worth noting that researchers themselves virtually never benefit financially from publication of their peer-reviewed articles. Several recent policies that may appear to be a liberalization of subscription policies are in fact small concessions to the growing demand for greater access from the scientific community that produces the articles, concessions made at little economic risk to the publishers.

Sharing data, reagents, and ideas is fundamental to the scientific process itself. Fundamental to research is the incremental building up of knowledge, data, ideas, and experiments. Open-access publishing, both the unfettered distribution and searching afforded by on-line free access and the unlimited creative re-uses permitted by less restrictive copyright licenses, will facilitate the advance of science and medicine.

Prof. Ted Bergstrom, University of California at Santa Barbara, U.S.

Pippa Smart, International Network for the Availability of Scientific Publications, U.K.

The International Network for the Availability of Scientific Publications (INASP) was established by ICSU in 1992, to provide support to networking and partnerships with the aim of bridging the increasing information divide between the developed and developing world. It now operates a range of programmes to support access to information for researchers, health workers and rural development workers. Access to research information is facilitated through its Programme for the Enhancement of Research Information (PERI). This programme not only provides access to international research, but facilitates activities to support national publications to increase their visibility and long-term sustainability. INASP recognises that research information is provided in many different forms (datasets, publications, etc.), but the scholarly journal remains one of the prime vehicles for accrediting and disseminating research information. It also appreciates that both national and international information is of importance to research. The Journals OnLine (JOL) project supports a methodology to enable national publications to have an online presence, to increase their visibility and promote communication with readers and authors. It has been particularly successful in Africa with the AJOL service that now includes 184 journals, and also supports full text online publishing. This service has been operational since 1998, and has recently re-launched on a new platform. The new software enables individual publishers to load, edit and correct their own content to further support development and use of online communication. The JOL methodology and software is available for other countries and regions to adopt for their own publications.

Lulama Makhubela, National Development Agency, South Africa, Information, Journals, and Digital Libraries: Can Developing Countries Become Key Players in the Information Society?

The connection of CODATA’s work in the area of preservation of and open access to scientific resources and the core business of South African National Development Agency (NDA) should be understood within a context that scientific knowledge in all domains—natural, social, and humanities—should lead to development. A critique on the availability and affordability of and accessibility to information, journals, and digital libraries in developing countries generally and South Africa specifically will be given. The central thesis of this presentation asks a fundamental question of whether developing countries will become key players in the Information Society. Aspects covered in the presentation focus on social exclusion, access to information and data, the centrality of language in knowledge diffusion, and the economics of information within the South African context in which the NDA operates. The presentation will suggest strategies for the preservation of and open access to data that will assist in addressing the plight of the poorest of poor in our communities.

John Willinksy, University of British Columbia, Canada

The big question facing the circulation of scientific information is what is going to come of the emerging open access movement. At issue are a growing number of arguments in support of increasing access to research through a variety of open access models. As the last speaker in this panel, I will review those arguments for increasing access to scientific information that have not been raised by the previous panelists, with a thematic focus on how increasing access to knowledge is in the interests of both people and knowledge. I come to this as one who conducts research on scholarly publishing and who develops open-source publishing solutions to increasing access to research. The case for increasing access has components that can be roughly categorized as epistemological, historical, development, political, public, economic, and legal in nature, and is directed toward faculty members, librarians, policymakers, and the public. The epistemological argument for open access, for example, has to do with how dependent a knowledge claim is on being fully open to review and critique. Anything that unduly restricts the circulation of knowledge, especially among “legitimate” participants in its construction, reduces that body of knowledge’s claims to validity and reliability. If the current subscription publishing model can be shown, as I believe it can, to contribute to declining levels of access, then those models are not what we might call epistemologically conducive to the development of knowledge. This concern with exploring new publishing models leads to the historical argument, which draws on precedents from an earlier era of publishing innovation, using Isaac Newton as a leading instance. Newton is well known for being a highly secretive scientist and a reluctant author. Nevertheless, after he had tracked the birth and emergence of the scientific periodical, with the launching of the Philosophical Transactions in 1655, for only a few years, he understood that this publicly circulated, relatively inexpensive 16-page journal represented something important to science. He allowed one of his letters, on optics, to be published in the Transactions. It was a move he came to regret and did not do again, but this early experience in open access went a long way in shaping what became the norms of the scientific article. The development (or developing countries) argument for increasing access has everything to do with the parallel development in China of both economic growth and scientific papers published (with an increase by a factor of 10 since the 1980s). Developing countries are suffering a knowledge gap, even as their university population grows, and universities in the West are contributing to that gap, as their work becomes increasingly expensive to access (with some generous exceptions negotiated by INASP and the WHO). The public and political arguments for open access to research are about people’s basic right to know, especially in matters of publicly funded research and scholarship. The value of exercising that right is affirmed by the health revolution brought about by public access to medical information. Public access to research also speaks to greater accountability demanded of professionals (physicians, educators, lawyers, etc.) and the increasing role of interest groups in selectively presenting information to the public, against which full access to the research would act as a safeguard. As time permits and questions arise, additional arguments having to do with the vanity of authors, the future of the library, and the economics of open access will be presented.

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top