The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
USNC HOME

COMMITTEE MEMBERS

ACTIVITIES

PUBLICATIONS

RELATED LINKS

CONTACT US

BISO HOME

LOCAL SEARCH


Strategies for Preservation of and Open Access to Digital Scientific Data in China

The National Academies

Policy and Global Affairs Division

Board on International Scientific Organizations

U.S. National Committee for CODATA

in collaboration with

Chinese National Committee for CODATA

and

CODATA Task Group on Preservation and Archiving of S&T Data in Developing Countries

AGENDA

Background

As a major producer of scientific, technical, and medical (STM) data and information, and as a partner for international cooperative research, China potentially has a great deal to offer to the world’s knowledge base. Although China’s scientific and technical research capabilities are rapidly improving, significant problems remain with regard to its digital archiving and access policies that inhibit more rapid progress and improved international cooperation.

Since 2000, the U.S. National Committee for CODATA and the Chinese National Committee for CODATA have held a series of bilateral meetings with senior science officials and data managers from both countries to discuss various data management and policy issues (see http://www7.nationalacademies.org/usnc-codata/ China_US_Data_Seminars.html). Particularly noteworthy in this regard was the high-level data access initiative announced in February 2003 by Xu Guanghua, the Chinese Minister of Science and Technology, and supported by the National People's Congress. The initiative, which was substantially attributed to the results of the bilateral CODATA meetings (private communication from Liu Chuang, 2002), includes "creating a law to ensure that scientific information is communicated more widely, and co-ordinating efforts by government departments to develop information centres and databases to facilitate the communication of scientific and technological information” (Jia Hepeng, "China urges its researchers to share data," SciDevNet, 14 March 2003). This new policy toward greater openness with publicly funded STM information appears to be part of a broader effort to liberalize the national public information regime, as evidenced by the current drafting of a Freedom of Information law that is expected to be completed in 2004.

A fourth bilateral CODATA meeting of data experts was held in October 2003, on scientific resources sharing policy. This meeting provided some of the advance groundwork for this project and re-confirmed the commitment of the Chinese science policy community to promoting greater openness regarding Chinese research data and literature, and identified the priority areas for initial focus. The proposed workshop is intended to help support those advances.

Indeed, the effective long-term preservation of and open access to these information resources in all countries increases in importance as an essential component of the global public research infrastructure, which can now be integrated through the Internet. The challenges in storing and maintaining access to these growing collections of data and information are substantial, even in more economically developed countries. Moreover, although many of the challenges that require sustainable solutions are the same for digital data and information across all disciplines, others are distinct or unique to certain disciplines or types of information. Some solutions may be based on extending or emulating existing successful models, while others may benefit from entirely new approaches that are context-dependent.

China faces substantial hurdles in this regard. Although many of their STM data resources and especially their journals still reside in paper formats, China already has significant digital information preservation and access needs that in many cases are not being successfully addressed. Factual databases and the STM literature can provide an important research, economic, and policy tool for China—just as they do in more economically developed countries—for capacity building in science and education, for supporting sustainable development of commerce and industry, and for promoting good governance. Resolving the many different problems in the preservation and open archiving of digital S&T data and information successfully today will provide great benefits for future generations; the costs of inaction will be incalculable, but certain to be substantial. At the same time, it is important to recognize that even the most economically developed countries have encountered various difficulties with the preservation and open archiving of digital data and information, so it could be counterproductive to push for digital archiving in the Chinese context without careful consideration and sustainable long-term plans.

Intellectual Merit of the Proposed Activity

The proposed workshop in June of 2004 in Beijing will build on the results of the bilateral CODATA meetings and on the new major scientific data access policy initiative noted above, as well as on the results of previous U.S. National Research Council/National Academies (and other) reports on various aspects of digital information preservation and archiving in different discipline contexts. The workshop will begin by describing some of the key scientific data and information resources that require the highest priority attention for long-term preservation and open access in China. It will examine various models of open archiving that might be adopted or adapted for use within the Chinese context on a realistic and sustainable basis. And it will provide some much needed focus to these generally under-appreciated problems, and generate additional high-level attention by bringing together scientific and digital archiving experts, national science policy and funding officials, and representatives of public and private donor and development organizations, who will be able to integrate the results of this project into their future planning.

There are many difficult issues and factors that need to be addressed in successfully implementing open archives of digital scientific information resources in China, as in other developing countries. These include scientific and technical, institutional and management, legal and policy, and economic and financial aspects. The key issues and factors in all of these areas have been identified through the series of bilateral U.S.-Chinese CODATA meetings, through an informal data archiving subgroup of the USNC/CODATA between 2001 and 2003, in consultations with the CODATA Task Group on the Preservation and Archiving of Scientific and Technical Data in Developing Countries since October 2002, and in a data archiving workshop co-organized by CODATA Task Group and the USNC/CODATA with the South African National Research Council in May 2002 in Pretoria. The proposed workshop will review and discuss these aspects and sustainable digital archiving models, summarized briefly below, from the Chinese perspective.

1. Scientific and technical aspects. Developing new repositories for scientific data and information that can be openly accessed and shared requires accommodating the needs and practices of different scientific disciplines, as well as encouraging the development of interdisciplinary research values and methods. For factual databases, differences in nomenclature and taxonomy exist within, as well as across, scientific communities and countries. In addition, names and concepts may change over time, and preserving data requires preserving historical contexts. Within a specific region there likely will be differences among the mandates and objectives of individual archives. Archives for different types of data (e.g., observational vs. experimental, physical science vs. biological science, human subjects or not) may have disparate procedures and metrics for data quality, and differing criteria for appraising value and selecting data for preservation. Digital archives of the STM literature present fewer discipline-specific problems than heterogeneous data collections, although they do raise other significant institutional and policy issues, as discussed further below.

In addition to the preservation and access issues related to the practices of different kinds of scientific research, the development of digital archives also depends largely on discipline-independent technology and infrastructure requirements. These include, among others: the development and adoption of metadata standards and practices appropriate to the different goals of preservation and access; flexible search and retrieval capabilities; technological and semantic interoperability; and appropriately accommodating the evolution in technology (hardware and software), as well as data and information collected in proprietary formats and commercial databases. Some of the technological issues of digital preservation are still challenging in the developed world, and they can pose especially difficult hurdles in China.

2. Institutional and management aspects. The institutional approaches that may be considered for open access repositories of STM data and information have been greatly expanded and enhanced as a result of increasing access to global digital networks. Institutional considerations fall in two broad categories: a) establishing open access data centers and journal repositories in developing countries to serve high-priority development objectives; and b) promoting access by users in the developing world to data resources and STM journals archived in OECD countries, with emphasis on information of particular relevance to development needs. Models to consider within category (a) include: (i) extending existing digital repositories for STM data and information that are already organized in the OECD countries, (ii) forming new repositories at the national or provincial level as an open or shared resource, and (iii) creating distributed networks of data and information centers or nodes throughout the country and internationally. Examples of these models and their particular applicability to the Chinese digital archiving context will be reviewed and discussed.

Establishing and maintaining open digital STM data and information archives requires the implementation of effective operational procedures and practices. The archives need to manage the selection, appraisal, and retention process. Other operational issues for digital STM archives include properly managing the volume of bits, which is enormous and growing, even in the developing world; coping with the diversity of sources, formats, and documentation; and maintaining a sufficiently long time horizon for access in the face of continually changing definitions, digital media and formats, and hardware and software obsolescence. Planning and developing requirements for open archives must accommodate the continual change and evolution in the practice of science; the local variability in focus, practice, and available technology and other physical and human infrastructure; and the differing mandates and objectives of various STM data and information producers, as well as a diversity of potential customers, including scientists, educational institutions, businesses, policy-makers, and ordinary citizens.

Effective management of open digital STM archives also depends on an investment in training and education. The dependency of science on technology changes the role of archiving. Properly documenting data and information for long-term preservation and access must become part of the daily practice of scientists. Promoting these changes to established practice requires collaboration among scientists, managers, educators, and the other archive customers. The challenges posed by the relatively complex management requirements for successful digital archiving must be addressed for all such functions.

3. Intellectual property and policy aspects. With regard to STM data resources, most research databases and data centers in China are managed directly by government ministries and subject to a relatively restrictive state information regime based on official secrecy requirements. This is a major challenge to the adoption of an open-access model, since these policies are founded on deeply rooted political, institutional, and cultural factors, some of which apply to the overall public information regime and some of which are exacerbated in the scientific context by perceived political or economic sensitivities of the subject matter (e.g., domestic disease statistics, biodiversity information, environmental degradation or resource exploitation, or the disclosure of many otherwise inexpedient facts). From a purely good governance standpoint, however, it is precisely for many of these reasons that many types of scientific data should be made openly available and usable, especially within the country, and not just for research purposes. As noted above, the recent high-level approval of the liberalization of the government's laws and policies regarding access to government-produced and government-funded academic research data has made this a very propitious time to examine the data access and archiving policies and practices in China.

The case for change in access policies to governmental STM data can be made at many levels, both internally and externally. The most effective approach is one based on the realization of self-interest, although a comparison with the policies of other countries can be effective if approached properly. Particularly auspicious is the trend over the past decade by many developing countries to adopt Freedom of Information laws (see, e.g., www.freedominfo.org), including especially the current FoI initiative in China. Of course, there are nonetheless legitimate public-policy reasons for limiting access to certain types of data, including well-justified national security restrictions, the protection of privacy and confidentiality, or private (as opposed to government) intellectual property rights.

A related problem exists in getting scientists to contribute the data produced in the course of their research to public repositories. Additional barriers may include: the lack of an appropriate data center in which to deposit the data, no requirement by the funding source to deposit the data or to share them openly, poor data management practices, insufficient recognition of the importance of these activities by the scientist’s institution and a lack of effective incentives or rewards, inadequate funding to prepare the data sufficiently to make them usable by others, and a lack of training to do so.

With regard to STM journals, these too are mostly published by government or government–sponsored organizations in China. Since they are meant to be read by the research community, they do not have many of the same official secrecy constraints as the underlying data. They do, however, have copyright protection and are still published almost exclusively in print form, so their open availability digitally raises new IP and policy issues for them.

4. Economic and financial aspects. Certainly, just as great a barrier to the establishment of well-managed digital repositories, much less ones operated under an open access regime, is adequate funding. Digital data and journal archiving and distribution are not among the most pressing recognized priorities in China, despite the importance, and considerable potential contributions, of well-managed scientific information resources to research capacity building and to social and economic development. Yet creative and well-planned approaches need not cost much to make them work. Moreover, the potential social and economic returns from openly available data and information that are relevant to high-priority problems can more than offset the relatively small financial costs of operating the archives. Once this premise is accepted at the policy level, the pressure to recoup costs may be substantially alleviated. Establishing a digital archive requires developing public and private cost models and incentives to maintain the information. The optimal outcome is to eliminate or minimize as much as possible any user fees to those least able to pay—but likely to exploit—the data and information, recognizing that any fees levied on Chinese researchers may pose an insurmountable barrier to implementing socially and economically beneficial uses. Different funding models will be examined in the workshop.

Broader Impacts of the Proposed Activity

As discussed above, the proposed workshop will explore in detail the various scientific and technical, institutional and management, legal and policy, and economic and financial aspects that need to be addressed in successfully implementing open archives of digital scientific information resources in China. The workshop will review and discuss these aspects and sustainable archiving models from the Chinese perspective. In addition, many of the issues identified and discussed during the workshop will likely apply throughout the developing world. One of the tasks of the workshop will be to identify follow-up activities that might be taken toward improving open access and preservation for digital scientific data and information selected for discussion, not only in China, but in other developing countries. The U.S. National Committee for CODATA and the CODATA Task Group expect to organize similar workshops in other developing countries, building on the experience from the Beijing workshop. Most important from the perspective of the U.S. scientific community, the policies and practices promoted for implementation in China may be expected to improve access to STM data and information resources in future cooperative activities between Chinese and U.S. researchers.

Planned Activities

This project will provide an international and interdisciplinary forum to promote a deeper understanding of, and requirements for, long-term preservation and open access to digital scientific information resources in China. A 2.5-day workshop will be held at the Chinese Academy of Sciences in Beijing 22-24 June 2004, pursuant to the following statement of task:

  1. Identify research areas in which preservation of and open access to digital scientific information requires high priority attention in China, and provide the underlying rationales for the areas chosen.
  2. Identify and discuss the scientific and technical, institutional and economic, legal and policy, and management factors relevant to providing open access to digital scientific information resources (both data and STM literature), including an examination of different possible models and their potential benefits and shortcomings in China, and drawing on examples of other digital archiving and access regimes in related areas.
  3. Review and discuss the current status of access and archiving regimes for the types of scientific information identified in task 1.
  4. Identify follow-up activities that might be taken toward improving open access and preservation for each major type of digital scientific information selected for discussion in task 1, taking into consideration the results of the discussions under tasks 2 and 3.

The workshop will be organized jointly by the U.S. and Chinese National CODATA Committees and the international CODATA Task Group on the Preservation and Archiving of Scientific and Technical Data in Developing Countries. The Chinese Academy of Sciences, the Chinese Association for Science and Technology, and the Chinese Ministry of Science and Technology will provide local logistical support and assist in meeting planning. Finally, the International Council for Scientific and Technical Information will provide additional substantive expertise.

An international planning committee will be formed under the auspices of the U.S. National Academies with expertise in the scientific, technical, institutional, management, policy, and financial factors associated with digital archiving and open access approaches. The planning committee will be responsible for helping to develop the workshop program, identify speakers and topics, and generally oversee the planning activities leading up to the workshop. This work will be done by e-mail, fax, and phone. A local organizing committee consisting of individuals from the collaborating organizations for the workshop also will be formed to help with the local logistics, arrangements, and program development.

The workshop will have approximately 100 core invited attendees consisting of invited speakers, experts in the priority areas selected for discussion in task 1, government science officials and policy makers, and representatives of public and private donor and development institutions.

Workshop Summary
A summary of workshop will be prepared the National Academies. The report will be published in hard copy and online on the National Academies Press Web site, with links to the other collaborating organizations' Web sites. The report will be translated into Chinese.

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top