News

Criteria for the Trustworthiness of Data Centres

Jens Klump
Helmholtz Centre Potsdam German Research Centre for Geosciences
jens.klump@gfz-potsdam.de

doi:10.1045/january2011-klump

 

Abstract

The use of persistent identifiers to identify data sets as part of the record of science implies that the data objects are persistent themselves. Scientific findings, historical documents and cultural achievements are to a rapidly increasing extent being presented in electronic form — in many cases exclusively so. However, besides the invaluable advantages offered by this form, it also carries serious disadvantages. The rapid obsolescence of the technology required to read the information combined with the frequently imperceptible physical decay of the media themselves represents a serious threat to preservation of the information content. Since research projects only run for a relatively short period of time, it is advisable to shift the burden of responsibility for long-term data curation from the individual researcher to a trusted data repository or archive. But what makes a data repository trustworthy? The trustworthiness of a digital repository can be tested and assessed on the basis of a criteria catalogue. These catalogues can also be used as a basis to develop a procedure for auditing and certification of the trustworthiness of digital repository.

Introduction

The rapid decay of URLs pointing to research resources was an important part of the motivation to use persistent identifiers instead of ephemeral URLs (see e.g. Wren, 2008; Lawrence et al., 2001). Surely, if we use persistent identifiers to identify digital objects as parts of the record of science these objects themselves need to be persistent and kept in long-term digital repositories and archives. How can the trustworthiness of a particular repository in a network of data repositories (e.g. DataCite data publication agents, World Data System, ESA Ground Segment, and others) be assessed?

In recent years, scientific findings, historical documents and cultural achievements are to a rapidly increasing extent being presented in electronic form — in many cases exclusively so. Besides the invaluable advantages offered by this form, it also carries serious disadvantages. In paper documents content and representation come together as one unit, where as in digital formats the content is separate from its representation and requires additional information and technology for the user to access the information. However, the underlying technology is still undergoing further development at an exceptionally fast pace. The rapid obsolescence of the technology required to read the information combined with the frequently imperceptible physical decay of the media themselves represents a serious threat to preservation of the information content. This makes our digital assets particularly vulnerable. Given the tasks outlined above, only data centres prepared for long-term preservation can be considered to be trustworthy custodians of our digital heritage.

But what makes a data repository trustworthy? This paper will discuss the fundamentals of criteria catalogues for assessing the trustworthiness of an archive for digital research data and how these criteria can be transferred into audit and certification of research data repositories and archives.

 

Reference Model and Criteria Catalogues

In the project "Publication and Citation of Primary Scientific Data" (STD-DOI), which laid the conceptual and technical foundations for DataCite, the question arose how to assess the trustworthiness of digital repositories. At the same time other groups started to investigate the issue of trustworthiness of digital archives. To help assess repositories, tools and metrics have been developed by various preservation organizations. To achieve a confluence of approaches in the definition of criteria for trustworthiness of digital archives, members of the digital archiving community developed "Ten Principles for Minimum Requirements for Trustworthy Digital Preservation Repositories" (Center for Research Libraries (CRL) et al., 2007).

As early as 1994 it became apparent that criteria for the assessment of trustworthiness of digital archives were needed (Dobratz et al., 2008; Task Force on Archiving of Digital Information, 1996). In 1995 the International Standards Organisation (ISO) approached the Consultative Committee for Space Data Systems (CCSDS) to develop a formal standard for the long-term preservation of data from space missions. In preparing a draft standard it became clear that a reference model was needed as a base for further standard building activities and that a reference model would solve cross-domain problems regarding the long-term preservation of digital materials (Rank et al., 2010). The outcome of this process was the Open Archival Information System Reference Model (OAIS-RM), or known to most as the "OAIS model". This document went through several consultation and review phases and was published as an international standard (ISO 14721:2003). This standard is currently under review and a draft recommended practice was published in October 2009 (CCSDS, 2009).

Although designed for the curation of space data, the OAIS model aims to be as context-neutral as possible and deliberately avoids jargon from both the IT and archival professions. In this way, OAIS became a lingua franca for archival information systems that has since become widely adopted because it enables effective communication among projects on a national and international scale. With its general approach and universal applicability the OAIS model also served as a reference model for criteria catalogues for the assessment of the trustworthiness of digital archives. Among these, the most widely known catalogues are:

  • Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) (Ambacher et al., 2007)
  • Catalogue of Criteria for Trusted Digital Repositories (nestor Catalogue) (Dobratz et al., 2006, 2009)

The underlying principles of all of the above mentioned criteria catalogues are derived from the fundamental concepts of quality management, as formulated in the ISO 9000 family of standards. These standards are designed to help organizations ensure they meet the needs of customers and other stakeholders (ISO, 2000). Key concepts in ISO 9000, that also apply to assessing the trustworthiness of digital archives, are the documentation and transparency of activities surrounding the digital archive, the adequacy of the activities to the stated goals and the requirements of the designated user community, and the measurability of the degree of compliance of the archive activities with the criteria for trustworthiness (Dobratz et al., 2008).

The initiatives described above do not operate in isolation from each other. While the OAIS model has already been transferred into an ISO standard, activities to derive an international standard criteria catalogue for trustworthy digital repositories are still under way. Currently, TRAC is work in progress in the ISO technical committee ISO TC20/SC13 and CCSDS (ISO/DIS 16363). The nestor Criteria Catalogue has been published as a draft standard by the German National Bureau of Standards (DIN 31644). This activity is not in competition with ISO/DIS 16363 but is intended to complement the work on the ISO draft through the international standardisation structures of ISO and its national members. In summer 2010 representatives of the respective working groups in CCSDS, DANS and DIN signed a "Memorandum of Understanding" to strengthen cooperation between these initiatives (Giaretta et al., 2010).

 

Translating Criteria into Practice — Auditing and Certification of Digital Archives

Each digital repository has its own targets and specifications. On the other hand, the criteria catalogues for trusted digital repositories have to take a general approach and thus remain at a high level of abstraction. For application to a specific domain and archive instance, the evaluation criteria have to be translated into the specified context and aligned to the needs of the designated user community. At this point, where abstract criteria are translated into specific use cases, the principle of applicability becomes important.

An example for the translation of abstract criteria for the trustworthiness of digital repositories into a specific application is the set of "European LTDP Common Guidelines" of the European Space Agency Ground Segment Coordination Body (ESA GSCB) (Albani et al., 2010) for their ground segment data centres.

In a network of data repositories it is quite likely that not all repositories operate on the same technical level. Yet it may be important to define criteria for auditing the performance of the networked repositories. As the example of the CCSDS has already shown, the need to preserve data from space missions is particularly pressing, at the same time space science has a long record of curating data. Data from space missions are not held in a central archive but are, at least initially, distributed among mission specific data systems. In this setting the need arose to find common guidelines for the long-term preservation of these valuable scientific assets.

At the European Space Agency ESA, European Space Agency Centre for Earth Observation (EO) is the largest European EO data provider. It also operates as the reference European centre for EO payload data exploitation. Long-term preservation of these data and of the ability to discover, access and process them is a fundamental issue and a major challenge at programmatic, technological and operational levels. To harmonise its approach to long-term data preservation among participating data centres the ESA Ground Segment Coordination Body (ESA GSCB), in cooperation with nestor, formulated a set of "European LTDP Common Guidelines".

The ESA "Common Guidelines" document directly addresses ESA ground segment data centres. Its criteria are referenced against the nestor Criteria Catalogue and other relevant standards (e.g. metadata encoding, security). Its structure follows the data life cycle. Early in the design process for the Common Guidelines ESA GSCB recognised that not all data centres operate on the same technical level. At the same time, the requirements towards long-term preservation may differ from case to case. To accommodate these differences among data centres the ESA Common Guidelines introduce three different levels of compliance. Each criterion is graded as essential, important, or optional. The criteria are then combined into profiles, or levels of compliance, with an entry level followed by two more advanced levels. To allow for future developments in long-term digital preservation the grading scheme and levels of compliance can be extended to allow for even more advanced levels.

An approach similar to the European LTDP Common Guidelines is proposed in the European Framework for Audit and Certification of Digital Repositories, which was outlined in a Memorandum of Understanding between CCSDS, DANS and DIN (Giaretta et al., 2010). This framework defines three levels of trustworthiness:

  • Basic Certification through the Data Seal of Approval (DSA).
  • Extended Certification through DSA plus additional publicly available self-audit with an external review based on ISO 16363 (TRAC) or DIN 31644 (nestor).
  • Formal Certification after full external audit and certification based on ISO 16363 (TRAC) or DIN 31644 (nestor).

With a wider adoption of standard-based long-term data curation we will see more applications of criteria catalogues to specific data repositories.

 

Conclusion

The need for criteria to assess the trustworthiness of digital repositories was recognised by memory institutions and by data centres many years ago. This resulted in a number of initiatives aimed at developing criteria catalogues for trusted digital archives. Data centres, in particular those organised in networks of several data repositories and archives, have shown interest also in auditing and certification of their trustworthiness as long-term digital repositories. The need for certification has led to the initiation of standardisation processes through ISO and national standardisation bodies. The standardisation process and regular exchange between the main initiatives has aided a confluence of these activities, which will lead to a harmonisation of the criteria catalogues. In addition, growing adoption of criteria catalogues for auditing of archives and networks of archives has provided useful feedback on further development of criteria catalogues and auditing procedures for the certification of trusted digital archives.

 

Acknowledgements

The author would like to thank his colleagues in the nestor working group "Trusted Archives", in the project "Publication and Citation of Scientific Primary Data", and at ESA GSCB for the interesting and fruitful discussions. The author gratefully acknowledges support by the German Research Foundation (DFG) through the project "Publication and Citation of Scientific Primary Data" (STD-DOI), by the German Federal Ministry for Education and Research through nestor, and by ESA.

 

References

[1] Albani, M., V. Beruti, M. Duplaa, C. Giguere, C. Velarde, E. Mikusch, M. Serra, J. Klump, and M. Schroeder (2010), Long term preservation of earth observation space data - European LTDP Common Guidelines (Version 1.1), European Space Agency, Ground Segment Coordination Body, Frascati, Italy. Available from: http://earth.esa.int/gscb/ltdp/EuropeanLTDPCommonGuidelines_Issue1.1.pdf

[2] Ambacher, B. u. a. (2007), Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC), CRL Center for Research Libraries, Chicago, IL. Available from: http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf

[3] CCSDS (2009), Audit and certification of trustworthy digital repositories, Draft Recommended Practice, Red Book, Consultative Committee for Space Data Systems, Greenbelt, MD. Available from: http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206520R1/Attachments/652x0r1.pdf

[4] Center for Research Libraries (CRL), Digital Curation Centre (DCC), Digital Preservation Europe (DPE), and Competence Network for Digital Preservation (nestor) (2007), Ten Principles, Available from: http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re

[5] Digital Curation Centre (DCC), and Digital Preservation Europe (DPE) (2007), DCC and DPE Digital Repository Audit Method Based on Risk Assessment (DRAMBORA), Digital Curation Centre, Edinburgh, UK. Available from: http://www.repositoryaudit.eu/download

[6] DINI AG Elektronisches Publizieren (2006), DINI-Certificate Document and Publication Services 2007 (Version 2.0), Deutsche Initiative für Netzwerkinformation (DINI), Göttingen, Germany. Available from: http://nbn-resolving.de/urn:nbn:de:kobv:11-10075687

[7] Dobratz, S. et al. (2006), Catalogue of Criteria for Trusted Digital Repositories, Die Deutsche Bibliothek, Frankfurt (Main), Germany. Available from: http://edoc.hu-berlin.de/series/nestor-materialien/8/PDF/8.pdf

[8] Dobratz, S. et al. (2009), Catalogue of Criteria for Trusted Digital Repositories, nestor materials, Deutsche Nationalbibliothek, Frankfurt (Main), Germany. [online] Available from: http://nbn-resolving.de/urn:nbn:de:0008-2010030806

[9] Dobratz, S., P. Rödig, U. M. Borghoff, A. Schoger, and B. Rätzke (2008), The Use of Quality Management Standards in Trustworthy Digital Archives, In: Proceedings of the Fifth International Conference on Preservation of Digital Objects Joining up and working: Tools and Methods for Digital Preservation, A. Farquhar (Ed.), 8 pp., British Library, London, UK. Available from: http://nbn-resolving.de/urn:nbn:de:kobv:11-10092248

[10] Giaretta, D., H. Harmsen, and C. Keitel (2010), Memorandum of Understanding to Create a European Framework for Audit and Certification of Digital Repositories, [online] Available here.]

[11] ISO (2000), ISO 9000:2000: Quality management systems — Fundamentals and vocabulary, Standard, International Organization for Standardization (ISO), Geneva, Switzerland. Available from: http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=29280

[12] Lawrence, S., F. Coetzee, E. Glover, D. Pennock, G. Flake, F. Nielsen, R. Krovetz, A. Kruger, and L. Giles (2001), Persistence of Web References in Scientific Research, IEEE Computer, 34(2), 26-31. doi:10.1109/2.901164

[13] Rank, R. H., C. Cremidis, and K. R. McDonald (2010), Archive Standards: How Their Adoption Benefit Archive Systems, In: Standard-Based Data and Information Systems for Earth Observation, L. Di and H. K. Ramapriyan (Eds.), pp. 127-142, Springer Berlin Heidelberg, Heidelberg, Germany. doi:10.1007/978-3-540-88264-0_8

[14] Sesink, L., R. van Horik, and H. Harmsen (2008), Data Seal of Approval, Data Archiving and Networked Services (DANS), Den Haag, The Netherlands. Available from: http://www.datasealofapproval.org/

[15] Task Force on Archiving of Digital Information (1996), Preserving Digital Information, Commission on Preservation and Access and the Research Libraries Group, Mountain View, CA. Available from: http://www.rlg.org/legacy/ftpd/pub/archtf/final-report.pdf

[16]Wren, J. D. (2008), URL decay in MEDLINE-a 4-year follow-up study, Bioinf., 24(11), 1381-1385, doi:10.1093/bioinformatics/btn127

 

Data Seal of Approval awarded to data repositories

30 may 2011

Researchers seeking to share or reuse data through a repository need to be sure that the repository takes appropriate measures to ensure the long-term availability and integrity of the data. This is why DANS developed the Data Seal of Approval (DSA) quality mark, which formulates 16 guidelines related to trustworthy data management and stewardship. Since 2009 the DSA has been managed and further developed by an international board.

We are pleased to announce that six repositories have now acquired the DSA: the Archaeology Data Service (ADS, UK), the DANS Electronic Archiving System (EASY, NL), the Inter-university Consortium for Political and Social Research (ICPSR, USA), the Platform for Archiving CINES (PAC, FR), the Language Archive of het Max Planck Institute for Psycholinguistics (NL), and the UK Data Archive.
 
To receive the Data Seal, a repository must first complete an online self-assessment that gauges its adherence to the guidelines. The assessment is then peer reviewed by the DSA Board before the Data Seal of Approval is awarded. Once the Data Seal is granted, the repository may display the seal on its Web site with a link back to the assessment to provide transparency to repository users.
 
The DSA has received a positive international response, as it is a relatively light-weight mechanism that functions as a first step towards the more stringent certification methods currently under development in Europe. What’s more, it can be applied to all disciplines. After receiving the DSA Stewart Jeffry from ADS blogged: ‘We are delighted that we have achieved a distinction that reflects so well on all the hard work that our curatorial team have put into ensuring that the ADS conforms to internationally recognised best practice in the area.’

The Data Seal of Approval Board

How to Appraise & Select Research Data for Curation

A Digital Curation Centre (DCC) and Australian National Data Service (ANDS) ‘working level’ guide by Angus Whyte (DCC) and Andrew Wilson (ANDS)
This guide will help you develop a managed approach to appraising and selecting datasets for curation. It provides working knowledge of current approaches, issues and challenges, and of the roles of research groups and institutional data services in addressing these. This guide should interest researchers responsible for managing data or who work in data-intensive fields, and those supporting them at research group level, or in institutional repositories, data centres or archives.

UK Data Archive self-assesment against the DSA

The Data Seal of Approval is an informal series of guidelines which digital repositories can use to self-assess their practices to ensure that they are aware of issues surrounding digital preservation and have practices in place to control against those issues.

The UK Data Archive has been involved in the creation of the Data Seal of Approval (DSA) because it believes that the complex international standards which currently exist in this area are not the best place for repositories to start to assess their practices. The DSA is a 'low-level' series of guidelines which does not replace more formal standards, both existing and forthcoming, but provides a way for smaller and more specialist organisations to reach a recognised benchmark in digital repository management.

The Archive has self-assessed against the DSA. An external organisation also checked that the self-assessment of the UK Data Archive was honest and reliable. Details of the assessment and the 'seal' itself will be posted in due course.

Twitter: Jen's Things

Just been doing a self-audit for the ADS using the Data Seal of Approval. Think we need to make more of our documentation public.

To SEAL a Contract: PANGAEA Goes back to the Basics

Back in 2009, the Alfred Wegener Institut (www.awi.de) considered how to approach the self-assessment of PANGAEA (www.pangaea.de), in order to be awarded the Data Seal of Approval. Immediately it became obvious, that some work would have to be done: Although PANGAEA has been and still is an innovative and noteworthy exemption in a landscape of otherwise mostly “Empty Archives” (Nature 461, 160-163 (2009), doi:10.1038/461160a), its development and operation has been based on common sense, good scientific and best IT-management practise, as opposed to more rigid long term policies, rules and contracts. It can be argued that without this approach PANGAEA could not have contributed so much to innovation in the field, during its ca. 15 years of existence.

However, certification such as the DSA necessarily requires stable (and versioned) policies, documentation, contracts with data producers and consumers. While this may not sound like an insurmountable problem, it poses problems in the case of PANGAEA, because it is developed and operated by departments of two independent institutions: The Alfred Wegener Institut (AWI) , a Helmholtz centre (www.helmholtz.de), which through its mission is a provider of large scale infrastructures, such as research ships, for its designated scientific community, and MARUM (www.marum.de), the centre for marine environmental science at the University of Bremen.

It turned out that, in order to write up enforceable policies and unambiguous assignment of responsibilities for individual tasks as described or stipulated through the DSA guidelines, a new cooperation agreement, pertinent to the shared responsibility for PANGAEA would have to be closed between AWI and the University Bremen, which would have to describe the mission and governance of PANGAEA. Based on this contract and its supervision structures, an organizational chart and allocation of duties to the departments of both institutions can be fleshed out in sufficient depth.

The draft of the cooperation agreement is on its way now, to be followed by a draft for the assignment of business duties by the end of this year, 2010.

The Alliance for Permanent Access aims to develop a shared vision and framework for a sustainable organisational infrastructure for permanent access to scientific information

At Luxembourg key players in the area of Audit and Certification signed a Memorandum of Understanding to set up a European Framework for Audit and Certification of digital repositories.

The signatories were David Giaretta, Henk Harmsen and Christian Keitel

For more information see the Repository Audit and Certification wiki

 

  • David Giaretta in his capacity as chair of the CCSDS/ISO Repository Audit and Certification Working Group (RAC),
  • Henk Harmsen in his capacity as Chair of the Data Seal of Approval (DSA) Board and
  • Christian Keitel in his capacity as Chair of the DIN Working Group "Trustworthy Archives – Certification

For more information see the Repository Audit and Certification wiki

The Language Archive

The Language Archive (TLA) is a unit of the Max Planck Institute for Psycholinguistics based on three essential pillars.

Data archive holding resources on languages and cultures collected by a wide variety of researchers in various countries worldwide.
Management and Access Tools developed and maintained with the help of a wide variety of projects.
Archiving and Software Experts in archiving and software development embedded in international collaboration projects.

Although the TLA will be primarily grounded on the research needs of the Max Planck Gesellschaft (MPG), Berlin Brandenburgische Akademie der Wissenschaften (BBAW) and Koninklijke Nederlandse Akademie van Wetenschappen (KNAW) it has an open policy. It participates in national and international projects and collaborations and contributes to the currently emerging eResearch infrastructures and is committed to advance and promote international standards that facilitate interoperability.

The primary goal of TLA is to store and preserve digital language resources, to give access to researchers and other interested users and to develop and integrate new technologies advancing language research. According to a UNESCO study about 80% of our recordings about languages and cultures are endangered and urgent measures are called for to safeguard this valuable data. In addition to this, like for almost all data-oriented research disciplines, it becomes apparent that the data management and preservation problem has not been solved yet. Therefore TLA will be open to all requests for depositing language related data if it is suitable material and will be made available for research purposes. In most cases data curation is required to transform data into open standards based formats and so increase the chances of long-term interpretability. TLA will develop and maintain advanced software to allow archive managers to organize and maintain a consistent and coherent digital archive. State of the art software will also allow users to easily create, access and enrich the data stored at TLA. Also TLA will develop or integrate cutting edge technology that introduces new computational methodologies to the study of languages. Deposits and access to the data must be based on clear legal and ethical statements and the archive must be subject to regular quality assessment procedures to show its reliability such as for instance to get the Data Seal of Approval.

Making Datasets Visible and Accessible: DataCite’s First Summer Meeting

Session 3: Trustworthiness of Data Centres: A Technological, Structural and Legal Discussion

Tuesday 8 June opened with a talk by Henk Harmsen, Data Archiving and Networked Services (DANS), who gave an overview of the DANS Data Seal of Approval [7]. The Seal of Approval is a minimum set of 16 requirements that DANS considers necessary for a responsible data centre, including three for data producers, three for data consumers, and ten for data repositories. Approval consists of a self-assessment, which must be made publicly available via the repository’s Web site, followed by a review by the Data Seal of Approval Board. Henk noted that self-assessment was simple to implement, taking no longer than a day. By September 2010, he added, a tool would be available to streamline the self-assessment process.

[7] DANS Data Seal of Approval Web site http://www.datasealofapproval.org/

Video: Session 3 / Trustworthiness of Data Centres - A technological, a structural and a legal discussion
The data seal of approval / Henk Harmsen, DANS
        » video (120 MB) / slideshow
source: http://datacite.org/news.html#n6%20%3Chttp://datacite.org/news.html#n6

Excerpt from DPC Annual Report 2008-2009

Excerpt from DPC Annual Report 2008-2009

The UK Data Archive has been involved in the evolution of the informal Data Seal of Approval (DSA) assessment process. DSA was established by a number of European institutions committed to durability in the archiving of research data. The DSA is not a formal certification, rather a series of guidelines which demonstrate best practice for organisations wishing to guarantee the durability of research data, as well as promoting goals relating to durable archiving in general. In relation to the DSA, the UK Data Archive was also represented at the European Commission sponsored ‘Digital Preservation of Scientific Information in a Trusted Environment’ workshop in Luxembourg, as part of the ongoing CESSDA-PPP project.

The UK Data Archive has also published a well-received booklet and website entitled Managing and Sharing Data: a best practice guide for researchers. The first printing of the booklet was “sold-out” within a couple of months, and a new edition is due to be published later in 2009. From a digital preservation point of view the booklet demonstrates that the research life-cycle and the digital preservation lifecycle need to be well integrated. However, it is designed to help researchers and data managers across all research disciplines and research environments make sure that research data are of the highest quality and have the greatest potential for longterm re-use. A programme of training in Data Management and Sharing was also established last year with the most popular events being those on consent, confidentiality and disclosure.

The UK Data Archive is playing a significant role in the JISC-funded Keeping Research Data Safe 2 project. This project will extend previous work on digital preservation costs for research data, including the original Keeping Research Data Safe study. Our main role to date has been to undertake a detailed review of the activity model published in the KRDS report and to carry out a detailed review of activity costs within the UK Data Archive. These cost data, along with data from other organisations, will also provide guidelines for other organisations which wish to produce their own cost profiles based on their own, often different, institutional mandates.

Minor amendments have been made to the UK Data Archive’s Preservation Policy (v.3.10) and it remains available on its website.

Work has been proceeding on the UKDA Secure Data Service. This new service will allow controlled restricted access of potentially disclosive microdata files to Approved (Or Accredited) Researchers, subject to various conditions of eligibility and purpose of use.