The Need to Formalize Trust Relationships in Digital Repositories

min read
E-Content

© 2008 Fran Berman, Ardys Kozbial, Robert H. McDonald, and Brian E. C. Schottlaender. The text of this article is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/).

EDUCAUSE Review, vol. 43, no. 3 (May/June 2008): 10–11

The Need to Formalize Trust Relationships
in Digital Repositories

Fran Berman, Ardys Kozbial, Robert H. McDonald, and
Brian E. C. Schottlaender

Fran Berman is Director of the San Diego Supercomputer Center and Professor and High Performance Computing Endowed Chair in the Department of Computer Science and Engineering at the University of California, San Diego. Ardys Kozbial is Technology Outreach Librarian at the University of California, San Diego Libraries. Robert H. McDonald is Director of Strategic Initiatives in the Digital Preservation Initiatives Group at the San Diego Supercomputer Center. Brian E. C. Schottlaender is The Audrey Geisel University Librarian at the University of California, San Diego.

Comments on this article can be sent to the authors at [email protected], [email protected], [email protected], and [email protected] and/or can be posted to the web via the link at the bottom of this page.

Many disparate groups—data managers, university administrators, computer scientists, technology educators, and librarians—are concerned about the deluge of digital data brought about by the Information Age. And well they might be. An EMC-sponsored research team from International Data Corporation (IDC) posits that 281 exabytes (281 billion gigabytes) of digital information existed in the world in 2007 and that by 2011, the aggregate amount of digital data will be 1.8 zettabytes (1,800 exabytes).1

The unrelenting increase in both the volume of digital data and its primacy in modern life, work, entertainment, and scholarship, combined with the challenge of developing and supporting adequate data infrastructure, requires new methods for meeting the long-term management, stewardship, and preservation requirements of data in the Information Age. The diversity of constituencies concerned with digital preservation, coupled with the current general need for reliable resources for preservation infrastructure, suggests that collaborative relationships that cross institutional and sector boundaries will provide important and promising ways to deal with the data preservation challenge. These collaborations have the potential to spread the burden of digital preservation, create the economies of scale needed to support it, and mitigate the risks of data loss.

Piloting a Data Preservation Grid with Chronopolis

Successful preservation partnerships require that roles and responsibilities of the partners be clearly defined. The Chronopolis pilot, a collaboration among the San Diego Supercomputer Center (SDSC), the University of California, San Diego Libraries (UCSDL), the University of Maryland Institute for Advanced Computer Studies (UMIACS), and the National Center for Atmospheric Research (NCAR), with support from the Library of Congress (LC), provides a model that collaborators might employ to formalize trust relationships among preservation partners.2

Conceptually, Chronopolis is a data grid configured for the purpose of using data replication for data preservation.3 The intent of the data grid is to provide a distributed, trustworthy repository that contains multiple copies of valued data collections, with varying degrees of access to those collections at each participant site. Each participant in Chronopolis can play any or all of several different roles relative to each data collection and can serve different roles for different data collections.

The Chronopolis data grid hosts at least three geographically distinct copies in order to protect data. Conceptually, Chronopolis is technology-independent and seeks to use the best-suited and most appropriate software for each component available. The pilot uses an integrated software system to support the data grid, incorporating, but not limited to, the Storage Resource Broker (SRB)4 and the Producer-Archive Workflow Network (PAWN).5

The roles of participants within the Chronopolis Pilot include the following:

  • Users. Users utilize Chronopolis resources and services for data management and preservation of their data collections. Users may provide access to their constituencies through the Chronopolis environment or through their own institution.
  • Partners.Partners support the installation of servers for Chronopolis at their sites, register their data collections into Chronopolis, and use the Chronopolis environment to replicate their data collections.
  • Providers.Providers constitute the federated Chronopolis data grid and play specific roles for a given collection: Core Center, Replication Center, or Deep (Write Once) Archive. Providers deploy distributed storage infrastructure at their sites and work as a team to provide infrastructure for preservation tools and services.

The Chronopolis pilot currently includes users (the California Digital Library and the Inter-university Consortium for Political and Social Research), partners (CDL and ICPSR), and providers(SDSC, UCSDL, UMIACS, and NCAR), who are together laying the groundwork for further collaboration. Additional participants will be added laterin 2008. The pilot provides an opportunity to explore collaboration across institutional boundaries and to establish expectations for all roles (user, partner, and provider).

Building Formalized Trust

In order for Chronopolis participants to work together with clear expectations of outcomes, notions of trust must be embodied formally. David Maister, Charles Green, and Robert Galford posit four components of trust: (1) credibility; (2) reliability; (3) intimacy; and (4) self-interest.6 Of these four components, self-interest is weighted most heavily because the authors believe this is where the greatest risk for breaching trust lies. In credible and reliable collaborations, there can be enough intimacy to share information. What is done with this shared information reflects self-interest.

In the case of Chronopolis, self-interest is shared among collaborators because the data collections hosted by Chronopolis are replicated and geographically distributed between multiple collaborators. Thus, any single institution’s important information becomes its collaborator’s important information and vice versa. The identification of data collections to be ingested into Chronopolis is thus driven both by enlightened self-interest and also by an interest in preserving a collaborator’s digital information.

In the business world, trust is usually enforced by a contractual agreement tied to monetary incentives or penalties. In the higher education domain, trust is more informal and is generally the product of personal relationships rather than formal agreements. But a federated preservation environment demands more—namely, formalization using policy-based trust mechanisms.

In the Chronopolis pilot, each participating institution must have a formal trust relationship with the others. The nature of these relationships depends on the roles the participants play with respect to one another on the data grid; if they play multiple roles with respect to one another, they have multiple relationships. General trust relationships among the collaborator institutions are currently formalized via Memoranda of Understanding (MOUs), and service-oriented trust relationships are formalized via Service Level Agreements (SLAs). These types of agreements provide specifications of the expectations and commitments required to work together closely and successfully.

For example, SDSC and the UCSDL developed both an MOU and an SLA to specify their interactions. Although both institutions reside on the same campus (UCSD), they are separate organizational entities with different administrative reporting relationships. SDSC and UCSDL’s MOU describes each institution’s intent to fulfill common goals in building the preservation environment described in the Chronopolis pilot and to share the experience needed in order to build grid-based storage architecture and metadata for digital preservation. This agreement is complemented by a more specific SLA that provides support for running the UCSDL’s production instance of SRB at SDSC. The SLA outlines the participation of each entity and specifically states the requirements necessary for operating such a production storage environment. This agreement has been in effect since 2003, with a renewal in 2006 and options for each three years going forward. SDSC also has separate MOUs with UMIACS and NCAR, reflecting their individual interactions with respect to Chronopolis. At this juncture, these agreements are being extended to cover all of the Chronopolis pilot collaborations and will provide the basis for a more permanent set of formalized trust agreements.

Formal trust agreements can create the foundation on which certification for trustworthy digital repository status will be built. Sections of both the CRL/OCLC/NARA Trustworthy Repositories Audit & Certification (TRAC) document (http://www.crl.edu/PDF/trac.pdf) and the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) Toolkit (http://www.repositoryaudit.eu/download/) are devoted to describing appropriate institutional governance and formal trust criteria for the parties involved with the sustainability and governance of trustworthy digital repositories. Chronopolis is currently undertaking an audit of pilot participants in order to include such criteria in its MOUs and SLAs, providing an appropriate version of trust for data-grid participants.

Conclusion

The successful preservation of valuable digital assets will require the expertise and collaboration of many institutions, both public and commercial, to help craft the reliable, economically sustainable, and trusted environments necessary for housing, managing, and ensuring our global knowledge over time. With successful data preservation and access as the ultimate objectives, the implementation of structural mechanisms such as MOUs and SLAs can be a means by which independent trust is achieved.

Notes

1. John F. Gantz (Project Director), “The Diverse and Exploding Digital Universe,” March 2008, http://www.emc.com/digital_universe.

2. We are grateful to our colleagues on the Chronopolis pilot (including David Minor, Reagan Moore, Chris Jordan, Richard Moore, Arcot Rajasekar, Luc Declerck, and others), as well as our collaborators at CDL, ICPSR, UMIACS, NCAR, and LC, for their help and support with this article.

3. R. W. Moore, F. Berman, D. Middleton, B. Schottlaender, J. Jaja, and A. Rajasekar, “Chronopolis: Federated Digital Preservation across Time and Space,” in Local to Global Data Interoperability: Challenges and Technologies (Piscataway, N.J.: IEEE, 2005), http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1612488.

4. See the SRB main page on the SDSC website: http://www.sdsc.edu/srb/index.php/Main_Page.

5. See the “Overall Architecture and Main Components” page on the ADAPT website: http://www.umiacs.umd.edu/research/adapt/architecture.html.

6. David H. Maister, Charles H. Green, and Robert M. Galford, The Trusted Advisor (New York: Free Press, 2000).