- The Texas Digital Library is a consortium of 20 institutions that offers its members virtual storage and data access, along with long-term preservation of digital collections and data management.
- This article highlights how the TDL addresses the specific needs and goals of four collections at three very different universities — Baylor, Trinity, and Texas A&M.
- As an exemplary consortial solution, the TDL's members work collaboratively to leverage best practices in curation, metadata, usability, and data integrity for the long haul.
Pattie Orr is vice president for Information Technology and dean of University Libraries at Baylor University. Debra Hanken Kurtz is executive director of the Texas Digital Library. Diane Graves is assistant vice president for Information Resources and university librarian at Trinity University.
The Lone Star State is known for its independence and individuality. Sometimes, those qualities extend all the way into higher education, where the state has faced Texas-sized challenges in creating collaborative projects not experienced in some smaller, more densely populated states. But despite perceptions of Texans' devotion to the go-it-alone pioneer spirit, the state has some remarkable examples of mutually beneficial collaboration across miles, time zones, and institutions of various sizes and types.
One such example is the Texas Digital Library (TDL), a shared-access infrastructure project launched in 2005 by a group of Texas research libraries.1 A though initially intended to support the needs of the state's PhD-granting institutions, the TDL has grown to become a significant collaborative solution to emerging challenges in digital curation and scholarship over the past decade. By pooling resources and expertise, the TDL has built a shared infrastructure that lets academic libraries offer access to digital scholarship and collections and facilitate emerging modes of academic publishing. The TDL is developing additional services — including preservation, storage, and data management — to meet the needs of a wider range of institutions.
Here, we describe how the TDL plans to address the needs of digital collectors (libraries) and the challenges of managing large data sets (for researchers) at three very different institutions, each with unique needs and goals.
Case Studies: Three Institutions, Four Collections, Countless Stories
The TDL consortium currently includes 20 academic institutions of varying sizes, missions, and collection strengths. Here, we offer a representative sample of TDL institutions by highlighting three universities and their projects.
Waco-based Baylor University is Texas' oldest continuously operating university, a nationally ranked private Christian university and research institution, and one of TDL's pioneering members. With more than 15,000 students, it offers undergraduate, graduate, and professional degrees and certifications.
Archival collections spanning the 19th and 20th centuries present unique preservation challenges to the Baylor Libraries, in part due to their significant size and variety of formats. Perhaps most significantly, Baylor's collections sometimes pose geographic challenges, particularly when it comes to fostering collaborations with researchers located around the world, as in the case of the Browning Letters Project.
Since 1918, Baylor has been home to a significant portion of the famous correspondence between husband-and-wife Victorian poets Robert Browning and Elizabeth Barrett Browning (figure 1). The Baylor collection is housed at the Armstrong Browning Library, which is the library of record for the Brownings and includes a museum and 62 stained glass windows that illustrate scen es from the poets' works.
Source: Armstrong Browning Library
Figure 1. Part of Browning collection love letter
When it comes to researching the Brownings' copious correspondence, a geographic challenge faces Baylor librarians and Browning scholars everywhere: the Brownings' extensive collections of letters are scattered across multiple institutions around the world. Several other institutions aside from Baylor hold portions of the correspondence: Wellesley College in Massachusetts, The Harry Ransom Center at the University of Texas at Austin, Ohio State University, and, in the UK, the Balliol College and Bodleian Libraries at the University of Oxford. Today, nearly 100 years after Baylor acquired its collection of letters, the project is bringing the entire Browning correspondence back together — virtually. Holdings in Texas and the letters physically housed at other institutions are now readily available via the Browning Letters Project, a high-visibility collaboration presented via Baylor's Digital Collections. Baylor's membership in TDL enhances collaboration between the universities and provides a logical long-term storage solution.
More recently, Baylor Libraries, working closely with a faculty scholar and collector, undertook a new project: the development of the Black Gospel Music Restoration Project. Featuring digital copies of vinyl recordings (figure 2), cassettes, and reel-to-reel tapes, some quite rare, the archive provides worldwide access to America's unique history of gospel music and its wide range of performers. The TDL enables this resource of large, often unique, audio files to be securely stored and made discoverable to researchers, performers, and fans.
Source: Gospel Music Restoration Archive
Figure 2. Vinyl recording of "Old Ship of Zion" gospel song
San Antonio's Trinity University is a small, private, primarily residential liberal arts school. With a full-time enrollment of less than 2,500 students, its ability to support large research collections is limited by mission, if not by resources. Still, the college has strong academic programs and close ties to its home city and local history. Undergraduate research is a strong focus, and faculty members welcome opportunities for their students to engage with unique materials.
For example, the university's Claude and ZerNona Black Papers shed light on the civil rights movement in South Central Texas. Spanning the 20th century and culminating in the early 21st, the "personal papers of the Reverend Claude William Black, Jr. and his wife ZerNona Stewart Black document their civil rights, community activism, and Baptist ministry activities." The collection includes church records, correspondence, printed content, audio and visual recordings, and sermons. (Figure 3 shows a photo from the collection of the March to Zion.) The collection not only enriches historians' understanding of the midcentury civil rights struggle, but also demonstrates the significance of black churches in the Southwest in achieving racial equality.
Source: Claude and ZerNona Black Papers
Figure 3. Photo of the March to Zion
TDL storage provides a secure, affordable solution that would otherwise be an out-of-reach resource for Trinity. The university's participation in the TDL will enable its students and researchers to experience this little-known chapter in the history of the civil rights movement.
Texas A&M, College Station
A member of the Association of Research Libraries (ARL), Texas A&M University (TAMU), College Station, has a strong applied research mission. Described by U.S. News and World report as "an academic and athletic powerhouse in central Texas,"2 TAMU has more than 40,000 students. The school offers undergraduate, graduate, and professional programs and serves as the flagship campus of a system with locations throughout the state. Four of the state's five ARL member institutions — TAMU, College Station; the University of Texas at Austin; the University of Houston; and Texas Tech University — founded the TDL and subsidize TDL membership for their system schools.
The TAMU Libraries Office of Scholarly Communications is collaborating with the TDL to define the requirements for a Texas data repository. Given emerging federal mandates that require public access to research publications and data derived from federally funded research, TAMU anticipates needing a data repository that can curate and preserve a large variety of digital data sets, that can vary from a few megabytes to terabytes. In a 2013 survey of TAMU faculty, 21 percent of the researchers estimated that their research has generated more than a terabyte of data over the last five years.3 The TAMU Libraries plan to develop research-curation systems and workflows that serve to not only make research publicly accessible but also to support interdisciplinary research addressing society's grand challenges. The team hopes to enhance interdisciplinary research among a loosely coupled network of researchers working on related scholarship by enabling efficient sharing of research data and associated scholarly products before those materials are made publicly accessible. The TAMU Libraries team is using the TDL project to test this hypothesis through pilot projects with researchers at Texas A&M, focusing on interdisciplinary fields such as energy resources and sedimentary systems.
The pilot projects will collaborate with TAMU researchers studying the Eagle Ford Shale, a system being fracked in Texas, and address sustainability; figures 4 and 5 show the project's size and scope. The research will produce a wide range of digital data, including images, numerical data sets, digital maps, and three-dimensional seismic data sets.
Source: Energy Information Administration
Figure 4. The Eagle Ford Shale, which produces both oil and gas in Texas
Figure 5. Alternate map of the Eagle Ford Shale
Addressing Unique Collections and Common Needs
Despite their diversity, these projects present a common challenge for their host institutions: all must identify ways to store content securely for the very long term.
The difficulties inherent in long-term storage of digital content give librarians, data managers, and researchers the worst nightmares. Illustrative tales of data lost to obsolete formats (think HyperCard or even floppy disks) remind us that data and content must migrate as standards and formats change. Newer technologies such as DVDs and CDs also have short shelf lives. The National Archives reports that the life expectancy of these technologies is only two to five years, although they are cited as having up to five times that of shelf life.4 Although central campus providers offer large-scale storage quantified in petabytes at cost-effective prices, these solutions provide no real preservation service other than daily backups.
Even in a stable standard, data can corrupt, while unstable data corrupt absolutely. The TDL seeks solutions that will help its member institutions manage and preserve collections and data placed there — without putting additional burdens on the members' local IT staff or infrastructure.
In many cases, it is not possible for all relevant material to coexist physically, but each collection gains value as a digital whole. (The Browning collections, with their wide geographic distribution, offer the most significant example of virtual coexistence's potential.) For decades, scholars have had to find travel funds to visit repositories in order to use their collections. If unaware of their existence or lacking the travel funds required to view them, researchers often missed relevant materials held in distant repositories.
Technologies now offer libraries, archives, and museums — as well as the institutions that house them — opportunities to make their unique materials widely accessible to the general public and researchers alike. With that potential, however, come new challenges. Although local archivists make every effort to preserve and conserve original formats, fragile media — such as that housed in Trinity's Black collection and Baylor's Black Gospel recordings — often prevent easy access, even by on-site users. (Figure 6 shows cassette tapes holding audio recordings from the Claude and ZerNona Black Papers.) Trinity found itself challenged not only by the materials' weak condition but also by the need to maintain and repair the kinds of equipment needed to use the materials. How many of us still have projectors that show Super 8 film or players for small reel-to-reel tapes, much less their replacement parts and lamps? How many old turntables and styluses must Baylor keep and maintain — and for whose use (have you ever seen a millennial try to use an LP turntable)? And, while streaming provides a fabulous means of sharing current-semester video and audio content for curricular support, streaming storage for archival purposes is cost-prohibitive for even the wealthiest institutions.
Figure 6. Cassette tapes of audio recordings (Claude and ZerNona Black Papers)
Like these rare and archival materials, data sets present their own brand of preservation challenges. Too often, they can be difficult to use or access by anyone other than the researcher who created them. Data sets require appropriate software and operating systems — sometimes past end of life or soon to be — to make their content accessible. Providing this multitude of access technologies quickly becomes difficult to scale. Many repositories instead require contributors to submit tab-delimited or similarly standard formats that preserve only the raw data. Texas A&M's interdisciplinary project will generate massive amounts of raw data. Those information pieces must be coded in a standardized way and use a consistent metadata scheme for indexing and access. Although the software used for analysis might be state of the art in 2014, this will not be true for long, thus begging the question of data portability and readability. With that question comes the issue of data integrity and protection from corruption or loss over time.
Providing Collaborative Solutions
These two challenges — the long-term preservation of digital assets and the management of research data — represent significant needs across the spectrum of TDL academic libraries. Perhaps more significantly, it makes sense for the TDL to tackle these problems consortially, as a cooperative approach and a shared infrastructure make solutions to these problems possible, affordable, and desirable for many different types of libraries.
The TDL's leadership decided on a two-pronged approach to the preservation challenge. The first piece, currently in implementation, relies on DuraCloud as an immediate solution. Based both on its existing relationship with the parent company, DuraSpace, and on the product's record, TDL decided that DuraCloud's consortial offering and affordability relative to other options made it the logical choice as a first step for its members.
Members can upload content into DuraCloud in one of three ways:
- The sync tool lets systems administrators write scripts that will ingest content from predetermined network directories.
- The intuitive web interface lets users designate content for public viewing.
- For large and/or complicated collections, members can work with TDL systems staff to move content to DuraCloud via command line scripting.
Out of the box, DuraCloud provides five preservation services:
- The duplicate on change service synchronizes content between storage providers.
- The bit integrity checker verifies that content held within DuraCloud has maintained bit integrity.
- The upload tool is a graphical interface for adding files to DuraCloud.
- The retrieval tool is a command line utility for transferring content in DuraCloud to local servers.
- The chunker and stitcher tools are command line tools for breaking up and reconstituting large files retrieved from DuraCloud.
The second piece, still in the planning stages, is participation in the Digital Preservation Network. The DPN's advantages include its strong redundancy: multiple content copies will be maintained in independent, geographically dispersed repositories. Built by and for the academy, and in recognition of its needs, DPN seeks to preserve the scholarly and cultural heritage record in perpetuity. Each node will negotiate agreements with contributing organizations; these agreements include succession rights to ensure that content remains available through the network "in the event of dissolution or divestment of content by the original depositor and/or archive."5
The TDL has partnered with University of Texas Libraries (UT Libraries) and the Texas Advanced Computing Center (TACC) to create the Texas Preservation Node (TPN), one of the DPN's first five nodes. The TDL will use its DuraCloud preservation service as a staging area to prepare content for ingestion into DPN. The TDL will work with UT Libraries Office of Fiscal Services to negotiate contracts with depositors that include succession rights agreements. The TDL has formed a preservation working group from the membership to determine policies and best practices for preparing content preserved in DuraCloud. The goal is to ensure content is packaged and described in a way that will facilitate DPN's retrieval of a copy whenever the contributing member wants such a copy made.
Ultimately, the preservation service's two prongs are not mutually exclusive; rather, the DuraCloud option informs how the DPN option will be implemented and helps build momentum for its adoption in Texas.
Academic institutions and their IT and library leaders increasingly find themselves grappling with legitimate but challenging demands regarding rights and accessibility in relation to storage, support, and access to research data sets. The TDL is no exception; participants realized that they needed to develop solutions that could benefit TDL members and provide relief from the expense, overhead, and other demands associated with data management at the institutional level.
The TDL thus formed a working group, chaired by Bruce Herbert, a distinguished faculty member in geology and director of digital services and scholarly communications for the University Libraries at TAMU. The group's task is to identify and assess tools and recommend best practices for metadata application, preparation and ingestion of data sets, and compliance reporting. Initially, +working group members hoped that DSpace, the open-source institutional repository software that TDL hosts, would accommodate data. However, preliminary testing of the most recent version (4.1) revealed that the interface required too many steps to facilitate easy adoption by faculty.
The working group thus surveyed the existing landscape for other tools; it is currently investigating Dataverse, the Purdue University Research Repository (PURR), and Figshare as possible homes for research data. The first two options are open source and can be implemented at TACC or hosted at their home institutions (Harvard and Purdue, respectively). The working group will assess Dataverse and PURR for ease of use, including ingestion, citation, and ability to share the data. TDL systems staff has installed the two tools to determine what is required to maintain the service for its 20 members and to provide a test environment for the working group.
The group will also investigate Figshare, the only commercial solution. Figshare is hosted in Amazon's cloud service, which could integrate well with TDL's DuraCloud preservation service; it also has some visualization and compliance reporting features. In addition, Figshare offers EZID licenses as part of its subscription. EZIDs are unique, persistent identifiers that you can use for anything: data, text, and even physical objects. Figure 7 shows a TAMU student accessing digital resources.
Figure 7. TAMU student using digital resources, including Vireo, at a University Libraries' public
Conclusion: The TDL as a Distinct Response
Many, if not most, higher education and research institutions grapple with questions around archival storage, big data access, security, preservation, and portability. TDL's member institutions could have tackled such questions on their own, but the advantages of a consortial solution impelled them to work collaboratively. By doing so, they not only gained the expertise of IT and library professionals from each institution, they also received value that far exceeds virtual storage. The TDL draws on its members' strengths to provide services that include best practices in curation, metadata, usability, and data integrity — for the long haul. By working together, TDL members seek to preserve the scholarly records of Texas (and beyond) for future generations.
- "Texas Institutions to Develop Joint Digital Library," Tech Watch, EDUCAUSE Review, vol. 40, no. 5 (2005), 6.
- "Texas A&M University—College Station," Education Rankings and Advice, Best Colleges, U.S. News & World Report, 2014.
- Dr. Bruce E. Herbert, personal communication, Dec. 11, 2014.
- "Frequently Asked Questions (FAQs) about Optical Storage Media: Storing Temporary Records on CDs and DVDs," Records Management, Initiatives, U.S. National Archives and Records Administration [http://www.archives.gov/records-mgmt/initiatives/temp-opmedia-faq.html].
- "The Case for Building a Digital Preservation Network," EDUCAUSE Review Online, August 5,2013.