Inside CDL

UC Libraries and HathiTrust: Partnership Details

  1. Why is UC participating in the HathiTrust?
  2. How will the HathiTrust be governed?
  3. What services will the HathiTrust offer users?
  4. What services will be offered in the future?
  5. How is HathiTrust different from Google Book Search?
  6. How does UC’s Digital Preservation Program fit in?

1. Why is UC participating in the HathiTrust?

  • To provide an academic counterweight to Google and Open Library by assuming direct management responsibility for our digital objects and data. By participating in the project, UC has the opportunity to partner with a major set of colleague institutions to create a shared service for access to mass digitized materials that is ambitious in scope, optimized for use by students and scholars, and designed with long-term preservation in mind.  It gives us the opportunity to collaborate with partners in technological development and innovation, combining the best of UC/CDL and University of Michigan/Committee on Institutional Cooperation digital preservation and curation expertise.
  • To exploit the full potential of aggregation, especially for public domain, providing greater service to users through combined content and access.  Users will be able to access UC books that have been digitized by Google and the Internet Archive (IA) through a single platform.  In addition to UC materials, users will be able to access unique content from other partners, creating a large-scale service that none of us could build independently.
  • To accelerate full text access to UC mass digitized content.  A large number of UC’s digitized Google files currently reside at UC and efforts have focused on managing the files, embedding links to Google Book Search within systemwide CDL services such as the Melvyl Catalog and the WorldCat Local pilot project, and planning for integration with bibliographic records in OCLC.  The intent has been to develop alternative access services in the future.  Now, by participating in the HathiTrust, we will be able to deliver full text searching and other services on an accelerated time frame.
  • To help fulfill the promise of preservation.  HathiTrust creates a network of cross-organizational partners that extends our overall reach, and balances the technical standards we want to promote with a healthy technical and geographic diversity necessary for a robust preservation infrastructure.
  • To build a sustainable model for both UC and CDL services including preservation, access, collection management, and resource sharing.  The ability to de-duplicate mass digitized files across partner institutions will lead to lower overall storage costs.  Shared access and preservation services with multiple partners should also lead to lower overall costs.  In addition, we could pursue cost avoidance for physical collections through cooperative shared print initiatives.  This will require more experience with and analysis of the utility of digital surrogates.

2. How will the HathiTrust be governed?

3. What services will the HathiTrust offer users?

The University of Michigan’s MBooks repository forms the core of the current services, which will evolve in a more collaborative way under the new organization. One benefit to UC is that we could immediately take advantage of the services already deployed within the existing infrastructure, including:

  • Full text search. Consolidated full text indexes for all content in the repository point to downloadable full text where available, or to local copies and Google Book Search copies. Full text indexing and discovery could presumably be optimized for research and scholarly use in ways that Google’s services are not. The University of Michigan is currently experimenting with SOLR (an open enterprise search server) for large-scale search.
  • Online viewing. Page-turner application for online viewing where permitted by copyright (to be redeployed and further developed via an API).
  • Print on demand. The University of Michigan currently offers print on demand services to local users of its public domain content in HathiTrust.  There may be potential for this capability to be extended to UC’s public domain content in the future.
  • Access for print-disabled users. These services are also slated to be redeployed via an API.
  • Copyright management and rights clearance. The University of Michigan has already developed an infrastructure for researching and documenting copyright status.
  • Preservation certification. Certification is provided by Trustworthy Repositories Audit and Certification (TRAC), the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA), which is in progress, and Preservation Metadata Implementation Strategies (PREMIS) conformance.
  • Bibliographic data distribution. A range of options for bibliographic data distribution include OAI, Z39.50, and APIs.

Back to Top

4. What services will be offered in the future?

Future services could include:

  • Shibboleth for authentication.
  • APIs to enable local development by HathiTrust partners.
  • Institutional localization and/or branding.
  • Anything else the partners jointly envision.

The HathiTrust Functional Objectives provide a look at current development plans: http://www.hathitrust.org/objectives

5. How is HathiTrust different from Google Book Search?

HathiTrust provides a new platform for the expert curation and consistent access to materials long associated with research libraries. The trust and reliance developed over decades in providing essential print collections will extend to HathiTrust as a valued source for scholarly materials. In addition, the HathiTrust provides the following benefits:

  • Access to content not in Google Book Search, such as digital collections unique to each institution, providing unified full text discovery and access with a common user experience.
  • Ability for partners to build tools and develop standards for distributed discovery and access across similar repositories.
  • Preservation and digital curation services.
  • Copyright research and the ability to surface public domain content that Google may continue to restrict.
  • Services better optimized for scholarly use, including the association of accurate bibliographic data such as date of publication for journal volumes with the correct digitized files.
  • A test bed for development that we would not otherwise have access to, providing UC with a greater pool of content for experimentation.

6. How does UC’s Digital Preservation Program fit in?

The UC Digital Preservation Program (DPP) has a full range of digital preservation and curation expertise that complement and enhance the HathiTrust partnership. The DPP intends to maintain, augment, and extend our existing preservation and digital curation services and systems through the following:

  • The DPP is committed to preserving the digital assets that support UC’s research, teaching, and learning missions (e.g., UCTV, web archived content, UC library collections, eScholarship, electronic theses and dissertations, etc.) in addition to mass digitized content.
  • The DPP will continue to work with campus partners on an evolving set of digital curation services that are designed to meet UC’s unique needs.
  • As a HathiTrust partner we will be directly involved in creating a common preservation infrastructure that will leverage our core technologies.
  • Many of UC’s mass digitized books, like our licensed collections in Portico, are non-unique and amenable to a shared solution in which the deduplication of materials held in common can be considered, allowing the DPP to focus its repository efforts on unique UC assets and the increasing pressure to curate and preserve complex digital objects from museums and scientific laboratories.

Back to Top