Jump to: navigation, search

A library uses LOCKSS software to turn a low-cost PC into a digital preservation appliance called a LOCKSS Box that performs the following four functions:

  • It collects content from the target web sites using a web crawler similar to those used by search engines.
  • It continually compares the content it has collected with the same content collected by other LOCKSS Boxes, and repairs any differences.
  • It acts as a web proxy or cache, providing browsers in the library's community with access to the publisher's content or the preserved content as appropriate. It can also serve content by Metadata (Open URLs) via resolvers.
  • It provides a web-based administrative interface that allows the library staff to target new journals for preservation, monitor the state of the journals being preserved, and control access to the preserved journals.

Contents

[edit] Collecting

Before a LOCKSS Box can preserve e-content, the following must happen:

  • The publisher has to give permission for the LOCKSS system to collect and preserve the journal. They can do this by adding a page to the journal's web site containing a permission statement, and links to the issues of the journal as they are published.

Image:LOCKSS_howitworks1.png

  • The LOCKSS Box has to be told where to find this page, and how far to follow the chains of web links. This is accomplished using a LOCKSS Plugin, which is a list of parameters specific to each publishing platform, supplied by the Stanford University LOCKSS team. The Plugin is distributed automatically to authorized LOCKSS Boxes.

[edit] Preserving and Auditing

LOCKSS Boxes at libraries around the world collect content directly from the publisher’s web site.


Image:LOCKSS_howitworks2.png


LOCKSS software then compares the copies stored in libraries' LOCKSS boxes to the content available on the publisher's website to establish the content’s authoritative version. This version is used to repair lost of corrupted copies.


Image:LOCKSS_howitworks3.png


The LOCKSS Boxes then communicate over the Internet to continually audit the content they are preserving. If the content in one LOCKSS Box is damaged or incomplete, that LOCKSS Box will receives repairs from the content based on other LOCKSS Boxes. This cooperation between the LOCKSS Boxes avoids the need to back them up individually. It also provides unambiguous reassurance that the system is performing its function and that the correct content will be available to readers when they try to access it. The more organizations preserve given content, the stronger the guarantee that they will all have continued access to it.

[edit] Providing Access

Authorized readers from an institution can access LOCKSS stored and preserved content when a publisher is not available for any reason (subscription canceled, network traffic, publisher server down). Library readers have perpetual, seamless access to content for as long as the LOCKSS Box is maintained.

Content served from a LOCKSS Box will look the same as content served from the publisher with one important exception. If the publisher's site is unavailable, content that normally changes whenever the reader presses the browser "reload" button (for example, ads) will instead be constant.


Image:LOCKSS_howitworks4.png


Readers access content from an institution's LOCKSS box via one of two methods when the original web site is unavailable.

  • Method One: LOCKSS Boxes provide transparent access to the content they preserve. Institutions often run web proxies, to allow off-campus users to access their journal subscriptions, and web caches, to reduce the bandwidth cost of providing Web access to their community. Each institution integrates their LOCKSS box into these systems, intercepting requests from the community's browsers to the journals being preserved. When a request for a page from a preserved journal arrives, it is first forwarded to the publisher. If the publisher returns content, that is what the browser displays. Otherwise the browser displays the preserved copy.
  • Method Two: The LOCKSS team is implementing infrastructure so libraries can deliver content to users by making that content a target for journal-level SFX resolution.

For a visual overview of how LOCKSS provides access, see Providing Access or download the PDF version.

[edit] Administering

Library staff administers their LOCKSS Box via a Web user interface. The interface enables new content preservation, monitors the preservation of existing content, controls access to the appliance, and a wide variety of other functions. The LOCKSS Team at Stanford University provides technical support.

[edit] Format Migration

The LOCKSS network preserves the content in its original format and dynamically migrates the content to a newer format, if required, when a reader requests the preserved content. This approach, called, Migration On Access, offers at least five significant advantages.

1. Preserving the content in its original format satisfies archival requirements. It also allows the LOCKSS system to be frugal with storage space. We know of no preservation system that discards the original bits after migrating them to a new format. Migrating and keeping both the original and the migrated copy multiplies the storage requirements for a preservation system by the number of migrations.

2. Preserved content is migrated by the most recent, and presumably best, technology available at the time the reader requests access.

3. Preserved content is rarely accessed. Performing migration only when and if it is needed reduces the resource cost

4. Content can be migrated directly from the original to the current format, minimizing the effects of format conversion artifacts.

5. The format converters, once developed, can themselves be preserved to document the original format.

Here’s a simple description of how Migration On Access works in the LOCKSS system.

Every piece of Web content is in a format (pdf, flash, html, etc.). Eventually, browsers might no longer be able to display content in a particular format. When a reader requests content that is preserved in a LOCKSS box, and the reader’s Web browser cannot display the content, LOCKSS migrates the content ‘on the fly’ and delivers the content to the reader.

Every request from a Web browser to a Web server includes a list of formats that the browser is capable of displaying, each with a numeric preference value. A browser that can no longer display an old format will indicate this by setting the preference value to zero.

If a LOCKSS box receives a request for some preserved content but the browser indicates in this way that it cannot display the preserved content in its original format, the LOCKSS box selects a converter that can convert the original format to one of the formats that the browser has indicated it can display (by setting the preference value to something greater than zero). The LOCKSS box uses the converter to create a temporary access copy, which it delivers to the browser.

For additional information see http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html

[edit] OAIS

LOCKSS software is based on Association of Computing Machinery (ACM) award-winning technology. It provides an OAIS-compliant, OAIS Formal Statement of Compliance open source, peer-to-peer, decentralized digital preservation infrastructure. It is format-agnostic, preserving all formats and genres of web-published content, provided the content has an authoritative version. The intellectual content, which includes the historical context (the look and feel), is preserved. Content preserved by libraries in their LOCKSS Box becomes a part of their collection, and they have perpetual access to all of it.

[edit] Auditing

Audit processes have the potential to reveal how well a system is meeting its stated goals. The LOCKSS system has been, or is currently being evaluated by: (1) Librarians who are using the LOCKSS system to preserve content; (2) the Council of Library Resources; and (3) the Library of Congress.

Librarians audit their LOCKSS box via a web page interface. On that page, the preservation status of the content in that local LOCKSS box is displayed.

The CRL is becoming a digital repository certification agency; a CRL TRAC test audit of LOCKSS can be found at http://www.crl.edu/content.asp?l1=13&l2=58&l3=142&l4=71

The Library of Congress and the LOCKSS Program staff are working together to complete the Certification and Accreditation process under the Federal Information Processing Standard for categorizing security risks of federal information and systems (FIPS 199). This NTIS, Computer Security Division standard is available at http://csrc.nist.gov/publications/fips/fips199/FIPS-PUB-199-final.pdf