[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
2nd Open Archive Initiative Workshop at CERN

To: Multiple recipients of list <epc-l@iucr.org>
Subject: 2nd Open Archive Initiative Workshop at CERN
From: Brian McMahon <bm@iucr.org>
Date: Mon, 21 Oct 2002 09:31:40 +0100 (BST)
Attached a report and commentary on the meeting I have just been to in
Geneva. 
 - Brian

Second Workshop on the Open Archives Initiative: Gaining Independence
with e-Prints Archives and OAi

Geneva, 17-19 October 2002


The Open Archive Initiative (OAi) is a movement growing out of the
successful implementation of the preprint server initially for high-energy 
physics (arXiv.org), and a desire to disseminate scholarly publications
such as theses, conference proceedings and courseware from the web servers 
of the institutions sponsoring such publications. The possibilities of
self-publication of refereed articles are also attractive, especially
where library budgets are continually squeezed in the face of
ever-increasing subscription rates by what are considered high-cost
commercial publishers.

Arising from previous meetings, the Protocol for Metadata Harvesting
(OAI-PMH) is a technical method for disseminating descriptive metadata
about resources that an institution wishes to advertise or make
available. Such resources are typically e-prints (theses, conference
proceedings, informal reports, preprints), but might be anything at all
(database records, physical museum specimens...). The protocol is designed 
to allow a client (the "harvester") to query a server (the "data
provider") for the metadata formats it supports, and for individual
records or sets of records in the desired formats. As a base level for
interoperability, servers are required to provide metadata conforming to
the Dublin Core standard, but the negotiation facility allows metadata of 
arbitrary complexity to be exchanged between capable servers and clients.

Version 2.0 of the OAI-PMH standard has just been released, as a stable
release version built on the experience of pilot schemes running the
earlier version 1.0.

It is expected that service providers will emerge who collect the metadata 
offered by contributing institutions, and who then may layer value-added
services on top of the harvested metadata to link together and impose
organisation on the distributed resources made visible in this way. One
element of the protocol designed to facilitate this is the <friends>
container type which can point to related repositories.

The majority of presentations at this workshop concentrated on the
establishment of institutional repositories of digital documents
interconnecting via the OAI-PMH. In most cases, the institution would
be a University, and the maintainer of the repository the University
Library. In part this is because many of the resources considered suitable 
for management in this realm have by tradition been curated by the
libraries. They feel that this new role is one that they are well
qualified to perform, though in practice not all university libraries have 
sufficient technical resources at their disposal to implement even the
modest programming requirements.

The most highly developed of the library-based schemes have front-end
applications to allow direct submission of electronic content by the
organising faculty staff; some also have hooks that could permit peer
review and the development of home-grown journal publishing
operations. The DSpace project developed at MIT in collaboration with
Hewlett-Packard is an impressive implementation, and the software engine
will shortly be released as an Open Source package. Other infrastructure
packages, such as the established eprints.org software of the University
of Southampton and the powerful system driving document organisation,
translation and serving built over many years at CERN are also, or will
soon be, available for Open Source download and development. The intention
is to encourage the adoption and use of powerful and reasonably
standardised tools in building properly federated data repositories. The
presentation of the DSpace project drew attention to the need for a
vertical integration by discipline of the horizontally federated
institutional resources, a distinction that was implicit in many other
presentations.

In some of the breakout sessions to discuss general concerns, this
dichotomy again became apparent. One approach towards its resolution was
to charge professional and learned societies with the organisation of
their discipline-specific content through construction and dissemination
of their own controlled vocabularies and relevant metadata formats.

Through its interoperability, the OAI-PMH can certainly support
value-added discipline-specific metadata records; but the learned society
is faced in the first instance with the problem of locating the records in 
which it has an interest from the large volume of metadata that is
harvestable across a wide range of providers, much of which may be
incomplete or of a low standard. The answer to this problem seemed to be
the creation of middle-tier service providers to harvest promiscuously and 
annotate the records they retrieve, prior to re-export. In fact this is a
similar function to that provided by "traditional" abstracting and
indexing services, the distinction being that such agencies would be ready 
(one supposes as a matter of honour) to re-export the metadata that they
had freely harvested.

It is difficult for institutions - at least universities - to
self-publish the results of scholarly research as traditional
peer-reviewed articles, since the institution cannot call upon outsiders
to provide the reviewing service. (A manifesto on academic independence by 
Professor J.-C. Guedon of the University of Montreal looked ahead to the
days of confederal editorial boards organised by groups of universities -
but while an Ivy League board might flourish, one wonders about the
academic credibility of boards convened by lesser-known and less respected 
institutions.)  Yet the institutions want all the publications of their
faculty members to be available (free of cost) from their own web
servers. The preferred approach was to retain copyright or negotiate with
journal publishers a copyright waiver that allowed the institutions to
host (and deliver) such articles. While the IUCr, for example, has been
happy to allow authors to mount their published articles on their own web
pages, a consistent policy of allowing free redistribution by federated
institutional servers would severely threaten the journals' basis for
subscription.

It is also clear that in designing their institutional repositories,
universities do not wish to become real publishers and to shoulder the
costs of administering peer review and document markup. Yet they do want
cost-free access to high-quality articles. There is a confidence in some
disciplines that authors are sufficiently skilled in editorial tasks to be 
fully entrusted with document markup; hence the SPARC journal Documenta
Mathematica can claim annual production costs for its 700-page journal of
around EUR 200. It is fortunate that its Managing Editor, an academic,
gives his time freely. Of course, such apparent philanthropy hides the
true costs of production, but the community represented here felt this to
be acceptable since the requirement to publish is an integral part of the
academic endeavour.

The open-access archive was, however, seen by Guedon in his elegant essay
as an important development that could restore the openness and continuity 
of scientific communication that is sometimes characterised by the
idealised Age of Letters. New web technologies allow and indeed encourage
post-publication feedback and recommendations (the model for this is the
amazon.com retailer site). Indeed, the possibility arises to move away
from the discrete article-based method of contributing ideas to a
distributed forum of discussion towards which any qualified person may
contribute. The model here is of distributed open-source software
development projects, where small and large contributions are made, but
access control and detailed logging permits open scrutiny and evaluation
of the contributions. In this Utopian ideal, extended and continuous
evaluation of open contributions restores to scholars full independence,
unfettered by the controlling power of commercial publishers who provide
resources for the dissemination of information, coupled with a certain
amount of control.

Absent from this meeting was a sense of real concern about the long-term
preservation of (and access to) the resources within disparate autonomous
repositories, though the various national-level funding and coordinating
bodies represented demonstrated that this is increasingly a matter of
concern at national levels, at least in some countries. An innovative role 
for "service providers" was seen to be the possibility of polling
open-archive repositories and notifying them when it was deemed
appropriate to migrate the digital objects they held to another format or
representation in an effort to prolong their longevity. Indeed, such
services could in principle perform the migration function and return to
the repository the new representation of the object (presumably at some
cost).

The reticence of institutional repositories to become fully-fledged
self-publishers left the SPARC (Scholarly Publication and Academic
Resources Coalition) presentations appearing as a complementary rather
than integral initiative. The Budapest Open Access Initiative to
promote new publishing models to allow open access to information also
appeared as a rather loosely-associated development. Both SPARC and the
BOAI do however see the metadata harvesting protocol as an important
technical facilitator of their goals.

The Elsevier Scirus science-centric web search engine was presented as an
application that could certainly exploit OAI-distributed metadata to good
effect, allowing the construction of a science resource finder able to
span both formal journal publications and less formal web documents.

I feel that the IUCr should certainly consider implementing an OAI-PMH
based data server, and perhaps also run harvester software. Among the
possible applications are:

  * By offering metadata records in the PubMed (and other) formats we
    could optimise the transfer of our metadata to arbitrary linking
    partners.

  * We could harvest non-published materials such as theses, providing
    in the first instance a web catalogue of theses in crystallography,
    subsequently perhaps providing links from article reference lists to
    theses and other such reports.

  * It could provide a possible route for limited access to databases
    such as CSD, which do not currently offer web access.

  * More speculatively, it is a technique we might persuade generators of
    crystallographic data sets (synchrotron laboratories, service
    crystallography facilities) to adopt so as to auto-catalogue and
    provide access to such primary data.

The presentations of the workshop (including slides and video recordings
of the talks) are available at http://doc.cern.ch/age?a02333


_________________________________________________________________________
Brian McMahon                                       tel: +44 1244 342878
Research and Development Officer                    fax: +44 1244 314888
International Union of Crystallography            e-mail:  bm@iucr.org
5 Abbey Square, Chester CH1 2HU, England                   bm@iucr.ac.uk
Reply to: [list | sender only]

Prev by Date: ICSTI: New Elsevier Facility

Next by Date: Re: 2nd Open Archive Initiative Workshop at CERN

Prev by thread: Re: OAI Workshop

Next by thread: Re: 2nd Open Archive Initiative Workshop at CERN

Index(es):

Date

Thread
Discussion List Archives

2nd Open Archive Initiative Workshop at CERN