Note that details of ‘where we are now’ are not included for the technical infrastructure section as this information was not gathered explicitly as part of the consultation. As it is, much background to the current situation is implicit in the following sections and section 3.5 on exploiting the Web.
As previously discussed a renewed vision for scholarly communication in 2013 is yet to be developed. More work needs to be done to develop scenarios of how research and learning will be carried out in the future. The technical infrastructure required to support such scenarios could then be elaborated with more confidence. In the meantime what is self-evident is that scholarly communication in the future will be more Web based than is now the case.
If repositories are to become more aligned with the Web, the processes of deposit, discovery, access, curation and preservation of content will need to be better integrated with Web based services and tools. Repository software and added value services will need to take a RESTful approach and should as far as possible use the existing Web architecture i.e. the Web based HTTP protocol, URIs for identifiers and HTML representation. It is only where scholarly communication has ‘special requirements’ over and above other content on the Web that there might be a need to use specialised non-Web architecture solutions.
In a presentation at a Talis Xiphos Research Day meeting1 in 2008 Andy Powell (Eduserv Foundation) argues that we have not got repository architecture right, that the architecture needs to be based on Web architecture rather than the current focus on specialised harvesting protocols (OAI-PMH), institutional collections and aggregators. He goes on to consider how Web 2.0 might further influence ‘getting scholarly content on the Web’ by exploiting social networks associated with content. Paul Walk, UKOLN, has also undertaken work (in progress)2 to propose an architecture to support repositories.
This Review will not attempt to rehearse the arguments and make judgements on this debate. Instead some questions and considerations will be raised about options for the way forward. In addition the Review will mention some other technical issues at a policy level.
Although we may wish to move to a Web based architecture for repositories there is already within UK institutions a significant deployed base of ‘legacy repositories’. Repository software developers need to consider the questions that face many other existing deployed information systems. Is there a way to layer a Web based approach onto this legacy deployment? Would such a Web based approach require development of new software platforms? Should a Web based approach be applied at the level of aggregator (e.g. Intute) or at the local level (the institutional repository)?
Recommendation 9: Explore options for moving to a more Web based architecture for repositories, taking into account the requirement to move forward existing ‘legacy repositories’.
Concentrated ‘global’ collections of content such as Flickr and Slideshare are held up by Powell as examples of successful repositories that “promote the social activity that takes place around content as well as content management and disclosure activity” (see slide 17 of Powell’s presentation). However, if we consider current practice in research and teaching it would seem that most social activity takes place at a group level. Typically a research group within an institution or cross-institution works together under a funding grant, sharing knowledge and work in progress. Similarly social interaction connected with teaching and learning typically is based on a cohort of students working together as a group. It would be useful to explore whether repositories associated with research groups and learning groups would be more likely to promote social activity that would encourage tagging, embedded comment, re-use. Note that the JISC Faroes project3 is doing this with learning materials for language teachers.
On the other hand, ‘group facilities’ could be layered onto global collections of content, in a similar way to groups forming on Facebook and MySpace, though such an approach might constrain the functionality that a product designed specifically for a small group might deliver.
There are characteristics of some types of scholarly communication that differ from other content on the Web. It may be that such characteristics are incompatible with establishing concentrated collections of scholarly content by means of simple deposit mechanisms. Most obvious is the issue of copyright surrounding deposit of journal articles and learning materials. Another consideration is how concentrated content stored by a third party could be integrated with data in local institutional systems such as a CRIS or REF system. The prospect for centralised collections of datasets seems less contentious than for journal articles.
Recommendation 10: Explore the concentration of the collection of different types of content at different levels small group (lab, research group, student cohorts), discipline and global levels and how these different levels might facilitate social networking effects.
Other technical issues include modelling the relatively complex objects that will increasingly be found in repositories. Already there is a significant level of complexity in content such as journal articles with several versions with related conference proceedings and associated data; or images with associated rich metadata. Complexity will increase with the collection and re-use of datasets. Two initiatives since the original Roadmap have addressed this problem area, the Scholarly Works Application Profile (SWAP)4 and Open Archives Initiative Object Reuse and Exchange (OAI-ORE).5
As outlined in the project wiki,6 SWAP was intended to address issues that arose from the ePrints UK project, in particular the inconsistent linking to full text within Dublin Core metadata created for repository items. This inconsistent linking makes automated harvesting full text unpredictable, and is currently hampering the work of Intute Repository Search. The resulting SWAP was based on the Functional Requirements for Bibliographic Records (FRBR). It could be argued that the development process for SWAP focused too closely on the modelling of complex objects rather than on the other functional requirements, resulting in an over complex solution for most institutional repository requirements. However if SWAP could be implemented in a user friendly interface the resulting rich metadata would enable rich added value services. Does SWAP deliver sufficient gain for the pain?
Currently the SWAP has not been implemented and the community acceptance and take up plan has not been progressed. However other application profiles based on the FRBR/SWAP model are being formulated for other content types. One option would be to move forward a simple deployment of SWAP at the item or copy level, retaining the cataloguing rules to ensure a consistent approach to linking to full text. This may be an interim solution whilst exploring the benefits of deploying the full SWAP description set of work, expression, manifestation and agent.
Recommendation 11: Explore deployment of a cut down version of SWAP, possibly at the copy level, retaining the cataloguing rules to ensure a consistent approach to linking to full text. Evaluate whether use of SWAP is consistent with a Web architecture approach to repositories.
The Open Archives Initiative Object Reuse and Exchange7 (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources (otherwise known as complex digital objects). Such aggregations of Web resources might include text, images, multimedia, and increasingly complex data-sets. The intention of OAI-ORE is to describe such aggregations in a predictable way to enable applications to manipulate them to support, for example, authoring, deposit, exchange, visualization, reuse, and preservation. A primary motivation for ORE is to enable re-use of data. Whilst ORE focuses on the relations within aggregations of resources, and SWAP is focused on research papers, there is a potential overlap between functionality enabled by SWAP and ORE.
Recommendation 12: Explore use of OAI-ORE to enable applications to handle complex objects. Demonstrate how OAI-ORE facilitates the re-use of research outputs and research data. Clarify different roles of OAI-ORE and SWAP.
Turning to software platforms, the majority of UK repositories are dependent on two main software platforms ePrints and DSpace. Both are open source although there are different development models with DSpace looking to a community of developers to contribute code, whereas ePrints development is led by University of Southampton. Extensions to the two platforms will be required to deliver additional functionality to UK IRs such as compliance with SWAP, integration with existing institutional systems, management of datasets, Web oriented architecture. Over time there may be more involvement of commercial suppliers in provision of repository software, and more integration with existing library management systems, but for the foreseeable future there is a significant reliance on the two platforms. Whilst these products will have their own product development plans, contribution from the UK in terms of development effort and funding needs to be targeted to ensure strategic deliverables are prioritised.
Recommendation 13: Target UK contribution to ePrints and DSpace in terms of development effort and funding to ensure strategic deliverables are prioritised.
In reconsidering the repository architecture in the light of recent Web initiatives, issues of outsourcing and scalability arise. What repository functions can be outsourced at the network level, or to put it in Web 2.0 terminology, what can be done ‘in the cloud’? Repositories need to grow to accommodate multiple types of materials and the inclusion of data will require much more storage for repositories to work with. It seems likely that data from ‘big science’ will be stored at some stage in its life-cycle in network services such as Amazon S3.
Other parts of the infrastructure also might be outsourced (preservation, metadata creation, disclosure to search engines, linking data) with national procurement of infrastructure. There could be some local badging of services so that the institutional brand can be preserved. This would immediately free up a lot of local attention and resource, which currently is trying to solve problems, develop solution and build services in a massively redundant way.
Institutions need ongoing advice on the use of identifiers, both for repository content and for researchers and institutions. This might be part of the infrastructure offered at a national level.
Recommendation 14: Explore use of cloud computing to support repository storage and services. Consider what repository infrastructure is best located at the local institutional level and what is better outsourced to help alleviate cost implications.
The development of SWORD, an ATOM profile for depositing items in repositories, has proved successful. Developers from various repository platforms came together to specify and develop a simple protocol. This might prove a valuable pattern of development for other repository related applications. SWORD now needs to be taken forward to demonstrate agreement on metadata passed by the protocol for deposit in multiple repositories e.g. deposit into ESRC and a local institutional repository.
Recommendation 15: Follow SWORD development pattern for other repository related applications. Demonstrate use of SWORD to deliver deposit to multiple repositories.
- Powell, A. Web 2.0 and repositories have we got our repository architecture right? Presentation at Talis Xiphos Research Day, June 2008. http://www.slideshare.net/eduservfoundation/repositories-and-web-20-have-we-got-our-repository-architecture-right []
- Walk,P. Repository architecture #83. Presentation at JISC Repositories Architecture meeting, July 2008. http://www.slideshare.net/paulwalk/repositories-architecture-83 []
- http://www.elanguages.ac.uk/researchcommunity/projects/faroes.html []
- http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile []
- http://www.openarchives.org/ore/ []
- http://www.ukoln.ac.uk/repositories/digirep/index/Functional_Requirements#Conclusions_from_Eprints_UK []
- http://www.openarchives.org/ore/ []
