Providing access, sharing and re-use of data will enable ‘new’ research. Physical and earth sciences will be working like the biology/bioinformatics communities, drawing on large datasets managed at community level. Data will be made accessible to research groups for re-use by means of Web based ‘research environments’.
A national strategy needs to be in place to ensure discovery, access, re-use and preservation of data. There are differing requirements and differing solutions for each of these processes that need to be taken into account. Key indicators will be put in place to measure progress in terms of data stored, also to measure the impact of sharing and re-use of data.
There will be a network of data centres for access to and preservation of research data working together at international, national, institutional and departmental levels. There will be considerable differences in the requirements and roles of different sized HE institutions. This network will be linked up with commercial sector and public sector data centres. Some of these data centres may use Web based facilities (e.g. Amazon S3). Transitory data stores will be available to support short term collaboration.
Data deposit will be built into the experimental research workflow. Deposit will be integrated into tools being used within labs. Repositories will be integrated into Open Science initiatives and will be used to make data available within collaborating groups.
There will be improved provision and quality of supporting information (metadata), but metadata will only be created where necessary. The future creation of metadata will take into account the rich versus light-weight metadata debate, resulting in various different levels of richness of metadata being created depending on the requirements for re-use and preservation.
There will be increasing demand for repositories to provide geodata as Web services or in more manageable ways/formats. This will enable users to create mash-ups and combine data more easily.
We have only just scratched the surface. Technically goals are achievable but in every other respect there is a lot of ground to cover.
Certainly there is much activity by UK (and international) players as regards the curation and preservation of data. In the UK there are initiatives that include the JISC Digital Curation Centre, the Research Information Network, and the UK Research Data Service; internationally there is involvement from the Australian National Data Service, the US National Digital Information Infrastructure and Preservation Program, the Library of Congress, the National Science Foundation etc. As the infrastructure for research data is still in the planning stage, there are many lessons that can be learnt from the past experience of setting up repositories for research papers. It would be useful to clarify who is taking responsibility for inputting ‘lessons learnt’ from JISC programmes into the wider discussions on data repositories.
The licensing issues for geodata have not really been simplified in practice. The licensing issues are still the same big issue for geodata, standards still the same issue, although now with KML (Keyhole Markup Language) becoming more prevalent as a data sharing format that may change.
There needs to be involvement of the ‘repository community’ with current research data initiatives to ensure policy issues learnt in establishing research output repositories are carried over to establishing research data repositories. Policies and processes for populating data repositories need to be influenced by the experience of existing institutional repositories. Existing JISC services or projects such as UKOLN or DCC might take a lead role here.
There is likely to be a hybrid of institutional, national and international research data centres as well as subject and disciplinary data centres. Deposit of legacy data needs to be considered; mandates need to be put in place and enforced; a framework of rewards and incentives established for sharing and re-use of ‘my data’; support of new grassroots researchers who are not yet established, they are more likely to bring about cultural change
Milestone 6: Clarify responsibility for feeding existing repository implementation experience into current planning activity for research data centres. Potential candidates for this role are DCC, UKOLN and JISC. Taking experience of previous JISC programmes, and existing IRs on board, the interaction between different types of research data centre should be defined at an early stage. Work in this area needs to be undertaken in collaboration with the UK Research Councils. There is a need to establish metrics for populating research data centres and measure impact.
Meeting the objectives for discovery, access, re-use and preservation of data may require different solutions. Similarly different types of content will have different requirements (big science/small science; different disciplines).
Milestone 7: Ensure the multiple objectives that management and re-use of research data supports (e.g. discovery, access, re-use and preservation of data) are taken into account in proposed solutions. Ensure the requirements of different research data types (big science, small science, different disciplines) are taken into account in proposed solutions.
Most smaller institutions will not be able to sustain the costs of a digital data repository. Nor will they be able to ensure appropriate finding aids can find deposited material. Ingest of diverse idiosyncratic data requires specialised data archives at national level. There needs to be a national strategy involving the largest HE institutions and the national data archives.
Milestone 8: Formulate national strategy to take account of differing roles for different types of HE institution.
Already there is a lot of activity in this area, and it will increase over the next few years. Outcomes need to be targeted at the appropriate stakeholder groups on an ongoing basis. In particular what is relevant to the existing research repository manager? What is still being debated at the policy level? What can be acted on now?
There will be opportunities to incorporate training of personnel in data management into repository projects. The barriers between domain experts, informaticians / data managers must be broken down with training and career incentives/opportunities put in place for work at this interface to begin. Different communities need to be brought together to address the problem.
ShareGeo will hopefully give us the opportunity to continue to educate the community on sharing and reusing geospatial data within current licensing and security models