Colleagues in university libraries in the United States recently published an ambitious and well thought out shared service model for research data curation.

It is based on a very pertinent observation. Looking after the huge panoply of file formats and software types that constitute research data is beyond the remit of any single institution. The expertise needed will sit in many different places.

The model also highlights how we need to focus on data at the moment researchers wish to publish a dataset. Curation is not something to be done tomorrow.

At our own data archive, 4TU.ResearchData, our day-to-day work is not so much on data curation of datasets that have already been deposited with us.

Rather, the focus is on the moment researchers (some of whom are from Dutch universities, some of whom from further afield) send us their initial dataset. Often, this comes with limited metadata and next-to-none documentation.

It’s the role of our data moderators to liaise with the researchers, improving the quality of the metadata and adding enough documentation to explain the codes, acronyms and software that accompany the datasets. This work is crucial – it ensures that the data is re-usable by other researchers.  

Of course, we still need to think about the long-term data curation to make sure the data is still readable in a technical sense. But our greater challenge is making sure the data is contextualised.

So it’s good that such work is at the heart of the Data Curation Network. The workflow suggested fully acknowledges the need to work with moderators who can fully appraise the data and provide the necessary curation expertise at the moment of deposit.