Authors: Wilma van Wezenbeek, Alastair Dunning, Marta Teperek
Date: 3 August 2018
- Interim version of the FAIR Data Action Plan: https://doi.org/10.5281/zenodo.1285290
- Interim version of the FAIR Data Report: https://doi.org/10.5281/zenodo.1285272
- TU Delft welcomes these two very thorough reports, with plenty of valid recommendations. TU Delft is particularly pleased to see the importance of long-term curation, data stewardship and disciplinary frameworks for FAIR highlighted in the Action Plan
- Careful attention needs to be paid to role and responsibilities, particularly with regards to the role of data stewards. Data stewards are not be the only keepers of research data; researchers must assume individual responsibility as well.
- There should be an explicit action on funders to recognise the costs of long-term data curation and preservation as eligible costs on grants (this is a particular problem for current EU funding mechanisms)
- The general public are not addressed in the document. The Action Plan should address how the FAIR principles can be understood to help maintain broad trust in science
- There also needs to be attention on the implication of implementing FAIR data for those working between academia and industry.
- There should be greater clarity between tasks that occur at national level, and those that occur at international level.
- There is nothing about the governance of the FAIR Principles. Do the Principles require a broader governance model so they can represent all the necessary stakeholders?
- Consideration to be given to renaming DMPs as Output Management Plans to allow for the management of code, models etc.
- There is overlap between many of the recommendations; they should be merged to provide a more concise document
FAIR Data report: Specific Comments on the FAIR Data report:
Page 10, “The key arguments can be summarised in three categories”
- Agreed that these are important, though I miss the “peer advantage”, so not per se sharing your data because you want or need to be integer, or want to be transforming research, but because immediate and open sharing helps research further (perhaps this is just semantics?
Page 23, “…stewardship in the wider science community”.
- In this context, ‘science’ could be more clearly defined. The LERU open science roadmap provides a good example
“..and the international scientific unions”.
- It is indeed preferable not to limit discussions to data-related groups or venues. Data-related discussions should indeed happen at normal “community” conferences and venues.
Page 24, “It would be useful to define use cases taking advantage of FAIR beyond their current data sharing capacities to convince such communities to engage more fully with a FAIR ecosystem.”
- Perhaps there is a room for a clear recommendation here?
Page 28, “..motivation to join the movement”
- The word “movement” might have some negative connotations
Page 32, “A set of case study examples should be developed and maintained to demonstrate that providing FAIR data can increase the impact of facilities by increasing data reuse and thereby return on investment in the facility.”
- Perhaps there is a room for a clear recommendation here?
Page 47, “Researchers should also preferably deposit in certified repositories”
- Perhaps Rec 18 on this page could be combined with Rec 10, so that the researchers know what certification is the “right” one (although is challenging), and relate this to what is said said in the following section about researcher awareness. To know whether a journal is eligible, researchers use(d) the impact factor, now they need something to identify the right certified data repository (e.g. CoreTrustSeal)
Comments on FAIR Data Action Plan
Rec.1, page 3:
The FAIR principles should be consulted on and clarified to ensure they are understood to include appropriate openness, timeliness of sharing, assessability, data appraisal, long-term stewardship and legal interoperability.
Stakeholders: Global coordination fora; Research communities; Data services.
- We feel that policy makers & research funders should be listed as a stakeholder here as well. In the end their policy requirements (and their definitions) will be the key drivers for researchers and academic institutions (key content producers)
Rec. 2: Mandates and boundaries for Open
- Perhaps it would be worth adding add that with the use of the principle “as open as possible”, the slide shifts once other criteria have been made explicit (e.g. national security or endangered species)
- A further recommendation could be to create specific rights statements for data, along the lines of http://rightsstatements.org/
Concrete and accessible guidance should be provided to researchers in relation to sharing sensitive and commercial data as openly as possible.
Stakeholders: Data stewards; Data services; Institutions; Publishers.
- Much more needs to be done to help the sharing of sensitive and commercial data, for instance development of shared practices and protocols for how academia and industry share and manage research data. Best practice should be documented.
- Policy makers & research funders should be part of this as well => their endorsement of the guidance is necessary
Rec. 4: Components of a FAIR data ecosystem
- While these are all infrastructure-related elements, it is strange that Research Communities are not identified as key stakeholders in every action
Rec. 8: Cross-disciplinary FAIRness
Case studies for cross-disciplinary data sharing and reuse should be collected. Based on these case studies, mechanisms that facilitate the development of frameworks for interoperability and reuse should be developed.
Stakeholders: Global coordination fora; Data stewards
- Research communities are also key stakeholders here – they are the key actors and need to be willing to work with the data stewards and the global coordination for the case studies to be identified and developed.
Rec. 9: Develop robust FAIR data metrics
- It should be clarified who the data metrics are for – different stakeholders require different metrics
- Any reference to rewarding FAIR practices is missing – perhaps add this reference to Rec. 14
Rec. 10: Trusted Digital Repositories
At an appropriate point, the language of the CoreTrustSeal requirements should be reviewed and adapted to reference the FAIR data principles more explicitly (e.g. in sections on levels of curation, discoverability, accessibility, standards and reuse).
Stakeholders: Global coordination fora; Data services; Institutions
- This is a very important point. Perhaps, in-line with previous valid points in the action plan, it would be useful to say that the metrics of FAIRness need to be developed with the research communities, that the language (and the requirements themselves) might have to take into account disciplinary differences. This might be particularly important when it comes to very confidential datasets, very large datasets etc.
Rec. 11: Develop metrics to assess and certify data services
“Certification schemes are needed to assess all components of the FAIR data ecosystem. ….”
- More elements need to be taken into account here indeed – see the report by Bilder and Neylon
Rec. 12: Data Management via DMPs
- The introduction text to this recommendation reads: “…The DMP should be regularly updated to provide a hub of information on the FAIR data objects.”. Given that recommendations 3 and 16 discuss the dependencies between datasets and other types of research outputs (crucially, code), and that the very introduction talks about ‘FAIR data objects’, shouldn’t DMPs be rebranded to “Output Management Plans”?
- While this is of course a useful recommendation, reading the associated questions and seeing the reference: “This applies to the relatively small, informal projects of individual scientists ..”, there is a worry about administering the process too much. This section is called “ Creating a culture of FAIR data”. Perhaps the focus should be on a transition to a culture where researchers understand autonomously that their data should be FAIR. That has a slightly different angle to it.
Data Management Plans should be living documents that are implemented throughout the project. A lightweight data management and curation statement should be assessed at project proposal stage, including information on costs and the track record in FAIR. A sufficiently detailed DMP should be developed at project inception. Project end reports should include reporting against the DMP.
Stakeholders: Funders; Institutions; Data stewards; Research communities
- The statement at the project proposal stage should also outline any reasons for not making data available.
Rec. 13: Professionalise data science and data stewardship roles
Key data roles need to be recognised and rewarded, in particular, the data scientists who will assist research design and data analysis, visualisation and modelling; and data stewards who will inform the process of data curation and take responsibility for data management.
Stakeholders: Funders; Institutions; Publishers; Research communities.
- Researchers, the creators of content, together with the data stewards, have the responsibility for data management. Data Stewards support researchers in data management, but given that they are not the creators of the data, they can only help researchers (who are willing to get help) to manage their research data. They cannot enforce good data management on researchers if the researchers are not willing to cooperate. In addition, data stewards should definitely be included as stakeholders in this point.
Professional bodies for these roles should be created and promoted. Accreditation should be developed for training and qualifications for these roles.
Stakeholders: Institutions; Data services; Research communities.
- Again, given that the discussion is also about data stewards, they should be also recognised as key stakeholders for this action point.
Rec. 14: Recognise and reward FAIR data and data stewardship
Credit should be given for all roles supporting FAIR data, including data analysis, annotation, management, curation and participation in the definition of disciplinary interoperability frameworks.
Stakeholders: Funders; Publishers; Institutions.
- Research communities and data stewards are key stakeholders for this action. Crediting FAIR data is dependent on those creating and re-using data outputs. Do they correctly credit resources they re-used? Do they appropriately credit those who helped them collate, analyse or manage the data?
Evidence of past practice in support of FAIR data should be included in assessments of research contribution. Such evidence should be required in grant proposals (for both research and infrastructure investments), for career advancement, for publication and conference contributions, and other evaluation schemes.
Stakeholders: Funders; Institutions; Publishers; Research communities.
- One element which would be worth mentioning here is hiring criteria. Best practice could be shared
The contributions of organisations and collaborations to the development of certified and trusted infrastructures that support FAIR data should be recognised, rewarded and appropriately incentivised.
Stakeholders: Funders; Institutions.
- This is particularly important for institutions which provide necessary elements of FAIR ecosystem components, but cannot get funding for these through traditional EC funding mechanisms (institutional costs are considered as overheads, which however are capped at 25% and cannot be spent post-project, eg on long-term archiving)
Rec. 15: Policy harmonisation
Concerted work is needed to update policies to incorporate and align with the FAIR principles to ensure that policy properly supports the FAIR data Action Plan.
A funders’ forum at a European and global level should do concrete work to align policies, DMP requirements and principles governing recognition and rewards.
- These two points are independent and need to be coordinated, e.g. policymakers on national and institutional levels need to take into account recognition and rewards, and need to do it jointly with the funding bodies; institutional requirements and funders requirements should be aligned as well. These two points should be merged into one and both funders and policy makers are listed as key stakeholders.
Rec. 16: Broad application of FAIR
- This recommendation is vague and repeats ideas elsewhere.
- Why not try to use the same principles for publications (research articles) to see the publications also as part of the “EOSC ecosystem”
Rec. 17: Selection and prioritisation of FAIR Data Objects
- These actions should be put high on institutional priority lists, they would help both researchers and research leaders
When data are to be deleted as part of selection and prioritisation efforts, metadata about the data and about the deletion decision should be kept.
Stakeholders: Research communities; Data stewards; Data services.
- This feels like an unnecessary burden on the community. Attention should be focused on archiving data that are needed to validate research.
Rec. 18: Deposit in Trusted Digital Repositories
Concrete steps need to be taken to ensure the development of domain repositories and data services for interdisciplinary research communities so the needs of all researchers are covered.
Stakeholders: Data services; Funders; Institutions.
- Researchers should be recognised as key stakeholders here, as they are the ones whose needs have to be addressed.
Rec. 19: Encourage and incentivise data reuse
- This is a weak section and could be absorbed in to others.
- It misses reference to academic rewards systems – if researchers are encouraged to re-use, they should be also appropriately recognised for this in academic evaluation criteria (projects might not be necessarily “novel”). There should be a cross-reference with Rec. 14
Rec. 21: Use information held in Data Management Plans
DMPs should be explicitly referenced in systems containing information about research projects and their outputs (CRIS). Relevant standards and metadata profiles, should consider adaptations to include DMPs as a specific project output entity (rather than inclusion in the general category of research products). The same should apply to FAIR Data Objects.
Stakeholders: Standards bodies; Global coordination fora; Data services.
- Logically, these should be appropriately rewarded (Rec. 14) and consequently, Funders and Policy makers should be considered as stakeholders here.
DMPs themselves should conform to FAIR principles and be Open where possible.
Stakeholders: Data services; Research communities; Policymakers.
- Funders are a key stakeholder here as well, as they can mandate that DMPs are openly available
Skills and roles for FAIR
- Introduction says: “Data stewards who manage data, ensure that it is FAIR and prepare it for long term curation are also essential.” Data Stewards support researchers in data management, but researchers themselves are the ultimate data procedures. Data Stewards cannot enforce good data management without the will and cooperation from researchers. Please rephrase to: “Data stewards who support researchers in data management and help ensure that it is FAIR and prepare it for long term curation are also essential.”
Rec. 29: Implement FAIR metrics
- Seems repetitive with Rec. 9
- Please provide reference to Rec. 31
Rec 30: Monitor FAIR
- Here also I feel that we are overstretching ourselves slightly. Funders need to monitor much, and yes any requirements or compliancy issues need to be tracked, but the research and funder have more related issues to tackle under the open science or “just science” umbrella. Is that all now captured under FAIR? Or should it be the other way round?
Rec. 31: Support data citation and next generation metrics
- Provide reference to software citation recommendations.
- There should be an additional action on publishers to scrap the limit on the number of possible citations. For example, in Science reports, the maximum number of citations is 30. As a result, not only researchers struggle to cite all relevant literature within this citation number limit, but would be also disinclined to cite datasets. The citation limits might have been justified in the print era. However, in the digital era citation limits are anachronistic and only serve the interests of publishers, who wish to further boost the impact factor of their most prominent venues.
Rec. 32: Costing data management
- There should be an explicit action on funders to recognise the costs of long-term data curation and preservation as eligible costs on grants.