On 25 October 2017 I attended the conference “Better Science Through Data” organised jointly by Springer Nature and the Wellcome Trust. I flew from the Netherlands to attend this one day conference for two main reasons:

  • I was part of the Programme Committee, and so I was naturally curious to see how the event worked out.
  • The idea of this conference is that it is for researchers by researchers. And I it quite refreshing compared with most meetings on research data management, which are featuring mostly librarian speakers and librarian audience and sometimes I have the feeling they tend to be a bit detached from the reality of day to day research data management practice. So, as an ex-researcher now working at the library, I like opportunities allowing me to her what does the research community say.

Below are my key reflections of the meeting.

Making data FAIR at the end starts at the beginning of the research project

The common theme of the conference was that in order to make research data Findable Accessible Interoperable and Re-usable (FAIR) at the end of the project, good data management needs to be applied from the very start. Or in other words, sharing messy data is a challenge.

Iain Hrynaszkiewicz, the Head of Data Publishing at Springer Nature mentioned that the recognition of daily data management as key to making data FAIR was reflected in the name change of the whole conference. The past editions of this yearly conference were titled “Publishing Better Science Through Data”. This year for the first time the conference was simply called “Better Science Through Data” – good data stewardship spans across the whole research lifecycle and is not limited to data publication.

All participants also received a copy of the first formal publication about FAIR data principles (doi: 10.1038/sdata.2016.18).

Keynotes

There were four keynote speakers at the conference.

Aled Edwards – society before the pride of individuals

The first speaker was Aled Edwards, the Director of the Structural Genomics Consortium (SGC) and a professor at the University of Toronto. Aled discussed important issues of tensions between progress in science and the desire to patent discoveries [Alastair Dunning wrote about another example of this problem at TU Delft]. He mentioned that funding bodies, government organisations and policymakers suggest that patents drive innovation, but that he was yet to see scientific evidence for this. To the contrary, Aled’s talk provided strong and compelling evidence that doing science without making research data and findings available leads to a waste of resources, mainly as a result of duplication of research efforts. Therefore, the mission and vision of the Structural Genomics Consortium is to do patent-free research, similarly to what McGill University’s Montreal Neurological Institute (MNI) and Hospital in Canada does.

Picture1

Kirstie Whitaker – human approach to data sharing

The second keynote was by Dr Kirstie Whitaker, a Neuroimaging Researcher at the University of Cambridge and a Research Fellow at the Alan Turing Institute. I have already seen Kirstie’s talk at Open Science in Practice Summer School, but I really enjoyed seeing it again: I can never describe how important is Kirstie’s honest approach to data sharing in research and I think that attending her talk should be compulsory to every researcher (and to every librarian!). Kirstie not only clearly explained what ‘Reproducible’ research means but also addressed the barriers to reproducible research and how to tackle them.

Picture2

Her practical advice to researchers was to start small, for example by publishing protocols on protocols.io, using GitHub and Jupyter notebooks for managing code and for commenting on own code (“Comments are your friend!”). She also suggested that one should not be afraid to ask for help – no one is perfect and there is always room for improvement, especially when researchers are only starting with reproducible working.

Finally, Kirstie stressed the important message that reproducible research is not the same as open research. There might be legitimate reasons preventing one from making research “open”, but every researcher should be aiming to make research reproducible.

Picture 3

Esther Crawley – tensions between protecting participants and openness

The third keynote was from Esther Crawley, a Professor of Child Health at the University of Bristol and a Consultant Paediatrician. Esther explained that even if data cannot be made openly available, they should be made accessible and access conditions need to be transparent. She praised research data services at the University of Bristol, which provided her with a mechanism for data sharing on request. In addition, in the Bristol model, every request for data access is considered by an independent Data Access Committee. Esther mentioned that the biggest advantage of having a Data Access Committee external to the research group is that all the decisions are unbiased and free of any potential vested interests.

Finally, Esther advised anyone wishing to embark on research projects involving human participants to carefully consider data sharing when designing consent forms – these will determine what can and what cannot be done with the data in the future. At TU Delft we strongly support this notion and we work closely with the Human Research Ethics Committee to ensure that researchers receive appropriate guidance and training on data management and sharing when planning their projects involving human participants.

Picture 4

Jez Cope – “a cool seminar by a librarian”

I had to leave halfway through the last keynote by Jez Cope, the Research Data Manager at the University of Sheffield, so I can only comment on the first half of his presentation, but all Jez’s slides are available. What I really like about Jez’s presentation was that he spoke about the library and services offered by libraries in support of Open Research in a very inclusive way, focusing on researchers and their perspective. Jez started from explaining that modern libraries are not museums of books, but that they are there to help and support researchers in their journey to Open Research: they can help them with measuring the impact of their research, make their work publicly available or make their data visually attractive. Researchers in the audience seem to have enjoyed his presentation, which is best evidenced by a Tweet from Aled Edwards:

Picture 5

Better data in practice – talks from researchers

Finally, the core of the conference consisted of lightning talks presented by researchers who spoke about their daily data management practice. There were thirteen talks and I would like to highlight two which inspired me the most.

Pierre Montagano – dealing with Dependency Hell

Pierre Montagano, the Director of Business Development of Code Ocean noted that reproducibility in computational research is often difficult not only because the code is not available, but also because of dependencies on different programming languages, versions of programming language, different operating systems and much more. With Code Ocean, one can not only share the code (and get a DOI to make the code citeable) but also make it easy for anyone to simply re-run the analysis, without the need of installing anything or worry about all the other dependencies.

Picture 6

Danielle Robinson – share and sync data wherever you are

Danielle Robinson, the Scientific and Partnerships Director at Code for Science & Society explored further the issue of research reproducibility and focused on the data files themselves. She introduced a tool called Dat, which can support data sharing, versioning and syncing and, importantly, can access datasets from multiple sources. In addition, within Dat one can automate the metadata creation process, making datasets searchable.

The tool looked certainly interesting to me and I was keen to try it out myself because many researchers at TU Delft complained about the difficulties with being unable to access their data files from wherever they are. Unfortunately, the desktop application for Windows is not yet available. Will definitely keep an eye on it!

Excellent engagement example

Finally, I wanted to emphasise that the way the conference was organised was exemplary for allowing participation and engagement not only with people on-site but also with others who followed the meeting remotely. On top of taking questions from the audience, Iain Hrynaszkiewicz who chaired the event was also managing questions coming online via Slido.

This allowed not only the ‘shy’ participants to speak up, but also prompted discussions with those who were not able to attend the meeting in person.

In addition, all talks were live recorded and made available on Facebook [I have to say that personally, I find the choice of platform for making the videos available somewhat bizarre, especially as Facebook requires people to create an account before viewing videos, which is against the spirit of openness and sharing].

Finally, the use of Twitter was excellent – dedicated hashtag for #scidata17 allowed the exchange of comments and spurred interesting discussions. For example, conversations on Twitter revealed that there were controversies around Esther Crawley’s research and, as Ross Mounce pointed out, it was good to learn about the other side of the story.

Picture 7So big congratulations to Iain Hrynaszkiewicz and his team at Scientific Data – I think anyone organising events can learn a lot from you. It was also a big pleasure to be part of the Programme Committee – thank you!