Version control and beyond: speaking of facts, files and folders

By: Hilde van Zeeland · 20 January 2017
Category: Research Data

In December 2016, we gave a course on version control to a group of PE&RC PhD students, requested by PE&RC. The term ‘version control’ refers to the managing of different versions of the same file. Typically, such version control happens in a context in which the relationships between files also matter, as well as the organisation of files in folders and subfolders.

Preparing for the course

Considering this broad context, the initial question we asked ourselves was: What do we cover in this course? What version control issues do researchers encounter and possibly struggle with? To make sure we got it right, we first carried out a survey amongst a group of researchers at PE&RC, in which we asked them what issues they encountered in their day-to-day work. A lot of researchers responded to the survey. Their responses allowed us to identify five main areas that they struggled with, below listed with examples from the survey:

  • What is a logical way to structure my folders and files? ‘My organisation of folders seemed intuitive when I made it at the beginning of my PhD, but now it’s a bit of a mess.’
  • How do I keep track of the relationships between files? ‘I have had old versions of graphs in new versions of papers. And if I need to revise a paper again after two months, I struggle to find the latest version, not necessarily of the paper file, but of the scripts and figures.’
  • How do I decide what files to keep or get rid of? ‘I struggle to retrieve the last version among many, all having the same names and opened dates.’
  • What are ways to automatically synchronise files between computers? ‘I have multiple backups, but manual syncing between them makes it difficult to keep track.’
  • How can I keep track of file changes made by multiple co-authors? ‘Extra problems arise when several people work on the same manuscript.’

These responses showed that researchers struggled with more than the organisation of files and folders. The topics of file synchronisation and collaboration also deserved attention. We decided to set up the course as follows:

  • Part 1: Managing files and folders, including tips for structuring files/folders
  • Part 2: Applications and platforms, including practical information about synchronisation tools (e.g. SURFdrive, OneDrive) and collaboration tools (e.g. Word Online, Overleaf).

Once we had sent out an invitation to the researchers, showing this course setup, the available places filled up quickly.

What came up during the course

During the course, we were happy to find that the content we had prepared matched the needs of the participants. They were eager to discuss the topics covered. They also asked quite a few questions – reflecting engagement, but at the same time indicating that our presentation had covered only some of the issues they struggled with. Their questions were diverse, ranging from issues regarding the safety of popular file synchronisation tools (such as Google Drive) to the difficulty of keeping track of integrated code and databases (such as in R). We might integrate some of these topics in our next version control course (which we plan to give soon, but for which we do not have a date yet).

One issue that we found particularly interesting was that of literature organisation. A participant explained that she found it difficult to keep track of what she had read, where, and how this built on her earlier readings. While reference management tools Mendeley and EndNote allow users to tag sources (identifying themes), simple tagging does not provide an overview of the contents of the articles and of how these relate. This researcher decided to use a mind mapping software for this. By mind mapping, she could visually organise the information she had read in different articles. Rather than adding tags to categorise articles, she managed to categorise facts, and to link articles covering these facts. (You can find a wide range of mind mapping solutions here.)

How can we keep track of facts?

For us this is a new area. And it is one that, in addition to the version control issues identified above, clearly plays an important role in the day-to-day work of researchers. Every researcher writes literature reviews as part of their articles. So they all need to keep track of facts, as well as of which articles state which facts. As a library, we teach how to use tools for the management of references. But how to use tools to do science is another matter. This blog is an invitation to share your insights on how to use tools to organise facts, and to do science. We are looking forward to your comments.

 

This blog was written by Hugo Besemer and Hilde van Zeeland.
Image: Courtesy of GitHub

Hilde van Zeeland

Hilde van Zeeland

Leave a reply

Your email address will not be published. Required fields are marked *