Interoperability in data and models – why and how
Communicating with each other requires using the same language, if necessary, with hands and feet.
Key element in Interoperability (the I in FAIR) is understanding each other’s data. From data to person and from data to machine. Therefore, to increase interoperability it is indispensable to add context and shared vocabulary to the dataset and/or code. Words can mean the same but, when unexplained, are all different for a computer.
What is interoperability in general? And how do WUR researchers apply it? This blog post presents two WUR research cases, where interoperability is an essential element. The examples used in this blog post were presented on a webinar [only available for WUR users] on September 22, 2020.
Well begun is half done: understanding the headers
Lieke Melsen evaluates the openly shared Camels hydrometeoroligical dataset, which she uses for her research on the complexity of hydrological models. In order to understand the data, three different files which explain the headers are required (two ASCII and one EXCEL). ASCII formatting can be read by most machines, however Excel cannot. Humans are able to open the three files and understand the headers in the Camels data set, but how can the computer ever understand that it has to open three files and make the interpretations? It takes experience and expert knowledge to understand and work with the dataset.
A NetCDF format forms the solution used by Lieke. In a NetCDF file the metadata are part of the datafile, the dimensions are consistent among the variables and it contains global attributes (e.g. provenance link, NetCDF version used and version history. Therefore, NetCDF is solving many problems in data sharing and offers the opportunity to deliver interoperable data from the start. NetCDF is not solving everything yet, but many models used in hydrological studies work with NetCDF as input and output files, which helps in sharing correct metadata.
Not one system is going to put structure in your data, because it doesn’t understand your research. The structure comes from you Patrick Vandewalle
Press the button
Maarten Voors integrates interoperability into his research on investors in agriculture and mining in Kenya. The study combines nine different sources, e.g openstreetmap, estimated rainfall from Tamsat, Elevation data from CGIAR. Since all data was spatial, in the project the data was merged based on geolocations. By wanting to merge, Maarten was faced with numerous data sets, formats, code languages and file versions. Obviously, that asks for a way to integrate. It took a lot of time to manually transform the data into one consistent data set in order to answer Maartens’ research questions and to be able to go with ‘one press on the button’ from data to table/graph. Maarten’s data will be published in Dataverse Harvard including metadata, processing steps and output code. Thus Maarten hopes that others (including his future self) will have easier access to the data.
- There is an incredible amount of data available, and interoperability is key in combining data sets from different sources.
- It is not all about coding for others, it is also a reminder for yourself how you did something in the past.
- Speaking the same language is key: use vocabularies and ontologies (most preferably FAIR vocabularies) so that both humans and machines understand the data.
- It is advised to start from existing ontologies, and to share ontologies where possible. See also one of our previous blog posts (here)
- Datasets may come with descriptions and ‘readme’ files that are understandable to human readers but not actionable for machines. The less human interaction needed to combine datasets, the better.
Interoperability at WUR
Interoperability is domain specific, therefore, WUR has no standards available. Scientific communities are working to assign unique identifiers to properties in the data. Within the DDHT (Data Driven and High Tech) research program and the DT (Digital Twin) research program within WUR, specific projects work on vocabularies and ontologies. By investing in these research projects WUR aims to create an environment in which relevant vocabularies will become findable and provide datasets with semantic information derived from ontologies. As a researcher, you can join these communities and apply the standards/ontologies/ vocabularies once available.
Don’t know where to start? WDCC* can help researchers either to connect to setup a community and align standards with each other, or to get in touch with the experts within your science group or external community (e.g. Elixir or Research Data Alliance).
Within WUR, the Wageningen Data Competence Center is your first entry point for questions on Data Science, Research Data Infrastructure or Data Management. Please contact us when you have questions on Data or Code Sharing, licensing or data sharing agreements. For all questions you may contact: firstname.lastname@example.org
Further reading in a WDCC Use Case: improving access and interoperability of open data for the agri-food sector with the AgroDataCube. Also spotlighted on our Data Portal.