Lecture 2 - Reproducible & FAIR data science

ANSCI 4940 - Spring 2025

Ass. Prof. Dr. Miel Hostens

Reproducible & FAIR data science

What are the FAIR principles

FAIR data is data which meets the FAIR principles of findability, accessibility, interoperability, and reusability (FAIR).[1][2] The acronym and principles were defined in a March 2016 paper in the journal Scientific Data by a consortium of scientists and organizations.[1]

Get to know the principles (1)

Get to know the principles (2)

  • Check for reproducible advise in the tutorials at the bovi-analytics website.

  • Discuss with the entire team and Dr. Miel Hostens (contact him on his desk) on how this will reflect on your team project.

Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.

Findable

  • F1. (Meta)data are assigned a globally unique and persistent identifier

  • F2. Data are described with rich metadata (defined by R1 below)

  • F3. Metadata clearly and explicitly include the identifier of the data they describe

  • F4. (Meta)data are registered or indexed in a searchable resource

Accessible

Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation.

Accessible

  • A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

  • A1.1 The protocol is open, free, and universally implementable

  • A1.2 The protocol allows for an authentication and authorisation procedure, where necessary

  • A2. Metadata are accessible, even when the data are no longer available

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

Interoperable

  • I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

  • I2. (Meta)data use vocabularies that follow FAIR principles

  • I3. (Meta)data include qualified references to other (meta)data

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

Reusable

  • R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

  • R1.1. (Meta)data are released with a clear and accessible data usage license

  • R1.2. (Meta)data are associated with detailed provenance

  • R1.3. (Meta)data meet domain-relevant community standards