If you build a data set and nobody can find it, is it useful? Not as much as it could be. With trust in science under siege from partisan actors and impartial pathogens, the accessibility and transparency of — and trust in — scientific information must be improved.
Have people stopped trusting science? The data tell a surprising story
Enter the FAIR Data Principles. In 2014, scientists realized that data management and stewardship could benefit from a set of shared guidelines, and dozens of international researchers gathered to draft new recommendations. The resulting principles — which established that data should be findable, accessible, interoperable and reusable (FAIR) — were published ten years ago1. The original publication has around 16,000 citations, and governments, funders and publishers around the world now ask that data be hosted and shared in FAIR-compliant ways.
A decade on, however, even the founders acknowledge that the FAIR principles are an imperfect tool. Barend Mons, a molecular biologist at Leiden University in the Netherlands who conceived the initiative, says that FAIR was always meant to be a set of general principles, “and so, by definition, cannot address the specifics of every application”. Fortunately, other researchers have taken the framework and extended it to cover the broader data ecosystem2, including the algorithms, tools and workflows that drive contemporary research.
Making every discipline FAIR
At its core, FAIR is meant to ensure that data are produced, analysed, stored and shared in ways that promote transparency and reproducibility. “The more the data are understandable by people other than the creators, the more we are able to determine not only the trustworthiness of the data set itself, but also its alleged creators,” says Mons.
The ideal data set should be properly documented, simple for both computers and people to find and use. It should also be easy to integrate with other data. To accomplish this, scientists must design workflows before data have been collected and create and maintain a detailed metadata file — an often overlooked component that contains contextual information about the data set, such as where and when it was created. The initiative also prioritizes data-management plans, including choosing appropriate licences and persistent identifiers — the unique labels ascribed to different resources — such that any information created during a project is findable and usable long after the research is over.

The complex truth about trust in science
“It’s a lot to think about, and I can see why it might seem really daunting for some scientists to consider,” says Amelia Jiménez-Sánchez, a data-integrity researcher at the University of Barcelona in Spain. But FAIR is like cooking, she says: once you have the right ingredients — or familiarize yourself with FAIR practices — it becomes easier to make a meal. “Eventually, it just becomes a part of how you do your work.”
Users have tailored those practices to their disciplines. Carnegie Mellon University in Pittsburgh, Pennsylvania, has released FAIR guides for chemistry, mathematics, neuroscience and psychology. Other initiatives have focused on astronomy, materials science, genetics and single-cell genomics data. For fields without dedicated FAIR resources, researchers in the Netherlands have published ‘ten simple rules’ for kick-starting conversations about FAIR practices3.
Recognizing that discipline-specific resources didn’t exist in his field, Eliu Huerta, a theoretical physicist at the Argonne National Laboratory in Lemont, Illinois, began adapting FAIR principles for high-energy physics. Today, Huerta is part of a collaboration called FAIR4HEP, aimed at helping researchers in the field to improve their data-sharing practices. In 2022, he co-wrote a study evaluating data from the Large Hadron Collider at CERN, Europe’s particle-physics laboratory near Geneva, Switzerland, for its ‘FAIRness’4. Among other things, the study “provides a domain-agnostic, step-by-step set of checks to guide in the process of making a dataset FAIR”, it says — a process the authors call FAIRification. The web-based FAIR Data Self-Assessment Tool from the Australian Research Data Commons, a company building research-data infrastructure in Melbourne, likewise offers “practical tips on how to enhance FAIRness” of your data.
Expanding beyond data
The FAIR guidelines also apply to software. The FAIR-USE4OS guidelines5 extend FAIR principles to open-source software projects, for instance, and initiatives such as FAIR4RS focus on research software6.
“Data are data, but there’s also the entire system of infrastructure that is built around it to store, share and analyse that information, and those tools need to be fair and reproducible too,” says Natalie Cooper, a macroecologist at the Natural History Museum in London.

Six ways to put the public at the heart of science and policy
Last year, Cooper edited a guide to reproducible code on behalf of the British Ecological Society that is rooted in FAIR principles. The code and data share many features, so a lot of the recommendations remain the same. But something she has found most helpful in her own work is code review, which Cooper now does before submitting anything for publication. During the review, colleagues exchange protocols, test them for reproducibility and suggest ways to improve efficiency. “You just make comments to each other, and hopefully you can improve each other’s code,” Cooper says. “It can be a really positive experience.”
Neil Chue Hong, founding director of the Software Sustainability Institute at the University of Edinburgh, UK, helped to establish the FAIR4RS principles. Hong says that, over the past few decades, increased reliance on software is among the largest changes to data science , such that almost every branch of research now uses software in some way. As a result, the institute advocates for the fundamental importance of providing scientists with training on best practices when using research software. “It’s now very hard to analyse or visualize data without software, and at the same time, it’s very hard for software to exist without high-quality data,” he says.
Just as data should come with a metadata or README file that contains information about the data set itself, software and algorithms should also have good documentation, including which version a person used. That’s especially true for artificial-intelligence research. For instance, HuggingFace, a model-sharing service based in New York City, encourages researchers to create ‘model cards’ that provide key information about AI tools, including their intended use, performance metrics, training data and limitations.
