Loading...
report
report
Link Copied!

Table of Contents

  • 1. Summary
  • 2. Introduction
  • 3. AI for Pathogen Detection
  • 4. AI for Environmental Forecasting
  • 5. AI for Metascience
  • 6. The UKRI Dark Data Prize
  • 7. Acknowledgements
  • 8. Authors
  • 1. Summary
  • 2. Introduction
  • 3. AI for Pathogen Detection
  • 4. AI for Environmental Forecasting
  • 5. AI for Metascience
  • 6. The UKRI Dark Data Prize
  • 7. Acknowledgements
  • 8. Authors

Summary

  • The AI for Science Strategy has launched an AI for Science Mission focused on drug discovery, and the government has committed to selecting further missions. DSIT and UKRI should consider three additional options:
    • AI for Pathogen Detection. Building on existing strengths in the public sector and private sector, this mission would align with the UK Biological Security Strategy by using AI for real-time pathogen surveillance to protect the NHS.
    • AI for Environmental Forecasting. Building on research programmes across UK universities, this mission would use AI to deliver faster, cheaper and higher-resolution forecasts. Improved forecasting could reduce grid balancing costs associated with wind and solar power while supporting construction productivity and farm profitability.
    • AI for Metascience. Leveraging Britain’s position as a global leader in metascience, this mission would complement the existing AI Metascience Fellowship by supporting the development of AI tools to improve the quality, integrity and efficiency of research.
  • These missions focus on areas where the UK already has strong capabilities and where progress would support wider national priorities. Each combines clear opportunities for AI-driven scientific progress with wider societal benefits and a strong case for government action.
  • The AI for Science Strategy commits to launching pilot programmes for collecting 'dark data' – an umbrella term which includes null results, data from failed experiments, and wider data that is not publicly shared. To support this objective, DSIT and UKRI should establish a UKRI Dark Data Prize, an academic recognition prize for researchers who publish scientifically valuable dark data.
  • Together, these initiatives would help the UK build frontier capability in AI-driven science while strengthening its position as a global scientific leader.

Introduction

Walk down Broadwick Street in Soho today and you might notice a black water hand pump commemorating John Snow, a physician and pioneer of epidemiology. In 1854, he identified a pump in that very location as the source of a devastating cholera outbreak. Snow also used statistical evidence to demonstrate that households supplied by sewage-contaminated water from the Southwark and Vauxhall company were more likely to die from cholera than those served by the cleaner Lambeth company. This helped him challenge the dominant “miasma” (bad air) theory regarding the spread of cholera, and to correctly propose that cholera spreads through faeco-oral transmission.

While Snow’s story is well known among doctors, the statistical work by William Farr which enabled it is less so. Snow’s evidence relied on data from the General Register Office (GRO), which recorded births, marriages and deaths. When Farr joined the GRO in 1839, he brought with him an explicit agenda to curate data useful for public health. Farr worked to make it routine for the GRO to record causes of death across England and Wales, making it possible to measure mortality rates from specific diseases at the district level. It was through Farr’s efforts to compile this high-quality data that Snow was able to challenge the paradigm of “‘miasma” and pioneer an entirely new scientific field of epidemiology.

Today, as we enter a new era of AI for Science, we have the opportunity to replicate William Farr’s impact in another set of breakthroughs. Just as the GRO created the data infrastructure that made Snow’s work possible, DSIT could now curate the datasets, provide the compute access and build the pipelines that reshape science.  

Background

The AI for Science Strategy sets out an ambitious vision for how the UK can use artificial intelligence (AI) to strengthen its position as a global leader in scientific research. The strategy argues that, if the UK moves quickly, it can retain and expand its scientific leadership while supporting the creation of new companies and economic growth.

Achieving this requires action across three key pillars: data (making high-quality datasets available and improving access to existing ones), compute (expanding compute capacity and ensuring researchers can access it) and people and culture (building a strong UK ecosystem of AI for Science talent).

To deliver on this vision, the government has launched AI for Science Mission One - “We will accelerate drug discovery to develop trial-ready drugs within 100 days by 2030 and contribute to deploying new treatments faster.”. Action 15 of the strategy commits the government to selecting additional AI for Science missions.

With scarce resources, it is not possible for the government to focus on every discipline. The AI for Science Strategy rightly outlines a focus on areas where there is existing UK strength, alignment with wider UK strategy and opportunities for AI-driven progress.

Mission selection should also consider where government intervention is most valuable. This includes areas where the government is well placed to curate datasets needed to train AI models, or where AI could help produce public goods.

This report sets out options for three further missions, focused on:

  • AI for Pathogen Detection
  • AI for Environmental Forecasting and 
  • AI for Metascience.

The report also proposes establishing a UKRI Dark Data Prize - an academic recognition prize for researchers who publish scientific valuable papers showing null results. This is linked to Action 5 of the AI for Science Strategy, which commits to pilot programmes to collect “dark data”, an umbrella term including null results, data from failed experiments and other data that is not publicly shared.

The mission on metascience and the Dark Data Prize support the strategy’s first objective: developing frontier capability in AI-driven science. The missions on pathogen detection and environmental forecasting support the strategy’s second objective: ensuring the UK retains its position as a global scientific leader.

AI for Pathogen Detection

The Case for a mission

The COVID-19 pandemic caused an estimated £250 billion in economic losses in the UK, contributed to the cost-of-living crisis, led to around 200,000 deaths and significantly increased NHS elective waiting lists. The UK cannot afford a similar shock again. Meanwhile, seasonal pathogens, including influenza, norovirus and seasonal coronaviruses, continue to place heavy pressure on the NHS each winter. Early detection is one of the most effective ways to reduce the health and economic costs of infectious disease outbreaks. Detecting pathogens earlier enables faster public health responses, reduces transmission and limits disruption to the health system and wider economy.

Improved pathogen detection could help identify outbreaks of influenza, respiratory syncytial virus (RSV), norovirus and seasonal coronaviruses earlier, allowing more effective containment and response. This could reduce the cycle of winter pressures that have become a persistent feature of NHS operations. Enhanced surveillance would also strengthen UKHSA’s ability to monitor the spread of diseases such as HIV and tuberculosis, supporting the goals of the UK’s national action plans in these areas.

The benefits would extend beyond public health. Earlier detection of infectious diseases affecting livestock and crops could reduce economic losses in agriculture. During the 2001 foot-and-mouth disease outbreak, infection spread for three weeks before detection, ultimately costing the public sector £3 billion and the private sector £5 billion. Earlier detection could significantly reduce the scale of such losses.

Advances in genomic sequencing and epidemiological surveillance have already improved the ability to detect emerging pathogens. However, with the scale and complexity of modern biological data, artificial intelligence offers an opportunity to transform pathogen surveillance by enabling the continuous analysis of genomic, epidemiological and environmental data at unprecedented speed and scale.

Because the benefits of early detection are widely shared across society, pathogen surveillance has the characteristics of a public good. Private firms may struggle to capture sufficient returns from investment in early-warning systems, creating a clear role for government leadership.

A national mission focused on AI for pathogen detection could therefore significantly strengthen the UK’s ability to detect and respond to emerging biological threats.

Why the UK is well positioned

The UK already has strong capabilities in pathogen detection across the public, academic and private sectors. These provide a strong foundation for developing AI-enabled pathogen surveillance systems. 

The UK Health Security Agency (UKHSA) runs the metagenomics Surveillance Collaboration and Analysis Programme (mSCAPE), a world-first initiative aiming to use metagenomics to identify known and previously unknown pathogens. The Wellcome Sanger Institute hosts the Genomic Surveillance Unit, which works with international partners to detect emerging infectious disease threats. In the private sector, Oxford Nanopore Technologies, one of the UK’s most successful life sciences spin-outs, develops sequencing technologies that are central to modern pathogen surveillance and is working with partners on early-warning systems for pandemics.

These capabilities provide a strong foundation for an AI-enabled pathogen detection system. However, the UK likely lags behind the United States and China in the scale of investment in genomic surveillance and in the maturity of programmes such as wastewater sequencing. A focused mission could help close this gap.

The Opportunity for AI

AI tools can analyse genomic data at a scale that is impossible for human researchers alone and can integrate information from multiple sources, including genomic sequences, epidemiological patterns and travel data.

It also creates the possibility of continuous pathogen monitoring, where emerging threats are detected automatically through the analysis of large and rapidly updating datasets. AI tools could help identify unusual genetic signatures, detect anomalies in disease patterns and flag potential outbreaks earlier than existing surveillance systems.

A mission on AI-enabled pathogen detection would directly support the “Detect” pillar of the UK Biological Security Strategy. The 2025 Biological Security Strategy Implementation Report identifies the exploration of AI-driven analytics as a priority for improving pathogen surveillance.

Funding mechanisms already exist to support this work. The Integrated Security Fund, which addresses the UK’s highest-priority national security risks, has allocated £15 million to biosecurity for 2025–26. A national AI pathogen detection mission could build on this funding to scale the UK’s early-warning capabilities.

Example mission target

"We will harness AI to identify both known and previously unknown pathogens within 7 days of outbreak emergence in 100% of outbreaks by 2030, to protect the NHS against infectious disease outbreaks, epidemics and pandemics."

A target of detecting an outbreak within seven days of emergence would align with the aims of the 7-1-7 alliance, which has set a target for nations to be able to detect a new outbreak within 7 days of emergence, notify public health authorities within 1 more day, and launch a response within the next 7 days. 

The 7-1-7 target was first put forward by Thomas Frieden, former Director of the US Centers for Disease Control (CDC), and has been adopted by the World Health Organisation. Adopting this target would position the UK as a global leader in epidemic preparedness.

Delivering the Mission 

Achieving this objective would require scaling and integrating existing surveillance programmes into a more comprehensive early-warning system, building on proposals such as the Surveillance Observatory for Nucleic Acid Recognition (SONAR).

Government action would be required across data, compute, and people/culture:

Data

Secure pathways should be developed to allow vetted researchers access to relevant government datasets for developing AI-enabled pathogen detection tools. Where possible, datasets should be made available in modern, cloud-native formats.

This could include data from:

The PATH-SAFE programme (historical data)

Compute

Access to large-scale compute remains a constraint for many AI-for-science projects. Priority access to the AI Research Resource (AIRR) should therefore be offered to academic teams, non-profit organisations and companies developing AI-enabled pathogen detection systems, subject to appropriate security vetting.

People and Culture

An AI for pathogen detection placement programme should be established to fund placements for early-career researchers with machine-learning expertise within initiatives such as the Biothreats Radar, mSCAPE and the Animal and Plant Health Agency’s Genomics for Animal and Plant Disease Consortium (GAP-DC). These placements would help build interdisciplinary teams combining expertise in machine learning, genomics and infectious disease epidemiology.

AI for Environmental Forecasting

The Case for a mission

Accurate environmental forecasting is critical for economic resilience and public safety. Weather and environmental conditions affect infrastructure, agriculture, energy systems and the wider economy. As climate change increases the frequency and severity of extreme weather events, improving forecasting capability will become increasingly important. Improved environmental forecasting could also support several major national priorities.

In construction, weather-related disruption is a major constraint on productivity. A 2024 survey found that 70% of UK construction managers experienced weather-related delays in the previous year. Higher-resolution forecasts could support better planning of activities such as concrete pours, groundworks and crane operations, helping reduce delays and supporting the government’s goal of building 1.5 million homes.

In the energy system, improved forecasts could reduce the gap between predicted and actual electricity generation from renewable sources. This would support the National Energy System Operator’s (NESO) objective of reducing balancing costs in the grid. For example, Open Climate Fix’s Quartz Solar model, now used in NESO’s control room, applies machine learning to improve short-term solar forecasts and has delivered approximately £30 million per year in savings since its integration.

Improved forecasting would also benefit agriculture. A 2025 survey of UK farmers found that 74% had experienced financial losses due to extreme weather, while 52% reported uncertainty about crop planning because of unpredictable weather patterns. More accurate forecasts could help farmers make better decisions about planting and harvesting and take steps to protect livestock from extreme weather events.

The UK already benefits from world-leading forecasting institutions, but advances in artificial intelligence offer an opportunity to significantly improve the speed, resolution and cost of environmental prediction systems. A national mission focused on AI for environmental forecasting could enable faster, higher-resolution forecasts while reducing the computational costs associated with traditional physics-based models.

Accurate forecasts benefit the whole economy, but the underlying data and infrastructure are largely held by public institutions. Government action can therefore play a role in enabling the development of AI-driven forecasting systems.

Why the UK Is well positioned

The UK already hosts some of the world’s leading institutions in weather and environmental science.

Britain hosts both the European Centre for Medium-Range Weather Forecasts (ECMWF) and the Met Office, which runs the AI for Numerical Weather Prediction (AI4NWP) programme. Alongside these specialist institutions, there is deep academic expertise in AI and environmental science at universities including Leeds, Bristol, Exeter, Reading, Cambridge and Oxford, as well as a focused programme at the Alan Turing Institute.

In the private sector, several UK-based companies are developing AI tools for environmental prediction. Climate X, a London-based start-up that raised £14 million in 2024, uses environmental data for AI-assisted climate risk modelling. Ötzi, a spinout from the University of Cambridge, is also developing AI systems for predicting environmental risks. Google DeepMind has developed weather forecasting models that achieve state-of-the-art performance.

Together, these capabilities provide a strong foundation for advancing AI-driven environmental forecasting in the UK.

The Opportunity for AI

Machine learning models offer several advantages over traditional physics-based forecasting systems. Frontier models such as Aardvark, developed at the University of Cambridge, can achieve comparable forecasting performance to traditional models while requiring far less computing power. This can make them orders of magnitude faster.

AI models can also support new forecasting approaches. For example, the Met Office’s FastNet model retains high spatial resolution while covering large geographic areas. Meanwhile, models such as DeepMind’s GenCast reduce the computational costs associated with ensemble forecasting, where multiple potential future scenarios are simulated to improve prediction accuracy. These advances suggest that AI could significantly expand the capability of environmental forecasting systems.

Example mission target

"We will harness AI to improve the quality of environmental forecasting, ensuring that warnings regarding extreme weather at a spatial resolution of 1.5km can be shared 15 days in advance by 2030."

Currently, the Met Office provides extreme weather warnings up to 7 days in advance. An existing Met Office model achieves a spatial resolution of 1.5km with a forecast length of 5 days. Deepmind’s latest model can forecast cyclones up to 15 days in advance but has a spatial resolution of 28km. The proposed target would combine the Met Office’s current frontier of spatial resolution with DeepMind’s current frontier of forecast length.

Delivering the Mission

Government action would be required across data, compute and people and culture:

Data

Historical environmental datasets are critical for training machine learning forecasting models. With this in mind, government should:

  • Ensure that open data shared by the Met Office is available in modern cloud-native formats such as Zarr to enable easier use in machine learning systems.
  • Provide targeted funding to the Centre for Environmental Data Analysis (CEDA) to convert historical environmental archives into formats suitable for machine learning, such as Kerchunk or Zarr. CEDA holds environmental records dating back to 1853.
  • Review and aim to reduce the three-to-six hour delay in sharing data from the Met Office’s UKV 2 km model, which currently limits the ability of external organisations to develop AI-assisted nowcasting systems.

Compute

Access to large-scale compute remains a constraint for many AI-for-science projects. Priority access to the AI Research Resource (AIRR) should therefore be provided to academic teams, non-profit organisations and companies developing AI-driven environmental forecasting models. Access could be tied to the development of open-weight models that can be adapted to different forecasting contexts.

People and culture

Where appropriate, government should set out funding calls to support large-scale, team-based science across research groups working on AI for Environmental Forecasting.

AI for Metascience

The Case for a mission 

The UK is emerging as a global leader in metascience, the study of how science itself is conducted and how it can be improved. The creation of the UK Metascience Unit and growing investment in this field have positioned Britain at the forefront of efforts to improve the quality, integrity and efficiency of research.

Artificial intelligence offers powerful tools for accelerating this work. AI systems can analyse large volumes of scientific literature, identify methodological weaknesses and automate parts of the research process. Used well, these tools could improve the reliability and productivity of scientific research.

However, AI also introduces new risks for the scientific ecosystem. Many AI tools used in research are developed by private companies and remain closed-source, limiting transparency and access. At the same time, AI systems can amplify existing problems in the research system, including poor reproducibility, publication bias and the growing volume of low-quality research. A national mission focused on AI for metascience could help ensure that artificial intelligence strengthens rather than undermines the integrity and productivity of scientific research.

Improving the integrity and productivity of research could also deliver significant economic benefits. The UK government plans to invest £86 billion in research and development. Improving the quality and reliability of research would increase the return on this investment and support economic growth.

Poor reproducibility has previously slowed innovation in areas such as biomedical research. For example, weak research foundations have impeded the commercialisation of biomarker tests and slowed the development of cancer treatments. By improving research reliability, AI for metascience tools could help increase the success rate of scientific translation and support the creation of new companies.

It is difficult to predict the areas of science which will deliver the greatest societal benefits in the future. AI for metascience tools offer the benefit of improving science across a wide range of disciplines, and could have cross-cutting benefits across AI for Science missions. For example, improved research synthesis and replication could accelerate progress in areas such as drug discovery, pathogen detection and environmental forecasting.

Challenges in the research system

Some evidence suggests that scientific research has become less disruptive over time, with fewer discoveries fundamentally reshaping fields. Metascience (the study of how science itself is conducted) has identified several structural challenges that may limit the integrity, novelty and efficiency of research. Artificial intelligence interacts with many of these challenges. In some cases, it could help address them; in others, it risks amplifying existing problems.

Key issues include:

  • The replication crisis: Many scientific findings cannot be reproduced by independent researchers. Researchers at pharmaceutical companies Bayer and Amgen have reported being unable to replicate 75% and 89% of studies respectively in some areas of biomedical research. In machine learning research, problems such as data leakage (where testing data inadvertently enters training data) can undermine reproducibility.
  • Publication bias and the “file drawer problem”: Academic incentives often reward positive results, meaning studies with null results are less likely to be published. Furthermore, machine learning models often demonstrate inconsistent performance, and researchers are prone to selectively reporting the best performances across test runs. This distorts the scientific record and reduces the availability of “dark data”, which is particularly important for training AI models used in scientific discovery. 
  • Paper mills and low-quality research: Businesses that sell authorship on low-quality papers have begun using AI tools to produce large volumes of unoriginal research using publicly available datasets. This risks flooding the scientific literature with unreliable work.
  • Opacity in AI-assisted research: AI-assisted analysis may be difficult to interpret, making scientific errors harder to detect. Researchers may also rely on models that produce accurate predictions without correctly explaining underlying phenomena ( sometimes described as the prediction–explanation fallacy).
  • Potential effects on research behaviour: Emerging evidence suggests that AI tools influence researchers to focus on areas where large datasets already exist, potentially reducing the originality of research.

These challenges suggest that AI could either improve or weaken the research ecosystem, depending on how it is deployed. A mission focused on AI for metascience would aim to ensure that AI strengthens, rather than undermines, the integrity and productivity of scientific research.

Why the UK is well positioned

The UK already hosts many of the institutions driving progress in metascience. The Cochrane Collaboration was founded in 1993, conducting systematic reviews to support evidence-based healthcare. The UK Reproducibility Network launched in 2019 as a network of local groups focused on promoting academic rigour at UK universities. The same year, the Research on Research Institute was launched, conducting applied metascience research and supporting science policymakers globally. The UK Metascience Unit, founded in 2024, is applying the tools of science to study the practice of science itself. The University of Cambridge already hosts an AI for Metascience programme, which includes evaluation of LLMs for peer review. 

These capabilities provide a strong foundation for a national mission focused on developing AI tools to improve scientific research.

The Opportunity for AI

Artificial intelligence offers powerful tools for improving the functioning of the research system. For example:

  • AI-assisted literature synthesis: Large language models can analyse large volumes of research papers rapidly, which may eventually enable continuously updated systematic reviews that synthesise evidence across large bodies of literature. Tools such as otto-SR, Elicit and pitts.ai show progress towards this capability.
  • Automated replication of research: Because large language models perform well on coding tasks, they may eventually enable automated replication of studies where data and code are publicly available. A benchmark measuring progress towards this goal has already been established.
  • AI-assisted peer review and research evaluation: AI tools could reduce the burden of bureaucracy, giving researchers more time to focus on research, as noted by the REF-AI project. LLM-assisted peer review tools similar to Refine.ink could eventually support faster assessment of grant applications or research papers submitted to journals. 
  • Detection of errors and fraud: Tools like Sleuth AI could eventually automate "scientific sleuthing", supporting the research community to self-regulate by identifying errors and fraud more quickly. In disciplines which use reporting guidelines, AI tools could eventually automate compliance checks.
  • Improved statistical robustness: By performing data analysis at speed and scale, AI tools could make multiverse analysis (where all reasonable analytical approaches are executed to test the robustness of results) faster and cheaper. 

A growing ecosystem of tools targeting metascientific problems is already emerging:

  • RegCheck explores whether a research paper has deviated from its preregistration. 
  • penelope.ai reformats research papers to match the requirements of specific academic journals, which can vary greatly. 
  • The Robyn Dawes Institute is developing AI tools to evaluate scientific evidence. 

Adoption of tools like this remains at an early stage. A national mission focused on AI for metascience could accelerate the development and diffusion of various tools that would improve the functioning of the research system, in particular by supporting “open-weight” tools which may be cheaper for smaller academic institutions to adopt, ensuring that they are not left behind.

Example mission target

"We will advance AI for Metascience tools, so that 30% of outputs in Main Panels A, B and C will be rated as world-leading in originality, significance and rigour in the Research Excellence Framework 2029."

In REF 2021, close to 20% of outputs in these panels were rated as world-leading. The proposed target would therefore represent a substantial improvement in research quality.

The Research Excellence Framework groups academic disciplines into four Main Panels. Panels A (Medicine, Health and Life Sciences), B (Physical Sciences, Engineering and Mathematics) and C (Social Sciences) are the areas where metascience tools are most likely to influence research methods and evaluation. Panel D (Arts and Humanities) is therefore excluded from this target.

While the subjectivity inherent in research assessment creates some risk of gaming, the independence of Research England (which oversees the REF) from DSIT, the independence of REF evaluation panels from DSIT and the significant ambition of the target would reduce the actual risk this poses to the underlying mission of driving improvements in research quality.

Delivering the mission

To accelerate the development of AI for metascience tools, DSIT and UKRI should launch a programme of innovation prizes and follow-on funding. The programme would aim to identify promising prototypes, support their development and accelerate their adoption across the research ecosystem. Government should also take further action in data, compute and people/culture.

Innovation prizes for AI for metascience tools

Government should begin by issuing a “Request for Products” outlining priority areas where AI tools could improve the research system. This process could draw on expertise from: the UK Metascience Unit, AI Metascience Fellows, the Metascience Alliance, the Incubator for Artificial Intelligence (i.AI), and experts in the priority areas of the AI for Science Strategy. The request could be modelled on Y Combinator’s “Request for Startups”, which identifies promising areas for innovation. 

Following this, innovation inducement prizes should be launched to support the development of open-weight prototypes that meet these criteria. Prize competitions are particularly effective where the ideal inventor is unknown, drawing in non-traditional innovators is desirable, up-front R&D costs are low and products are close to technology readiness level (TRL) 3.

Winning teams should receive access to compute through the AI Research Resource (AIRR) to support further development.

The most promising prototypes should then be supported through contracts for innovation, providing funding and continued compute access. These contracts should require temporary open-weight access for one year to accelerate adoption across the research sector while preserving long-term commercial incentives.

Data

Government should build on the 2025 UKRI Data Sandpit for Metascience by enabling secure access to data that could support the development of AI tools for research evaluation and synthesis. Relevant data may include UKRI grant application data and Research Excellence Framework data.

UKRI should also consider reforms to its open access policy to encourage greater transparency in scientific publishing. For example, adopting Plan U, which mandates that research is published as preprints, could help shift the system towards more open peer review, which could support the training of AI tools.

Compute

Innovation prize winners should receive priority access to compute via the AI Research Resource (AIRR) to support the development and testing of AI for metascience tools.

People and Culture

Government should also support a culture of experimentation with AI tools within the research system. This could include organising AI for Metascience hackathons in partnership with leading institutions in priority areas of the AI for Science Strategy, such as:

  • The Wellcome Sanger Institute (for engineering biology),
  • The UK Atomic Energy Authority (UKAEA) (for fusion energy),
  • The Henry Royce Institute (for materials science),
  • The Academy for Medical Sciences (for medical research) and
  • The National Quantum Computing Centre (NQCC) (for quantum technologies).

The UKRI Dark Data Prize

Action 5 of the AI for Science Strategy commits to launching pilot programmes for collecting dark data. To deliver on this commitment, DSIT and UKRI should work in partnership to establish the "UKRI Dark Data Prize", an academic recognition prize which rewards researchers who publish papers which find null results but demonstrate significant scientific value.

The prize should be funded by a consortium of private AI for science companies, leveraging their commercial interest in dark data. DSIT should use its convening power to bring these companies together and resolve demand fragmentation.

Academia runs on a prestige economy which currently does not reward individuals for sharing dark data. By increasing the potential prestige associated with publishing null results, a prize could incentivise the sharing of this data, accelerating AI for Science while also providing broader scientific benefits.

Why dark data matters

“Dark data” refers to scientifically valuable data that is not publicly shared, including null results, data from failed experiments, and incomplete data or abandoned research projects.

If made available, this type of data could be incredibly valuable. First, it would help researchers avoid pursuing scientific dead ends, reducing duplication of effort. Second, it would improve the accuracy of secondary research, including systematic reviews and meta-analyses. This could be particularly valuable in healthcare, where secondary research informs NHS clinical guidelines and ultimately affects patient outcomes.

Third, dark data is increasingly important for the development of AI systems for scientific discovery. Many AI-for-science organisations aim to build models capable of generating and evaluating hypotheses or experimental plans. However, models trained only on published research inherit the biases of the academic literature, which tends to overrepresent positive findings. The absence of dark data therefore leads to incomplete and potentially misleading training data.

Proposed Prize Design

Clear eligibility criteria are essential to ensure the prize encourages valuable data sharing while limiting opportunities for gaming. Eligible publications should:

  • be pre-registered, with primary outcomes unchanged after preregistration
  • be published open-access or as pre-prints (to ensure broad access)
  • report null results for primary outcomes
  • have at least one UK-based author
  • be relevant to the priority areas of the AI for Science Strategy (engineering biology, fusion energy, materials science, medical research or quantum technologies)

Entries should be assessed by a panel of scientific experts and representatives from AI-for-science companies coordinated by DSIT and UKRI. Judging criteria should include:

  • completeness of shared data
  • adherence to FAIR principles (Findable, Accessible, Interoperable and Reusable)
  • the potential usefulness of the data for training AI-for-science models
  • the overall scientific quality of the research (aligned to achieving a 4* output rating in the REF)

To maximise prestige and visibility, the prize should be delivered in partnership with leading scientific organisations, including: the Royal Society, the Wellcome Sanger Institute, the UKAEA, the Henry Royce Institute, the Academy for Medical Sciences and the NQCC. 

The prize could be awarded annually, with one winner in each of the five priority areas of the AI for Science Strategy. Because recognition prizes are primarily driven by prestige rather than financial value, the monetary awards could remain modest: £50,000 for each category winner and £10,000 for runners-up. 

Acknowledgements

Thank you to Julia Willemyns, Alys Key, Lauren Gilbert, Charlie Harris, Pranay Shah, Cassidy Nelson, Richard Moulange, Simon Grimm and Ben Nelmes for their valuable advice on this report.

Contact Us

For more information about our initiative, partnerships, or support, get in touch with us at:

[email protected]
Contact Us

For more information about our initiative, partnerships, or support, get in touch with us at:

[email protected]