OpenBind’s first data and model release marks a milestone for AI enabled drug discovery

The UK‑led OpenBind initiative has reached a major milestone with the announcement of the release of its first publicly available dataset and predictive AI model, a groundbreaking step toward accelerating the discovery of new medicines using artificial intelligence. The release showcases how engineering the production of AI-ready data is not only feasible but essential to evolving AI tools for scientific fields, which all suffer from a lack of data. With this OpenBind release, both high‑quality, standardised experimental data, and a newly trained predictive model, OpenBind v1, will become freely accessible to researchers worldwide, for immediate use in therapeutic discovery and to drive the next generation of AI models. 

While AI has introduced a step‑change in predictive accuracy for protein structures, its impact on drug discovery has remained muted, limited above all by the global shortage of reliable experimental data measuring in atomic detail how molecules of drug discovery bind to disease‑related proteins. OpenBind aims to fill this critical gap. Led by Diamond Light Source, the collaboration of structural biologists and AI specialists – supported in its foundation phase by the Department for Science, Innovation and Technology (DSIT) – is the first initiative to generate these essential datasets at industrial scale, openly and continuously, and designed specifically for AI.

This first release demonstrates that OpenBind’s pipeline is now operational, having generated 800 high-quality measurements in only seven months – in the past, such large datasets took years to be produced and released. This integrated operation combines automated chemistry, robust binding measurements and high throughput crystallography at Diamond’s XChem Fragment Screening facility with an engineered data release process and AI model training using UK’s Isambard-AI compute cluster. It lays the groundwork for transformative progress in drug discovery, with future data tranches planned to address global‑health challenges such as COVID‑19, malaria, dengue, Zika, and cancer, where rapid development of new treatments remains vital.

Read more on the Diamond website

Image credit: Stuart March – DNDi

Diamond will host a pioneering AI-driven drug discovery consortium

Diamond will be the base for OpenBind, an AI-driven drug discovery centre which will make the UK a world-leader in drug innovation and advancement.

With its unparalleled XChem facilities, Diamond will be a global hub for AI-driven drug discovery. This will lead to the prospect of tackling previously untreatable diseases and dramatically reducing the cost of drug discovery and development. The project is backed by up to £8 million of investment from DSIT’s newly established Sovereign AI unit, a key driver in the government’s AI Opportunities Action Plan.

The consortium will close critical data gaps by using new AI models to find potential new drugs and help create better treatments for diseases. It will also help scientists use engineering biology to solve bigger problems, like making enzymes that can break down plastic waste.

The main aim is to create the world’s largest collection of data on how drugs interact with proteins, the building blocks of the body. Using automated chemistry and high-throughput X-ray crystallography, the consortium will generate more than 500,000 protein-ligand structures over a period of five years. This is twenty times greater than anything collected in the last 50 years.

OpenBind will offer a core dataset that will drive progress across scientific and technological areas, including predicting molecular structures, designing new molecules and improving research workflows. It will work in tandem with other new methods in order to reduce trial-and-error experimentation, guide better decision-making, and support more efficient exploration of chemical possibilities.

At Diamond Light Source, a joint venture between the UK government through STFC and the Wellcome Trust, we are proud to be at the forefront of the UK’s ambition to lead the world in AI-driven drug discovery. OpenBind represents an exciting step forward in harnessing our unique capabilities to generate the high-quality data that AI needs to revolutionise healthcare, helping to cement the UK’s position as a global hub for bioscience innovation.

Professor Gianluigi Botton, CEO of Diamond Light Source

The consortium will be led by some of the world’s leading scientific minds including Professor Frank von Delft, principal scientist of the macromolecular crystallography I04-1 beamline and the XChem facility at Diamond, as well as the University of Oxford’s Professor Charlotte Deane and Nobel laureate David Baker, head of the Institute for Protein Design at the University of Washington.

Read more on Diamond website

Image: Professor Frank von Delft, Diamond’s principal scientist of the MX I04-1 beamline and the XChem facility

New strategy for targeting cancer-causing protein previously considered “undruggable”

A cancer-causing protein long thought to be resistant to medication could soon be the target of new drugs, thanks to the work of Quebec researchers who used synchrotron light to find and exploit its weak spot.

Dr. Steven LaPlante, a professor at Quebec’s Institut National de la Recherche Scientifique (INRS), and his team studied a type of protein called Ras, “which is highly related to a good percentage of the cancers that are out there,” especially those of the head, neck and urinary tract. Ras proteins act as a molecular “switch,” flipping between active and inactive modes; they play a critical role in cell signaling and growth regulation and are often mutated in cancers. Major pharmaceutical companies have studied Ras for years, trying to develop new medications, says LaPlante, but have only recently begun to make some breakthroughs.

LaPlante, who worked in the a pharmaceutical industry before joining INRS, said he wanted to take a new approach to the problem, “to start everything from scratch, like making a nice cake – you start from scratch and when you do that, you really have control over how to optimize every segment (of the process) and make a really good cake.”

Using the Canadian Light Source (CLS) at the University of Saskatchewan, LaPlante and his team gathered atomic-level, 3D information about the protein; they discovered a “pocket” in it that appears to be an ideal target for molecular drug treatment. But, he added, it is “a cryptic pocket – it’s there sometimes and not there other times,” depending on the state of the protein.

The researchers found that, when the Ras protein is in its mutated, cancer-causing state, “molecules snuggle inside the pocket.” “Using crystallography, we were able to look at the mutant proteins to better understand what their structures are,” says LaPlante. Their work was recently published in the journal ACS Omega.

Read more on CLS website

Breaking boundaries in biomedicine: APS enables protein design

From growth hormones to cancer drugs, small molecules play a crucial role in our health. Monitoring them is essential to keeping us healthy; it enables physicians to calculate dosages and patients to monitor their medical conditions at home, for example.

Monitoring small molecules depends on sensing where they are, and in what concentrations. While scientists have developed sensors to detect some small molecules, these sensors are used primarily in research and drug discovery and can only detect a limited range of molecules with particular qualities. There is a compelling need for sensors that can detect and signal the presence of diverse small molecules of different shapes, sizes, flexibility and polarity. 

Using artificial intelligence (AI), a team of scientists led by Nobel Prize winner David Baker at the University of Washington has created a computational method for generating proteins that bind and signal a wide range of small molecules with great effectiveness. Baker won the 2024 Nobel Prize in Chemistry for computational protein design.

The research described here, published in Science and conducted in part at the Advanced Photon Source (APS), exemplifies that approach. The APS is a U.S. Department of Energy (DOE) Office of Science user facility at DOE’s Argonne National Laboratory.

The sensor design problem

Creating a protein sensor for small molecules is very difficult. The protein must first bind to the small molecule, then signal its presence. 

The team solved both problems with modular design strategies. Their AI-generated proteins consist of identical repeating subunits surrounding a central cavity. The cavity holds a pocket where the small molecule binds.

The subunits, being modular, are easily disassembled. In this way, the small molecule binding proteins can be treated like Lego blocks and be connected to well-established signaling proteins (such as split green fluorescent protein, or GFP), to make a full sensing protein device. When a small molecule binds in the pocket, the subunits reassemble, which leads to the signaling module sending a signal that the small molecule is present.

First step: Binding

The team chose a diverse spectrum of ligands (molecules that bind to protein receptors to send signals between cells), including cholic acid, a biomarker for liver disease; methotrexate, a cancer drug, which requires regular monitoring; thyroxine, a human hormone that indicates thyroid conditions; and a cyclic peptide.

The scientists constructed a machine learning algorithm based on AlphaFold2 (a protein structure predictor whose developers, John Jumper and Demis Hassabis, shared the Nobel Prize in Chemistry with Baker) and other machine learning protein design algorithms to generate thousands of proteins to bind the small molecules.

After computational design, the team tested the designed proteins in the laboratory and identified binders to particular ligands, following computational design and using machine learning methods to choose the best designs for experimental tests.

To confirm the accuracy of their design approach, the Baker team turned to the APS. They used the ultrabright X-ray beams to collect data on the atomic structure of the binding proteins. Using the Northeastern Collaborative Access Team (NE-CAT) beamlines at 24-ID at the APS, the team determined the structures of crystals formed from one of the designed proteins. 

“Prediction algorithms are excellent tools, but without verification of the structures, there’s no proof that the predictions match reality,” said Kay Perry of Cornell University, staff scientist at NE-CAT. ​“X-ray crystallography remains one of the best ways to make that confirmation, and the team was able to do so in this case.”

Second step: Signaling

The next challenge was turning the binding proteins into signaling proteins. The scientists took advantage of their modularity to create two different types of signaling events. 

The team built ligand-induced dimerization proteins from the binders. Linna An, the first author of this study, said the technology can be used in many health-related applications, such as regulating the release of drugs in cancer therapies.

In a different type of signaling event, the scientists fused the binding proteins to a newly designed nanopore, a protein creating a channel allowing ion flow. The fused unit was constructed in such a way that when a small molecule blocked the binding pocket, the whole nanopore was blocked, preventing the flow of ions and loss of current. Loss of current signaled the presence of the small molecule. 

Read more on APS website

Image: The crystal structure of CHD_r1 (gray) is very similar to the computational design model (colored).

Credit: Linna An, et al., Science.

Room-temperature serial crystallography experiments with microsecond pulsed beams

Scientists can now scan thousands of protein crystals at room temperature using X-ray microsecond pulses at the ESRF’s serial crystallography beamline, ID29. This capability is of utmost importance for time-resolved studies and drug discovery research at physiological conditions. The results are published in Communications Chemistry.

Studying macromolecular complexes at room temperature has always been challenging because of X-ray damage to the biological samples. Usually this is mitigated by collecting diffraction data at cryogenic conditions, but under these conditions functional dynamics are hindered.

Serial crystallography can provide an alternative way to collect data at physiological conditions with limited X-ray damage andto visualise functional dynamics that become untrapped. Serial femtosecond crystallography at X-ray free electron lasers (XFELs) allow scientists to decode macromolecular structures by acquiring data of tiny protein crystals at room temperature, outrunning the damage thanks to the extremely short pulses on the femtosecond. The transfer of the same technology to 3rd generation synchrotrons has been often limited to longer exposure time, flux and spatial resolution.

At the ESRF, thanks to the Extremely Brilliant Source, the ID29 beamline today has a flux density of ( > 1014 ph/s/µm2), three times higher than 3rd generation synchrotron sources. With this, scientists can deliver X-rays in very short pulses, on the microsecond time resolution, and at a very high repetition rate for macromolecular structure determination at room temperature.

Combined with a slightly polychromatic beam, this allows to measure complete reflections and ultimately accurate structure factor from thousands of microcrystals, even from low redundant datasets. This combination minimizes the sample consumption down to only a few microliters of crystal slurry, in contrast to larger amounts that are frequently needed for serial experiments, and allows complete data to be collected in the fraction of the time.

“Our beamline is the first in the world at a high energy 4th generation synchrotron which is designed to use the high flux density to study macromolecules at room temperature, with a microsecond time resolution”, explains Daniele de Sanctis, scientist in charge of ID29 together with Shibom Basu, EMBL scientist. “The technique, called serial microsecond crystallography (SµX), allows researchers to use less sample to achieve comprehensive structural detail of proteins under physiological conditions and also to visualise molecular movies in action on this time domain. Our work initiates a new future of time-resolved serial microsecond crystallography experiments at 4th generation storage rings, that will ultimately complement X-ray free electron laser (XFEL) experiments.”

A versatile sample environment

One specificity of serial crystallography is the set-up. How do scientists deliver a slurry of hundreds to thousands of microcrystals to the beam? This is a constantly evolving field and ID29 can accommodate different kinds of sample delivery methods with its flexible setup. The researchers applied the unique beam of ID29 to different sample delivery methods: fixed target (foils and chips) and  three different types of high viscosity extruders demonstrating how structures obtained do not present any evident sign of radiation damage. The data quality obtained allows to unambiguously identify the electron density map of ligated molecules.

Read more on ESRF website

Image: Daniele De Sanctis, scientist in charge of the ESRF, and Shibom Basu, from the EMBL, on the beamline.

Credit: S. Candé.

Cancer Research Horizons and Diamond Light Source establish drug discovery partnership

Cancer Research Horizons, the innovation arm of Cancer Research UK, is partnering with Diamond Light Source, the UK’s national synchrotron, to build a world-leading fragment-based drug discovery programme

Diamond Light Source accelerates electrons to near light speed, producing bright light that is directed into research instruments known as beamlines. Cancer Research Horizons and its drug discovery site at Newcastle University have already been using Diamond’s beamlines and XChem facility for fragment-based screening, a powerful approach to identify chemical entities that can be developed rapidly into potent candidates.

The new partnership will build on this existing relationship to improve the throughput, running and analysis of these experiments. By leveraging their combined expertise and resources, the partnership aims to accelerate the drug discovery process and help bring new cancer treatments to patients faster.

Under the agreement, Cancer Research Horizons will fund two on-site postdoctoral research assistants dedicated to optimising the delivery of its in-house and industry-partnered projects. In return, Diamond will provide early access to any proprietary developments to its platform.

The partnership will establish a governance framework to enable Cancer Research Horizons to provide feedback on the industrialisation of Diamond’s Fragment Screening platform. This initiative aims to enhance its appeal to Cancer Research Horizons’ pharmaceutical and biotech partners, driving broader industry engagement.

Read more on Diamond website