The Long Read: The AI revolution

For what was once a purely technical subject, machine learning has hardly been out of the news. Beginning in late 2022, the world has had to come to terms with the impact of a number of groundbreaking, generative artificial-intelligence (AI) models – notably the ChatGPT chatbot by the US company OpenAI, and text-to-image systems such as Midjourney, developed by the US company of the same name. Everyday conversations cannot avoid the debate over whether we are living amid a fantastic new industrial revolution – or the end of civilisation as we know it.

All this popular controversy can detract from a quieter – but no less important – machine-learning evolution taking place in the scientific realm. Arguably this began in the 1990s, with greater computing power and the development of so-called neural networks, which attempt to mimic the wiring of the brain, and which helped to popularise AI as an overarching term for machines that ape human thinking. The real acceleration, however, has taken place in the past decade or so, thanks to the storage and processing of “big data”, and experiments with layered neural networks – what has come to be called deep learning.

Of this revolution, synchrotron users – who are among the world’s largest producers of scientific data – stand to be great beneficiaries. Machine learning has the potential to streamline experiments, reduce data volumes, speed up data analysis and obtain results that would otherwise be beyond human insight. “We’ve been amazed in many ways by the results we could produce,” says Linus Pithan, a materials and data scientist based at the German synchrotron DESY, who ran an autonomous crystal-growth experiment at the ESRF’s ID10 beamline with colleagues last year. “The quality of the online data analysis was astonishing.”

Formerly a member of the ESRF’s Beamline Control Unit where he helped develop the new BLISS beamline control system, Pithan is well placed to test the potential of machine learning in synchrotron science. The flexibility of BLISS was necessary for him and his colleagues to integrate their own deep-learning algorithm, which they had trained beforehand to reconstruct scattering-length density (SLD) profiles from the X-ray reflectivity of molecular thin films. Unlike the forwards operation – calculating a reflectivity curve from an SLD profile – this inverse problem can be painfully tedious to solve even for an experienced analyst: the data are inherently ambiguous, because they do not include the phase of the scattered X-rays. Indeed, it is a demanding task for a machine too, which is why at the beamline Pithan’s group made use of an online service known as VISA to harness the ESRF’s central computer system.

The success of the automation was immediately apparent (Figure 1). From the reflectivity measurements, the deep-learning algorithm could output SLD profiles and thin-film properties such as layer thickness and surface roughness in real time, and thereby stop in-situ molecular beam deposition at any desired sample thickness between 80 Å and 640 Å, with an average accuracy of 2 Å [1]. “The machine-learning model was able to ‘predict’ results within milliseconds,” says Pithan. “In a way, we transferred the time that is traditionally needed for the manual fitting process to the point before the actual experiment where we trained the model. So by the time of the experiment, were able to get results instantaneously.”

The ESRF has been anticipating a rise in machine learning for many years. It forms part of the data strategy, and is one of the reasons for the ESRF’s engagement in various European projects that support the trend: PaNOSC, which is a cloud service to host publicly funded photon and neutron research data; DAPHNE, which aims to make photon and neutron data accord to “FAIR” (reusable) principles; and most recently OSCARS, which promotes European open science. Vincent Favre-Nicolin, the head of the ESRF algorithms and scientific data analysis group, is wary of claiming that machine learning is always a “magical” solution, and points out the toll it can take on computing resources. “But for some areas it makes a real difference,” he says.

Read more on ESRF website

Image: Painstaking manual segmentation of ESRF tomographic data reveals the vasculature of a human kidney for the Human Organ Atlas project. It also provides valuable training data for deep-learning algorithms that will be able to do the same job much faster