From Sequence to Structure: A Fast Track for RNA Modeling

In Biology 101, we learn that RNA is a single, ribbon-like strand of base pairs that is copied from our DNA then read like a recipe to build a protein. But there’s more to the story. Some RNA strands fold into complex shapes that allow them to drive cellular processes like gene regulation and protein synthesis, or catalyze biochemical reactions. We know that these active molecules, called non-coding RNAs, are present in all life forms, yet we’re just starting to understand their many roles – and how they can be harnessed for applications in environmental science, agriculture, and medicine.

To study – and potentially modify – the functions of non-coding RNAs, we need to determine their structure. Scientists from Lawrence Berkeley National Laboratory (Berkeley Lab) and the Hebrew University of Jerusalem have developed a streamlined process that predicts the structure of an RNA molecule down to the atomic level. Members of the research community can come to Berkeley Lab’s Advanced Light Source (ALS) user facility knowing nothing more than the molecule’s nucleotide sequence and get a structure, or they can do it themselves using the team’s open-source software.

“We were looking at the bigger picture with structure prediction, like how we can go from A to Z rather than working on A, B, and D. That’s what we try to do at Berkeley Lab, make it user friendly,” said Michal Hammel, a staff scientist in Berkeley Lab’s Molecular Biophysics and Integrated Bioimaging (MBIB) division. Hammel co-developed the process, called SOlution Conformation PrEdictor for RNA (SCOPER), with MBIB colleague Scott Classen and Hebrew University collaborators Dina Schneidman-Duhovny and Edan Patt.

A paper describing SCOPER was recently published in Biophysical Journal.

Historically, it has ranged between difficult to impossible to accurately determine the three-dimensional atomic blueprint of a folded RNA because they rarely convert into a neat crystalline form to be imaged with X-ray crystallography. And because the twists and folds of the RNA strand move around as the molecule functions, there are actually multiple correct structures.

In recent years, artificial intelligence (AI) tools like AlphaFold have become very accurate at generating protein structure predictions based on amino acid sequence, making life a lot easier for scientists worldwide and greatly accelerating the pace of drug discovery. These algorithms have been expanded to RNA structures, but the accuracy remains middling. Getting a reliable model currently involves combining the outputs of multiple computational tools and imaging data. It’s a long process, and still fraught with uncertainty.

SCOPER has simplified it significantly. Say you want to study a new RNA: First, put the nucleotide sequence into one of the open-source, AI-based structure prediction tools available today. Then, take your sample to a small angle X-ray scattering (SAXS) facility for characterization. Better yet, let Hammel and his colleagues at the ALS’s SAXS beamline get that data for you.

Take the SAXS data and predicted structures, and put them through SCOPER’s pipeline. The first step uses an existing program to generate possible flexible arrangements of the RNA from the predicted static structures. Next, a new machine learning program, developed and trained on existing atomic structures by Patt, refines the structures by adding the placements of magnesium ions. Inside cells, positively charged magnesium ions interact with negatively charged RNAs to keep them folded stably. Their presence also helps elucidate structure when using SAXS.

Next, SCOPER generates simulated SAXS data representing the theoretical structures and compares them with the real-world SAXS data to determine which structure is correct.

Read more on ALS website

Image: These renderings show RNA structures that were used to evaluate the accuracy of the new SCOPER process. The AI-generated initial structure predictions based on sequence (blue) is pictured with the refined predicted structure generated by SCOPER (red), which includes the placement of magnesium ions (violet). 

Credit: Michal Hammel/Berkeley Lab

Using Machine Learning to Find Better Electrochemical Catalysts

Hydrogen may be the most common element in the universe, but that doesn’t mean it’s easy to get when we need it, such as for use as an energy source and storage method. “Green hydrogen,” as it’s known, is generated by splitting water into its component atoms through electrolysis, but that requires materials for an electrolyzer that can catalyze the reaction, some of which are rare and expensive. 

Finding alternative electrocatalysts is therefore an important goal in the quest for a carbon-neutral energy grid. But it’s a big job because so many chemical possibilities must be evaluated. Researchers from the University of Toronto and Carnegie Mellon University turned to the artificial intelligence technique of machine learning to efficiently screen thousands of possible catalysts and identify some likely choices.  Their work appeared in the Journal of the American Chemical Society.

While most commercial electrolysis uses alkaline water electrolyzers, a promising alternative is the proton exchange membrane (PEM) electrolyzer, which uses a solid polymer electrolyte membrane to separate out hydrogen gas at higher pressures and current density than is possible with alkaline electrolyzers.  At present, however, the only oxygen evolution reaction (OER) catalyst that can endure the extreme acidic environment at the anode in the PEM electrolyzer is iridium oxide (IrO2), which is expensive because of its great demand for many other uses. In the current work, the researchers explored the prospects for an OER catalyst based on ruthenium in the form of RuO2, which would be a far less expensive and more abundant alternative.
    
A disadvantage of ruthenium when used in the OER process is its tendency to become overoxidized, with the formation of soluble Ru atoms that can limit its catalytic lifetime and stability. To overcome this problem, the experimenters sought metallic oxides that could alloy with RuO2 and create a more robust and stable OER catalyst. They used a neural net computational pipeline approach applied to density function theory calculations to efficiently screen a large set of mixed metallic oxides to isolate likely candidates.

After training a neural network algorithm model on 36,465 metal oxide structures, the investigators substituted 46 elements in the oxide structure while keeping the rutile oxide structure intact. This led to a set of 2070 hypothetical candidates, which were then evaluated for their Pourbaix electrochemical stability. The investigators note that Pourbaix stability provides an excellent benchmark for gauging the electrochemical stability of catalysts prior to reaction.
    
Further calculations narrowed down the candidate set to the Ru-Cr-Ti-Ox group, particularly Ti and Cr, so the research team focused on these for experimental validation. They synthesized materials for testing various dopant amounts in the OER catalyst compounds, including in-situ X-ray absorption spectroscopy (XANES) at the 9-BM and 20-BM beamlines of the Advanced Photon Source, a U.S. Department of Energy (DOE) Office of Science user facility at DOE’s Argonne National Laboratory.

Read more on APS website

Image: The AI-accelerated workflow for catalyst design in this work, starting from design and synthesis, through characterization, and ending with testing the catalyst in a real electrolyzer for hydrogen production.

ALS Machine Learning Models at Beamtimes around the World

From the types of samples to the techniques used to study them, user experiences at beamlines around the world can vary, but one commonality connects them: beamtime is precious. At different facilities, users encounter different beamline controls, and varying availability of compute infrastructure to process their data. Beyond needing to familiarize themselves with different equipment and software setups, they also need to ensure that they’re collecting meaningful, consistent data no matter where they are. For the past several months, the ALS Computing group has been traveling around the world for beamtime. Their firsthand experience is informing the development of a suite of tools aimed at lowering the barriers of access to advanced data processing for all users.

Today’s beamtime experience

As a beamline scientist at the ALS, Dula Parkinson has helped numerous users with microtomography, a technique that can yield ten gigabytes of data in two seconds. “In many cases, users won’t have done this kind of experiment or analysis before, and they won’t have the computing infrastructure or software needed to analyze the huge amounts of complex data being produced,” he said.

Computational tools and machine-learning models can help the users, from adjusting their experimental setup in real time to processing the data after the experiment has concluded. Eliminating these bottlenecks can make the limited beamtime more efficient and help users glean scientific insights more quickly.

As a former beamline scientist himself, Computing Program Lead Alex Hexemer has first-hand knowledge of the user experience. He was instrumental in the creation of a dedicated computing group at the ALS in 2018, which continues to grow in both staff numbers and diversity of expertise. A current focus for the group is to advance the user experience with intuitive interfaces.

Computing approach to beamtime

Recently, Hexemer and two of his group members, Wiebke Koepp and Dylan McReynolds, traveled to Diamond Light Source, where they worked with Beamline Scientist Sharif Ahmed to test some of their tools during a beamline experiment. “It is always useful to see other facilities from the user’s perspective,” McReynolds said. “We want our software to be usable at many facilities, so getting to test in other environments was very valuable.”

The computational infrastructure is an essential complement to the beamline instrumentation. To standardize their experiments across different microtomography beamlines, the team performed measurements on a reference material—sand with standardized size distributions. Each scan captures a “slice” from the sample; the slices then need to be reconstructed into three-dimensional images that contain 50 to 200 gigabytes of data.

Read more on ALS website

Image: The ALS Computing group performed experiments and tested their machine learning models at Beamline 8.3.2. Clockwise from back left: Tanny Chavez, Dylan McReynolds, Raja Vyshnavi Sriramoju, Seij De Leon, Dula Parkinson, Wiebke Koepp.

The Long Read: The AI revolution

For what was once a purely technical subject, machine learning has hardly been out of the news. Beginning in late 2022, the world has had to come to terms with the impact of a number of groundbreaking, generative artificial-intelligence (AI) models – notably the ChatGPT chatbot by the US company OpenAI, and text-to-image systems such as Midjourney, developed by the US company of the same name. Everyday conversations cannot avoid the debate over whether we are living amid a fantastic new industrial revolution – or the end of civilisation as we know it.

All this popular controversy can detract from a quieter – but no less important – machine-learning evolution taking place in the scientific realm. Arguably this began in the 1990s, with greater computing power and the development of so-called neural networks, which attempt to mimic the wiring of the brain, and which helped to popularise AI as an overarching term for machines that ape human thinking. The real acceleration, however, has taken place in the past decade or so, thanks to the storage and processing of “big data”, and experiments with layered neural networks – what has come to be called deep learning.

Of this revolution, synchrotron users – who are among the world’s largest producers of scientific data – stand to be great beneficiaries. Machine learning has the potential to streamline experiments, reduce data volumes, speed up data analysis and obtain results that would otherwise be beyond human insight. “We’ve been amazed in many ways by the results we could produce,” says Linus Pithan, a materials and data scientist based at the German synchrotron DESY, who ran an autonomous crystal-growth experiment at the ESRF’s ID10 beamline with colleagues last year. “The quality of the online data analysis was astonishing.”

Formerly a member of the ESRF’s Beamline Control Unit where he helped develop the new BLISS beamline control system, Pithan is well placed to test the potential of machine learning in synchrotron science. The flexibility of BLISS was necessary for him and his colleagues to integrate their own deep-learning algorithm, which they had trained beforehand to reconstruct scattering-length density (SLD) profiles from the X-ray reflectivity of molecular thin films. Unlike the forwards operation – calculating a reflectivity curve from an SLD profile – this inverse problem can be painfully tedious to solve even for an experienced analyst: the data are inherently ambiguous, because they do not include the phase of the scattered X-rays. Indeed, it is a demanding task for a machine too, which is why at the beamline Pithan’s group made use of an online service known as VISA to harness the ESRF’s central computer system.

The success of the automation was immediately apparent (Figure 1). From the reflectivity measurements, the deep-learning algorithm could output SLD profiles and thin-film properties such as layer thickness and surface roughness in real time, and thereby stop in-situ molecular beam deposition at any desired sample thickness between 80 Å and 640 Å, with an average accuracy of 2 Å [1]. “The machine-learning model was able to ‘predict’ results within milliseconds,” says Pithan. “In a way, we transferred the time that is traditionally needed for the manual fitting process to the point before the actual experiment where we trained the model. So by the time of the experiment, were able to get results instantaneously.”

The ESRF has been anticipating a rise in machine learning for many years. It forms part of the data strategy, and is one of the reasons for the ESRF’s engagement in various European projects that support the trend: PaNOSC, which is a cloud service to host publicly funded photon and neutron research data; DAPHNE, which aims to make photon and neutron data accord to “FAIR” (reusable) principles; and most recently OSCARS, which promotes European open science. Vincent Favre-Nicolin, the head of the ESRF algorithms and scientific data analysis group, is wary of claiming that machine learning is always a “magical” solution, and points out the toll it can take on computing resources. “But for some areas it makes a real difference,” he says.

Read more on ESRF website

Image: Painstaking manual segmentation of ESRF tomographic data reveals the vasculature of a human kidney for the Human Organ Atlas project. It also provides valuable training data for deep-learning algorithms that will be able to do the same job much faster 

New AI-driven tool streamlines experiments

Researchers at the Department of Energy’s SLAC National Accelerator Laboratory have demonstrated a new approach to peer deeper into the complex behavior of materials. The team harnessed the power of machine learning to interpret coherent excitations, collective swinging of atomic spins within a system. 

This groundbreaking research, published recently in Nature Communications, could make experiments more efficient, providing real-time guidance to researchers during data collection, and is part of a DOE-funded project led by Howard University including researchers at SLAC and Northeastern University to use machine learning to accelerate research in materials. 

The team created this new data-driven tool using “neural implicit representations,” a machine learning development used in computer vision and across different scientific fields such as medical imaging, particle physics and cryo-electron microscopy. This tool can swiftly and accurately derive unknown parameters from experimental data, automating a procedure that, until now, required significant human intervention.

Peculiar behaviors

Collective excitations help scientists understand the rules of systems, such as magnetic materials, with many parts. When seen at the smallest scales, certain materials show peculiar behaviors, like tiny changes in the patterns of atomic spins. These properties are key for many new technologies, such as advanced spintronics devices that could change how we transfer and store data. 

To study collective excitations, scientists use techniques such as inelastic neutron or X-ray scattering. However, these methods are not only intricate, but also resource-intensive given, for example, the limited availability of neutron sources. 

Machine learning offers a way to address these challenges, although even then there are limitations. Past experiments used machine learning techniques to enhance the accuracy of X-ray and neutron scattering data interpretation. These efforts relied on traditional image-based data representations. But the team’s new approach, using neural implicit representations, takes a different route. 

Read more on SLAC website

Machine learning enhances X-ray imaging of nanotextures

Using a combination of high-powered X-rays, phase-retrieval algorithms and machine learning, Cornell researchers revealed the intricate nanotextures in thin-film materials, offering scientists a new, streamlined approach to analyzing potential candidates for quantum computing and microelectronics, among other applications.

Scientists are especially interested in nanotextures that are distributed non-uniformly throughout a thin film because they can give the material novel properties. The most effective way to study the nanotextures is to visualize them directly, a challenge that typically requires complex electron microscopy and does not preserve the sample.

The new imaging technique detailed July 6 in the Proceedings of the National Academy of Sciences overcomes these challenges by using phase retrieval and machine learning to invert conventionally-collected X-ray diffraction data – such as that produced at the Cornell High Energy Synchrotron Source, where data for the study was collected – into real-space visualization of the material at the nanoscale.

The use of X-ray diffraction makes the technique more accessible to scientists and allows for imaging a larger portion of the sample, said Andrej Singer, assistant professor of materials science and engineering and David Croll Sesquicentennial Faculty Fellow in Cornell Engineering, who led the research with doctoral student Ziming Shao.

“Imaging a large area is important because it represents the true state of the material,” Singer said. “The nanotexture measured by a local probe could depend on the choice of the probed spot.”

Read more on the CHESS website

Artificial intelligence deciphers detector “clouds” to accelerate materials research

A machine learning algorithm automatically extracts information to speed up – and extend – the study of materials with X-ray pulse pairs.

X-rays can be used like a superfast, atomic-resolution camera, and if researchers shoot a pair of X-ray pulses just moments apart, they get atomic-resolution snapshots of a system at two points in time. Comparing these snapshots shows how a material fluctuates within a tiny fraction of a second, which could help scientists design future generations of super-fast computers, communications, and other technologies.

Resolving the information in these X-ray snapshots, however, is difficult and time intensive, so Joshua Turner, a lead scientist at the Department of Energy’s SLAC National Accelerator Center and Stanford University, and ten other researchers turned to artificial intelligence to automate the process. Their machine learning-aided method, published October 17 in Structural Dynamics, accelerates this X-ray probing technique, and extends it to previously inaccessible materials.

“The most exciting thing to me is that we can now access a different range of measurements, which we couldn’t before,” Turner said.

Handling the blob

When studying materials using this two-pulse technique, the X-rays scatter off a material and are usually detected one photon at a time. A detector measures these scattered photons, which are used to produce a speckle pattern – a blotchy image that represents the precise configuration of the sample at one instant in time. Researchers compare the speckle patterns from each pair of pulses to calculate fluctuations in the sample.

“However, every photon creates an explosion of electrical charge on the detector,” Turner said. “If there are too many photons, these charge clouds merge together to create an unrecognizable blob.” This cloud of noise means the researchers must collect tons of scattering data to yield a clear understanding of the speckle pattern.

“You need a lot of data to work out what’s happening in the system,” said Sathya Chitturi, a Ph.D. student at Stanford University who led this work. He is advised by Turner and coauthor Mike Dunne, director of the Linac Coherent Light Source (LCLS) X-ray laser at SLAC. 

Read more on the SLAC website

Image: A speckle pattern typical of the sort seen at LCLS’s detectors

Credit: Courtesy Joshua Turner

What drives rechargeable battery decay?

How quickly a battery electrode decays depends on properties of individual particles in the battery – at first. Later on, the network of particles matters more.

Rechargeable lithium-ion batteries don’t last forever – after enough cycles of charging and recharging, they’ll eventually go kaput, so researchers are constantly looking for ways to squeeze a little more life out of their battery designs.

Now, researchers at the Department of Energy’s SLAC National Accelerator Laboratory and colleagues from Purdue University, Virginia Tech, and the European Synchrotron Radiation Facility have discovered that the factors behind battery decay actually change over time. Early on, decay seems to be driven by the properties of individual electrode particles, but after several dozen charging cycles, it’s how those particles are put together that matters more.

“The fundamental building blocks are these particles that make up the battery electrode, but when you zoom out, these particles interact with each other,” said SLAC scientist Yijin Liu, a researcher at the lab’s Stanford Synchrotron Radiation Lightsource and a senior author on the new paper. Therefore, “if you want to build a better battery, you need to look at how to put the particles together.”

Read more on the SLAC website

Image: A piece of battery cathode after 10 charging cycles. A machine-learning feature detection and quantification algorithm allowed researchers to automatically single out the most severely damaged particles of interest, which are highlighted in the image.

Credit: Courtesy Yijin Liu/SLAC National Accelerator Laboratory

I am doing science that is more important than my sleep!

NSLS-II #LightSourceSelfie

Dan Olds is an associate physicist at Brookhaven National Laboratory where he works as a beamline scientist at NSLS-II. Dan’s research involves combining artificial intelligence and machine learning to perform real-time analysis on streaming data while beamline experiments are being performed. Often these new AI driven methods are critical to success during in situ studies of materials. These include next generational battery components, accident safe nuclear fuels, catalytic materials and other emerging technologies that will help us develop clean energy solutions to fight climate change.

Dan’s #LightSourceSelfie delves into what attracted him to this area of research, the inspiration he gets from helping users on the beamline and the addictive excitement that comes from doing science at 3am.

Game on: Science Edition

After AIs mastered Go and Super Mario, Brookhaven scientists have taught them how to “play” experiments at NSLS-II

Inspired by the mastery of artificial intelligence (AI) over games like Go and Super Mario, scientists at the National Synchrotron Light Source II (NSLS-II) trained an AI agent – an autonomous computational program that observes and acts – how to conduct research experiments at superhuman levels by using the same approach. The Brookhaven team published their findings in the journal Machine Learning: Science and Technology and implemented the AI agent as part of the research capabilities at NSLS-II.

As a U.S. Department of Energy (DOE) Office of Science User Facility located at DOE’s Brookhaven National Laboratory, NSLS-II enables scientific studies by more than 2000 researchers each year, offering access to the facility’s ultrabright x-rays. Scientists from all over the world come to the facility to advance their research in areas such as batteries, microelectronics, and drug development. However, time at NSLS-II’s experimental stations – called beamlines – is hard to get because nearly three times as many researchers would like to use them as any one station can handle in a day—despite the facility’s 24/7 operations.

“Since time at our facility is a precious resource, it is our responsibility to be good stewards of that; this means we need to find ways to use this resource more efficiently so that we can enable more science,” said Daniel Olds, beamline scientist at NSLS-II and corresponding author of the study. “One bottleneck is us, the humans who are measuring the samples. We come up with an initial strategy, but adjust it on the fly during the measurement to ensure everything is running smoothly. But we can’t watch the measurement all the time because we also need to eat, sleep and do more than just run the experiment.”

Read more on the Brookhaven website

Image: NSLS-II scientists, Daniel Olds (left) and Phillip Maffettone (right), are ready to let their AI agent level up the rate of discovery at NSLS-II’s PDF beamline.

Credit: Brookhaven National Lab