Large research facilities at PSI such as the X-ray free-electron laser SwissFEL and the Swiss Light Source SLS – especially after the upgrade SLS 2.0 – deliver unimaginably vast amounts of data. Artificial intelligence is helping to evaluate data efficiently and exploit the facilities’ full potential for research.
Proteins are the workhorses of life. As tiny molecular machines, they are found in every cell and have a role in nearly all biological processes – from metabolism to cellular communication. Their diversity is enormous, because in the human body alone there are hundreds of thousands of different proteins, each with its own function. Proteins are important targets for drugs, and understanding their structure and function is an important task in biological research. One challenge in drug development is to find, if possible, an active agent that interacts with just one type of protein, to the exclusion of all the rest.
To achieve such a feat, one must first understand the language of proteins. The basis of this protein language is a kind of alphabet. It essentially consists of 20 building blocks analogous to letters. In proteins, however, it’s not about letters, but rather amino acids. Each protein is built up from a certain sequence of these amino acids; the sequence in turn largely determines its properties. Researchers would now like to know which protein sequence leads to which property. This is where so-called large language models such as GPT4 come into play. The AI chatbot ChatGPT, which has been causing a stir since 2022, is based on GPT4. Both were developed by the company OpenAI. ChatGPT uses an extensive dataset of texts created by humans to learn the patterns and structures of language. When the user enters a question or task, the model produces a response based on its understanding of the contexts and patterns that it learned during training. In this way it can write poems, novels and even programming code.
Flurin Hidber, a doctoral candidate supervised by Xavier Deupi, an expert in bioinformatics and protein structure at PSI, uses AI in protein research. Hidber uses a sophisticated model similar to ChatGPT that is trained to predict amino acids in protein sequences, instead of generating human-like language. This unique ability does not merely mimic the predictive capabilities of language models in AI, but rather provides valuable insights into the structure and function of proteins. Pharmaceutical researchers could use these to tailor medications and significantly shorten the process of trial and error in the laboratory, which in the end yields only a small proportion of drug candidates with promising properties.
An ambitious goal
Deupi and Hidber are working towards an ambitious goal: being able to determine the precise amino acid sequence that leads to a desired protein property. One focus of their research is light-sensitive proteins, a speciality of Deupi’s group and a research subject at SwissFEL. These proteins occur in many organisms, from microbes to humans, and have medical potential. Hidber’s use of AI to predict the properties of light-sensitive proteins solely on the basis of the sequence of their building blocks represents a significant advance in this field.
Through the precise prediction of the light-absorption properties of proteins, Hidber’s work could pave the way for the development of molecules with tailored properties – a step that could have a profound impact on optogenetics. This scientific technique employs light to control and monitor the activity of certain cells in living organisms, such as nerve cells in the brain. Researchers insert genes for light-sensitive proteins into these cells so they can precisely influence the cells’ behaviour by irradiating them with light.
This technology could contribute to the understanding and treatment of neurological diseases, since it provides a tool that can be used to investigate and control the activity of specific brain cells with unprecedented precision. For the future, Deupi and Hidber have set themselves the goal of reversing this process. They want to design new proteins with properties tailored to meet specific requirements, for example proteins that react to light of a particular colour. This blueprint could then be checked experimentally, and hopefully confirmed by colleagues in the laboratory.
The topic of protein dynamics is also at the heart of Cecilia Casadei’s research. The physicist has developed a new algorithm that enables more efficient evaluation of measurements at X-ray free-electron laser facilities such as SwissFEL. The building blocks of life often perform ultrafast movements. Investigating these with precision is crucial to gain a better understanding of proteins. In the long run, this can provide valuable information about disease processes and enable the development of novel medical approaches.
Read more on PSI website
Image: Xavier Deupi (left) and Flurin Hidber from the research group for Condensed Matter Theory want to better understand how the function of proteins is related to their structure. They are targeting light-sensitive proteins in particular.
Credit: Paul Scherrer Institute/Markus Fischer; KI image generation: Studio HübnerBraun/Midjourney