Designing better RNA medicines with artificial intelligence

Wayne Doyle

Key highlights

Artificial intelligence is becoming a key component of the RNA medicine design process
Neural networks are used to select optimal antigens or to predict RNA potency and stability
Generative models can design novel sequences that outperform traditional sequence optimization approaches
A major challenge in implementing AI for RNA design is obtaining high-quality data for model training

Introduction

RNA-based medicines – from mRNA vaccines to gene editing therapies – are transforming drug development, but designing effective RNA drugs remains a complex challenge. Designing a potent therapeutic requires optimizing the nucleotide sequence to be stable and efficiently translated. Historically, the RNA design has leveraged brute force approaches such as selecting codons that appear most frequently in the genome or choosing UTRs that are associated with highly expressed genes. However, the design space for an RNA sequence is extremely large, for example there are more than 10⁶³⁰ potential designs for the SARS-CoV2 spike-in antigen which is more atoms than there are in the known universe¹. Due to the size of the potential design space, traditional approaches to RNA design are unlikely to discover sequences that fully maximize RNA stability and translation.

With the rapid growth in the capabilities of artificial intelligence (AI) and machine learning (ML), we are now able to overcome the limitations of traditional RNA design allowing for the manufacture of highly effective RNA medicines. In this eBlog, we explore how the different ways that AI is being applied to improve sequence design and develop potent medicines.

Deep learning for antigen selection and RNA function predictions

The first challenge in developing an RNA medicine is determining what protein should be encoded in the therapeutic. To help in the selection of the target protein there has been rapid development of deep learning algorithms, a type of ML based on mimicking the human brain using models called neural networks. One example of their application is to personalized neoantigen therapies, where a patient-specific peptide needs to be identified to trigger an immune response against a tumor². Deep learning models such as NeoaPred³ and DeepHLApan⁴, have been developed that predict epitopes that are likely be immunogenic providing drug developers with targets that can undergo further sequence optimization.

One common area for optimization is related to RNA structure, a key determinant of RNA stability and function. RNA folds into complex secondary and tertiary structures, and that folding depends on both the sequence of the RNA and the local environment that the RNA is in. Traditional methods for predicting RNA secondary structure have used thermodynamic models⁵, which have shown high accuracy when predicting RNA secondary structure in buffer conditions but can’t predict how the structure will change in an LNP or after protein binding in a cell. To overcome the limitations of traditional thermodynamic models, several approaches based off deep learning have been developed. These algorithms, such as DeepFoldRNA⁶ and RhoFold⁷, are typically trained on known crystal structures of RNA and have outperformed traditional folding algorithms for accurate prediction of RNA structures.

A final area where deep learning has seen great success is with predicting the level of protein expression from an RNA sequence, enabling drug developers to test several sequences in silico before moving to more time-consuming and expensive in vitro and in vivo testing. For any given therapeutic, the amount of protein produced depends on the codons that are selected in the coding sequence as well as the regulatory elements that are present in the 5’ and 3’ UTRs. In a recent publication by Sanofi⁸, an integrated model that combined several small language models was developed that was able to accurately predict the translation and half-life of different RNA sequences.

Generative models for RNA design

The deep learning models described above have been proven to be extremely effective for predicting how a given RNA sequence will fold or be translated, however they require that a preliminary sequence already exists. Instead of evaluating or iteratively optimizing existing sequences, generative AI can be used to develop novel sequences that have never been observed in nature before. There are several different approaches for generative models that have been developed including:

Generative Adversarial Networks (GANs): These models employ two neural networks: one called the generator that creates RNA sequences, and one called the discriminator that tries to identify if an RNA sequence is artificial. This allows for a feedback loop to be created that enables the generator to iteratively improve in the creation of effective sequences. This approach was used by RNAGEN⁹ to design short RNAs that bind to proteins. Although a promising approach for shorter RNAs, they can struggle with longer sequences such as full-length mRNAs.
Language-model based generators: In addition to predicting how a sequence will function, transformer-based large language models can also be used to generate de novo sequence designs utilizing self-attention mechanisms. One example is GenerRNA¹⁰, which can generate novel sequence designs that are consistent with the “grammar” of RNA to ensure that they remain structurally stable.
Variational Autoencoders (VAEs): VAEs can create new sequences that are variations on the data they are trained on. One example of VAE in practice is RfamGen¹⁰, which can design functional RNAs based on sequence and secondary structure information.

Importantly, generative models are rarely used alone. They are often coupled with predictive models to evaluate them in silico before direct experimental validation. For example, a generative model will design several sequences that are then scored by a different learning model for stability or potency. Only the top-scoring RNAs move forward to testing in cells or other model systems, decreasing the time and cost needed for early-stage therapeutic development. This approach can be used to select lead candidates or be part of a larger reinforcement learning process where the generative models are given feedback to improve their effectiveness.

Data is the major challenge for the successful use of AI

Although AI is showing great promise for the design of RNAs that are stable and produce high levels of protein, their effectiveness depends on the data used to train the models. A major challenge in the field is that obtaining reproducible, high-quality, and multidimensional¹²data is difficult if not impossible from public repositories¹³. Although limited, cell-type specific data on key aspects of RNA biology can be critical determinants on the success of a model in designing an effective therapy. At Eclipsebio, we help our partners solve this challenge through our eVERSE portfolio of datasets. eVERSE is the only source of truly multidimensional data, including reproducible assessments of RNA secondary structure, miRNA binding, and more.

Conclusion

The combination of AI and RNA biology is ushering a new era in the design of safe and effective RNA medicines. Deep learning models are being actively used predict how RNA sequences will fold and function with impressive accuracy, while generative models are proposing novel sequences tailored for optimal protein production. By automating complex design tasks and learning from multidimensional sequencing datasets, AI is accelerating the development of RNA vaccines and therapies from concept to clinic.

Contact us today to learn more about our capabilities to support your AI model training and how we empirically validate your RNAs with comprehensive, sequencing-based characterization.

References

Latest eBlogs

RNA advancements and innovations: A 2025 review

In this latest eBlog, the Eclipsebio team looks back to some of the year's breakthroughs and ahead to future innovations.

General

•

12 / 10 / 25

State of Eclipsebio – 2025

In this Thanksgiving-themed blog, our CEO reflects on the past year at Eclipsebio and looks forward to 2026.