Key highlights
- Artificial intelligence (AI) is transforming RNA research and drug discovery, including the design of RNA-based medicines and selection of siRNA targets
- AI success depends on three pillars: compute, algorithms, and data
- Multidimensional data remains a major limiting factor in the use of AI to develop therapeutics
- eVERSE and other repositories that contain RNA-specific data are becoming critical for the success of AI in drug discovery
Introduction
AI is already transforming the way we approach biology— from structure prediction and gene expression modeling to the design of entirely new therapeutics. Across all domains, progress in AI depends on three foundational pillars: compute, algorithms, and data.
In many industries, compute and algorithms have become increasingly accessible. But in RNA research and drug discovery, data remains the limiting factor. And until that gap is closed, AI will fall short of its full potential in applications like small molecule target discovery, siRNA design, and RNA-based drug development.
In this eBlog, we review the three pillars of AI and how each is contributing to the application of machine learning models for RNA drug discovery.
Pillar 1: scalable and accessible compute
The barrier to accessing high-performance compute has lowered significantly in recent years. Cloud-based platforms offer scalable infrastructure for training deep learning models, and specialized GPUs (like NVIDIA A100s) can be rented by the hour. In academic and commercial biotech settings, this has opened the door to large-scale modeling projects that were once out of reach.
In RNA biology, this capability allows for computationally intensive tasks like predicting RNA folding, simulating degradation kinetics, and modeling translation efficiency across thousands of constructs. The compute infrastructure exists, the challenge is feeding the models running on these resources with accurate and consistent input.
Pillar 2: powerful and evolving algorithms
AI models used in biology today include a wide range of techniques: convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to Bayesian models, transformers, and graph neural networks. Many of these are well-documented and open-source, with active development communities and pretrained baselines available.
RNA-focused applications of these algorithms are already being explored, such as:
- Predicting siRNA efficacy and minimizing off-target effects
- Designing antisense oligonucleotides (ASOs) with reduced toxicity
- Modeling mRNA stability and translation rates
The underlying mathematics and model architectures are not the bottleneck. But across use cases, one pattern emerges: performance is ultimately constrained by the quality and depth of training data.
Pillar 3: limited data
Of the three pillars, data is by far the most challenging in RNA AI work.
Most publicly available RNA datasets were not generated with machine learning in mind. They may have inconsistent protocols, limited replication, low sequencing depth, or missing metadata. Integrating them across studies or timepoints often introduces batch effects that are hard to disentangle from biology.
Protein-focused AI has made dramatic leaps thanks to massive, curated datasets like AlphaFold’s PDB inputs. RNA biology, however, lacks a comparable resource. For models to accurately predict how RNA behaves or how it performs in various cellular contexts, they need input data that is reproducible, deeply sequenced, and tied to functional outcomes.
Creating this kind of data isn’t easy. It requires standardized protocols, thoughtful experimental design, and awareness of the downstream applications that AI will serve. But it's essential.
A path forward: purpose-built RNA datasets
Some efforts are starting to address this need. For example, our team has developed a growing data resource called eVERSE, an RNA genomic data platform built to support machine learning applications. It includes deeply sequenced, reproducible datasets from multiple RNA assays, with standardized metadata and quality metrics, providing information on structural accessibility, RBP regulation, and Ribosome binding. While it's just one approach, it reflects the kind of effort the field needs more of: RNA data that is designed not just for analysis, but for learning.
Ultimately, AI in RNA biology will only be as effective as the data it's trained on. Building robust, generalizable models requires more than compute or clever algorithms—it demands datasets that capture the full biological complexity of RNA, from structure to function to cellular response.
As the field matures, the need for high-quality RNA data will only grow. Whether from public efforts, academic consortia, or biotech-driven initiatives, solving the data bottleneck is key to unlocking what AI can do for RNA therapeutics.
Latest eBlogs
The three pillars of AI in RNA biology: why data is the hardest to get right
Artificial intelligence (AI) is transforming how we approach RNA research and drug discovery. In this eBlog we review how data is one of the key pillars for the successful use of AI.
Precision medicine in action: the role of personalized cancer vaccines
Precision oncology uses information on the patient's own tumor to create personalized therapies, such as RNA cancer vaccines. These therapies train a patient's own immune system to attack cancer.