Traditional genome analysis methods often overlook short sequences, but recent research shows that small genomic elements can have significant health and disease implications. Validating their peptide products biochemically is challenging due to their size, and conservation-based annotation is less effective for short sequences. This blog explores small functional open reading frames and their identification using ribosome profiling.
Open reading frames (ORFs) are regions of the transcriptome that have an in-frame start and in-frame stop codon. This means that if a ribosome was to start translating at the start codon and translocate every three bases from codon to codon, it will eventually reach a stop codon allowing for peptide synthesis to complete. Small ORFs (smORFs, less than 100 codons in length) have received particular interest recently for their roles in gene regulation and the generation of small peptides1,2,3. Within the larger category of smORFs there are four main classes that are known to have functional roles: upstream ORFs (uORFs, located in the 5’ UTR), downstream ORFs (dORFs, located in the 3’ UTR), internal ORFs (intORFs, located in the coding sequence of a gene), and ORFs on non-coding RNAs (lncORFs)1,2,3.
uORFs in the 5’ UTR are the most abundant class of smORFs, they are found on almost 50% of annotated coding genes in humans1,2. Their main functional role is in regulating the translation of the downstream annotated coding sequence (CDS), likely by preventing scanning ribosomes from reaching the CDS. Several mechanisms have been proposed for this regulation including the uORF inducing ribosome dissociation, stalling ribosomes, or triggering non-sense mediated decay of the RNA transcript3. These roles often emphasize the uORF itself as the regulator rather than the peptide product. In line with this function, the uORF’s position is generally more conserved than the actual sequence, and RNA structures surrounding the uORF often influence its efficacy. However, recent work has revealed that the resulting peptide product can also have functional roles, such as allosteric protein modulation and immune system signaling.
dORFs in the 3’ UTR are less common than uORFs, likely because ribosomes dissociate from the transcript after reaching a stop codon in the CDS, hindering their ability to reinitiate at the dORF2. Unlike uORFs, dORFs seem to enhance translation efficiency of the associated CDS. The exact mechanism remains unknown but may be caused by transcript looping and the enhanced recruitment of translation initiation factors2.
intORFs, found in the CDS, are an intriguing class of ORFs due to their overlap with known coding sequences. Eukaryotic translation is primarily monocistronic, with a single coded protein per transcript. However, intORFs can be present that are out of frame or are in frame but result in a truncated product. Although these ORFs are significantly smaller than an average coding sequence (<100 codons versus the typical CDS which is >400 codons), they appear to be translated as efficiently as canonical sequences1,2.
lncORFs are found on long noncoding RNAs (lncRNAs) which are usually considered to be noncoding transcripts. The median number of ORFs per lncRNA is six, often overlapping, making their characterization difficult. Their translation efficiency is generally low, making it unclear whether they generate functional peptide products or serve as translation machinery sponges to regulate coding transcripts. Several micropeptides originating from lncRNAs have been associated with cancer, angiogenesis, and methylated RNA recognition indicating that many lncORFs may generate functional peptide products1,2,3.
The mere presence of an in-frame start and stop codon in a given region of a transcript does not imply that the sequence associates with ribosomes. By random chance there is a high likelihood of a short sequence having in-frame start and stop codons1. Genomic analyses have found that there are millions of potential smORFs, which require a technology like ribosome profiling to validate activity1.
Ribosome profiling, such as Eclipsebio’s eRibo Pro, is a powerful tool for determining what transcripts are being translated and where ribosomes are associating3. With ribosome profiling, RNA undergoes degradation, maintaining the sequences directly interacting with ribosomes, known as ribosome footprints. These sequences are mapped to the genome or transcriptome to determine where ribosomes are acting. Importantly, a region that is undergoing active translation will have sequenced reads align to the region with 3 nucleotide periodicity due to ribosome translocation across codons. Various computational tools can screen ORFs for periodic reads, signifying active translation4,5. This approach has uncovered fundamental insights into smORF biology. For instance, it revealed the activity of one of GCN4’s uORF in yeast suppresses GCN4 translation. However, under starvation conditions, the uORFs are bypassed, enabling GCN translation3.
Small open reading frames (smORFs) represent an often-underestimated facet of gene regulation, yet they possess substantial influence on health and disease. Recognizing their potential, smORFs have recently emerged as promising targets for therapeutic interventions. Eclipsebio’s eRibo Pro service streamlines the complex process of ribosome profiling and ORF detection analysis: freeing you to concentrate on the biology of your system or the development of innovative therapeutics, all without the burdens of establishing lab infrastructure and training your team. Additionally, consider pairing this service with eSHAPE, our service for assessing RNA secondary structure through chemical probing, to gain deeper insights into the intricate interplay between RNA structure and smORF function. Contact us today to start your project and discover new insights into ORF biology.
1. Couso and Patraquim (2017) Nat Rev Mol Cell Biol
2. Wright, Yi, Weissman, and Chen (2021) Tends Cell Biol
3. Kute et al. (2021) Front Genet
4. Calviello et al. (2016) Nature Methods
5. Choudhary, Li, and Smith (2020) Bioinformatics