K562 miR-eCLIP Summary Report

Thank you for trusting Eclipsebio with your analysis needs. This interactive report provides an overview of your results, accessible at the different tabs. For each table and plot, the button can be clicked to get more information. The menus in the top right corner of each plot can be used to modify the plots (zoom, hide samples, and save a copy of the plot). This report is optimized for viewing on a desktop and plots have been tested on several browsers but in some cases they may not render correctly, please use an alternate browser if plots do not load.

If you have any questions about this experiment or the analyses please reach out to us at services@eclipsebio.com. We are happy to help!

Experiment Summary

This report summarizes a miR-eCLIP experiment performed on 7 samples: Sample1_IP, Sample2_IP, Sample3_IP, Sample4_IP, Sample5_IP, Sample6_IP and Sample7_IP. For miR-eCLIP experiments, the standard eCLIP protocol [PMID 27018577] was modified to enable chimeric ligation of miRNA and mRNA according to bioRxiv preprint by Manakov et al, 2022 [10.1101/2022.02.13.480296]. While plated, approximately 20 x 10⁶ cells were UV crosslinked at 400 mJoules/cm2 with 254 nm radiation, pelleted, snap frozen on ice dry and stored until use at -80°C. Cell pellets were then lysed with 1 mL of eCLIP lysis mix and sonicated (QSonica Q800R2) for 5 minutes, 30 seconds on / 30 seconds off with an energy setting of 75% amplitude, followed by digestion with RNase-I (Ambion). A primary mouse monoclonal AGO2/EIF2C2 antibody (sc-53521, Santa Cruz Biotechnology) was incubated for 1 hour with magnetic beads pre-coupled to the secondary antibody (M-280 Sheep Anti-Mouse IgG Dynabeads, Thermo Fisher 11202D) and added to the homogenized lysate for overnight immunoprecipitated at 4°C. Following overnight IP, 2% of the sample was taken as the paired size-matched input with the remainder magnetically separated and washed with eCLIP high stringency wash buffers. Chimeric ligation was then performed on-bead at room temperature for 1 hour with T4 RNA ligase (NEB). IP samples were then dephosphorylated with alkaline phosphatase (FastAP, Thermo Fisher) and T4 PNK (NEB) and an RNA adapter was ligated to the 3′ ends. IP and input samples were cut from the membrane at the AGO2 protein band size to 75 kDa above. Western blot was visualized using anti-AGO2 primary antibody (50683-RP02, SinoBiological) at a 1:2000 dilution, with TrueBlot anti-rabbit secondary antibody (18-8816-31, Rockland) at 1:6000 dilution. RNA adapter ligation, IP-western, reverse transcription, DNA adapter ligation, and PCR amplification were performed as previously described. Sequencing was performed as SE122 on the NextSeq 2000 platform.

After sequencing, samples were processed with Eclipsebio's proprietary analysis pipeline (v1). UMIs were pruned from read sequences using umi_tools (v1.1.1). Next, 3' adapters were trimmed from reads using cutadapt (v3.2). Reads were then mapped to a custom database of repetitive elements and rRNA sequences. All non-repeat mapped reads were mapped to the genome (UCSC version GRCh38/hg38) using STAR (v2.7.7a). PCR duplicates were removed using umi_tools (v1.1.1). AGO2 eCLIP peaks were identified within eCLIP samples using the peak caller CLIPper (v2.0.1). For each peak, IP versus input fold enrichments and p-values were calculated. miRNAs from miRBase (v22.1) were "reverse mapped" to any reads that did not map to repetitive elements or the genome using bowtie (v1.2.3). The miRNA portion of each read was then trimmed, and the remainder of the read was mapped to the genome using STAR (v2.7.7a). PCR duplicates were resolved using umi_tools (v1.1.1), and miRNA target clusters were identified using CLIPper (v2.0.1). Each cluster was annotated with the names of miRNAs responsible for that target. Peaks were annotated using transcript information from GENCODE release 41 (GRCh38.p13) with the following priority hierarchy to define the final annotation of overlapping features: protein coding transcript (CDS, UTRs, intron), followed by non-coding transcripts (exon, intron).

miR-eCLIP Workflow

Quality Control Metrics

The following table shows important quality control metrics for each sample. Two libraries are made for each sample, one containing immunoprecipitated (IP) RNA and the other containing input RNA. For each sample ("Sample"), two rows are included that list the metrics for each library. The identity of the library is included in the "IP or Input" column.

The following columns describe the processing of the libraries:

Initial reads: the number of sequenced reads for each library.
% Pass trim: reads first underwent adapter trimming to remove both adapters and sequences shorter than 18 nucleotides. The percentage of sequenced reads that passed this trimming step is listed in this column.
% Repetitive elements: trimmed reads are aligned to Eclipsebio's custom database of repetitive elements. This database includes rRNAs, tRNAs, snoRNAs, and other features. The percentage of trimmed reads that were filtered is listed in the column.
% Uniquely aligned to the genome: the remaining reads from the repetitive element filtering were aligned to the genome, and the percent of those reads that map uniquely to the genome is listed in this column.
% PCR duplicates: PCR duplicates were defined as uniquely mapped reads that map to the same read coordinates and have an identical unique molecular identifier (UMI). The percentage of mapped reads that were identified as PCR duplicates is included in this column.
Final nonchimeric reads: this column lists the number of uniquely mapped, deduplicated, nonchimeric reads that were used for this analysis.
AGO2 clusters: AGO2 clusters are found using the peak calling tool CLIPper. The clusters are identified from nonchimeric reads in the IP samples and are not input normalized.
AGO2 peaks: A peak is defined as a cluster with a log₂(fold enrichment over the matched input) > 3 and p-value < 0.001.
Final chimeric reads: The number of chimeric reads that mapped to the genome after trimming the miRNA component of the chimeric read.
% Chimeras: percentage of chimeric reads that mapped to the genome out of the total number of reads that mapped to the genome (chimeric + nonchimeric)
Chimeric clusters: chimeric clusters are found using the peak calling tool CLIPper. The clusters are identified from genome mapping component of chimeric reads in the IP samples and are not input normalized.

Sample	IP or Input	Initial reads	Final chimeric reads	% Chimeras	Chimeric clusters
Sample1_IP	IP	320,917,637	993,564	0.31%	19,947
Sample2_IP	IP	235,229,989	876,608	0.37%	18,340
Sample3_IP	IP	306,779,036	1,040,592	0.34%	20,969
Sample4_IP	IP	284,173,064	1,192,868	0.42%	23,170
Sample5_IP	IP	272,072,956	1,241,972	0.46%	24,145
Sample6_IP	IP	242,028,592	1,113,529	0.46%	21,546
Sample7_IP	IP	234,744,724	886,809	0.38%	18,018
Sample1_Input	Input	55,156,032
Sample2_Input	Input	41,475,423
Sample3_Input	Input	53,869,196
Sample4_Input	Input	52,392,618
Sample5_Input	Input	52,393,435
Sample6_Input	Input	47,084,819
Sample7_Input	Input	43,247,006

This tab summarizes the identified miRNA target peaks from the chimeric reads.

miRNA Target Counts

miRNA target peaks are found using the peak calling tool CLIPper. Reproducible peaks are identified by filtering for peaks where each replicate contains at least three chimeric reads with miRNAs of the same miRNA seed family overlapping the peak. The bar plot below shows the number of peaks detected in each sample, as well as the number of reproducible peaks for each set of replicates. Hover over each bars to see the number of peaks in each set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Gene Counts

The bar plot below shows the number of genes containing at least one significantly miRNA target peak per sample, as well as the number of genes containing at least one miRNA target peak for each set of replicates. Hover over the bars to see the number of genes in each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Seed Match Distribution

The bar plot below shows the seed match distribution of miRNA target peaks and reproducible miRNA target peaks for each set of sample conditions. miRNA target peaks were annotated with the top miRNA that contributed to each peak. Then, seed match information was added to each peak according to seed match definitions from TargetScan, allowing for up to one mismatch in the seed region.

The following seed matches are classified as "Canonical":

7mer-m8: An exact match to positions 2-8 of the mature miRNA (the seed + position 8)
7mer-1A: An exact match to positions 2-7 of the mature miRNA (the seed) followed by an 'A'
8mer: An exact match to positions 2-8 of the mature miRNA (the seed + position 8) followed by an 'A'
6mer: An exact match to positions 2-7 of the mature miRNA (the seed)

The following seed matches are classified as "Noncanonical":

6mer offset: An exact match to positions 3-8 of the mature miRNA
A 3' compensatory site is one in which strong 3' pairing (consequential miRNA-target complementarity outside the seed region) compensates for an imperfect seed match (Friedman et al., 2009).
A centered site is one that lacks perfect seed pairing and 3'-compensatory pairing but instead has 11-12 contiguous Watson-Crick pairs to miRNA positions 4-15. These are identified only in the reference species and therefore include no information about conservation.

Hover over each bar to see the number of peaks and % of peaks containing each type of seed match for each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Abundance in Chimeric Reads

The following plot shows the percentage of chimeric reads containing each miRNA. Each miRNA is depicted as one point for each sample. Hover over a point to see which miRNA it corresponds to, and what percentage of chimeric reads contain that miRNA. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

Top miRNA Strand Enrichment

The following plot shows miRNA 5p/3p ratios for the top 5 most abundant miRNAs, where abundance is calculated as a sum of the 5p and 3p arms. The color scale shows the magnitude of the 5p/3p ratio, with an orange color showing a miRNA with a positive 5p/3p ratio (more miRNA 5p reads compared to 3p) and a blue color showing a miRNA with a negative 5p/3p ratio (more miRNa 3p reads compared to 5p). The shape also illustrates the strand enrichment, where 5p enriched miRNAs are shown as an upright triangle and 3p enriched miRNAs are shown as an upside down triangle. miRNAs with "No strand specificity" had a 5p/3p ratio between -1 and 1. Hover over the triangles to see the percentage of chimeric reads containing that miRNA, as well as the 5p/3p ratio. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Feature Distribution

The following bar plot depicts the feature distribution of significantly enriched miRNA target peaks in each sample, as well as reproducible miRNA target peaks for each set of replicates. Peaks are annotated according to the following hierarchy: CDS, 5'UTR or 3'UTR, miRNA, intron, and other. Hover over the bars to see the number of peaks and percentage of peaks per feature in each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Width Distributionn

This boxplot shows the miRNA target peak width distribution of significantly enriched peaks in each sample, as well as reproducible miRNA target peaks for each set of replicates. Hover over the boxplot to see the minimum, lower fence, q1, median, q3, upper fence, and maximum peak widths for each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Metagene

The metagene plot shows the average number of miRNA target peaks along the 5'UTR, CDS, and 3'UTR for peaks called in each sample, as well as reproducible miRNA target peaks for each set of samples. In order to create the metagene, peaks were downsampled to match the peak set with the smallest number in order to account for differences in the number of significant peaks. Gene lengths were normalized, and the average number of peaks along each normalized condition was calculated and plotted for each peak set. Hoever over the line to see the average number of peaks at that position. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Motifs

This plot shows the top five most significant motifs found by HOMER de novo motif analysis, as well as the feature distribution for miRNA target peaks containing each motif. The top portion of the plot shows each motif as a dot, where the percentage of peaks containing that motif is plotted against the -log₁₀(p-value) for that motif. The bottom portion of the plot is a bar plot showing the feature distribution for all peaks (left side) compared to the feature distribution of peaks containing the selected motif (right side). Click on one of the motifs in the scatterplot, and the bar plot will populate with the feature distribution for peaks containing the selected motif. Click off of that point to unselect that motif. The drop down menu in the bottom left corner can be used to navigate between different sets of peaks. For this plot, only motifs called from reproducible peaks are shown (if applicable). All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Gene Enrichment

This wordcloud shows significantly enriched GO terms and KEGG pathways identified using the tool clusterProfiler. This analysis looks for enriched functional types or pathways in the set of genes that contain at least one miRNA target peak. The GO terms are split into the following three categories: Biological Process, Cell Component, and Molecular Function. The drop down menu in the bottom left corner can be used to navigate between different sets of peaks. For this plot, only motifs called from reproducible peaks are shown (if applicable). Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

We used DESeq2. to confidently identify statistically significant differences between sample groups. With DESeq2, we first normalize the raw counts by genomic peak across all our libraries, using relative log expression (REL). This helps us account for differences in sequencing depth and other factors that could affect our results. Next, we use information from all the peaks to accurately estimate the variability (dispersion) between samples for each peak in a way that considers the logarithmic nature of miR-eCLIP data. This step is essential because it helps us identify the most reliable differences between groups. Finally, we use the dispersions to divide the log2 fold changes between conditions and calculate a statistical test called the "Wald" test. This test helps us determine whether our observed differences are likely real or just due to chance. We also obtain p-values from this test for each peak and adjust them for multiple comparisons using the FDR procedure.

For each differential analysis performed, a table and volcano plot is provided below. The table lists the coordinates of each chimeric peak, and which gene and miRNA is associated with the peak. The average columns list the average, normalized chimeric read counts in each sample group. The associated fold change and -log₁₀-transformed significance is shown as well. The volcano plot shows the fold change versus the significance, with genes colored by which sample group they are enriched in.

This following files are provided along with this analysis:

File	Description
mir_targets.bam	BAM file containing unique miRNA target sequence alignments to the genome, after the miRNA portion of the reads has been trimmed. Nonchimeric reads were filtered out and PCR duplicates were removed.
mir_targets.neg.bw mir_targets.pos.bw	bigWig tracks for the negative (neg) and positive (pos) strands of DNA.
mir_targets_reproducible_clusters.bed	This file is only included if the experiment contained replicate samples. miRNA target clusters found to be reproducible and significant in all replicates with the same format as above.
report.html	HTML report containing plots, enriched GO terms and KEGG pathways, HOMER motif analysis, and repetitive element mapping information for enriched peaks.