K562 miR-eCLIP Summary Report

This interactive report provides an overview of miR-eCLIP analysis results, accessible at the different tabs. For each table and plot, the button can be clicked to get more information. The menus in the top right corner of each plot can be used to modify the plots (zoom, hide samples, and save a copy of the plot). This report is optimized for viewing on a desktop and plots have been tested on several browsers but in some cases they may not render correctly, please use an alternate browser if plots do not load.

If you are interested in performing your own miR-eCLIP experiment please contact us.

Experiment Summary

This report summarizes a miR-eCLIP experiment performed on the K562 cell line. Seven replicate samples were run using Eclipsebio's proprietary miR-eCLIP technology to identify miRNA binding sites. Full methods details are available upon request.

Quality Control Metrics

The following table shows important quality control metrics for each sample. Two libraries are made for each sample, one containing immunoprecipitated (IP) RNA and the other containing input RNA. For each sample ("Sample"), two rows are included that list the metrics for each library. The identity of the library is included in the "IP or Input" column.

The following columns describe the processing of the libraries:

Initial reads: the number of sequenced reads for each library.
% Pass trim: reads first underwent adapter trimming to remove both adapters and sequences shorter than 18 nucleotides. The percentage of sequenced reads that passed this trimming step is listed in this column.
% Repetitive elements: trimmed reads are aligned to Eclipsebio's custom database of repetitive elements. This database includes rRNAs, tRNAs, snoRNAs, and other features. The percentage of trimmed reads that were filtered is listed in the column.
% Uniquely aligned to the genome: the remaining reads from the repetitive element filtering were aligned to the genome, and the percent of those reads that map uniquely to the genome is listed in this column.
% PCR duplicates: PCR duplicates were defined as uniquely mapped reads that map to the same read coordinates and have an identical unique molecular identifier (UMI). The percentage of mapped reads that were identified as PCR duplicates is included in this column.
Final nonchimeric reads: this column lists the number of uniquely mapped, deduplicated, nonchimeric reads that were used for this analysis.
AGO2 clusters: AGO2 clusters are found using the peak calling tool CLIPper. The clusters are identified from nonchimeric reads in the IP samples and are not input normalized.
AGO2 peaks: A peak is defined as a cluster with a log₂(fold enrichment over the matched input) > 3 and p-value < 0.001.
Final chimeric reads: The number of chimeric reads that mapped to the genome after trimming the miRNA component of the chimeric read.
% Chimeras: percentage of chimeric reads that mapped to the genome out of the total number of reads that mapped to the genome (chimeric + nonchimeric)
Chimeric clusters: chimeric clusters are found using the peak calling tool CLIPper. The clusters are identified from genome mapping component of chimeric reads in the IP samples and are not input normalized.

Sample	IP or Input	Initial reads	Final chimeric reads	% Chimeras	Chimeric clusters
Sample1_IP	IP	320,917,637	993,564	0.31%	19,947
Sample2_IP	IP	235,229,989	876,608	0.37%	18,340
Sample3_IP	IP	306,779,036	1,040,592	0.34%	20,969
Sample4_IP	IP	284,173,064	1,192,868	0.42%	23,170
Sample5_IP	IP	272,072,956	1,241,972	0.46%	24,145
Sample6_IP	IP	242,028,592	1,113,529	0.46%	21,546
Sample7_IP	IP	234,744,724	886,809	0.38%	18,018
Sample1_Input	Input	55,156,032
Sample2_Input	Input	41,475,423
Sample3_Input	Input	53,869,196
Sample4_Input	Input	52,392,618
Sample5_Input	Input	52,393,435
Sample6_Input	Input	47,084,819
Sample7_Input	Input	43,247,006

This tab summarizes the identified miRNA target peaks from the chimeric reads.

miRNA Target Counts

miRNA target peaks are found using the peak calling tool CLIPper. Reproducible peaks are identified by filtering for peaks where each replicate contains at least three chimeric reads with miRNAs of the same miRNA seed family overlapping the peak. The bar plot below shows the number of peaks detected in each sample, as well as the number of reproducible peaks for each set of replicates. Hover over each bars to see the number of peaks in each set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Gene Counts

The bar plot below shows the number of genes containing at least one significantly miRNA target peak per sample, as well as the number of genes containing at least one miRNA target peak for each set of replicates. Hover over the bars to see the number of genes in each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Seed Match Distribution

The bar plot below shows the seed match distribution of miRNA target peaks and reproducible miRNA target peaks for each set of sample conditions. miRNA target peaks were annotated with the top miRNA that contributed to each peak. Then, seed match information was added to each peak according to seed match definitions from TargetScan, allowing for up to one mismatch in the seed region.

The following seed matches are classified as "Canonical":

7mer-m8: An exact match to positions 2-8 of the mature miRNA (the seed + position 8)
7mer-1A: An exact match to positions 2-7 of the mature miRNA (the seed) followed by an 'A'
8mer: An exact match to positions 2-8 of the mature miRNA (the seed + position 8) followed by an 'A'
6mer: An exact match to positions 2-7 of the mature miRNA (the seed)

The following seed matches are classified as "Noncanonical":

6mer offset: An exact match to positions 3-8 of the mature miRNA
A 3' compensatory site is one in which strong 3' pairing (consequential miRNA-target complementarity outside the seed region) compensates for an imperfect seed match (Friedman et al., 2009).
A centered site is one that lacks perfect seed pairing and 3'-compensatory pairing but instead has 11-12 contiguous Watson-Crick pairs to miRNA positions 4-15. These are identified only in the reference species and therefore include no information about conservation.

Hover over each bar to see the number of peaks and % of peaks containing each type of seed match for each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Abundance in Chimeric Reads

The following plot shows the percentage of chimeric reads containing each miRNA. Each miRNA is depicted as one point for each sample. Hover over a point to see which miRNA it corresponds to, and what percentage of chimeric reads contain that miRNA. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

Top miRNA Strand Enrichment

The following plot shows miRNA 5p/3p ratios for the top 5 most abundant miRNAs, where abundance is calculated as a sum of the 5p and 3p arms. The color scale shows the magnitude of the 5p/3p ratio, with an orange color showing a miRNA with a positive 5p/3p ratio (more miRNA 5p reads compared to 3p) and a blue color showing a miRNA with a negative 5p/3p ratio (more miRNa 3p reads compared to 5p). The shape also illustrates the strand enrichment, where 5p enriched miRNAs are shown as an upright triangle and 3p enriched miRNAs are shown as an upside down triangle. miRNAs with "No strand specificity" had a 5p/3p ratio between -1 and 1. Hover over the triangles to see the percentage of chimeric reads containing that miRNA, as well as the 5p/3p ratio. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Feature Distribution

The following bar plot depicts the feature distribution of significantly enriched miRNA target peaks in each sample, as well as reproducible miRNA target peaks for each set of replicates. Peaks are annotated according to the following hierarchy: CDS, 5'UTR or 3'UTR, miRNA, intron, and other. Hover over the bars to see the number of peaks and percentage of peaks per feature in each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Width Distributionn

This boxplot shows the miRNA target peak width distribution of significantly enriched peaks in each sample, as well as reproducible miRNA target peaks for each set of replicates. Hover over the boxplot to see the minimum, lower fence, q1, median, q3, upper fence, and maximum peak widths for each peak set. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Metagene

The metagene plot shows the average number of miRNA target peaks along the 5'UTR, CDS, and 3'UTR for peaks called in each sample, as well as reproducible miRNA target peaks for each set of samples. In order to create the metagene, peaks were downsampled to match the peak set with the smallest number in order to account for differences in the number of significant peaks. Gene lengths were normalized, and the average number of peaks along each normalized condition was calculated and plotted for each peak set. Hoever over the line to see the average number of peaks at that position. All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Motifs

This plot shows the top five most significant motifs found by HOMER de novo motif analysis, as well as the feature distribution for miRNA target peaks containing each motif. The top portion of the plot shows each motif as a dot, where the percentage of peaks containing that motif is plotted against the -log₁₀(p-value) for that motif. The bottom portion of the plot is a bar plot showing the feature distribution for all peaks (left side) compared to the feature distribution of peaks containing the selected motif (right side). Click on one of the motifs in the scatterplot, and the bar plot will populate with the feature distribution for peaks containing the selected motif. Click off of that point to unselect that motif. The drop down menu in the bottom left corner can be used to navigate between different sets of peaks. For this plot, only motifs called from reproducible peaks are shown (if applicable). All plots in this report are interactive, meaning you can zoom in/out, pan to a certain location, and select elements from the legend to remove. Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

miRNA Target Gene Enrichment

This wordcloud shows significantly enriched GO terms and KEGG pathways identified using the tool clusterProfiler. This analysis looks for enriched functional types or pathways in the set of genes that contain at least one miRNA target peak. The GO terms are split into the following three categories: Biological Process, Cell Component, and Molecular Function. The drop down menu in the bottom left corner can be used to navigate between different sets of peaks. For this plot, only motifs called from reproducible peaks are shown (if applicable). Plots can also be downloaded as a separate file. Hover over the top right corner of the plot to see the interactive options.

We used DESeq2. to confidently identify statistically significant differences between sample groups. With DESeq2, we first normalize the raw counts by genomic peak across all our libraries, using relative log expression (REL). This helps us account for differences in sequencing depth and other factors that could affect our results. Next, we use information from all the peaks to accurately estimate the variability (dispersion) between samples for each peak in a way that considers the logarithmic nature of miR-eCLIP data. This step is essential because it helps us identify the most reliable differences between groups. Finally, we use the dispersions to divide the log2 fold changes between conditions and calculate a statistical test called the "Wald" test. This test helps us determine whether our observed differences are likely real or just due to chance. We also obtain p-values from this test for each peak and adjust them for multiple comparisons using the FDR procedure.

For each differential analysis performed, a table and volcano plot is provided below. The table lists the coordinates of each chimeric peak, and which gene and miRNA is associated with the peak. The average columns list the average, normalized chimeric read counts in each sample group. The associated fold change and -log₁₀-transformed significance is shown as well. The volcano plot shows the fold change versus the significance, with genes colored by which sample group they are enriched in.

This following files are provided along with this analysis:

File	Description
mir_targets.bam	BAM file containing unique miRNA target sequence alignments to the genome, after the miRNA portion of the reads has been trimmed. Nonchimeric reads were filtered out and PCR duplicates were removed.
mir_targets.neg.bw mir_targets.pos.bw	bigWig tracks for the negative (neg) and positive (pos) strands of DNA.
mir_targets_reproducible_clusters.bed	This file is only included if the experiment contained replicate samples. miRNA target clusters found to be reproducible and significant in all replicates with the same format as above.
report.html	HTML report containing plots, enriched GO terms and KEGG pathways, HOMER motif analysis, and repetitive element mapping information for enriched peaks.