This repository contains all data and scripts used to generate the numbers and figures for the circRNA detection tool benchmarking paper published in Nature Methods.
The data
folder contains
- the
Supplementary_Table_2_all_circRNAs.txt.gz
file, which contains all predicted circRNAs in the untreated sample in this study with their annotation. - the
Supplementary_Table_3_selected_circRNAs.txt
file, which contains all 1560 circRNAs selected for validation, with their initial detection information (tool, BSJ count), their primer information (including FWD and REV primer sequence), results from three validation methods (Cq value with and without RNase R, Cq difference, and amplicon sequencing percent on-target amplification), validation metrics, and annotation information. This file was generated by the01_calculate_val_rates.R
script. - the
Supplementary_Table_4_all_circRNAs_treated.txt.gz
file, which contains all predicted circRNAs in the RNase R treated sample in this study with their annotation. - the
Supplementary_Table_5_RNase_R_enrichment_seq.txt.gz
file, which contains the RNase R enrichment factor calculated based on RNA sequencing data for each circRNA. This file was generated by the01_calculate_val_rates.R
script. - the
Supplementary_Table_6A_precision_values.txt
file, which contains the validation metrics (per-methods precision, compound precision, theoretical number of TP circRNAs) for each tool. This file was generated using the01_calculate_val_rates.R
script. - the
Supplementary_Table_6B_sensitivity_values.txt
file, which contains the validation metrics (per-methods precision, compound precision, theoretical number of TP circRNAs, estimated sensitivity) for each tool. This file was generated using the01_calculate_val_rates.R
script. - the
Supplementary_Table_6_tool_ranking.txt
file, which is a summary ofSupplementary_Table_6A_precision_values.txt
andSupplementary_Table_6B_sensitivity_values.txt
. This file was generated using the01_calculate_val_rates.R
script. - the
Supplementary_Table_7_combo_2tools.txt
file, which contains the number of circRNAs in the intersection and union of each combination of two tools, per cell line sample (only for circRNAs with BSJ count ≥ 5). This file was generated by the03_combination_tools.R
script. - the
Supplementary_Table_8_combo_3tools.txt
file, which contains the number of circRNAs in the intersection and union of each combination of three tools, per cell line sample (only for circRNAs with BSJ count ≥ 5). This file was generated by the03_combination_tools.R
script. - the
Supplementary_Table_9_top_tool_combinations.txt
file which contains a list of the top performing combinations of two tools. The list was composed by selecting the top 5 performing combinations in terms of the total number of detected circRNAs (union between both tools) and the weighted compound precision, for each cell line. This file was generated by the03_combination_tools.R
script. - the
details
folder, which contains some files needed for the following scripts to generate some of the Supplementary Figures and Tables.circ_db_hg38.txt
is a table with all circRNAs in all circRNA databases from a previous publication.
The data_analysis
folder contains
01_calculate_val_rates.R
file, which contains the calculations of the validation metrics (per-methods precision, compound precision, theoretical number of TP circRNAs, estimated sensitivity) and generatesSupplementary_Table_3_selected_circRNAs.txt
,Supplementary_Table_6A_precision_values.txt
,Supplementary_Table_6B_sensitivity_values.txt
,Supplementary_Table_6_tool_ranking.txt
, andSupplementary_Table_5_RNase_R_enrichment_seq.txt
.02_calculations_paper.R
file, which contains all calculations reported in the manuscript.03_combination_tools.R
file, which contains all calculation for the union and intersection of two or three tools and generatesSupplementary_Table_7_combo_2tools.txt
,Supplementary_Table_8_combo_3tools.txt
, andSupplementary_Table_9_top_tool_combinations.txt
.04_annotation_and_validation.R
file, which contains all calculations described in the paragraph Comparing precision values in function of circRNA annotation.
The figure_generating
folder contains the R scripts and R markdowns to generate all Figures and Supplementary Figures in the manuscript.
One of the collaborators noticed a mistake in the published data and figures. This mistake has now been rectified in the GitHub repo. The corrections have been submitted to Nature Methods and we are currently waiting for the online publication to be updated. In summary, an accidental basepair shift changed the BSJ position of 5% of the circRNAs. All the main conclusions and the majority of the figures stay the same.
In detail: 55,238 out of 1,137,055 (~ 5%) circRNAs identified in the paper were accidentally shifted one nucleotide in both the start (-1) and end position (+1), and were therefore wrongly annotated. For example: circRNA chr18:8718424-8720495 in the original data became chr18:8718423-8720496. This set of wrongfully annotated circRNAs came from 3 tools: KNIFE, NCLscan, and NCLcomparator. This happened during a wrongly performed ‘correction’ of 1-based to 0-based annotation. This error has now been fixed.
This mistake had an inmpact on:
- the annotation of a subset of circRNAs
- the overlap among tools
- the amplicon sequencing precision (subgroup BSJ count ≥ 5) is sligthly higher for KNIFE and NCLcomparator. Therefore, also their compound precision is slightly higher.
- the sensitivity has changed as there is more overlap among the tools than initially measured. The set of true positive circRNAs is thus 949 unique circRNAs (instead of 957) (Sup Table 6B). This also slightly changes the tool ranking (Sup Table 6).
- all tables and sup tables
- the following figures (most of them are only small changes):
- main panels: 2C, 2D, 3A, 3B, 4A, 5B
- sup figures: 4, 5, 6, 14, 21, 22, 23, 24, 25, 27, 29, 30, 33, 36, 37, 38, 40
Vromman, M., Anckaert, J., Bortoluzzi, S. et al. Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision. Nat Methods 20, 1159–1169 (2023). https://doi.org/10.1038/s41592-023-01944-6