circRNAceRNA

【circRNA】circRNA鉴定-CIRI2篇

2021-11-18  本文已影响0人  jjjscuedu

circRNA的生信分析根据鉴定的方法不同可以分为以下两种:

split-alignment-based approaches:针对内含子驱动模式下的反向可变剪切接头序列(back-spliced junction)设计的预测软件,如 find_circ、CIRCexplorer、CIRI和 MapSplice等;

pseudoreference-based approaches:通过基因组注释信息推测得到反向可变剪切接头序列,然后与注释的外显子序列进行匹配,预测得到新 circRNA的软件,如 KNIFE、 NCLscan等。

CIRI2是一款使用BWA-MEM比对结果,支持基于BSJ de novo的检测。在CIRI基础上,改进了MLE模型,判断潜在BSJ read中是否多个可能区域,有效控制由错误mapping或者基因组中重复序列所导致的假阳性。同时,CIRI2在测试数据中F1得分平均值最高,且较其他识别程序消耗更少的内存及运算时间。CIRC2需求经RNase R处理的样本数据。

===下载====

地址:https://sourceforge.net/projects/ciri/files/CIRI2/

=====比对====

bwa mem -t 40 -T 19  Nitab-v4.5_genome_Scf_Edwards2017.fasta CK_0_1_1.fq.gz CK_0_1_2.fq.gz >CK_0_1.sam

注:-T 是输出结果比对得分的阈值,默认值是30,很多帖子解释通过大部分数据测试,发现19这个值最好,可以提高CIRI的敏感性。所以选择和大家一致的参数。

====鉴定====

perl CIRI2.pl -F Nitab-v4.5_genome_Scf_Edwards2017.fasta -I CK_0_1.sam -O CK_0_1 -T 30 -A Nitab-v4.5_gene_models_Scf_Edwards2017.gtf

其中:

-I, --in  input SAM file name (required; generated by BWA-MEM)

-O, --out output circRNA list name (required)

-F, --ref_file FASTA file of all reference sequences

-A, --anno input GTF/GFF3 formatted annotation file name (optional)

输出结果如下图所示:

输出格式说明:

Column 1: circRNA_ID

Column 2: chromosome of a predicted circRNA

Column 3: circRNA_start

Column 4: circRNA_end

Column 5: circular junction read count of a predicted circRNA

Column 6: unique CIGAR types of a predicted circRNA. For example, a circRNAs have three junction reads: read A (80M20S, 80S20M), read B (80M20S, 80S20M), read C (40M60S, 40S30M30S, 70S30M), then its has two SM types (80S20M, 70S30M), two MS types (80M20S, 70M30S) and one SMS type (40S30M30S). Thus its SM_MS_SMS should be 2_2_1.

Column 7: non-junction read count of a predicted circRNA that mapped across the circular junction but consistent with linear RNA instead of being back-spliced

Column 8: ratio of circular junction reads calculated by 2*#junction_reads/(2*#junction_reads+#non_junction_reads). #junction_reads is multiplied by two because a junction read is generated from two ends of circular junction but only counted once while a non-junction read is from one end. It has to be mentioned that the non-junction reads are still possibly from another larger circRNA, so the junction_reads_ratio based on it may be an inaccurate estimation of relative expression of the circRNA.

Column 9: type of a circRNA according to positions of its two ends on chromosome (exon, intron or intergenic_region; only available when annotation file is provided)

Column 10: ID of the gene(s) where an exonic or intronic circRNA locates

Column 11: strand info of a predicted circRNAs (new in CIRI2)

Column 12: all of the circular junction read IDs (split by ",")

本文使用 文章同步助手 同步

上一篇 下一篇

猜你喜欢

热点阅读