外显子WES数据检测CNV方法梳理与软件汇总
总结(看的有限):
1. 大部分方法为基于深度(划分区间检测)的方法(FPKM、区域碱基数的类似FPKM算法)。
2. 大部分软件会使用对照样本作为reference作为基线,但是reference本身的方差可能会比较大,方差可能是bias也可能是CNV等,方差的差异在算法处理过程中是不能消除的。并且大部分使用对照的算法在检测common CNV的时候可能都不准确,比如人群频率50%的CNV,reference处理的时候,可能会把单拷贝作为正常二倍体处理,这样正常二倍体可能会被作为三倍体检出。
3. 外显子数据矫正常利用GC含量、mappability对覆盖深度进行矫正。比如GC矫正,一般是对于一个窗口w,标准化后的深度,等于窗口原始深度值/具有相同GC含量窗口的深度值。
4. 检测CNV之前一般会有质控,去除一些bias较大的区间,比如考虑区间覆盖度,样本整体覆盖情况,GC含量极端区间等。
5. 数据降噪方法常见 PCA、SVD(一般去除前k个noise)。一般应用这类方法的时候,也就可能去除掉common CNV的信号,所以会看到有些软件在检测common的性能上不太好。commom CNV有考虑的,比如CLAMMS的一个主要优化点就是同时考虑的common的CNV的特征,做批次效应去除的时候不用深度文件,而是用picard产生的metrics。另外CODEX2,在无正常对照的时候「也需要一堆样本同时检测」可以检测所有样本的common CNV,文章数据表现很好。
6. CNV检测算法常见HMM,CBS,新一点的方法还会用机器学习,其他的使用比较少也看不太懂~检测区间一般是跨越多个外显子,也有能做到单外显子水平的,但是比较少且recall不太好(deletion相对更容易做到)~
题目:
1. CopyDetective: Detection threshold-aware copy number variant calling in whole-exome sequencing data.
2. Detection of copy-number variations from NGS data using read depth information: a diagnostic performance evaluation.
3. Copy Number Variation Detection Using Total Variation.
4. A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data.
5. Copy number variation profiling in pharmacogenes using panel-based exome resequencing and correlation to human liver expression.
6. A machine-learning approach for accurate detection of copy number variants from exome sequencing.
7. Atlas-CNV: a validated approach to call single-exon CNVs in the eMERGESeq gene panel.
8. CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing.
9. Clinical analysis of germline copy number variation in DMD using a non-conjugate hierarchical Bayesian model.
10. Preprocessing Sequence Coverage Data for More Precise Detection of Copy Number Variations.
11. Integrative DNA copy number detection and genotyping from sequencing and array-based platforms.
12. WISExome: a within-sample comparison approach to detect copy number variations in whole exome sequencing data.
13. Anaconda: AN automated pipeline for somatic COpy Number variation Detection and Annotation from tumor exome sequencing data.
14. ExCNVSS: A Noise-Robust Method for Copy Number Variation Detection in Whole Exome Sequencing Data.
15. Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN
16. Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort.
17. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing
18. CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data.
19. A Sparse Model Based Detection of Copy Number Variations From Exome Sequencing Data.
20. DeAnnCNV: a tool for online detection and annotation of copy number variations from whole-exome sequencing data.
21. CopywriteR: DNA copy number detection from off-target sequence data.
22. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.
23. CODEX: a normalization and copy number variation detection method for whole exome sequencing.
24. Combinatorial approach to estimate copy number genotype using whole-exome sequencing data.
25. Assessing copy number from exome sequencing and exome array CGH based on CNV spectrum in a large clinical cohort.
26. Detection of internal exon deletion with exon Del.
27. cnvCapSeq: detecting copy number variation in long-range targeted resequencing data.
28. Inferring copy number and genotype in tumour exome data
29. cnvOffSeq: detecting intergenic copy number variation using off-target exome sequencing data.
30. Identification of copy number variants from exome sequence data.
31. PatternCNV: a versatile tool for detecting copy number changes from exome sequencing data.
32. EXCAVATOR: detecting copy number variants from whole-exome sequencing data
33. CoNVEX: copy number variation estimation in exome sequencing data using HMM.
34. Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation
35. Modeling read counts for CNV detection in exome sequencing data
36. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth.
37. An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis.
38. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling.
39. Copy number variation detection and genotyping from exome sequence data
40. CONTRA: copy number analysis for targeted resequencing
41. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate
42. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing
43. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data
44. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV.
45. CNV-seq, a new method to detect copy number variation using high-throughput sequencing