SBS分析:cfDNA低深度测序进行标志物开发

2023-10-20  本文已影响0人  信你个鬼

基本假设:

由于分裂细胞释放DNA片段以无细胞DNA(cfDNA)的形式进入循环,cfDNA可能反映了体细胞组织的突变特征。
血浆全基因组测序(WGS)中的点突变由于测序覆盖率低和突变等位基因分数低,通过常规的突变调用来识别具有挑战性。

本文的主要内容:

在这项对215名患者和227名健康个体的血浆WGS在0.3-1.5倍的覆盖率进行的概念证明研究中,我们发现在血浆中可以识别出病理和生理突变特征。通过将机器学习应用于突变谱,可以将I-IV期癌症患者与健康个体区分开来,曲线下面积为0.96。在血浆中的询问突变过程可以使更早地发现癌症,并可能评估癌症风险和病因。

本研究开发了一个pipeline工具 Pointy:可以 从低覆盖率的血浆WGS中提取点突变。
代码(非开源,需要申请,仅供学术使用,申请成功率低):https://doi.org/10.5281/zenodo.6666951

软件基本流程如下:

image-20231014190543608.png

全文的研究思路:

image-20231014185735480.png

主要结果

结果一:Pointy data数据描述与标准化

数据包括PGDX队列37个样本:测序深度中位数31.0 × 106 reads, duplication rate of 0.37%

这一批数据进行降采样后分析:downsampled to a target of 0.3× (10M paired-end reads)
数据的覆盖度等情况:

image-20231020161537066.png

GC矫正

PGDX cohort有两个测序批次,作者使用这个队列里面的正常样本的SBS谱进行PCA分析,观察到了明显的批次效应,而这个批次很有可能是GC含量引起的:

image-20231020162438590.png

且每个样本中的PC1贡献在两个测序批次中存在显著差异:

image-20231020162612899.png

因此作者对每个样本的SBS谱进行GC含量矫正。

结果二:结直肠癌血浆中突变特征的检测

mutational signatures主要使用MutationalPatterns R包进行分析。
CRC样本中,血浆Pointy signatures最大贡献的是SBS1 (aging) and SBS54 (probable SNP contamination),分别为339 (13.0%) and 379 (15.0%) mutations(图a)。
CRC样本 vs 健康样本:有差异的Pointy signatures如SBS1,SBS21 (microsatellite instability, MSI)。
作者还计算了这些signatures贡献度与 ctDNA fraction(图c)和 tumor mutation burden (TMB)的相关性(图d),显著相关。

image-20231014222436395.png

结果三:结直肠癌检测

基于SBS mutation profile对癌症样本和正常样本进行分类:主要对SBS profiles进行PCA降维分析,利用PCA主成分作为机器学习算法的输入数据。
作者用了四种机器学习算法:xgboost, random forest (RF), support vector machine (SVM), and logistic regression。
并进行嵌套的十倍交叉验证,重复10次,其中随机森林预测效果最好。

image-20231020175035473.png

结果四:多种癌症检测的signatures

数据队列2:the DELFI cohort,血浆WGS数据

图4a:在整个队列中,检测到≥1signature特征的患者比例从0.85(NSCLC)到0.38(胰腺癌)
图4b:按stage计算,≥1signature特征的检出率从I期疾病的0.70到IV期疾病的0.75
图4c:在I-IV期结直肠癌患者中,27例患者中有21例检测到血浆信号

image-20231020233122161.png

结果五:跨癌症类型的癌症分类

使用10倍嵌套交叉验证和500次迭代的随机森林模型将样本分类为健康或癌症。对于所有癌症类型和分期的总体检测(n = 199),AUC为0.96(95%CI 0.94–0.98)。

image-20231020235148419.png

总的来说,这篇文献主要是利用SBS mutation profile构建各种分类模型,用于癌症的检测。

扩展SBS

ref:The repertoire of mutational signatures in human cancer Nature. 2020 Feb;578(7793):94-101. doi: 10.1038/s41586-020-1943-3. Epub 2020 Feb 5

We developed classifications for each type of mutation. For SBSs, the primary classification comprised 96 classes (available at https://cancer.sanger.ac.uk/cosmic/signatures/SBS) constituted by the 6 base substitutions C>A, C>G, C>T, T>A, T>C and T>G (in which the mutated base is represented by the pyrimidine of the base pair), plus the flanking 5′ and 3′ bases. In some analyses, two flanking bases 5′ and 3′ to the mutated base were considered (producing 1,536 classes) or mutations within transcribed genome regions were selected and classified according to whether the mutated pyrimidine fell on the transcribed or untranscribed strand (producing 192 classes). We also derived a classification for DBSs (78 classes; available at https://cancer.sanger.ac.uk/cosmic/signatures/DBS). Indels were classified as deletions or insertions and—when of a single base—as C or T, and according to the length of the mononucleotide repeat tract in which they occurred. Longer indels were classified as occurring at repeats or with overlapping microhomology at deletion boundaries, and according to the size of indel, repeat and microhomology (83 classes; available at https://cancer.sanger.ac.uk/cosmic/signatures/ID)

文献信息
Genome-wide mutational signatures in low coverage whole genome sequencing of cell-free DNA
Nat Commun. 2022 Aug 23;13(1):4953. doi: 10.1038/s41467-022-32598-1

上一篇下一篇

猜你喜欢

热点阅读