人类癌症中体细胞DNA变异的绝对定量 --- ABSOLUTE

2020-04-12 本文已影响0人林枫bioinfo

1. 概述

ABSOLUTE 是美国著名生物医学科研机构Broad Institute推出的通过肿瘤样本体细胞拷贝数变异（CNV）与单核苷酸位点变异（SNV）数据，推断肿瘤纯度及恶性肿瘤细胞倍性的算法。有对应R包可以直接拿来使用。文章2012年发表于nature biotechnology，至今引用量已破千。

首先论及推断绝对拷贝数的困难性：

Inferring absolute copy number is more difficult for three reasons: (i) cancer cells are nearly always intermixed with an unknown fraction of normal cells (tumor purity); (ii) the actual DNA content of the cancer cells (ploidy), resulting from gross numerical and structural chromosomal abnormalities, is unknown9–13; and (iii) the cancer cell population may be heterogeneous, perhaps owing to ongoing subclonal evolution.

然后描述 ABSOLUTE 的推理框架：
假定一个混合物组成的癌症组织样本，癌症细胞比例为 α（假定为单染色体组），则正常细胞比例为 1 - α（二倍体）。对于基因组中的每个位点 x，设 q(x) 为癌细胞中该位点的整数拷贝数。设 τ 为癌细胞的平均倍性，定义为整个基因组 q(x) 的平均值。
那么在混合的癌症样本中，位点 x 的平均绝对拷贝数为 αq(x) + 2(1 − α) ；平均倍性D就是 ατ + 2(1 − α)。

因此位点 x 的相对拷贝数为：

由于 q(x) 为整数，所以 R(x)为离散型数值，且其可能的最小值为 2(1 − α)/D，发生在纯和缺失位点，与正常细胞的DNA片段相对应。
而引入SNV数据，可进一步提供相关支持信息：

ABSOLUTE算法会通过相关模型优化α和τ，即肿瘤纯度和癌细胞平均倍性。
整体算法很复杂，不是很能理解，现阶段直接运用R包，将来会再回顾。

2. 应用

R包在官网 https://software.broadinstitute.org/cancer/cga/absolute_download下载，需要先注册。
示例代码在 https://software.broadinstitute.org/cancer/cga/absolute_run 查看，列在下面。其包含了3个主函数：DoAbsolute()，用于设置各个参数的取值； RunAbsolute()，真正的run； CreateReviewObject()，用于结果整合。因为源代码的设计是一次只能运行一个样本，所以可以利用相关R包进行并行计算。

DoAbsolute <- function(scan, sif) {
  registerDoSEQ()
  library(ABSOLUTE)
  plate.name <- "DRAWS"
  genome <- "hg18"
  platform <- "SNP_250K_STY"
  primary.disease <- sif[scan, "PRIMARY_DISEASE"]
  sample.name <- sif[scan, "SAMPLE_NAME"]
  sigma.p <- 0
  max.sigma.h <- 0.02
  min.ploidy <- 0.95
  max.ploidy <- 10
  max.as.seg.count <- 1500
  max.non.clonal <- 0
  max.neg.genome <- 0
  copy_num_type <- "allelic"
  seg.dat.fn <- file.path("output", scan, "hapseg",
                          paste(plate.name, "_", scan, "_segdat.RData", sep=""))
  results.dir <- file.path(".", "output", scan, "absolute")
  print(paste("Starting scan", scan, "at", results.dir))
  log.dir <- file.path(".", "output", "abs_logs")
  if (!file.exists(log.dir)) {
     dir.create(log.dir, recursive=TRUE)
  }
  if (!file.exists(results.dir)) {
     dir.create(results.dir, recursive=TRUE)
  }
  sink(file=file.path(log.dir, paste(scan, ".abs.out.txt", sep="")))
  RunAbsolute(seg.dat.fn, sigma.p, max.sigma.h, min.ploidy, max.ploidy, primary.disease, 
              platform, sample.name, results.dir, max.as.seg.count, max.non.clonal, 
              max.neg.genome, copy_num_type, verbose=TRUE)
  sink()
}
arrays.txt <- "./paper_example/mix250K_arrays.txt"
sif.txt <- "./paper_example/mix_250K_SIF.txt"
## read in array names
scans <- readLines(arrays.txt)[-1]
sif <- read.delim(sif.txt, as.is=TRUE)
library(foreach)
## library(doMC)
## registerDoMC(20)
foreach (scan=scans, .combine=c) %dopar% {
  DoAbsolute(scan, sif)
}
obj.name <- "DRAWS_summary"
results.dir <- file.path(".", "output", "abs_summary")
absolute.files <- file.path(".", "output",
                            scans, "absolute",
                            paste(scans, ".ABSOLUTE.RData", sep=""))
library(ABSOLUTE)
CreateReviewObject(obj.name, absolute.files, results.dir, "allelic", verbose=TRUE)
## At this point you'd perform your manual review and mark up the file 
## output/abs_summary/DRAWS_summary.PP-calls_tab.txt by prepending a column with
## your desired solution calls. After that (or w/o doing that if you choose to accept
## the defaults, which is what running this code will do) run the following command:
calls.path = file.path("output", "abs_summary", "DRAWS_summary.PP-calls_tab.txt")
modes.path = file.path("output", "abs_summary", "DRAWS_summary.PP-modes.data.RData")
output.path = file.path("output", "abs_extract")
ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "allelic")

参考：
http://www.broadinstitute.org/cancer/cga/ABSOLUTE
https://www.genepattern.org/modules/docs/ABSOLUTE/2
https://www.jianshu.com/p/468077752689
https://www.jianshu.com/p/388fb14989df

人类癌症中体细胞DNA变异的绝对定量 --- ABSOLUTE

1. 概述

2. 应用

猜你喜欢

热点阅读