pyclone的输入格式：通过GenomicRanges准备输入

2020-07-29 本文已影响0人果蝇的小翅膀

问题：

在准备pyclone的输入的时候，需要点突变和对应的拷贝数信息。点突变的信息可以通过varscan获得，而拷贝数的信息可以通过sequenza获得，但是如何整合点突变和拷贝数的信息？

附pyclone的输入格式：

Tsv files for pyclone:
-mutation_id - A unique ID to identify the mutation.
-ref_counts - The number of reads covering the mutation which contain the reference (genome) allele.
-var_counts - The number of reads covering the mutation which contain the variant allele.
-normal_cn - The copy number of the cells in the normal population. For autosomal chromosomes this will be 2 and for sex chromosomes it could be either 1 or 2.
-minor_cn - The minor copy number of the cancer cells. Usually
-major_cn - The major copy number of the cancer cells.

实例：

在Bioconductor中的 GenomicRanges这个包主要用来处理这样的问题,可以参考其中的说明。

首先准备实例数据

library(GenomicRanges)
library(magrittr)
#准备实例数据
con<- textConnection("chr     start    end   rs   ref_counts  var_counts
1           100          100      rs01   100      10
1           200          200      rs02   200      20
1           300          300      rs03   300      30")
snps <- read.delim(con, head=TRUE, sep="")

con <- textConnection("chr     start    end      minor_cn  major_cn
1           10          150      0     2
1           250          450      1     2
1           400          600      2     2")  
cnvs <- read.delim(con, head=TRUE, sep="")

准备数据

通过GRanges函数将数据读取到GRanges中

gsnps = GRanges(seqnames = snps$chr ,
               ranges = IRanges(snps$start , snps$end ),
               strand = "+" )
#metadata columns can be added to a GRanges object,将表格中的其他信息添加到gsnps中
mcols(gsnps) = snps

gcnvs = GRanges(seqnames = cnvs$chr,
               ranges = IRanges(cnvs$start , cnvs$end ),
               strand = "+" )
#metadata columns can be added to a GRanges object
mcols(gcnvs) = cnvs

通过findOverlaps寻找它们之间的交集。根据overlaps可以看到它们的交集情况,最后得到的merge就是pyclone的输入了。

overlaps =findOverlaps(gsnps, gcnvs )
overlaps
#Hits object with 2 hits and 0 metadata columns:
#      queryHits subjectHits
#     <integer>   <integer>
# [1]         1           1
# [2]         3           2
  -------
#queryLength: 3 / subjectLength: 3

#合并信息。
merge = cbind( mcols(gsnps[queryHits(overlaps), ]) , mcols(gcnvs[subjectHits(overlaps) ,]) )

#整合信息
merge = as.data.frame(merge) %>%
     dplyr::mutate(mutation_id = rs,
                normal_cn = 2,
                ) %>%
  dplyr::select(mutation_id, ref_counts, var_counts, normal_cn, minor_cn, major_cn)

merge
# mutation_id ref_counts var_counts normal_cn minor_cn major_cn
#1        rs01        100         10         2        0        2
#2        rs03        300         30         2        1        2

参考文献：

pyclone的输入格式：通过GenomicRanges准备输入

问题：

实例：

猜你喜欢

热点阅读