pyclone的输入格式:通过GenomicRanges准备输入
2020-07-29 本文已影响0人
果蝇的小翅膀
问题:
在准备pyclone的输入的时候,需要点突变和对应的拷贝数信息。点突变的信息可以通过varscan获得,而拷贝数的信息可以通过sequenza获得,但是如何整合点突变和拷贝数的信息?
附pyclone的输入格式:
Tsv files for pyclone:
-mutation_id - A unique ID to identify the mutation.
-ref_counts - The number of reads covering the mutation which contain the reference (genome) allele.
-var_counts - The number of reads covering the mutation which contain the variant allele.
-normal_cn - The copy number of the cells in the normal population. For autosomal chromosomes this will be 2 and for sex chromosomes it could be either 1 or 2.
-minor_cn - The minor copy number of the cancer cells. Usually
-major_cn - The major copy number of the cancer cells.
实例:
在Bioconductor中的 GenomicRanges这个包主要用来处理这样的问题,可以参考其中的说明。
- 首先准备实例数据
library(GenomicRanges)
library(magrittr)
#准备实例数据
con<- textConnection("chr start end rs ref_counts var_counts
1 100 100 rs01 100 10
1 200 200 rs02 200 20
1 300 300 rs03 300 30")
snps <- read.delim(con, head=TRUE, sep="")
con <- textConnection("chr start end minor_cn major_cn
1 10 150 0 2
1 250 450 1 2
1 400 600 2 2")
cnvs <- read.delim(con, head=TRUE, sep="")
准备数据
- 通过GRanges函数将数据读取到GRanges中
gsnps = GRanges(seqnames = snps$chr ,
ranges = IRanges(snps$start , snps$end ),
strand = "+" )
#metadata columns can be added to a GRanges object,将表格中的其他信息添加到gsnps中
mcols(gsnps) = snps
gcnvs = GRanges(seqnames = cnvs$chr,
ranges = IRanges(cnvs$start , cnvs$end ),
strand = "+" )
#metadata columns can be added to a GRanges object
mcols(gcnvs) = cnvs
- 通过findOverlaps寻找它们之间的交集。根据overlaps可以看到它们的交集情况,最后得到的merge就是pyclone的输入了。
overlaps =findOverlaps(gsnps, gcnvs )
overlaps
#Hits object with 2 hits and 0 metadata columns:
# queryHits subjectHits
# <integer> <integer>
# [1] 1 1
# [2] 3 2
-------
#queryLength: 3 / subjectLength: 3
#合并信息。
merge = cbind( mcols(gsnps[queryHits(overlaps), ]) , mcols(gcnvs[subjectHits(overlaps) ,]) )
#整合信息
merge = as.data.frame(merge) %>%
dplyr::mutate(mutation_id = rs,
normal_cn = 2,
) %>%
dplyr::select(mutation_id, ref_counts, var_counts, normal_cn, minor_cn, major_cn)
merge
# mutation_id ref_counts var_counts normal_cn minor_cn major_cn
#1 rs01 100 10 2 0 2
#2 rs03 300 30 2 1 2
参考文献: