[R]bioconductor之GenomicInteracti

2020-10-19  本文已影响0人  小贝学生信

GenomicInteractions包是Utilities for handling genomic interaction data,可用于分析chiapet与Hi-C数据。本次主要学习分析Hi-C数据的相关用法
https://www.bioconductor.org/packages/release/bioc/html/GenomicInteractions.html
https://www.bioconductor.org/packages/release/bioc/vignettes/GenomicInteractions/inst/doc/hic_vignette.html

一、安装包,加载实例数据

1、安装包

BiocManager::install("GenomicInteractions")
library(GenomicInteractions)

2、示例文件

#示例文件地址
hic_file <- system.file("extdata", "Seitan2013_WT_100kb_interactions.txt",
                        package="GenomicInteractions")
hic_file
#[1] "C:/Users/Dell/Documents/R/win-library/3.6/GenomicInteractions/extdata/Seitan2013_WT_100kb_interactions.txt"

#查看该txt文件
a <- read.table(file = hic_file, header = T)
dim(a)
a[1:6,1:12]
a[1:6,1:12]
#创建GenomicInteractions对象
hic_data <- makeGenomicInteractionsFromFile(hic_file,
                                            type="homer",
                                            experiment_name = "HiC 100kb",
                                         description = "HiC 100kb resolution")
seqlengths(hic_data) <- c(chr15 = 103494974, chr14 = 125194864)
head(hic_data[,1:2])

如下图,GenomicInteractions对象类似Genomicranges对象也是有两部分组成:左边为interactions的两个序列坐标信息,右边为metadata,暂时只看两列信息。


head(hic_data[,1:2])

二、regions/bins相关统计操作

如上所述,interactions at a resolution of 100kb. 因此基本regions的宽度为100000

summary(width(regions(hic_data)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   89536  100000  100000   99991  100000  100000

Some anchors are shorter than 100kb due to the bin being at the end of a chromosome.

regions(hic_data)
anchors(hic_data)
anchors(hic_data,type="first")
anchorOne(hic_data)

anchors(hic_data,type="second") 
anchorTwo(hic_data)
summary(calculateDistances(hic_data,method="midpoint"))
#    Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
#  100000   4600000  18700000  24875054  39200000 116600000        34

三、interactions相关统计操作/mcols

mcols(hic_data)
mcols(hic_data)$counts
mcols(hic_data)$InteractionID
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,2))
hic_data$p.value <- exp(hic_data$LogP)
plot(density(hic_data$p.value))
hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
hic_data$fdr <- hic_data$FDR.Benjamini..based.on.3.68e.08.total.tests.
plot(density(hic_data$fdr))
hic_data$p.value & hic_data$fdr
#mcols(hic_data)$counts
head(interactionCounts(hic_data))
#447000 reads totally
sum(interactionCounts(hic_data))
#23276 interactions in total
length(interactionCounts(hic_data))
#平均interactions覆盖reads数
mean(interactionCounts(hic_data))
hist & density
p1 <- plotCisTrans(hic_data) +
  labs(title=" ")
sum(is.trans(hic_data_subset))
p2 <- plotCisTrans(hic_data_subset) +
  labs(title=" ")
library(patchwork)
p1 + labs(title = "hic_data")| 
  p2 + labs(title = "hic_data_subset")

cis在同一染色体,trans在染色体。可以看到在示例数据中大部分都是cis-interaction


plotCisTrans
p3 <- plotCounts(hic_data, cut=30)
#cut可以避免极端值影响绘图
p4 <- plotCounts(hic_data_subset, cut=30)
p3 + labs(title = "hic_data")| 
  p4 + labs(title = "hic_data_subset")
plotCounts

四、Interactions Annotation

1、注释文件

data("mm9_refseq_promoters")
data("thymus_enhancers")
mm9_refseq_promoters
thymus_enhancers
annotation.features <- list(promoter = mm9_refseq_promoters, enhancer = thymus_enh)
annotation.features
annotation.features

2、annotateInteractions注释

annotateInteractions(hic_data_subset, annotation.features)
head(regions(hic_data_subset))
head(regions(hic_data_subset)$promoter.id)
head(regions(hic_data_subset)$enhancer.id)

如上即为所有的bins注释了promoter与enhancer的重叠概况。
Each anchor may overlap multiple features of each type, so the columns containing feature names or IDs are stored as lists.

head(regions(hic_data_subset))
regions(hic_data_subset)$node.class
table(regions(hic_data_subset)$node.class)
#  distal enhancer promoter 
#     989      275      890

如上分为三大类,当某一个bin同时注释到enhancer与promoter时,结果也只能注释一类,按照创造annotation.features对象时的顺序给出一个。
Any anchors which are not annotated with any of the given features will be assigned the class “distal”

3、Interaction types

如上共有三种node,两两组合共有6种可能

plotInteractionAnnotations(hic_data_subset, legend = TRUE)
plotInteractionAnnotations
length(hic_data_subset[isInteractionType(hic_data_subset, "promoter", "distal")])
# [1] 492
length(hic_data_subset[is.pd(hic_data_subset)])
# [1] 492
hic_data_ep <- hic_data_subset[isInteractionType(hic_data_subset, "promoter", "enhancer")]
max(interactionCounts(hic_data_ep))

most_counts <- hic_data_ep[which.max(interactionCounts(hic_data_ep))]
most_counts

min_pval <- hic_data_ep[which.min(hic_data_ep$p.value)]
min_pval

教程最后还介绍了在注释promoter,enhancer的基础上,利用Gviz包介绍了interactions可视化的方法,感觉还比较复杂,下次再专门学习下。

上一篇 下一篇

猜你喜欢

热点阅读