ATAC-seqR语言做生信ATACSeq 开放染色质分析

Week4— chromVAR:预测染色质可及性相关的转录因子

2018-12-09  本文已影响17人  六六_ryx

第四周 2018— 06.11-06.17
原文: chromVAR: Inferring transcription factor-associated accessibility from single-cell epigenomic data
期刊:Nat Methods ,14(10): 975–978.
发表日期:2017 October ;
doi: 10.1038/nmeth.4401.
关键词:

概述

ChromVAR是一个R包,可用于鉴定不同细胞类型的转录因子基序。它通过估计染色质可及性的激活和关闭来分析稀疏染色质可及性数据,能够对scATAC-seq图谱进行精确地聚类,并鉴定与染色质可及性变化相关的已知的和新的序列基序。不仅可用于单细胞也可用于bulk细胞的表观数据,从而鉴定细胞类型和细胞特异性的反式调控因子。另外chromVAR对测序深度低和多种细胞类型或样本的数据具有较高的灵敏性。

输入数据

ChromVAR 包的输入数据包括3个:
1)比对后的reads;
2)chromatin accessibility peaks;
3)一组代表motif位置的权重矩阵(PWM)或基因组注释的染色质特征数据集

他们从cisBP 数据库提取了一系列人和鼠的PWMs,cisBP数据库包含多样而全面的已知TF基序集合。用户也可以自己提供TF 基序或其他类型的基因组注释,如增强子模块,ChIP-seq peaks, GWAS 疾病注释等。

工作流程

具体使用方法

安装

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("chromVAR")
# BiocManager::install("GO.db")
# unix
library(BiocParallel)
register(MulticoreParam(8)) # Use 8 cores
# windows
register(SnowParam(SnowParam(workers=1, type = "SOCK")))
# 如果不想用多核,建议用SerialParam注册该选项
register(SerialParam())

应用

chromVAR的应用包括以下几个方面:

library(chromVAR)
library(motifmatchr)
library(SummarizedExperiment)
library(Matrix)
library(ggplot2)
library(BiocParallel)
library(BSgenome.Hsapiens.UCSC.hg19)
register(SerialParam())
data(example_counts, package = "chromVAR")
set.seed(2017)
example_counts <- addGCBias(example_counts, genome = BSgenome.Hsapiens.UCSC.hg19)
counts_filtered <- filterSamples(example_counts, min_depth = 1500, 
                                 min_in_peaks = 0.15, shiny = FALSE)
counts_filtered <- filterPeaks(counts_filtered)
motifs <- getJasparMotifs()
motif_ix <- matchMotifs(motifs, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)
dev <- computeDeviations(object = counts_filtered, 
                                 annotations = motif_ix)
variability <- computeVariability(dev)
plotVariability(variability, use_plotly = FALSE)
Variability
sample_cor <- getSampleCorrelation(dev)

library(pheatmap)
pheatmap(as.dist(sample_cor), 
         annotation_row = colData(dev), 
         clustering_distance_rows = as.dist(1-sample_cor), 
         clustering_distance_cols = as.dist(1-sample_cor))
Clustering
tsne_results <- deviationsTsne(dev, threshold = 1.5, perplexity = 10, 
                               shiny = FALSE)
# plotDeviationsTsne 绘图
tsne_plots <- plotDeviationsTsne(dev, tsne_results, annotation = "TEAD3", 
                                   sample_column = "Cell_Type", shiny = FALSE)
tsne_plots[[1]]
tsne_plots[[2]]
Cell / sample similarity
diff_acc <- differentialDeviations(dev, "Cell_Type")
head(diff_acc)
##                           p_value p_value_adjusted
## MA0025.1_NFIL3       6.667785e-02     1.175006e-01
## MA0030.1_FOXF2       1.802723e-02     4.166773e-02
## MA0031.1_FOXD1       3.825026e-03     1.077708e-02
diff_var <- differentialVariability(dev, "Cell_Type")
head(diff_var)
##                        p_value p_value_adjusted
## MA0025.1_NFIL3       0.4330854        0.9135025
## MA0030.1_FOXF2       0.7011659        0.9652404
inv_tsne_results <- deviationsTsne(dev, threshold = 1.5, perplexity = 8, 
                                    what = "annotations", shiny = FALSE)
ggplot(inv_tsne_results, aes(x = Dim1, y = Dim2)) + geom_point() + 
  chromVAR_theme()
Motif / kmer similarity
kmer_ix <- matchKmers(6, counts_filtered, genome = BSgenome.Hsapiens.UCSC.hg19)
kmer_dev <- computeDeviations(counts_filtered, kmer_ix)
kmer_cov <- deviationsCovariability(kmer_dev)
plotKmerMismatch("CATTCC",kmer_cov)
Kmers and sequence specificity of variation
de_novos <- assembleKmers(kmer_dev, progress = FALSE) #no progress bar
de_novos

pwmDistance函数可以看de novo motif与已知motif的匹配关系,最终会返回3个列表,dist是每对motifs间的距离,strandmotif匹配的链,offset 是motifs间的偏移。

dist_to_known <- pwmDistance(de_novos, motifs)
closest_match1 <- which.min(dist_to_known$dist[1,])
dist_to_known$strand[1,closest_match1]
library(ggmotif) # Package on github at AliciaSchep/ggmotif. Can use seqLogo alternatively
library(TFBSTools)
# De novo motif
ggmotif_plot(de_novos[[1]])
# Closest matching known
ggmotif_plot(toPWM(reverseComplement(motifs[[closest_match1]]),type = "prob"))
De novo kmer assembly
getAnnotationSynergy(counts_filtered, motif_ix[,c(83,24,20)])
getAnnotationCorrelation(counts_filtered, motif_ix[,c(83,24,20)])

##                      MA0139.1_CTCF MA0107.1_RELA MA0140.2_GATA1::TAL1
## MA0139.1_CTCF            1.0000000   -0.41950962          -0.27827092
## MA0107.1_RELA           -0.4195096    1.00000000          -0.08943439
## MA0140.2_GATA1::TAL1    -0.2782709   -0.08943439           1.00000000

相关资料

上一篇 下一篇

猜你喜欢

热点阅读