免疫组库数据分析||immunarch教程:探索性数据分析
immunarch — Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R
我们已经讲到,immunarch分析免疫主库数据是很有好的,那么有多友好呢?真的可以5行代码出博士论文吗?我们来看看这五行代码:
install.packages("immunarch") # Install the package
library(immunarch); data(immdata) # Load the package and the test dataset
repOverlap(immdata$data) %>% vis() # Compute and visualise the most important statistics:
geneUsage(immdata$data[[1]]) %>% vis() # public clonotypes, gene usage, sample diversity
repDiversity(immdata$data) %>% vis(.by = "Status", .meta = immdata$meta) # Group samples
为了系统地了解这个包,我们引入一个包:pacman。
library(pacman)
p_functions(immunarch)
[1] "apply_asymm"
[2] "apply_symm"
[3] "bunch_translate"
[4] "coding"
[5] "cross_entropy"
[6] "dbAnnotate"
[7] "dbLoad"
[8] "entropy"
[9] "fixVis"
[10] "gene_stats"
[11] "geneUsage"
[12] "geneUsageAnalysis"
[13] "get_genes"
[14] "getKmers"
[15] "immunr_dbscan"
[16] "immunr_hclust"
[17] "immunr_kmeans"
[18] "immunr_mds"
[19] "immunr_pca"
[20] "immunr_tsne"
[21] "inc_overlap"
[22] "inframes"
[23] "js_div"
[24] "kl_div"
[25] "kmer_profile"
[26] "noncoding"
[27] "outofframes"
[28] "public_matrix"
[29] "publicRepertoire"
[30] "publicRepertoireApply"
[31] "publicRepertoireFilter"
[32] "pubRep"
[33] "pubRepApply"
[34] "pubRepFilter"
[35] "pubRepStatistics"
[36] "repClonality"
[37] "repDiversity"
[38] "repExplore"
[39] "repLoad"
[40] "repOverlap"
[41] "repOverlapAnalysis"
[42] "repSample"
[43] "repSave"
[44] "select_barcodes"
[45] "select_clusters"
[46] "spectratype"
[47] "split_to_kmers"
[48] "top"
[49] "trackClonotypes"
[50] "vis"
[51] "vis_bar"
[52] "vis_box"
[53] "vis_circos"
[54] "vis_heatmap"
[55] "vis_heatmap2"
[56] "vis_hist"
[57] "vis_immunr_kmer_profile_main"
[58] "vis_seqlogo"
[59] "vis_textlogo"
也是就是这个R包有59个函数(功能),但是我们并不需要全部记住,常用的有:
-
repExplore
- to compute basic statistics, such as number of clones or distributions of lengths and counts. To explore them you need to pass the statistics, e.g. count, to the .method. -
repClonality
- to compute the clonality of repertoires. -
repOverlap
- to compute the repertoire overlap. -
repOverlapAnalysis
- to analyse the repertoire overlap, including different clustering procedures and PCA. -
geneUsage
- to compute the distributions of V or J genes. -
geneUsageAnalysis
- to analyse the distributions of V or J genes, including clustering and PCA. -
repDiversity
- to estimate the diversity of repertoires. -
trackClonotypes
- to analyse the dynamics of repertoires across time points. -
spectratype
- to compute spectratype of clonotypes. -
getKmers
and kmer_profile - to compute distributions of kmers and sequence profiles
在immunarch中统计只是一条命令,而可视化半条命令就够了。每个分析函数的输出可以直接传递到vis
函数:用于可视化的一般函数。下面是用法示例。几乎所有可视化的分析结果都支持根据元数据表中的各自属性或使用用户提供的属性对数据进行分组。分组可以通过传递.by
参数或同时传递.by
和.meta
参数给vis
函数来实现。
library(immunarch) # Load the package into R
data(immdata) # Load the test dataset
immdata$meta
exp_vol <- repExplore(immdata$data, .method = "volume")
p1 <- vis(exp_vol, .by = c("Status"), .meta = immdata$meta)
p2 <- vis(exp_vol, .by = c("Status", "Sex"), .meta = immdata$meta)
p1 + p2
同样,您可以将.by作为一个字符向量传递,它与数据中的样本数量精确匹配,每个值都应该对应于样本的属性。它将用于根据所提供的值对数据进行分组。注意,在这种情况下,您应该将NA传递给.meta。
exp_vol <- repExplore(immdata$data, .method = "volume")
by_vec <- c("C", "C", "C", "C", "C", "C", "MS", "MS", "MS", "MS", "MS", "MS")
p <- vis(exp_vol, .by = by_vec)
p
你要说,哎,我想提出数据自己画图。
exp_vol <- repExplore(immdata$data, .method = "volume")
exp_vol
Sample Volume
A2-i129 A2-i129 6443
A2-i131 A2-i131 6375
A2-i133 A2-i133 6300
A2-i132 A2-i132 6721
A4-i191 A4-i191 5058
A4-i192 A4-i192 5763
MS1 MS1 5301
MS2 MS2 7043
MS3 MS3 6310
MS4 MS4 7313
MS5 MS5 5588
MS6 MS6 7278
如果数据是分组的,将执行比较组均值的统计检验,除非提供.test = F。在只有两组的情况下,采用Wilcoxon秩和检验(R函数wilcox)。exact = F)参数进行测试,以测试两组之间是否存在平均秩值的差异。当大于两组时,采用Kruskal-Wallis检验(R函数kruskar .test),相当于秩次方差分析(ANOVA),检验来自不同组的样本是否来自同一分布。显著的Kruskal-Wallis检验表明,至少一个样本随机地占优势于另一个样本。经过多次比较调整后的p值绘制在组的顶部。p值的调整使用Holm方法(也称为Holm- bonferroni校正)。您可以执行命令?p.adjust
在R控制台看到更多。
如果对默认出的图不满意,你可以用fixVis
打开一个shiny窗口,一点一点调整直到可以发表。
# 1. Analyse
exp_len <- repExplore(immdata$data, .method = "len", .col = "aa")
# 2. Visualise
p1 <- vis(exp_len)
# 3. Fix and make publication-ready results
fixVis(p1)
对于基本的探索性分析,比如比较每个指repertoire or distribution的reads/ UMIs数量,可以使用repExplore函数。
exp_len <- repExplore(immdata$data, .method = "len", .col = "aa")
exp_cnt <- repExplore(immdata$data, .method = "count")
exp_vol <- repExplore(immdata$data, .method = "volume")
p1 <- vis(exp_len)
p2 <- vis(exp_cnt)
p3 <- vis(exp_vol)
p1
p2 + p3
加入分组信息即可得到统计结果。
# You can group samples by their metainformation
p4 <- vis(exp_len, .by = "Status", .meta = immdata$meta)
p5 <- vis(exp_cnt, .by = "Sex", .meta = immdata$meta)
p6 <- vis(exp_vol, .by = c("Status", "Sex"), .meta = immdata$meta)
p4
p5 + p6
评价样本多样性的方法之一是评价克隆性(Clonality,主要区别于clonotypes)。repClonality 是指最常见或最不常见的克隆性的数量。有几种方法来评估克隆性,让我们来看看它们。clonal.prop
方法计算细胞克隆池占用曲目的比例:
imm_pr <- repClonality(immdata$data, .method = "clonal.prop")
imm_pr
Clones Percentage Clonal.count.prop
A2-i129 18 10.0 0.0027556644
A2-i131 28 10.0 0.0042728521
A2-i133 9 10.3 0.0014077898
A2-i132 113 10.0 0.0164987589
A4-i191 4 11.5 0.0007773028
A4-i192 8 10.4 0.0013738623
MS1 2 10.1 0.0003700278
MS2 66 10.0 0.0092372288
MS3 2 10.6 0.0003095496
MS4 176 10.0 0.0236336780
MS5 3 13.2 0.0005303164
MS6 150 10.0 0.0202456472
attr(,"class")
[1] "immunr_clonal_prop" "matrix"
第一种方法考虑的是最丰富的细胞克隆型:
imm_top <- repClonality(immdata$data, .method = "top", .head = c(10, 100, 1000, 3000, 10000))
imm_top
10 100 1000 3000 10000
A2-i129 0.08011765 0.17282353 0.3491765 0.5844706 1
A2-i131 0.06670588 0.15647059 0.3467059 0.5820000 1
A2-i133 0.10505882 0.18294118 0.3655294 0.6008235 1
A2-i132 0.02388235 0.09423529 0.3118824 0.5471765 1
A4-i191 0.17176471 0.32047059 0.5122353 0.7475294 1
A4-i192 0.11541176 0.22141176 0.4325882 0.6678824 1
MS1 0.19164706 0.30894118 0.4817647 0.7170588 1
MS2 0.04458824 0.11470588 0.2770588 0.5123529 1
MS3 0.16482353 0.23011765 0.3575294 0.5928235 1
MS4 0.02329412 0.07517647 0.2415294 0.4768235 1
MS5 0.20611765 0.29188235 0.4521176 0.6874118 1
MS6 0.02835294 0.08235294 0.2460000 0.4812941 1
attr(,"class")
[1] "immunr_top_prop" "matrix"
imm_top %>% vis()
而稀有(rare )方法处理的是最不多产的克隆型:
imm_rare <- repClonality(immdata$data, .method = "rare")
imm_rare
imm_rare
1 3 10 30 100 MAX
A2-i129 0.6991765 0.8215294 0.8710588 0.9267059 0.9604706 1
A2-i131 0.6969412 0.8243529 0.8995294 0.9436471 0.9869412 1
A2-i133 0.6800000 0.8100000 0.8690588 0.8974118 0.9357647 1
A2-i132 0.7088235 0.8831765 0.9643529 0.9951765 1.0000000 1
A4-i191 0.5355294 0.6484706 0.7189412 0.7707059 0.8697647 1
A4-i192 0.5976471 0.7487059 0.8342353 0.8675294 0.9156471 1
MS1 0.5780000 0.6692941 0.7438824 0.7929412 0.8571765 1
MS2 0.7788235 0.8891765 0.9322353 0.9718824 1.0000000 1
MS3 0.7272941 0.7825882 0.8071765 0.8385882 0.8652941 1
MS4 0.8109412 0.9343529 0.9725882 1.0000000 1.0000000 1
MS5 0.6112941 0.7001176 0.7575294 0.7809412 0.8284706 1
MS6 0.8107059 0.9208235 0.9703529 0.9897647 1.0000000 1
最后,用homeo方法评估克隆空间的稳态,即。,给定大小的无性系所占曲目的比例
imm_hom <- repClonality(immdata$data,
.method = "homeo",
.clone.types = c(Small = .0001, Medium = .001, Large = .01, Hyperexpanded = 1)
)
imm_hom
Small (0 < X <= 1e-04) Medium (1e-04 < X <= 0.001)
A2-i129 0 0.8634118
A2-i131 0 0.8858824
A2-i133 0 0.8597647
A2-i132 0 0.9542353
A4-i191 0 0.7135294
A4-i192 0 0.8183529
MS1 0 0.7248235
MS2 0 0.9244706
MS3 0 0.8061176
MS4 0 0.9683529
MS5 0 0.7520000
MS6 0 0.9625882
Large (0.001 < X <= 0.01)
A2-i129 0.09705882
A2-i131 0.09011765
A2-i133 0.07600000
A2-i132 0.04576471
A4-i191 0.15623529
A4-i192 0.08611765
MS1 0.12082353
MS2 0.06411765
MS3 0.05917647
MS4 0.03164706
MS5 0.07647059
MS6 0.03741176
Hyperexpanded (0.01 < X <= 1)
A2-i129 0.03952941
A2-i131 0.02400000
A2-i133 0.06423529
A2-i132 0.00000000
A4-i191 0.13023529
A4-i192 0.09552941
MS1 0.15435294
MS2 0.01141176
MS3 0.13470588
MS4 0.00000000
MS5 0.17152941
MS6 0.00000000
attr(,"class")
[1] "immunr_homeo" "matrix"
vis(imm_top) + vis(imm_top, .by = "Status", .meta = immdata$meta)
vis(imm_rare) + vis(imm_rare, .by = "Status", .meta = immdata$meta)
vis(imm_hom) + vis(imm_hom, .by = c("Status", "Sex"), .meta = immdata$meta)
到这里,我们已经写完一篇博士论文了。
参考:
https://immunarch.com/articles/web_only/v3_basic_analysis.html