单细胞测序rna_seq

2021.10.1-10.5immunarch学习笔记

2021-10-04  本文已影响0人  千容安

>>准备工作


1.加载程序包
library(ggplot2)
library(dplyr)
library(dtplyr)
library(data.table)
library(patchwork)
library(immunarch)

2.数据上传

必须是“/”

把单个数据(txt格式)上传

把文件夹名字命名成txt就可以

把所有数据放在文件夹里一次性批量上传
immdata<-repLoad("C:/Users/Administrator.DESKTOP-4UQ3Q0K/Desktop/2021.9.15/2021.9.15/Report/0_CloneTypes/TRA/txt")

查看一下数据




>>基础分析


一、repExplore - 计算基本统计数据,如克隆数长度和计数的分布

①CDR3的种类 -volume
exp_vol <- repExplore(immdata$data, .method = "volume",.col="aa") #.col: 核苷酸用nt,氨基酸用aa
p1 <- vis(exp_vol, .by = c("Sample"), .meta = immdata$meta)
p2 <- vis(exp_vol, .by = c("Sample", "Group"), .meta = immdata$meta)
p1 + p2

这里meta下有"Sample"和"Group"
CDR3克隆总数和种类

此组数据无group数据,只有sample,故用代码:repExplore(immdata$data, "volume") %>% vis() 可得第一幅图

CDR3的种类

②CDR3的数目 -clone
repExplore(immdata$data, "clone") %>% vis()

CDR3的数目

③CD3长度的分布-lens
repExplore(immdata$data, "lens") %>% vis() Visualise the length distribution of CDR3

CD3长度的分布 repExplore的用法

如果对默认出的图不满意,可以用fixVis打开一个shiny窗口,一点一点调整

④.method = "count" 计算克隆型丰度的分布

count

二、repClonality - 计算repertoire的克隆性

①.method = "clonal.prop" #克隆数所占百分比
imm_pr <- repClonality(immdata$data, .method = "clonal.prop")
imm_pr

clonal.prop

②.method = "top" #计算丰度最多几个的克隆型总的克隆数占全部克隆数的比例
使用“. head”定义索引区间
imm_top <- repClonality(immdata$data, .method = "top", .head = c(10, 100, 1000, 3000, 10000))
imm_top

10 100 1000 3000 10000
3.clonotypes.TRA 0.72695083 0.99731363 0.9997267 1.0000000 1.0000000
4.clonotypes.TRA 0.47442090 0.99038907 0.9968098 0.9991586 1.0000000
5.clonotypes.TRA 0.01678667 0.10878299 0.5148891 0.8712953 0.9862775
6.clonotypes.TRA 0.60817466 0.99323669 0.9987712 1.0000000 1.0000000
7.clonotypes.TRA 0.99835715 0.99994302 1.0000000 1.0000000 1.0000000
8.clonotypes.TRA 0.01157174 0.03364117 0.1266189 0.2486339 0.4939564
attr(,"class")[1] "immunr_top_prop" "matrix" "array" `

vis(imm_top) + vis(imm_top, .by = "Sample", .meta = immdata$meta)

将“top”可视化

③.method = "rare" #计算丰度最少几个的克隆型总的克隆数占全部克隆数的比例 用' . bound '来定义边界
imm_rare <- repClonality(immdata$data, .method = "rare")
vis(imm_rare) + vis(imm_rare, .by = "Sample", .meta = immdata$meta)

将“rare”可视化

③设置“homeo”来分析相对丰度(也称为克隆空间内稳态),定义为特定丰度的克隆群体占据的repertoire比例。.clone.types :一个命名的数值向量,其边界为标记掉克隆群的半闭区间。(不同相对丰度水平的Clonotypes,它们的克隆数占全部克隆数的比例)
imm_hom <- repClonality(immdata$data,.method = "homeo",.clone.types = c(Small = .0001, Medium = .001, Large = .01, Hyperexpanded = 1))
vis(imm_hom) + vis(imm_hom, .by = c("Sample", "Group"), .meta = immdata$meta)

将“homeo”可视化

三、repOverlap - 计算repertoire 的重复性

①number of public clonotypes (.method = "public") - 重叠相似性的经典衡量标准,"public" and "shared" 是为了研究者的方便而存在的同义词。
②overlap coefficient (.method = "overlap") - 重叠相似性的规范化度量度 : 交叉点的大小除以两组较小的大小

输入代码
imm_ov1 <- repOverlap(immdata$data, .method = "public", .verbose = F)
imm_ov2 <- repOverlap(immdata$data, .method = "overlap", .verbose = F)

p1 <- vis(imm_ov1)
p2 <- vis(imm_ov2)

p1 + p2

public和overlap
vis(imm_ov1, "heatmap2") 将图1用热力图表示

③Jaccard index (.method = "jaccard") 测量有限样本集之间的相似性 : 交集的大小除以样本集的结合大小。index is conceptually a percentage of how many objects two sets have in common out of how many objects they have total.(索引在概念上是两个集合共有多少个对象的百分比,从总共有多少个对象中可以看出这两个集合共有多少个对象。)

④Tversky index (.method = "tversky")- 将一个变体与原型进行比较的集合上的非对称相似性度量。如果使用默认参数,则与Dice的系数类似。

⑤cosine similarity (.method = "cosine")- 衡量两个非零向量之间的相似性,度量它们之间夹角的余弦。

⑥Morisita’s overlap index (.method = "morisita") - 衡量群体中个体分散的统计指标。它用于比较样品之间的重叠。index measures how many times it is more likely to randomly select two sampled points from the same quadrat (the dataset is covered by a regular grid of changing size) then it would be in the case of a random distribution generated from a Poisson process. Duplicate objects are merged with their counts are summed up.(指数衡量从同一个样方中随机选取两个采样点(数据集被一个大小不断变化的规则网格复盖)的可能性的多少倍,然后它将是泊松过程产生的随机分布。对重复对象进行合并,并对其计数进行归纳。)

⑦incremental overlap - overlaps of the N most abundant clonotypes with incrementally growing N 增量重叠——N个最丰富的重写类型与增量增长的N的重叠(.method = "inc+METHOD", e.g., "inc+public" or "inc+morisita")



四、repOverlapAnalysis - 对相似数据的子集进行聚类。

"mds" - 多维尺度重排分析(多维缩放)
输入:
repOverlapAnalysis(imm_ov1, "mds")

报错:

Standard deviations (1, .., p=4):
[1] 0 0 0 0
Rotation (n x k) = (6 x 2):
[,1] [,2]
3.clonotypes.TRA 0.9185710 10.128352696
4.clonotypes.TRA -0.8835021 -10.058512066
5.clonotypes.TRA -654.8558395 0.038515119
6.clonotypes.TRA -2.5666597 -0.417012565
7.clonotypes.TRA 1.2436410 0.299973334
8.clonotypes.TRA 656.1437892 0.008683481

绘图:repOverlapAnalysis(imm_ov1, "mds")%>% vis()

mds

repOverlapAnalysis(imm_ov1, "tsne") %>% vis() # t-Stochastic Neighbor Embedding

tsne

使用K-means对MDS结果组件进行聚类
repOverlapAnalysis(imm_ov1, "mds+kmeans") %>% vis()

mds+kmeans

五、geneUsage - V-基因和J-基因统计量的估计

E.g., if you plan to use TRBV genes of Homo Sapiens, you need to use the hs.trbv string in the function, where hs comes from the alias column and trbv is the gene name

imm_gu <- geneUsage(immdata$data, "hs.trbv", .norm = T)
vis(imm_gu, .by = "Sample", .meta = immdata$meta)

geneUsage

计算前两个样品的分布:
imm_gu <- geneUsage(immdata$data[c(1, 2)], "hs.trbv", .norm = T)
vis(imm_gu)

3和4样品

vis(imm_gu, .grid = T)

grid = T将数据单独展示

在vis括号里加上,.plot = "box"改变图表类型变作箱式图
vis(imm_gu, .by = "Sample", .meta = immdata$meta,.plot = "box")

boxplot

geneUsageAnalysis - to analyse the distributions of V or J genes, including clustering and PCA.分析V或J基因的分布,包括聚类和PCCA。
imm_gu <- geneUsage(immdata$data, "hs.trbv", .norm = T)
imm_gu_js <- geneUsageAnalysis(imm_gu, .method = "js", .verbose = F)
imm_gu_cor <- geneUsageAnalysis(imm_gu, .method = "cor", .verbose = F)

p1 <- vis(imm_gu_js, .title = "Gene usage JS-divergence", .leg.title = "JS", .text.size = 3)
p2 <- vis(imm_gu_cor, .title = "Gene usage correlation", .leg.title = "Cor", .text.size = 3)
p1 + p2

“js” - Jensen-Shannon Divergence.
“cor” - correlation.



“cosine” - cosine similarity.
“hclust” - clusters the data using hierarchical clustering.

imm_gu_js[is.na(imm_gu_js)] <- 0
vis(geneUsageAnalysis(imm_gu, "cosine+hclust", .verbose = F))

可以添加 clustering
imm_cl_pca <- geneUsageAnalysis(imm_gu, "js+pca+kmeans", .verbose = F)
imm_cl_mds <- geneUsageAnalysis(imm_gu, "js+mds+kmeans", .verbose = F)
imm_cl_tsne <- geneUsageAnalysis(imm_gu, "js+tsne+kmeans", .perp = .01, .verbose = F)
p1 <- vis(imm_cl_pca, .plot = "clust")
p2 <- vis(imm_cl_mds, .plot = "clust")
p3 <- vis(imm_cl_tsne, .plot = "clust")
p1 + p2 + p3



六、repDiversity -估计repertoires的多样性.

# Chao1 diversity measure
div_chao <- repDiversity(immdata$data, "chao1")

# Hill numbers
div_hill <- repDiversity(immdata$data, "hill")

# D50
div_d50 <- repDiversity(immdata$data, "d50")

# Ecological diversity measure
div_div <- repDiversity(immdata$data, "div")

p1 <- vis(div_chao)
p2 <- vis(div_chao, .by = c("Sample", "Group"), .meta = immdata$meta)
p3 <- vis(div_hill, .by = c("Sample", "Group"), .meta = immdata$meta)

p4 <- vis(div_d50)
p5 <- vis(div_d50, .by = "Status", .meta = immdata$meta)
p6 <- vis(div_div)

p1 + p2
chao

p3+p6

p4

d50
imm_raref <- repDiversity(immdata$data, "raref", .verbose = F)

p1 <- vis(imm_raref)
p2 <- vis(imm_raref, .by = "Status", .meta = immdata$meta)

p1 + p2
raref
repDiversity(immdata$data, "raref", .verbose = F) %>% vis(.log = TRUE)
上一篇 下一篇

猜你喜欢

热点阅读