单细胞分析之细胞交互-5:NicheNet多组间互作比较
常用的细胞通讯软件:
- CellphoneDB:是公开的人工校正的,储存受体、配体以及两种相互作用的数据库。此外,还考虑了结构组成,能够描述异构复合物。(配体-受体+多聚体)
- iTALK:通过平均表达量方式,筛选高表达的胚体和受体,根据结果作圈图。(配体-受体)
- CellChat:CellChat将细胞的基因表达数据作为输入,并结合配体受体及其辅助因子的相互作用来模拟细胞间通讯。(配体-受体+多聚体+辅因子)
- NicheNet/NicheNet多样本分析:通过将相互作用细胞的表达数据与信号和基因调控网络的先验知识相结合来预测相互作用细胞之间的配体-靶标联系的方法。( 配体-受体+信号通路)
其它细胞互作软件还包括
Celltalker
,SingleCellSignalR
,scTensor
和SoptSC
(这几个也是基于配体-受体相互作用)
之前写过NicheNet的标准分析pipeline,实际上做细胞互作分析的时候我们更多的还是在做样本间的互作差异比较。平常我用CellChat比较多,但其实NicheNet也可以做多样本互作比较,而且效果更好。
0. 读入expression data of interest
, NicheNet ligand-receptor network
和 ligand-target matrix
加载所需要的包
library(nichenetr)
library(RColorBrewer)
library(tidyverse)
library(Seurat) #
读入演示数据
seurat_obj = readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
p1=DimPlot(seurat_obj, group.by = "celltype") # user adaptation required on own dataset
p2=DimPlot(seurat_obj, group.by = "pEMT") # user adaptation required on own dataset
p1|p2
table(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT)
# High Low
# CAF 396 104
# Endothelial 105 53
# Malignant 1093 549
# Myeloid 92 7
# myofibroblast 382 61
# T.cell 689 3
这个演示里面比较的是pEMT-high-niche和pEMT-low-niche,换成不同组都一样的。
seurat_obj@meta.data$celltype_aggregate = paste(seurat_obj@meta.data$celltype, seurat_obj@meta.data$pEMT,sep = "_") # user adaptation required on own dataset
DimPlot(seurat_obj, group.by = "celltype_aggregate")
seurat_obj@meta.data$celltype_aggregate %>% table() %>% sort(decreasing = TRUE)
## .
## Malignant_High T.cell_High Malignant_Low CAF_High myofibroblast_High Endothelial_High
## 1093 689 549 396 382 105
## CAF_Low Myeloid_High myofibroblast_Low Endothelial_Low Myeloid_Low T.cell_Low
## 104 92 61 53 7 3
celltype_id = "celltype_aggregate" # metadata column name of the cell type of interest
seurat_obj = SetIdent(seurat_obj, value = seurat_obj[[celltype_id]])
读入NicheNet受体配体网络
(25345*688)和受体配体矩阵
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
## CXCL1 CXCL2 CXCL3 CXCL5 PPBP
## A1BG 3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF 5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M 6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1 8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05
lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))
lr_network = lr_network %>% mutate(bonafide = ! database %in% c("ppi_prediction","ppi_prediction_go"))
lr_network = lr_network %>% dplyr::rename(ligand = from, receptor = to) %>% distinct(ligand, receptor, bonafide)
head(lr_network)
## # A tibble: 6 x 3
## ligand receptor bonafide
## <chr> <chr> <lgl>
## 1 CXCL1 CXCR2 TRUE
## 2 CXCL2 CXCR2 TRUE
## 3 CXCL3 CXCR2 TRUE
## 4 CXCL5 CXCR2 TRUE
## 5 PPBP CXCR2 TRUE
## 6 CXCL6 CXCR2 TRUE
table(lr_network$bonafide)
# FALSE TRUE
# 10629 1390
###?为什么这么多是false?
如果分析的是小鼠的数据,需要先做一下基因的同源转换
organism = "human" # user adaptation required on own dataset
if(organism == "mouse"){
lr_network = lr_network %>% mutate(ligand = convert_human_to_mouse_symbols(ligand), receptor = convert_human_to_mouse_symbols(receptor)) %>% drop_na()
colnames(ligand_target_matrix) = ligand_target_matrix %>% colnames() %>% convert_human_to_mouse_symbols()
rownames(ligand_target_matrix) = ligand_target_matrix %>% rownames() %>% convert_human_to_mouse_symbols()
ligand_target_matrix = ligand_target_matrix %>% .[!is.na(rownames(ligand_target_matrix)), !is.na(colnames(ligand_target_matrix))]
}
1. Define the niches/microenvironments of interest
每个niche应该至少有一个“sender/niche”细胞群和一个“receiver/target”细胞群。
在这个演示数据集中,我们想要去查看pEMT high和pEMT low的肿瘤组织中免疫细胞对肿瘤细胞的作用差异。因此“Malignant_High”和“Malignant_Low”被定义为“receiver/target”细胞群,其它细胞被定义为“sender/niche”细胞群。注意:T.Cell和Myeloid细胞只有在pEMT-High样本中才被定义为sender,因为pEMT-low样本中这两类细胞数目太少了。
⚠️也就是说,NicheNet在做组间比较的时候,可以把condition-specific的细胞群考虑在内。(比较的是所有sender细胞的组间差异,而不是细胞特异性组间差异)
! Important: your receiver cell type should consist of 1 cluster!
niches = list(
"pEMT_High_niche" = list(
"sender" = c("myofibroblast_High", "Endothelial_High", "CAF_High", "T.cell_High", "Myeloid_High"),
"receiver" = c("Malignant_High")),
"pEMT_Low_niche" = list(
"sender" = c("myofibroblast_Low", "Endothelial_Low", "CAF_Low"),
"receiver" = c("Malignant_Low"))
) # user adaptation required on own dataset
2. Calculate differential expression between the niches
In this step, we will determine DE between the different niches for both senders and receivers to define the DE of L-R pairs.
这里得到的是差异性受体配体对
计算DE
计算差异基因的方法默认是Seurat Wilcoxon test(也可以使用其它方法)。
assay_oi = "SCT" # other possibilities: RNA,...
DE_sender = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), niches = niches, type = "sender", assay_oi = assay_oi) # only ligands important for sender cell types
## [1] "Calculate Sender DE between: myofibroblast_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: myofibroblast_High and Endothelial_Low"
## [3] "Calculate Sender DE between: myofibroblast_High and CAF_Low"
## [1] "Calculate Sender DE between: Endothelial_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Endothelial_High and Endothelial_Low"
## [3] "Calculate Sender DE between: Endothelial_High and CAF_Low"
## [1] "Calculate Sender DE between: CAF_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: CAF_High and Endothelial_Low"
## [3] "Calculate Sender DE between: CAF_High and CAF_Low"
## [1] "Calculate Sender DE between: T.cell_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: T.cell_High and Endothelial_Low"
## [3] "Calculate Sender DE between: T.cell_High and CAF_Low"
## [1] "Calculate Sender DE between: Myeloid_High and myofibroblast_Low"
## [2] "Calculate Sender DE between: Myeloid_High and Endothelial_Low"
## [3] "Calculate Sender DE between: Myeloid_High and CAF_Low"
## [1] "Calculate Sender DE between: myofibroblast_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: myofibroblast_Low and Endothelial_High"
## [3] "Calculate Sender DE between: myofibroblast_Low and CAF_High"
## [4] "Calculate Sender DE between: myofibroblast_Low and T.cell_High"
## [5] "Calculate Sender DE between: myofibroblast_Low and Myeloid_High"
## [1] "Calculate Sender DE between: Endothelial_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: Endothelial_Low and Endothelial_High"
## [3] "Calculate Sender DE between: Endothelial_Low and CAF_High"
## [4] "Calculate Sender DE between: Endothelial_Low and T.cell_High"
## [5] "Calculate Sender DE between: Endothelial_Low and Myeloid_High"
## [1] "Calculate Sender DE between: CAF_Low and myofibroblast_High"
## [2] "Calculate Sender DE between: CAF_Low and Endothelial_High"
## [3] "Calculate Sender DE between: CAF_Low and CAF_High"
## [4] "Calculate Sender DE between: CAF_Low and T.cell_High"
## [5] "Calculate Sender DE between: CAF_Low and Myeloid_High"
DE_receiver = calculate_niche_de(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), niches = niches, type = "receiver", assay_oi = assay_oi) # only receptors now, later on: DE analysis to find targets
## # A tibble: 1 x 2
## receiver receiver_other_niche
## <chr> <chr>
## 1 Malignant_High Malignant_Low
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
可以看到,它是先计算了Sender:high的5种sender细胞分别和low的3中sender细胞的Sender DE,又反过来计算了low的3中sender细胞分别和high的5种sender细胞的DE。
然后计算了Receiver:肿瘤细胞high-low的差异基因和low-high的差异基因。
这样把细胞类型分开挨个计算而不是把所有sender和receiver细胞合并计算的意义是避免差异分析的结果主要被丰度高的细胞驱动。
正着计算一遍,反着计算一遍,a-->b上调的/下调的基因和b-->a下调的/上调的难道不是一样的吗?这样做的意义是什么?
处理DE结果
根据细胞表达基因的百分比对差异基因做一下初步筛选,只有在超过设定阈值(10%)的细胞中有表达的基因才会被认为是普遍表达的差异基因。
expression_pct = 0.10
DE_sender_processed = process_niche_de(DE_table = DE_sender, niches = niches, expression_pct = expression_pct, type = "sender")
DE_receiver_processed = process_niche_de(DE_table = DE_receiver, niches = niches, expression_pct = expression_pct, type = "receiver")
Combine sender-receiver DE based on L-R pairs:
如前所述,来自一种sender细胞的差异表达的配体是通过计算该样品这种sender细胞和另一样品中所有sender细胞得到的。因此我们有多种方法总结得到细胞类型的差异表达配体。我们可以使用average LFC,也可以使用minimum LFC。但是更推荐使用minimum LFC
。因为它是评估配体表达的最强的特异性指标,因为高的min LFC意味着和niche 2中的所有细胞类型相比,这个配体在niche 1的这个细胞类型中表达更强(如果使用average LFC,则不能排除niche 2中一种或多种细胞也很强的表达这个配体)。
specificity_score_LR_pairs = "min_lfc"
DE_sender_receiver = combine_sender_receiver_de(DE_sender_processed, DE_receiver_processed, lr_network, specificity_score = specificity_score_LR_pairs)
这一步主要得到了DE_sender_receiver
这个对象,也就是不同niche中的差异基因。
3. 计算空间互作差异(可选)
限空间转录组数据
include_spatial_info_sender = TRUE # if not spatial info to include: put this to false # user adaptation required on own dataset
include_spatial_info_receiver = FALSE # if spatial info to include: put this to true # user adaptation required on own dataset
spatial_info = tibble(celltype_region_oi = "CAF_High", celltype_other_region = "myofibroblast_High", niche = "pEMT_High_niche", celltype_type = "sender") # user adaptation required on own dataset
specificity_score_spatial = "lfc"
# this is how this should be defined if you don't have spatial info
# mock spatial info
if(include_spatial_info_sender == FALSE & include_spatial_info_receiver == FALSE){
spatial_info = tibble(celltype_region_oi = NA, celltype_other_region = NA) %>% mutate(niche = niches %>% names() %>% head(1), celltype_type = "sender")
}
if(include_spatial_info_sender == TRUE){
sender_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$ligand %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "sender"))
sender_spatial_DE_processed = process_spatial_de(DE_table = sender_spatial_DE, type = "sender", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)
# add a neutral spatial score for sender celltypes in which the spatial is not known / not of importance
sender_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% bind_rows(sender_spatial_DE_others)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))
} else {
# # add a neutral spatial score for all sender celltypes (for none of them, spatial is relevant in this case)
sender_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "sender", lr_network = lr_network)
sender_spatial_DE_processed = sender_spatial_DE_processed %>% mutate(scaled_ligand_score_spatial = scale_quantile_adapted(ligand_score_spatial))
}
## [1] "Calculate Spatial DE between: CAF_High and myofibroblast_High"
if(include_spatial_info_receiver == TRUE){
receiver_spatial_DE = calculate_spatial_DE(seurat_obj = seurat_obj %>% subset(features = lr_network$receptor %>% unique()), spatial_info = spatial_info %>% filter(celltype_type == "receiver"))
receiver_spatial_DE_processed = process_spatial_de(DE_table = receiver_spatial_DE, type = "receiver", lr_network = lr_network, expression_pct = expression_pct, specificity_score = specificity_score_spatial)
# add a neutral spatial score for receiver celltypes in which the spatial is not known / not of importance
receiver_spatial_DE_others = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% bind_rows(receiver_spatial_DE_others)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
} else {
# # add a neutral spatial score for all receiver celltypes (for none of them, spatial is relevant in this case)
receiver_spatial_DE_processed = get_non_spatial_de(niches = niches, spatial_info = spatial_info, type = "receiver", lr_network = lr_network)
receiver_spatial_DE_processed = receiver_spatial_DE_processed %>% mutate(scaled_receptor_score_spatial = scale_quantile_adapted(receptor_score_spatial))
}
4. 计算配体活性,推断active ligand-target links
在这一步中,我们将要预测不同niches中receiver细胞类型的每个配体的活性。(和常规NicheNet的配体活性分析类似)。
为了计算配体活性,我们首先需要在每个niche中分别定义一个感兴趣的基因集。在这个示例中,pEMT-high的基因集是和pEMT-low肿瘤相比,pEMT-high中的上调基因。pEMT-low的基因集则相反。
lfc_cutoff = 0.15 # recommended for 10x as min_lfc cutoff.
specificity_score_targets = "min_lfc"
DE_receiver_targets = calculate_niche_de_targets(seurat_obj = seurat_obj, niches = niches, lfc_cutoff = lfc_cutoff, expression_pct = expression_pct, assay_oi = assay_oi)
## [1] "Calculate receiver DE between: Malignant_High and Malignant_Low"
## [1] "Calculate receiver DE between: Malignant_Low and Malignant_High"
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
background = DE_receiver_processed_targets %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
## [1] "ANXA8L2" "PRKCDBP" "IL8" "PTRF" "SEPP1" "C1orf186" "CCDC109B"
## [8] "C10orf54" "LEPREL1" "ZNF812" "LOC645638" "LOC401397" "LINC00162" "DFNA5"
## [15] "PLK1S1" "ZMYM6NB" "C19orf10" "CTSL1" "SQRDL" "LOC375295" "WBP5"
## [22] "LOC100505633" "AIM1" "C1orf63" "LOC100507463" "GPR115" "VIMP" "SEP15"
## [29] "C1orf172" "NAPRT1" "LHFP" "KRT16P1" "C7orf10" "PTPLA" "GRAMD3"
## [36] "CPSF3L" "MESDC2" "C10orf10" "KIAA1609" "CCDC53" "TXLNG2P" "NGFRAP1"
## [43] "ERO1L" "FAM134A" "LSMD1" "TCEB2" "B3GALTL" "HN1L" "LOC550643"
## [50] "KIAA0922" "GLT25D1" "FAM127A" "C1orf151-NBL1" "SEPW1" "GPR126" "LOC100505806"
## [57] "LINC00478" "TCEB1" "GRAMD2" "GNB2L1" "KIRREL"
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))
## [1] "LOC344887" "AGPAT9" "C1orf110" "KIAA1467" "LOC100292680" "EPT1" "CT45A4" "LOC654433"
## [9] "UPK3BL" "LINC00340" "LOC100128338" "FAM60A" "CCDC144C" "LOC401109" "LOC286467" "LEPREL4"
## [17] "LOC731275" "LOC642236" "LINC00516" "LOC101101776" "SC5DL" "PVRL4" "LOC100130093" "LINC00338"
## [25] "LOC100132891" "PPAP2C" "C6orf1" "C2orf47" "WHSC1L1" "LOC100289019" "SETD8" "KDM5B-AS1"
## [33] "SPG20" "CXCR7" "LOC100216479" "LOC100505761" "MGC57346" "LPHN3" "CENPC1" "C11orf93"
## [41] "C14orf169" "LOC100506060" "FLJ31485" "LOC440905" "MLF1IP" "TMEM194A" "RRP7B" "REXO1L1"
## [49] "LOC100129269" "KIAA1715" "CTAGE5" "LOC202781" "LOC100506714" "LOC401164" "UTS2D" "LOC146880"
## [57] "KIAA1804" "C5orf55" "C21orf119" "PRUNE" "LRRC16A" "LOC339240" "FLJ35024" "C5orf28"
## [65] "LOC100505876" "MGC21881" "LOC100133985" "PPAPDC2" "FRG1B" "CECR5" "LOC100129361" "CCBL1"
## [73] "PTPLAD1" "MST4" "LOC550112" "LOC389791" "CCDC90A" "KIAA0195" "LOC100506469" "LOC100133161"
## [81] "LOC646719" "LOC728819" "BRE" "LOC284581" "LOC441081" "LOC728377" "LOC100134229" "C3orf65"
## [89] "SMEK2" "KIAA1737" "C17orf70" "PLEKHM1P" "LOC338758" "PCNXL2" "LOC91948" "C17orf89"
## [97] "LOC100505783" "SMCR7L" "C8orf4" "GPR56" "ATHL1" "LOC339535" "PPAPDC1B" "DAK"
## [105] "LOC100507173" "CRHR1-IT1" "PPAP2B" "ADCK4" "KIAA0146" "GYLTL1B" "LOC100272216" "LOC400027"
## [113] "WHSC1" "LOC100130855" "C7orf55" "C19orf40" "ADCK3" "C9orf142" "SGOL1" "LOC90834"
## [121] "PTPLAD2" "KIAA1967" "LOC100132352" "LOC100630918" "ADRBK2" "LINC00263" "FAM64A" "LOC401074"
## [129] "FAM179B" "RP1-177G6.2" "METTL21D" "ERO1LB" "FLJ45445" "NADKD1" "LOC100506233" "LOC100652772"
## [137] "FAM175A" "LINC00630" "C11orf82" "SETD5-AS1" "SGK196" "FLJ14186" "CCDC104" "FAM63A"
## [145] "NARG2" "MTERFD1" "CCDC74B-AS1" "LOC286186" "WDR67" "C12orf52" "FLJ30403" "KIAA2018"
## [153] "GCN1L1" "FLJ43681" "LOC152217" "FONG" "C18orf8" "ALG1L9P" "GTDC2" "LOC100507217"
## [161] "NBPF24" "WBSCR27" "C14orf1" "LOC284889" "KIAA0317" "FAM65A" "PMS2L2" "LUST"
## [169] "C15orf52" "FAM195A" "LOC399744" "PYCRL" "LOC338799" "LOC100506190" "C9orf91" "FLJ45340"
## [177] "LOC349196" "LOC100128881" "TOMM70A" "ALS2CR8" "LDOC1L" "HDGFRP3" "ZNF767" "LOC728558"
## [185] "LOC283693" "LEPREL2" "QTRTD1" "SELM" "C6orf25" "C1orf86" "HNRPLL" "LOC145820"
## [193] "LOC100289341" "C17orf85" "C3orf72" "C14orf64" "C9orf9" "LOC100506394"
length(geneset_niche1)
## [1] 1668
length(geneset_niche2)
## [1] 2889
在做配体活性分析之前,最好还是做一下基因集中的基因数的检测。一般认为对配体活性分析来说,感兴趣的基因集中有20-1000基因是比较合适的。如果得到的DE基因数过多,推荐使用更高的lfc_cutoff
阈值。在有>2的receiver细胞/niches时或,我们建议使用0.15的cutoff值。如果只有2组receiver细胞/niches时,我们建议使用更高的阈值(比如0.25)。如果是测序深度比较深的数据比如Smart-seq2,同样建议使用更高的阈值。
在这个演示数据中,我们使用的是Smart-seq2的数据,而且只有比较了2个niches,所以我们使用高LFC阈值以得到更少的DE基因(更高的阈值得到的DE基因 更少,可信度更高)。
lfc_cutoff = 0.75
specificity_score_targets = "min_lfc"
DE_receiver_processed_targets = process_receiver_target_de(DE_receiver_targets = DE_receiver_targets, niches = niches, expression_pct = expression_pct, specificity_score = specificity_score_targets)
background = DE_receiver_processed_targets %>% pull(target) %>% unique()
geneset_niche1 = DE_receiver_processed_targets %>% filter(receiver == niches[[1]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
geneset_niche2 = DE_receiver_processed_targets %>% filter(receiver == niches[[2]]$receiver & target_score >= lfc_cutoff & target_significant == 1 & target_present == 1) %>% pull(target) %>% unique()
# Good idea to check which genes will be left out of the ligand activity analysis (=when not present in the rownames of the ligand-target matrix).
# If many genes are left out, this might point to some issue in the gene naming (eg gene aliases and old gene symbols, bad human-mouse mapping)
geneset_niche1 %>% setdiff(rownames(ligand_target_matrix))
## [1] "ANXA8L2" "PRKCDBP" "IL8" "PTRF" "SEPP1" "C1orf186"
geneset_niche2 %>% setdiff(rownames(ligand_target_matrix))
## [1] "LOC344887" "AGPAT9" "C1orf110" "KIAA1467" "LOC100292680" "EPT1" "CT45A4"
length(geneset_niche1)
## [1] 169
length(geneset_niche2)
## [1] 136
top_n_target = 250
niche_geneset_list = list(
"pEMT_High_niche" = list(
"receiver" = niches[[1]]$receiver,
"geneset" = geneset_niche1,
"background" = background),
"pEMT_Low_niche" = list(
"receiver" = niches[[2]]$receiver,
"geneset" = geneset_niche2 ,
"background" = background)
)
ligand_activities_targets = get_ligand_activities_targets(niche_geneset_list = niche_geneset_list, ligand_target_matrix = ligand_target_matrix, top_n_target = top_n_target)
## [1] "Calculate Ligand activities for: Malignant_High"
## [1] "Calculate Ligand activities for: Malignant_Low"
5. Calculate (scaled) expression of ligands, receptors and targets across cell types of interest (log expression values and expression fractions)
在这一步中,我们将会计算受体、配体和靶基因在不同细胞群中的平均(scaled)表达和表达fraction。这里是使用DotPlot展示的,也可以用其他方式展示。
features_oi = union(lr_network$ligand, lr_network$receptor) %>% union(ligand_activities_targets$target) %>% setdiff(NA)
dotplot = suppressWarnings(Seurat::DotPlot(seurat_obj %>% subset(idents = niches %>% unlist() %>% unique()), features = features_oi, assay = assay_oi))
exprs_tbl = dotplot$data %>% as_tibble()
exprs_tbl = exprs_tbl %>% rename(celltype = id, gene = features.plot, expression = avg.exp, expression_scaled = avg.exp.scaled, fraction = pct.exp) %>%
mutate(fraction = fraction/100) %>% as_tibble() %>% select(celltype, gene, expression, expression_scaled, fraction) %>% distinct() %>% arrange(gene) %>% mutate(gene = as.character(gene))
exprs_tbl_ligand = exprs_tbl %>% filter(gene %in% lr_network$ligand) %>% rename(sender = celltype, ligand = gene, ligand_expression = expression, ligand_expression_scaled = expression_scaled, ligand_fraction = fraction)
exprs_tbl_receptor = exprs_tbl %>% filter(gene %in% lr_network$receptor) %>% rename(receiver = celltype, receptor = gene, receptor_expression = expression, receptor_expression_scaled = expression_scaled, receptor_fraction = fraction)
exprs_tbl_target = exprs_tbl %>% filter(gene %in% ligand_activities_targets$target) %>% rename(receiver = celltype, target = gene, target_expression = expression, target_expression_scaled = expression_scaled, target_fraction = fraction)
dotplot
为什么要用Dotplot展示这么多基因?直接热图多好
exprs_tbl_ligand = exprs_tbl_ligand %>% mutate(scaled_ligand_expression_scaled = scale_quantile_adapted(ligand_expression_scaled)) %>% mutate(ligand_fraction_adapted = ligand_fraction) %>% mutate_cond(ligand_fraction >= expression_pct, ligand_fraction_adapted = expression_pct) %>% mutate(scaled_ligand_fraction_adapted = scale_quantile_adapted(ligand_fraction_adapted))
exprs_tbl_receptor = exprs_tbl_receptor %>% mutate(scaled_receptor_expression_scaled = scale_quantile_adapted(receptor_expression_scaled)) %>% mutate(receptor_fraction_adapted = receptor_fraction) %>% mutate_cond(receptor_fraction >= expression_pct, receptor_fraction_adapted = expression_pct) %>% mutate(scaled_receptor_fraction_adapted = scale_quantile_adapted(receptor_fraction_adapted))
这一步得到了ligand, receptor和target的表达表。以exprs_tbl_ligand为例,每个表中都有ligand/ receptor/ target的细胞类型,表达量和在细胞中的表达百分比。
6. Expression fraction and receptor
在这一步中,我们将会基于受体表达强度计算配体-受体互作,对各细胞类型里各配体的受体进行打分,表达最强的受体将被给予最高的评分。这不会影响随后对单个配体的排序,但是将会帮助我们对每个配体最重要的受体进行排序。(next to other factors regarding the receptor - see later).
exprs_sender_receiver = lr_network %>%
inner_join(exprs_tbl_ligand, by = c("ligand")) %>%
inner_join(exprs_tbl_receptor, by = c("receptor")) %>% inner_join(DE_sender_receiver %>% distinct(niche, sender, receiver))
ligand_scaled_receptor_expression_fraction_df = exprs_sender_receiver %>% group_by(ligand, receiver) %>% mutate(rank_receptor_expression = dense_rank(receptor_expression), rank_receptor_fraction = dense_rank(receptor_fraction)) %>% mutate(ligand_scaled_receptor_expression_fraction = 0.5*( (rank_receptor_fraction / max(rank_receptor_fraction)) + ((rank_receptor_expression / max(rank_receptor_expression))) ) ) %>% distinct(ligand, receptor, receiver, ligand_scaled_receptor_expression_fraction, bonafide) %>% distinct() %>% ungroup()
7. Prioritization of ligand-receptor and ligand-target links
在这一步中,我们将会结合上面所有的计算结果来对ligand-receptor-target直接的links进行排序。We scale every property of interest between 0 and 1, and the final prioritization score is a weighted sum of the scaled scores of all the properties of interest.
We provide the user the option to consider the following properties for prioritization (of which the weights are defined in prioritizing_weights
) :
prioritizing_weights = c("scaled_ligand_score" = 5,
"scaled_ligand_expression_scaled" = 1,
"ligand_fraction" = 1,
"scaled_ligand_score_spatial" = 2,
"scaled_receptor_score" = 0.5,
"scaled_receptor_expression_scaled" = 0.5,
"receptor_fraction" = 1,
"ligand_scaled_receptor_expression_fraction" = 1,
"scaled_receptor_score_spatial" = 0,
"scaled_activity" = 0,
"scaled_activity_normalized" = 1,
"bona_fide" = 1)
Note: these settings will give substantially more weight to DE ligand-receptor pairs compared to activity. Users can change this if wanted, just like other settings can be changed if that would be better to tackle the specific biological question you want to address.
output = list(DE_sender_receiver = DE_sender_receiver, ligand_scaled_receptor_expression_fraction_df = ligand_scaled_receptor_expression_fraction_df, sender_spatial_DE_processed = sender_spatial_DE_processed, receiver_spatial_DE_processed = receiver_spatial_DE_processed,
ligand_activities_targets = ligand_activities_targets, DE_receiver_processed_targets = DE_receiver_processed_targets, exprs_tbl_ligand = exprs_tbl_ligand, exprs_tbl_receptor = exprs_tbl_receptor, exprs_tbl_target = exprs_tbl_target)
prioritization_tables = get_prioritization_tables(output, prioritizing_weights)
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[1]]$receiver) %>% head(10)
## # A tibble: 10 x 37
## niche receiver sender ligand_receptor ligand receptor bonafide ligand_score ligand_signific~ ligand_present ligand_expressi~
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 pEMT_H~ Malignan~ T.cel~ PTPRC--MET PTPRC MET FALSE 3.22 1 1 9.32
## 2 pEMT_H~ Malignan~ T.cel~ PTPRC--EGFR PTPRC EGFR FALSE 3.22 1 1 9.32
## 3 pEMT_H~ Malignan~ T.cel~ PTPRC--CD44 PTPRC CD44 FALSE 3.22 1 1 9.32
## 4 pEMT_H~ Malignan~ T.cel~ PTPRC--ERBB2 PTPRC ERBB2 FALSE 3.22 1 1 9.32
## 5 pEMT_H~ Malignan~ T.cel~ PTPRC--IFNAR1 PTPRC IFNAR1 FALSE 3.22 1 1 9.32
## 6 pEMT_H~ Malignan~ T.cel~ TNF--TNFRSF21 TNF TNFRSF21 TRUE 1.74 1 1 2.34
## 7 pEMT_H~ Malignan~ Myelo~ SERPINA1--LRP1 SERPI~ LRP1 TRUE 2.52 1 1 4.83
## 8 pEMT_H~ Malignan~ Myelo~ IL1B--IL1RAP IL1B IL1RAP TRUE 1.50 1 1 1.93
## 9 pEMT_H~ Malignan~ Myelo~ IL1RN--IL1R2 IL1RN IL1R2 TRUE 1.62 1 1 2.07
## 10 pEMT_H~ Malignan~ T.cel~ PTPRC--INSR PTPRC INSR FALSE 3.22 1 1 9.32
## # ... with 26 more variables: ligand_expression_scaled <dbl>, ligand_fraction <dbl>, ligand_score_spatial <dbl>,
## # receptor_score <dbl>, receptor_significant <dbl>, receptor_present <dbl>, receptor_expression <dbl>,
## # receptor_expression_scaled <dbl>, receptor_fraction <dbl>, receptor_score_spatial <dbl>,
## # ligand_scaled_receptor_expression_fraction <dbl>, avg_score_ligand_receptor <dbl>, activity <dbl>, activity_normalized <dbl>,
## # scaled_ligand_score <dbl>, scaled_ligand_expression_scaled <dbl>, scaled_receptor_score <dbl>,
## # scaled_receptor_expression_scaled <dbl>, scaled_avg_score_ligand_receptor <dbl>, scaled_ligand_score_spatial <dbl>,
## # scaled_receptor_score_spatial <dbl>, scaled_ligand_fraction_adapted <dbl>, scaled_receptor_fraction_adapted <dbl>, ...
prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[1]]$receiver) %>% head(10)
## # A tibble: 10 x 20
## niche receiver sender ligand_receptor ligand receptor bonafide target target_score target_signific~ target_present
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <dbl> <dbl> <dbl>
## 1 pEMT_High_niche Malignant~ T.cell~ PTPRC--MET PTPRC MET FALSE EHF 1.04 1 1
## 2 pEMT_High_niche Malignant~ T.cell~ PTPRC--MET PTPRC MET FALSE GADD4~ 0.836 1 1
## 3 pEMT_High_niche Malignant~ T.cell~ PTPRC--MET PTPRC MET FALSE SERPI~ 0.889 1 1
## 4 pEMT_High_niche Malignant~ T.cell~ PTPRC--EGFR PTPRC EGFR FALSE EHF 1.04 1 1
## 5 pEMT_High_niche Malignant~ T.cell~ PTPRC--EGFR PTPRC EGFR FALSE GADD4~ 0.836 1 1
## 6 pEMT_High_niche Malignant~ T.cell~ PTPRC--EGFR PTPRC EGFR FALSE SERPI~ 0.889 1 1
## 7 pEMT_High_niche Malignant~ T.cell~ PTPRC--CD44 PTPRC CD44 FALSE EHF 1.04 1 1
## 8 pEMT_High_niche Malignant~ T.cell~ PTPRC--CD44 PTPRC CD44 FALSE GADD4~ 0.836 1 1
## 9 pEMT_High_niche Malignant~ T.cell~ PTPRC--CD44 PTPRC CD44 FALSE SERPI~ 0.889 1 1
## 10 pEMT_High_niche Malignant~ T.cell~ PTPRC--ERBB2 PTPRC ERBB2 FALSE EHF 1.04 1 1
## # ... with 9 more variables: target_expression <dbl>, target_expression_scaled <dbl>, target_fraction <dbl>,
## # ligand_target_weight <dbl>, activity <dbl>, activity_normalized <dbl>, scaled_activity <dbl>,
## # scaled_activity_normalized <dbl>, prioritization_score <dbl>
prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(receiver == niches[[2]]$receiver) %>% head(10)
## # A tibble: 10 x 37
## niche receiver sender ligand_receptor ligand receptor bonafide ligand_score ligand_signific~ ligand_present ligand_expressi~
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 pEMT_L~ Maligna~ Endoth~ F8--LRP1 F8 LRP1 TRUE 0.952 1 1 2.17
## 2 pEMT_L~ Maligna~ Endoth~ PLAT--LRP1 PLAT LRP1 TRUE 0.913 1 1 2.70
## 3 pEMT_L~ Maligna~ CAF_Low FGF10--FGFR2 FGF10 FGFR2 TRUE 0.385 0.8 1 1.07
## 4 pEMT_L~ Maligna~ CAF_Low NLGN2--NRXN3 NLGN2 NRXN3 TRUE 0.140 0.2 1 0.269
## 5 pEMT_L~ Maligna~ CAF_Low RSPO3--LGR6 RSPO3 LGR6 TRUE 0.557 0.8 1 1.27
## 6 pEMT_L~ Maligna~ CAF_Low COMP--SDC1 COMP SDC1 TRUE 0.290 0.8 1 1.27
## 7 pEMT_L~ Maligna~ CAF_Low SEMA3C--NRP2 SEMA3C NRP2 TRUE 0.652 1 1 1.73
## 8 pEMT_L~ Maligna~ CAF_Low SLIT2--SDC1 SLIT2 SDC1 TRUE 0.494 1 1 0.846
## 9 pEMT_L~ Maligna~ Endoth~ IL33--IL1RAP IL33 IL1RAP FALSE 1.34 1 1 2.75
## 10 pEMT_L~ Maligna~ CAF_Low C3--LRP1 C3 LRP1 TRUE 0.480 1 1 4.79
## # ... with 26 more variables: ligand_expression_scaled <dbl>, ligand_fraction <dbl>, ligand_score_spatial <dbl>,
## # receptor_score <dbl>, receptor_significant <dbl>, receptor_present <dbl>, receptor_expression <dbl>,
## # receptor_expression_scaled <dbl>, receptor_fraction <dbl>, receptor_score_spatial <dbl>,
## # ligand_scaled_receptor_expression_fraction <dbl>, avg_score_ligand_receptor <dbl>, activity <dbl>, activity_normalized <dbl>,
## # scaled_ligand_score <dbl>, scaled_ligand_expression_scaled <dbl>, scaled_receptor_score <dbl>,
## # scaled_receptor_expression_scaled <dbl>, scaled_avg_score_ligand_receptor <dbl>, scaled_ligand_score_spatial <dbl>,
## # scaled_receptor_score_spatial <dbl>, scaled_ligand_fraction_adapted <dbl>, scaled_receptor_fraction_adapted <dbl>, ...
prioritization_tables$prioritization_tbl_ligand_target %>% filter(receiver == niches[[2]]$receiver) %>% head(10)
## # A tibble: 10 x 20
## niche receiver sender ligand_receptor ligand receptor bonafide target target_score target_signific~ target_present
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <dbl> <dbl> <dbl>
## 1 pEMT_Low_niche Malignant~ Endothe~ F8--LRP1 F8 LRP1 TRUE ETV4 0.771 1 1
## 2 pEMT_Low_niche Malignant~ Endothe~ PLAT--LRP1 PLAT LRP1 TRUE CLDN7 0.835 1 1
## 3 pEMT_Low_niche Malignant~ Endothe~ PLAT--LRP1 PLAT LRP1 TRUE ETV4 0.771 1 1
## 4 pEMT_Low_niche Malignant~ CAF_Low FGF10--FGFR2 FGF10 FGFR2 TRUE ETV4 0.771 1 1
## 5 pEMT_Low_niche Malignant~ CAF_Low FGF10--FGFR2 FGF10 FGFR2 TRUE WNT5A 1.40 1 1
## 6 pEMT_Low_niche Malignant~ CAF_Low NLGN2--NRXN3 NLGN2 NRXN3 TRUE CLDN5 0.979 1 1
## 7 pEMT_Low_niche Malignant~ CAF_Low NLGN2--NRXN3 NLGN2 NRXN3 TRUE ETV4 0.771 1 1
## 8 pEMT_Low_niche Malignant~ CAF_Low RSPO3--LGR6 RSPO3 LGR6 TRUE DDC 0.832 1 1
## 9 pEMT_Low_niche Malignant~ CAF_Low RSPO3--LGR6 RSPO3 LGR6 TRUE EGFL7 0.763 1 1
## 10 pEMT_Low_niche Malignant~ CAF_Low COMP--SDC1 COMP SDC1 TRUE CLDN7 0.835 1 1
## # ... with 9 more variables: target_expression <dbl>, target_expression_scaled <dbl>, target_fraction <dbl>,
## # ligand_target_weight <dbl>, activity <dbl>, activity_normalized <dbl>, scaled_activity <dbl>,
## # scaled_activity_normalized <dbl>, prioritization_score <dbl>
8. Visualization of the Differential NicheNet output
Differential expression of ligand and expression
在可视化之前,我们需要先定义每个niche中最重要的配体受体对。We will do this by first determining for which niche the highest score is found for each ligand/ligand-receptor pair. And then getting the top 50 ligands per niche.
top_ligand_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
top_ligand_receptor_niche_df = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, receptor, prioritization_score) %>% group_by(ligand, receptor) %>% top_n(1, prioritization_score) %>% ungroup() %>% select(ligand, receptor, niche) %>% rename(top_niche = niche)
ligand_prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% select(niche, sender, receiver, ligand, prioritization_score) %>% group_by(ligand, niche) %>% top_n(1, prioritization_score) %>% ungroup() %>% distinct() %>% inner_join(top_ligand_niche_df) %>% filter(niche == top_niche) %>% group_by(niche) %>% top_n(50, prioritization_score) %>% ungroup() # get the top50 ligands per niche
Now we will look first at the top ligand-receptor pairs for KCs (here, we will take the top 2 scoring receptors per prioritized ligand)
receiver_oi = "Malignant_High"
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
Visualization: minimum LFC compared to other niches
lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot
Show the spatialDE as additional information
lfc_plot_spatial = make_ligand_receptor_lfc_spatial_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, ligand_spatial = include_spatial_info_sender, receptor_spatial = include_spatial_info_receiver, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot_spatial
Ligand expression, activity and target genes
Active target gene inference - cf Default NicheNet
Now: visualization of ligand activity and ligand-target links
exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand, output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot
基于这个plot,我们可以推断出许多假说假说,比如:“Interestingly, IL1 family ligands seem to have activity in inducing the DE genes between high pEMT and low pEMT malignant cells; and they are mainly expressed by myeloid cells, a cell type unique for pEMT-high tumors.”
important: ligand-receptor pairs with both high differential expression (or condition-specificity) and ligand activity (=target gene enrichment) are very interesting predictions as key regulators of your intercellular communication process of interest !
important: ligand-receptor pairs with both high differential expression (or condition-specificity) and ligand activity (=target gene enrichment) are very interesting predictions as key regulators of your intercellular communication process of interest !
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(20, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
exprs_activity_target_plot = make_ligand_activity_target_exprs_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, prioritization_tables$prioritization_tbl_ligand_target, output$exprs_tbl_ligand, output$exprs_tbl_target, lfc_cutoff, ligand_target_matrix, plot_legend = FALSE, heights = NULL, widths = NULL)
exprs_activity_target_plot$combined_plot
Circos plot of prioritized ligand-receptor pairs
Because a top50 is too much to visualize in a circos plot, we will only visualize the top 15.
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(15, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
colors_sender = brewer.pal(n = prioritized_tbl_oi$sender %>% unique() %>% sort() %>% length(), name = 'Spectral') %>% magrittr::set_names(prioritized_tbl_oi$sender %>% unique() %>% sort())
colors_receiver = c("lavender") %>% magrittr::set_names(prioritized_tbl_oi$receiver %>% unique() %>% sort())
circos_output = make_circos_lr(prioritized_tbl_oi, colors_sender, colors_receiver)
Interpretation of these results
多数排名靠前的差异性L-R对似乎来自仅存在于pEMT高肿瘤中的细胞类型。这可能部分是由于生物学的因素(在某种情况下某种独特的细胞类型可能非常重要),但也可能是由于优先排序的方式以及这些独特的细胞类型在其他niche中并没有“counterpart”。
由于肿瘤微环境中,髓系细胞和T细胞与其他细胞有很大不同,因此它们的配体会显示出很强的差异表达。与来自相同细胞类型但不同niche/条件的细胞之间的差异表达(pEMT-high中的CAF与pEMT-low中的CAF相比)相比,同一niche/条件中(如pEMT-low肿瘤中)髓系/T 细胞vs肌成纤维细胞/CAFs/内皮细胞的差异表达可能更明显。
Visualization for the other condition: pEMT-low
receiver_oi = "Malignant_Low"
filtered_ligands = ligand_prioritized_tbl_oi %>% filter(receiver == receiver_oi) %>% top_n(50, prioritization_score) %>% pull(ligand) %>% unique()
prioritized_tbl_oi = prioritization_tables$prioritization_tbl_ligand_receptor %>% filter(ligand %in% filtered_ligands) %>% select(niche, sender, receiver, ligand, receptor, ligand_receptor, prioritization_score) %>% distinct() %>% inner_join(top_ligand_receptor_niche_df) %>% group_by(ligand) %>% filter(receiver == receiver_oi) %>% top_n(2, prioritization_score) %>% ungroup()
lfc_plot = make_ligand_receptor_lfc_plot(receiver_oi, prioritized_tbl_oi, prioritization_tables$prioritization_tbl_ligand_receptor, plot_legend = FALSE, heights = NULL, widths = NULL)
lfc_plot
9. Notes, limitations, and comparison to default NicheNet.
在原始的NicheNet pipeline中,配体 - 受体对表达的排序仅仅是根据其配体活性得出的。在这个差异性NicheNet pipeline中,我们也是根据与其他niches(或者其他空间位置信息[空转数据])相比,L-R 对的差异表达来得到信息。
因为我们在这里关注配体- 受体对的差异表达,并且通过使用默认将DE而不是activity赋予更高的优先级权重,我们倾向于找到许多与默认NicheNet管道不同的hits。在Differential NicheNet中,我们倾向于找到更多的high-DE, low-activity hits,而使用default NicheNet,我们找到更多的low-DE, high-activity hits。
应该注意的是,一些high-DE, low-activity hits可能非常重要,因为它们可能是由于NicheNet活性预测的限制而具有低NicheNet活性(例如NicheNet中关于该配体的不正确/不完全的先验知识),但其中一些也可能在DE中很高,但activity不高,因为它们没有强烈的信号效应(例如,仅参与细胞粘附的配体)。
相反的,对于在Diffifer NicheNet中没有被强烈优先考虑的low-DE, high-activity的受体配体对,应考虑以下因素:1)一些配体受到转录后调控,高预测活性可能仍然反映的是真实的信号传导; 2)高预测活性值可能是由于NicheNet的局限性(不准确的先验知识),这些低DE配体在感兴趣的生物学过程中并不重要(但该配体的高DE家族成员可能是重要的。因为家族成员之间的信号传导往往非常相似); 3)一种情况下的高活性可能是由于另一种情况下的下调,导致高activity和DE低。目前,配体活性是在每个条件下根据上调基因计算的,但下调基因也可能是配体活性的标志。
当配体 - 受体对同时具有高DE和高activity时,我们可以认为它们是调节感兴趣过程的非常好的候选者,我们建议测试这些候选物以进行进一步的实验验证。
References
Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods (2019) doi:10.1038/s41592-019-0667-5
Guilliams et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell (2022) doi:10.1016/j.cell.2021.12.018
Puram, Sidharth V., Itay Tirosh, Anuraag S. Parikh, Anoop P. Patel, Keren Yizhak, Shawn Gillespie, Christopher Rodman, et al. 2017. “Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer.” Cell 171 (7): 1611–1624.e24. https://doi.org/10.1016/j.cell.2017.10.044.