10X单细胞空间通讯分析之最新版cellphoneDB（v4）解

2023-05-04 本文已影响0人单细胞空间交响乐

作者，Evil Genius

前不久刚给学员上了一节关于细胞通讯的课程，也发现了很多软件的更新之处，在这里给大家分享一下cellphoneDB v4.0最新更新的内容。

考虑空间位置的通讯分析手段---CellphoneDB（V3.0）

安装上的不同，现在cellphoneDB完全封装成一个linux运行命令，conda直接安装就可以。

conda create -n cpdb python=3.8

source activate cpdb

pip install cellphonedb

分析方法上的更新（三种方法选择）

METHOD 1 simple analysis (>= v1): Here, no statistical analysis is performed. CellphoneDB will output the mean for all the interactions for each cell type pair combination. Note that CellphoneDB will report the means only if all the gene members of the interactions are expressed by at least a fraction of cells in a cell type (threshold). If the condition threshold is not met, the interaction will be ignored in the corresponding cell type pairs.

如果采用方法1，那么直接会选出所有的表达配受体的细胞类型pair

means, deconvoluted = cpdb_analysis_method.call(
         cpdb_file_path = cellphonedb.zip,
         meta_file_path = test_meta.txt,
         counts_file_path = test_counts.h5ad,
         counts_data = 'hgnc_symbol',
         output_path = out_path)

结果只包含受配体对的means.csv and deconvoluted.csv

METHOD 2 statistical_analysis (>= v1): This is a statistical analysis that evaluates for significance all the interactions that can potentially occur in your dataset: i.e. between ALL the potential cell type pairs. Here, CellphoneDB uses empirical shuffling to calculate which ligand–receptor pairs display significant cell-type specificity. Specifically, it estimates a null distribution of the mean of the average ligand and receptor expression in the interacting clusters by randomly permuting the cluster labels of all cells. The P value for the likelihood of cell-type specificity of a given receptor–ligand complex is calculated on the basis of the proportion of the means that are as high as or higher than the actual mean.

如果采用方法2，那么就会对配受体对进行假设检验

Only receptors and ligands expressed in more than a user-specified threshold percentage of the cells in the specific cluster (threshold default is 0.1) are tested and will get a mean value in the significant.txt output.

For the multi-subunit heteromeric complexes, we require that:
1、 all subunits of the complex are expressed by a proportion of cells (threshold), and then
2、 We use the member of the complex with the minimum expression to compute the interaction means and perform the random shuffling.

然后，对所有细胞类型进行两两比较。首先，随机排列所有细胞的cluster标签(默认为1000)，并确定cluster中平均受体表达水平的平均值和相互作用cluster中平均配体表达水平的平均值。对于两种细胞类型之间的每个成对比较中的每个受体配体对，这产生零分布。通过计算等于或高于实际平均值的平均值的比例，获得给定受体-配体复合物细胞类型特异性可能性的p值。然后，根据显著对的数量优先考虑细胞类型之间高度丰富的相互作用，以便可以手动选择生物学上相关的相互作用。

from cellphonedb.src.core.methods import cpdb_statistical_analysis_method

deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
        cpdb_file_path = cellphonedb.zip,
        meta_file_path = test_meta.txt,
        counts_file_path = test_counts.h5ad,
        counts_data = 'hgnc_symbol',
        output_path = out_path)

METHOD 3 degs_analysis (>= v3): This method is proposed as an alternative to the statistical inference approach. This approach allows the user to design more complex comparisons to retrieve interactions specific to a cell type of interest. This is particularly relevant when your research question goes beyond comparing “one” cell type vs “the rest”. Examples of alternative contrasts are hierarchical comparisons (e.g. you are interested in a specific lineage, such epithelial cells, and wish to identify the genes changing their expression within this lineage) or comparing disease vs control (e.g. you wish to identify upregulated genes in disease T cells by comparing them against control T cells). For this CellphoneDB method (cpdb_degs_analysis_method), the user provides an input file (test_DEGs.txt in the command below) indicating which genes are relevant for a cell type (for example, marker genes or significantly upregulated genes resulting from a differential expression analysis (DEG)). CellphoneDB will select interactions where:
1、 all the genes in the interaction are expressed in the corresponding cell type by more than 10% of cells (threshold = 0.1) and
2、 at least one gene-cell type pair is in the provided DEG.tsv file.

from cellphonedb.src.core.methods import cpdb_degs_analysis_method

deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
         cpdb_file_path = cellphonedb.zip,
         meta_file_path = test_meta.txt,
         counts_file_path = test_counts.h5ad,
         degs_file_path = degs_file.txt,
         counts_data = 'hgnc_symbol',
         threshold = 0.1,
         output_path = out_path)

这种方法可以自由地设计基因表达比较，以更好地匹配研究问题。使用方法2，零假设(和背景分布)考虑数据集中的所有细胞类型，并执行“一个”细胞类型与“其余”细胞类型的比较。然而，分析可能希望使用不同的方法来更好地反映研究情况。下面是一些例子:

分析需要考虑技术批次或生物协变量。在这里，更好的方法是依赖包含这些混杂因素的差异表达方法，并直接向CellphoneDB提供结果。

**对特定谱系中的特异性感兴趣，并希望执行分层差异表达分析(例如，对特定谱系感兴趣，例如上皮细胞，并希望识别在该上皮谱系中改变其表达的基因;研究问题:与上皮细胞b相比，上皮细胞a中哪些相互作用被上调?）

希望在疾病与控制方式中比较特定群体(例如，通过将疾病T细胞与对照T细胞进行比较来识别疾病T细胞中的上调基因;研究问题:疾病t细胞上调了哪些相互作用?）

包含空间信息，可以参考考虑空间位置的通讯分析手段---CellphoneDB（V3.0）。

结果解读

Output files

All files (except “deconvoluted.txt”) follow the same structure: rows depict interacting proteins while columns represent interacting cell type pairs.

The “means.txt” file contains mean values for each ligand-receptor interaction (rows) for each cell-cell interaction pair (columns).
The “pvalues.txt” contains the P values for the likelihood of cell-type specificity of a given receptor–ligand complex (rows) in each cell-cell interaction pair (columns), resulting from the statistical_analysis.
The “significant_means.txt” contains the mean expression (same as “means.txt”) of the significant receptor–ligand complex only. This is the result of crossing “means.csv” and “pvalues.txt”.
The “relevant_interactions.txt” contains a binary matrix indicating if the interaction is relevant (1) or not (0). An interaction is classified as relevant if a gene is a DEG in a cluster/cell type (information provided by the user in the DEG.tsv file) and all the participant genes are expressed. Alternatively, the value is set to 0. This file is specific to degs_analysis. Each row corresponds to a ligand-receptor interaction, while each column corresponds to a cell-cell interacting pair.
The “deconvoluted.txt” file gives additional information for each of the interacting partners. This is important as some of the interacting partners are heteromers. In other words, multiple molecules have to be expressed in the same cluster in order for the interacting partner to be functional.

See below the meaning of each column in the outputs:

P-value (pvalues.txt), Mean (means.txt), Significant mean (significant_means.txt) and Relevant interactions (relevant_interactions.txt)

id_cp_interaction: Unique CellphoneDB identifier for each interaction stored in the database.

interacting_pair: Name of the interacting pairs separated by “|”.

partner A or B: Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:)

gene A or B: Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list.

secreted: True if one of the partners is secreted.

Receptor A or B: True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database.

annotation_strategy: Curated if the interaction was annotated by the CellphoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from.

is_integrin: True if one of the partners is integrin.

rank: Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. (Only in significant_means.txt)

means: Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. (Only in means.txt)

p.values: p-values for all the interacting partners: p.value refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. (Only in pvalues.txt)

significant_mean: Significant mean calculation for all the interacting partners. If p.value < 0.05, the value will be the mean. Alternatively, the value is set to 0. (Only in significant_means.txt)

relevant_interactions: Indicates if the interaction is relevant (1) or not (0). If a gene in the interaction is a DEG (i.e. a gene in the DEG.tsv file), and all the participant genes are expressed, the interaction will be classified as relevant. Alternatively, the value is set to 0. ( Only in relevant_interactions.txt)

Again, remember that the interactions are not symmetric. It is not the same IL12-IL12 receptor for clusterA clusterB (i.e. receptor is in clusterB) that IL12-IL12 receptor for clusterB clusterA (i.e. receptor is in clusterA).

Deconvoluted (deconvoluted.txt)

gene_name: Gene identifier for one of the subunits that are participating in the interaction defined in the “means.csv” file. The identifier will depend on the input of the user list.

uniprot: UniProt identifier for one of the subunits that are participating in the interaction defined in the “means.csv” file.

is_complex: True if the subunit is part of a complex. Single if it is not, complex if it is.

protein_name: Protein name for one of the subunits that are participating in the interaction defined in the “means.csv” file.

complex_name: Complex name if the subunit is part of a complex. Empty if not.

id_cp_interaction: Unique CellphoneDB identifier for each of the interactions stored in the database.

mean: Mean expression of the corresponding gene in each cluster.

Interpreting the outputs

How to read and interpret the results?

The key files are significant_means.txt (for statistical_analysis) or relevant_interactions.txt (for degs_analysis), see below. When interpreting the results, we recommend you first define your questions of interest. Next, focus on specific cell type pairs and manually review the interactions prioritising those with lower p-value and/or higher mean expression. For graphical representation we recommend @zktuong repository: ktplots in R and ktplotspy in python.

CellphoneDB output is high-throughput. CellphoneDB provides all cell-cell interactions that may potentially occur in your dataset, given the expression of the cells. The size of the output may be overwhelming, but if you apply some rationale (which will depend on the design of your experiment and your biological question), you will be able to narrow it down to a few candidate interactions. The new method degs_analysis will allow you to perform a more tailored analysis towards specific cell-types or conditions, while the option microenvs will allow you to restrict the combinations of cell-type pairs to test.

It may be that not all of the cell-types of your input dataset co-appear in time and space. Cell types that do not co-appear in time and space will not interact. For example, you might have cells coming from different in vitro systems, different developmental stages or disease and control conditions. Use this prior information to restrict and ignore infeasible cell-type combinations from the outputs (i.e., columns) as well as their associated interactions (i.e. rows). You can restrict the analysis to feasible cell-type combinations using the option microenvs. Here you can input a two columns file indicating which cell type is in which spatiotemporal microenvironment. CellphoneDB will use this information to define possible pairs of interacting cells (i.e. pairs of clusters co-existing in a microenvironment) ignoring the rest of combinations.