FindClusters {Seurat}

2021-08-05 本文已影响0人不学无数YD

参考参考：
Seurat (version 4.0.3)
FindClusters function - RDocumentation

FindClusters: Cluster Determination

Description

Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!
基于共享邻近SNN聚类算法识别细胞簇，首先计算k近邻，构造SNN图。然后优化模块化函数以确定集群。有关算法的完整描述，参考Waltman and van Eck (2013) The European Physical Journal B。感谢Nigel Delaney(evolvedmitry @ github)在Rcpp中对Java模块化优化器代码的重写！

Usage

FindClusters(object, ...)

Default S3 method:

FindClusters(
object,
modularity.fxn = 1,
initial.membership = NULL,
node.sizes = NULL,
resolution = 0.8,
method = "matrix",
algorithm = 1,
n.start = 10,
n.iter = 10,
random.seed = 0,
group.singletons = TRUE,
temp.file.location = NULL,
edge.file.name = NULL,
verbose = TRUE,
...
)

S3 method for class 'Seurat'

FindClusters(
object,
graph.name = NULL,
modularity.fxn = 1,
initial.membership = NULL,
node.sizes = NULL,
resolution = 0.8,
method = "matrix",
algorithm = 1,
n.start = 10,
n.iter = 10,
random.seed = 0,
group.singletons = TRUE,
temp.file.location = NULL,
edge.file.name = NULL,
verbose = TRUE,
...
)

Arguments

object
An object

...
Arguments passed to other methods

modularity.fxn
Modularity function (1 = standard; 2 = alternative).
模块化. fxn
模块化功能(1 =标准；2 =备选方案)。
initial.membership, node.sizes
Parameters to pass to the Python leidenalg function.
要传递给Python leidenalg函数的参数。
resolution
Value of the resolution parameter, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of communities.
分辨率参数（resolution）
如果您想获得更多(更少)的作用域，请使用高于(低于)1.0的值。
设置下游聚类的间隔尺度（granularity），随着数值的增大，cluster数目也随之增多。研究发现设置为0.6-1.2，对于3000细胞的单细胞数据集效果最好。于更大的数据库，理想的分辨率也随之增加。这些cluster储存在object@ident slot中。
method
Method for running leiden (defaults to matrix which is fast for small datasets). Enable method = "igraph" to avoid casting large data to a dense matrix.
方法
运行leiden的方法(默认为matrix，对于小数据集很快)。启用method = "igraph "以避免将大数据转换为密集矩阵。
algorithm
Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.
算法
模块化优化算法(1 =原始Louvain算法；2 =多级细化的Louvain算法；3 = SLM算法；4 =Leiden 算法)。Leiden需要the leidenalg python.
n.start
Number of random starts.
开始
随机开始的数量，默认是10。
n.iter
Maximal number of iterations per random start.
n.iter
每次随机开始的最大迭代次数。
random.seed
Seed of the random number generator.
随机种子
随机数生成器的种子，默认是0。
group.singletons
Group singletons into nearest cluster. If FALSE, assign all singletons to a "singleton" group
单体分组
将单体分组到最近的簇中。如果为假，则将所有单例分配给一个“单例”组
temp.file.location
Directory where intermediate files will be written. Specify the ABSOLUTE path.
临时文件位置
将写入中间文件的目录。指定绝对路径。
edge.file.name
Edge file to use as input for modularity optimizer jar.
用作模块化优化器jar输入的边缘文件
verbose
Print output
打印输出
graph.name
Name of graph to use for the clustering algorithm
用于聚类算法的图形名称

Details

To run Leiden algorithm, you must first install the leidenalg python package (e.g. via pip install leidenalg), see Traag et al (2018).

详细信息

要运行Leiden算法，您必须首先安装leidenalg python包(例如通过pip安装leidenalg)，参见Traag等人(2018)。

Value

Returns a Seurat object where the idents have been updated with new cluster info; latest clustering results will be stored in object metadata under 'seurat_clusters'. Note that 'seurat_clusters' will be overwritten everytime FindClusters is run.

值

返回一个Seurat对象，其中标识已用新的群集信息更新；最新的聚类结果将存储在“seurat _ clusters”下的对象元数据中。请注意，每次运行FindClusters时，“seurat _ clusters”将被覆盖。

cluster ID 为何从 0 开始？是否可以从 1 开始？从 0 开始是因为有什么‘算法’方面特殊的优势么？
默认 clusterID 是从 0 开始，没有特殊的优势，可以使用 RenameIdents 函数重命名clusterID。
鉴定出来的细胞类群用 UMAP 绘图后发现聚类效果很差，大量 cluster 相互交叉，同一类群细胞弥散分布是什么原因？聚类可视化和类群划分不匹配。
首先要确保聚类和 UMAP 用的 dims 参数是一致的。可以适当降低一下 FindClusters 函数的resolution 参数，减少 cluster 数目，看看能不能把相互交叉的 cluster 聚成一个 cluster。还可以尝试 FindClusters 函数中不同的 algorithm 参数，看看聚类效果会不会改进。
请问 finder cluster 只能使用 SNN 进行聚类么？可以有其他选择吗？ / seurat 的聚类方式除了 KNN 外还有其他的选择吗？
Seurat 的聚类方法是基于 SNN 图和 Louvain 或 SLM 算法， FindNeighbors 函数返回的SNN 图是在 KNN 图的基础上得来的，不支持其他方法。（这个不一定准确，具体还是查看最新版的源的代码R/clustering.R · dany/seurat - 码云 - 开源中国 (gitee.com)
）
1.将细胞嵌入到图形结构中，如KNN图形（k-nearest neighbor），绘制相同基因表达模式的细胞，尝试将图形分成高度互相关连的“quasi-cliques”或者“communities”。

2.在PhenoGraph,基于PCA空间的欧几里得距离绘制KNN图，基于邻近区域重叠部分重新定义任意两个细胞的边缘权重。（Jaccard相似性）

3.为了聚类细胞，应用了优化组合技术如Louvain算法（默认）或者SLM,迭代组合细胞。