[PED07]Feature Selection for Clustering

2020-06-01


0.1 introduction介绍

0.1.1 Data Clustering 聚类

0.1.2 Feature Selection Models 特征选择

0.1.3 Feature Selection for Clustering 聚类的特征选择


与监督学习的特征选择类似,用于聚类的特征选择也被分类为Filter[15]、Wrapper[55]、Hybrid[19]。 Filter Model Wrapper Mode Hybrid Model

0.2 Feature Selection for Clustering 聚类的特征选择


0.2.1 Algorithms for Generic Data 通用数据算法

能够处理通用数据集的聚类特征选择 Spectral Feature Selection (SPEC)谱特征选择

SPEC[80]既可以监督也可以无监督学习,这里作为<font color=red>Filter模型 无监督 特征选择</font>方法。 Laplacian Score (LS)拉普拉斯分数

如果将SPEC 中<img src="https://img-blog.csdnimg.cn/20190809104238212.png" width="15%" align=center>替换为:
<img src="https://img-blog.csdnimg.cn/20190809104931193.png" width="60%" align=center>

LS在数据大小方面非常有效。与SPEC相似,LS中最耗时的是构造相似矩阵s。该算法的优点是既能处理带标记的数据,又能处理无标记的数据。 Feature Selection for Sparse Clustering稀疏聚类特征选择

[71]用Lasso和L_1范数作为特征选择方法嵌入在聚类过程中。特征选择的数量L使用gap statistics选择,类似于[67]中的选择聚类数量。 Localized Feature Selection Based on Scatter Separability(LFSBSS) 基于离散分离性的局部特征选择 Multi-Cluster Feature Selection (MCFS) Feature Weighting k-means

0.2.2 Algorithms for Text Data Term Frequency (TF) Inverse Document Frequency (IDF) Term Frequency-Inverse Document Frequency (TF-IDF) Chi Square statistic Frequent Term-Based Text Clustering Frequent Term Sequence

0.2.3 Algorithms for Streaming Data Text Stream Clustering Based on Adaptive Feature Selection (TSC-AFS) High-dimensional Projected Stream Clustering (HPStream)

0.2.4 Algorithms for Linked Data Challenges and Opportunities LUFS: An Unsupervised Feature Selection Framework for Linked Data Conclusion and Future Work for Linked Data

0.3 Discussions and Challenges

0.3.1 The Chicken or the Egg Dilemma

0.3.2 Model Selection: K and l

0.3.4 Stability


