机器学习入门(六) — 聚类和相似度模型
2018-12-08 本文已影响14人
紫霞等了至尊宝五百年
1 聚类和相似度-文档检索
2 检索感兴趣的文档
文件检索
![](https://img.haomeiwen.com/i4685968/98e7729bddc08533.png)
挑战
![](https://img.haomeiwen.com/i4685968/c4397f2e4f32c6e6.png)
3 用于测量相似度的单词计数表示
![](https://img.haomeiwen.com/i4685968/4092dcb8ed15f252.png)
测量相似度
![](https://img.haomeiwen.com/i4685968/f07afad5eb7eccac.png)
![](https://img.haomeiwen.com/i4685968/216bb1453ba991c2.png)
单词计数的问题-文档长度
![](https://img.haomeiwen.com/i4685968/f7532014d829f3eb.png)
解决方法=归一化
![](https://img.haomeiwen.com/i4685968/14feb2ef5656c557.png)
4 应用TF-IDF对于重要单词进行优先级排序
4.1 单词计数的问题 - 生僻词
![](https://img.haomeiwen.com/i4685968/ce012b4183ec8d6f.png)
文档频度
![](https://img.haomeiwen.com/i4685968/00c751ca17df1db3.png)
关键词
![](https://img.haomeiwen.com/i4685968/50f1be096131083e.png)
5 TF-IDFf文档表示
![](https://img.haomeiwen.com/i4685968/9cb1bc9ae1e62904.png)
6 检索相似的文档
最近邻域搜索
![](https://img.haomeiwen.com/i4685968/3b35dc60cec3fb9c.png)
1- 最邻近
![](https://img.haomeiwen.com/i4685968/922f10585d03f862.png)
k - 最邻近
![](https://img.haomeiwen.com/i4685968/c0eecef97dc233c9.png)
7 文档聚类
根据主题对文档分组
![](https://img.haomeiwen.com/i4685968/b8c88f6bf95c452c.png)
如果一些标签已知会怎样
![](https://img.haomeiwen.com/i4685968/0bf3c3a87a01c9f4.png)
多元分类问题
![](https://img.haomeiwen.com/i4685968/3094c7f5610bda98.png)
8 聚类介绍
聚类
![](https://img.haomeiwen.com/i4685968/b51f6f12f0599ab8.png)
什么定义了集群
![](https://img.haomeiwen.com/i4685968/ce910f71c4b2f5c3.png)
9 k-均值
![](https://img.haomeiwen.com/i4685968/30d2d003509efa70.png)
初始化
![](https://img.haomeiwen.com/i4685968/353d2f7003bbb4b7.png)
![](https://img.haomeiwen.com/i4685968/420d13c1f7fc8807.png)
![](https://img.haomeiwen.com/i4685968/2394c05e09ba058b.png)
10 其他例子
图像搜索
![](https://img.haomeiwen.com/i4685968/4d0ce29c52a11cc9.png)
根据病况来分组病人
![](https://img.haomeiwen.com/i4685968/065a6a156b44731c.png)
癫痫犯者是多种多样的
![](https://img.haomeiwen.com/i4685968/939f3bd8718bf186.png)
Amazon 中的商品
![](https://img.haomeiwen.com/i4685968/a04623696621ce57.png)
组织网页搜索结果
![](https://img.haomeiwen.com/i4685968/fe71eb82bb71f821.png)
发现相邻的邻居
![](https://img.haomeiwen.com/i4685968/e1f42a649261c36d.png)
![](https://img.haomeiwen.com/i4685968/231948bdb1933ff8.png)
11 聚类和相似度总结
![](https://img.haomeiwen.com/i4685968/148d6780126dfbf8.png)
![](https://img.haomeiwen.com/i4685968/4f2b2990ea8c6753.png)