mongo聚合（Aggregation）

2019-05-15 本文已影响0人昵称不能全是数字

前言

首先，也是先上官方文档
下面一张图来看pipeline，pipeline是基于数据处理的聚合管道

pipeline
给我的感觉是类似于grep那样，然后把数据一层一层处理，最终得到结果

特例

突然发现，原来count和distinct是aggregation的特例
MongoDB also provides db.collection.estimatedDocumentCount(), db.collection.count() and db.collection.distinct().

Map-Reduce (待续)

还没开始看（因为还没使用过），下面是官方链接
Map-Reduce

本文不介绍基本操作，只对自身遇到的一些操作分享出来
以下内容是一些使用过程中的一些实例，通过实例看如何使用

group 保留部分字段

也就是说在group之后，将这些被group的字段保留下来
https://stackoverflow.com/questions/16662405/mongo-group-query-how-to-keep-fields
上面说了如何保留第一个文档

如果想保留多个文档的话，需要用到$addToSet
pipeline如下

pip = [
            { '$match' : {'$and': con} },
            { '$group': {'_id': '$shortName', 'title': {'$addToSet': '$title'}, 'cnt':{'$sum':1}}},
            { '$sort' : {'cnt': -1 }},
            { '$skip' : skip },
            { '$limit' : limit},
            { '$project': {'_id': 0, 'title': 1}
        ]

保留后结果如下

保留字段

数据中数组拆分

$unwind
原始结构如下，对于每一个文档都包含作者list

"authors" : [ 
        {
            "link" : "https://dl.acm.org/author_page.cfm?id=81503689356",
            "name" : "Romain Rolland"
        }, 
        {
            "link" : "https://dl.acm.org/author_page.cfm?id=81503694216",
            "name" : "Etienne Yvain"
        }, 
        {
            "link" : "https://dl.acm.org/author_page.cfm?id=81381600792",
            "name" : "Olivier Christmann"
        }, 
        {
            "link" : "https://dl.acm.org/author_page.cfm?id=81503693561",
            "name" : "Emilie Loup-Escande"
        }, 
        {
            "link" : "https://dl.acm.org/author_page.cfm?id=81503692845",
            "name" : "Simon Richir"
        }
    ],

为了统计出最多的10个作者，采用aggregation方法如下：

res['authors'] = list(col.aggregate([{'$match': {'$and': con}}, {"$unwind":"$authors"}, {'$group': {'_id': '$authors.name', 'cnt':{'$sum':1}}}, {'$sort':{'cnt':-1}}, {'$limit':10}]))

先将$authors拆分，然后对$authors.name进行group
注：之所以用到了list，是因为aggregate返回的是一个cursor，需要转为list

结果如下：

unwind后统计结果

mongo聚合（Aggregation）

前言

特例

Map-Reduce (待续)

group 保留部分字段

数据中数组拆分

猜你喜欢

热点阅读