mongodb学习_and_使用mapreduce聚合
一:往数据库里插入数据
先生成一些简单的数据
from pymongoimport MongoClient
from randomimport randint
import datetime
client = MongoClient('localhost',27017)
db = client.get_database('229_taobao')
order = db.order_info
status = ['A','B','C']
cust_id = ['A123','B123','C123']
price = [500,200,250,300]
sku = ['mmm','nnn']
for iin range(1,100):
items = []
item_count =randint(2,6)
for nin range(item_count):
items.append({"sku":sku[randint(0,1)],"qty":randint(1,10),"price":randint(0,5)})
new = {
"status":status[randint(0,2)],
"cust_id":cust_id[randint(0,2)],
"price":price[randint(0,3)],
"ord_date":datetime.datetime.utcnow(),
"items":items
}
print new
order.insert_one(new)
print i
print order.estimated_document_count()
二:进入数据库进行mapreduce操作
查看数据类型
(1) 查询每个cust_id 的所有price总和
map函数
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
emit函数是把数据按照cust_id和price进行分组,可以将参数传给reduce函数
reduce函数
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
执行mapreduce函数并把结果输入map_reduce_example中
db.order_info.mapReduce( mapFunction1, reduceFunction1, { out: "map_reduce_example" } )
查询输出的结果
(2)计算所有items 的平均库存
map函数
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = { count: 1, qty: this.items[idx].qty };
emit(key, value);
}
};
reduce函数
var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };
for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.qty += countObjVals[idx].qty;
}
return reducedVal;
};
finalize函数
var finalizeFunction2 = function (key, reducedVal) {
reducedVal.avg = reducedVal.qty/reducedVal.count;
return reducedVal;
};
执行mapreduce函数并把结果输入map_reduce_example中
db.order_info.mapReduce( mapFunction2, reduceFunction2, { out: { merge: "map_reduce_example" }, finalize: finalizeFunction2 } )
查询输出结果
通过此实验可以看出,mapreduce比aggregate会更灵活