python多进程实现MapReduce模型下的文档库词频统计功
2021-12-15 本文已影响0人
Cache_wood
import time
import jieba
from multiprocessing import Process,Manager
def data(path): #data用来提取文档数据,汇总到列表中
lis = []
file = open(path,'r',encoding='gb18030')
txt = file.readline()
n = 1
for txt in file:
if n%6==4:
content = txt.strip('\n')
content = content.replace('<content>','') #除去无用信息
content = content.replace('</content>','')
if len(content)>0:
lis.append(content)
n+=1
file.close()
return lis #所有文档汇总到列表中并返回
def Map(path,lis): #Map函数进行分词并存储到列表
for pa in path:
text_lis = jieba.lcut(pa)
for i in text_lis:
lis.append((i,1))
#print(len(lis))
def Reduce(lis): #Reduce函数将结果汇总到字典中
print('time2 = %f'%(time.time()-start_time)) #测试Map函数总耗时(分词总耗时)
dic = {}
for k,v in lis:
dic[k] = dic.get(k,0)+1
#print(dic)
dic_order=sorted(dic.items(),key=lambda x:x[1],reverse=True) #字典降序排序
with open('data.txt','w',encoding='utf-8') as file:
for k,v in dic_order:
file.write(k+':'+str(v)+'\n') #将结果写入文件
if __name__=='__main__':
start_time = time.time()
path = data('news_sohusite_xml.csv')
print('time1 = %f'%(time.time()-start_time)) #测试提取文档用时
plist = []
m = Manager()
list1 = m.list([])
for i in range(10): #创建进程
p = Process(target=Map,args=(path[10000*i:10000*(i+1)],list1))
plist.append(p)
for p in plist:
p.start() #启动进程
for p in plist:
p.join() #阻滞主进程
Reduce(list1) #当Map进程全部完成之后Reduce进行结果归约
time = time.time() - start_time
print('time3 = %f'%(time)) #测试总用时
print('main')
time1 = 21.858970
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 1.170 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 1.592 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.133 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.086 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.370 seconds.
Prefix dict has been built successfully.
Loading model cost 2.410 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.318 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.303 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.327 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.474 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.355 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.748 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.061 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.551 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 1.838 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 1.865 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.109 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.597 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.558 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.337 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.162 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.297 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.331 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.218 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.180 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.236 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.026 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.266 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.474 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.485 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.093 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.073 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.763 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.388 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.477 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.776 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.645 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.495 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.881 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.701 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.262 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.053 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.275 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.186 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.651 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.337 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.230 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.315 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.445 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.433 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.601 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.796 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.871 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.802 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.878 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.746 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.072 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.766 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.357 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.180 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.232 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.495 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.594 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.441 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.490 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.915 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.805 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.709 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.426 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.483 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.385 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.406 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.399 seconds.
Prefix dict has been built successfully.
Loading model cost 2.903 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.829 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.945 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.468 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.802 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.420 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.627 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.642 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.862 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.633 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.710 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.879 seconds.
Prefix dict has been built successfully.
Loading model cost 3.393 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.213 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.814 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.744 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.379 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.082 seconds.
Prefix dict has been built successfully.
Loading model cost 2.815 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.948 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.503 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.915 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.159 seconds.
Prefix dict has been built successfully.
Loading model cost 2.861 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 3.060 seconds.
Prefix dict has been built successfully.
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 2.896 seconds.
Prefix dict has been built successfully.
Loading model cost 2.229 seconds.
Prefix dict has been built successfully.
time2 = 518.363330
time3 = 531.016773
main
time1是读入文档的时间,time2是形成完整字典的时间,time3是加入读入文件后的总时间。
通过观察线程数目和运行时间的关系,可以发现线程数少的时候运行时间更短。