urllib.request.urlretrieve进度提示

2016-12-28  本文已影响0人  梦归游子意

python WikiExtractor.py anwiki-20161220-pages-articles-multistream.xml -o extracted
后台运行,并默认记录日志

nohup python WikiExtractor.py anwiki-20161220-pages-articles-multistream.xml -o extracted &  
# nohup   ---no hang up

后台运行,并记录日志到指定文件file(标准输出)

nohup python WikiExtractor.py anwiki-20161220-pages-articles-multistream.xml -o extracted >file &  
chmod -R a+w AA/
# -R 循环

http://www.jianshu.com/p/79579843e579

from xinyilangs import xinyi_langs
from urllib.request import urlretrieve
import os

url = 'https://dumps.wikimedia.org/backup-index.html'
langs = xinyi_langs
file_list = ['https://dumps.wikimedia.org/{}/20161220/{}-20161220-pages-articles-multistream.xml.bz2'.format(lang, lang) for lang in langs]
def cbk(a, b, c):    
  '''回调函数        
    @a: 已经下载的数据块        
    @b: 数据块的大小        
    @c: 远程文件的大小    
  '''    
  per = 100.0 * a * b / c    
  if per > 100:        
    per = 100    
  print('%.1f%% of %.2fM' % (per,c/(1024*1024)))

dir = os.path.join(os.getcwd(), 'xml_bz2')
os.mkdir(dir)
for lang,file in zip(langs,file_list):    
  file_name = os.path.join(dir, '{}.xml.bz2'.format(lang))
上一篇 下一篇

猜你喜欢

热点阅读