PythonSNs(2)---Requests库的爬取性能分析

2018-05-16 本文已影响16人 Wayne_Dream

“任意”找个url，测试一下成功爬取100次网页的时间。（某些网站对于连续爬取页面将采取屏蔽IP的策略，所以，要避开这类网站。）

import requests
import time
def getHtmlText(url):
    try:       # try except:用于异常处理
        r = requests.get(url, timeout=30)   # get到网站  timeout=30:如果get时间超过30s,则停止等待响应 
        r.raise_for_status()                        # 检测是否连接成功
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return '运行异常'

if __name__ == "__main__":  # Python 模拟的程序入口
    url = 'https://www.baidu.com'
    totaltime = 0
    for i in range(100):
        starttime = time.perf_counter()
        getHtmlText(url)
        endtime = time.perf_counter()
        print('第{0}次爬取，用时{1:.4f}秒'.format(i+1, endtime-starttime))
        totaltime=totaltime+endtime-starttime
    print('总共用时{:.4f}秒'.format(totaltime))

这是用百度做的测试，有兴趣的可以试试别的网站，小心被封ip哦，特别是某些直播网站，封了的话可能一段时间全寝室都看不了直播了！！！哼哼
if __name__ == "__main__":对这条代码有疑问的可转至：http://blog.konghy.cn/2017/04/24/python-entry-program/

网络爬虫有风险，爬取数据需谨慎

PythonSNs(2)---Requests库的爬取性能分析

猜你喜欢

热点阅读