Python小爬虫-----比赛积分

2018-10-24 本文已影响0人胡子先生丶

前言：

个人比较喜欢足球运动，最近想学习一下Python，所以从简单的开始，先了解一下最基本的爬虫获取数据，直播8个人经常逛的一个足球论坛，顺便爬取一下数据。

目的：

了解爬虫获取数据的基本方法；
了解Json字符串的转换和读取，Json的转换在日常的代码编写都是很常用的；

先看看网址请求地址

1540449977(1).jpg

联赛的积分请求地址都是一个，只是传入的参数不同，所以我们不需要去获取该网站的HTML来解析数据，而是请求获取真正数据的请求地址；

代码片段：

需要注意的请求的URL中带有中文字符的需要转换，不然请求会报错
请求的数据连接是直接返回JSON字符串，并没有带HTML元素，没有使用到BeautifulSoup

import urllib.request
import urllib.parse
import json
import time
import threadpool

def getScore(league):
    tab = ’积分榜‘
    word_league = urllib.parse.quote(league)
    word_tab = urllib.parse.quote(tab)
    linlUrl ='https://dc.qiumibao.com/shuju/public/index.php?_url=/data/index&league=%s&tab=%s&year= 
    [year]'%(word_league, word_tab)
    response = urllib.request.urlopen(linlUrl)
    html_data = response.read().decode('utf-8')

   # 将python对象test转换json对象
   #data = json.dumps(html_data, ensure_ascii=False)
   #print(data)

   # 将json对象转换成python对象
   load = json.loads(html_data)
   print(load["data"])

start_time = time.time()
all_league = ["西甲", '意甲', '英超', '德甲', '法甲', '中超']
task_pool = threadpool.ThreadPool(5)
requests = threadpool.makeRequests(getScore,all_league)
for reqin requests:
    task_pool.putRequest(req)
task_pool.wait()
end = time.time()

print (end - start_time)
start1 = time.time()
for leaguein all_league:
    getScore(league)
print (time.time()-start1)

Python小爬虫-----比赛积分

前言：

目的：

代码片段：

猜你喜欢

热点阅读