爬虫1

2020-09-29 本文已影响0人 Rain师兄

我太机智了，一开始不知道怎么翻页查找，python才开始学，突然发现一个很好用的办法

用爬虫爬取笑话的时候，想了很久不知道怎么模拟翻页，也就是只能手动一次换一个网址，很麻烦。

后来发现，网页和网页之间的区别就差了几个数字。

比如

http://xiaohua.zol.com.cn/lengxiaohua/2.html

http://xiaohua.zol.com.cn/lengxiaohua/3.html

只有数字不同

所以我可以把数字分离出来，把网址分成三段，三个字符串

‘’http://xiaohua.zol.com.cn/lengxiaohua/“ + str(number) +".html"

然后用for 循环看看网页有多少次就重复多少次。

就解决了手动换网址的问题，但是还有一些问题。不过至少能把笑话都打出来了。

源代码

import requests

from bs4 import BeautifulSoup as bf

if __name__ =='__main__':

        for t in range(2,20):

              url = 'http://xiaohua.zol.com.cn/lengxiaohua/'+str(t)+'.html'

              headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36'}

html = requests.get(url,headers=headers)

html_text = html.text

soup = bf(html_text,'lxml')

texts = soup.findAll('p')

for i in texts:

  print(i.get_text())

爬虫1

猜你喜欢

热点阅读