Python四期爬虫作业

【Python爬虫】糗百-文字版块

2017-09-14  本文已影响20人  d1b0f55d8efb

**糗百-文字版块
https://www.qiushibaike.com/text/
爬取作者信息(头像/昵称/性别/年龄)
帖子内容,好笑数,评论数

自己爬取的源码

#__author:'cuiwnehao'__
#coding:utf-8
from bs4 import BeautifulSoup
import requests
url='https://www.qiushibaike.com/text/'
req=requests.get(url)
req.encoding='utf-8'
html=req.text
soup=BeautifulSoup(html,'lxml')
infos=soup.find_all('div',class_="article")
#print(len(article))
for info in infos:
    zuozhe=info.h2.text
    #print(zuozhe)
    neirong=info.span.text
    #print(neirong)
    haoxiaoshu=info.find('i').text
    #print(haoxiaoshu)
    pinglunshu = info.find('span',class_='stats-comments').find('i').text
    #print(pinglunshu)

    print(zuozhe)
    print(neirong)
    print(haoxiaoshu)
    print(pinglunshu)
    print("------------------------------------------------------")
上一篇 下一篇

猜你喜欢

热点阅读