Python网络爬虫与信息提取

(二)爬取豆瓣网的书名(BeautifulSoup库)|Pyth

2018-01-16  本文已影响62人  努力奋斗的durian

1.爬取网页的步骤
2.爬取网页的代码1结果显示
3.爬取网页的代码2结果显示
4.代码分析
最近更新:2018-01-16

1.爬取网页的步骤

2.爬取网页的代码1结果显示

2.1爬取网页的代码

#CrawUnivRankingB.py
import requests
from bs4 import BeautifulSoup
url="http://www.douban.com/tag/%E5%B0%8F%E8%AF%B4/?focus=book"
r=requests.get(url)
demo=r.text
soup=BeautifulSoup(demo,"html.parser")
book_div = soup.find(attrs={"id":"book"})
book_a = book_div.find_all(attrs={"class":"title"})
for book in book_a:
    print (book.string)

2.2 代码显示的结果


书名.png

3.爬取网页的代码2结果显示

3.1爬取网页的代码

import requests
from bs4 import BeautifulSoup
import bs4
url="http://www.douban.com/tag/%E5%B0%8F%E8%AF%B4/?focus=book"
r=requests.get(url)
demo=r.text
soup=BeautifulSoup(demo,"html.parser")
for tr in soup.find(attrs={"id":"book"}).children:
    if isinstance(tr,bs4.element.Tag):
        tds=tr(attrs={"class":"title"})
        for book in tds:
            print (book.string)

3.2 代码显示的结果


书名.png

4. 代码分析

4.1这里用BeautifulSoup解析,基本流程:

4.2分析页面源代码


页面源代码.png

4.3代码1说明

4.4代码2说明

上一篇 下一篇

猜你喜欢

热点阅读