1.2爬取商品信息_笔记

2016-11-24 本文已影响0人蜂DAO

最终效果

最终效果.png

我的代码：

from bs4 import BeautifulSoup

# 打开index.html文件
url = open('index.html','r')

#解析index.html文件内容
Soup = BeautifulSoup(url,'lxml')

# 获取index文件中指定标签中的内容
titles = Soup.select('.caption > h4 > a')
images = Soup.select('.thumbnail > img')
prices = Soup.select('.caption > h4.pull-right')
rates = Soup.select('.ratings > p.pull-right')
levels = Soup.select('.ratings > p:nth-of-type(2)' )


for title,image,price,rate,level in zip(titles,images,prices,rates,levels):
    data = {
        "title_con" : title.get_text(),
        "image_con": image.get('src'),
        "price_con": price.get_text(),
        "rate_con": rate.get_text()[:-8],
        "level_con": len(level.find_all("span","glyphicon-star")),
    }
    print(data)

总结：

学会了用BeautifulSoup解析网页
学会了用open()函数打开文件
学会用BeautifulSoup的select函数按标签抓取内容
学会用BeautifulSoup的find_all抓取标签中的所有子标签
学会用data = {}制作字典
学会用len()函数计算字符串长度

1.2爬取商品信息_笔记

最终效果

我的代码：

总结：

猜你喜欢

热点阅读