1.2爬取商品信息_笔记

2016-11-24  本文已影响0人  蜂DAO

最终效果

最终效果.png

我的代码:

from bs4 import BeautifulSoup

# 打开index.html文件
url = open('index.html','r')

#解析index.html文件内容
Soup = BeautifulSoup(url,'lxml')

# 获取index文件中指定标签中的内容
titles = Soup.select('.caption > h4 > a')
images = Soup.select('.thumbnail > img')
prices = Soup.select('.caption > h4.pull-right')
rates = Soup.select('.ratings > p.pull-right')
levels = Soup.select('.ratings > p:nth-of-type(2)' )


for title,image,price,rate,level in zip(titles,images,prices,rates,levels):
    data = {
        "title_con" : title.get_text(),
        "image_con": image.get('src'),
        "price_con": price.get_text(),
        "rate_con": rate.get_text()[:-8],
        "level_con": len(level.find_all("span","glyphicon-star")),
    }
    print(data)

总结:

上一篇下一篇

猜你喜欢

热点阅读