1.2爬取商品信息_笔记
2016-11-24 本文已影响0人
蜂DAO
最终效果
最终效果.png我的代码:
from bs4 import BeautifulSoup
# 打开index.html文件
url = open('index.html','r')
#解析index.html文件内容
Soup = BeautifulSoup(url,'lxml')
# 获取index文件中指定标签中的内容
titles = Soup.select('.caption > h4 > a')
images = Soup.select('.thumbnail > img')
prices = Soup.select('.caption > h4.pull-right')
rates = Soup.select('.ratings > p.pull-right')
levels = Soup.select('.ratings > p:nth-of-type(2)' )
for title,image,price,rate,level in zip(titles,images,prices,rates,levels):
data = {
"title_con" : title.get_text(),
"image_con": image.get('src'),
"price_con": price.get_text(),
"rate_con": rate.get_text()[:-8],
"level_con": len(level.find_all("span","glyphicon-star")),
}
print(data)
总结:
- 学会了用BeautifulSoup解析网页
- 学会了用open()函数打开文件
- 学会用BeautifulSoup的select函数按标签抓取内容
- 学会用BeautifulSoup的find_all抓取标签中的所有子标签
- 学会用data = {}制作字典
- 学会用len()函数计算字符串长度