麻瓜编程·python实战·1-2作业:爬取商品信息

2016-08-09  本文已影响0人  bbjoe

我的结果:

Paste_Image.png

我的代码:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

#建列表稍后整合评分星级
rates = []

# 解析网页后获取图片(image_urls)、价格(product_prices)、
# 标题(product_titles)、评论数(comment_numbs)、星级(product_rates)
with open('/Users/Administrator/Desktop/PycharmProjects/OReillyWebScraping/小白\html/1_2answer_of_homework/index.html', 'r') \
        as html_data:
    soup = BeautifulSoup(html_data, 'lxml')
    image_urls = soup.select \
        ('body > div > div > div.col-md-9 > div:nth-of-type(2) > div > div > img')
    product_prices = soup.select \
        ('body > div > div > div.col-md-9 > div:nth-of-type(2) > div > div > div.caption > h4.pull-right')
    product_titles = soup.select \
        ('body > div > div > div.col-md-9 > div:nth-of-type(2) > div > div > div.caption > h4:nth-of-type(2) > a')
    comment_numbs = soup.select \
        ('body > div > div > div.col-md-9 > div:nth-of-type(2) > div > div > div.ratings > p.pull-right')
# 经分析商品和其星级存在一对多的关系,所以这里获取的是父级
    product_rates = soup.find_all('div', class_='ratings')  

# 把星级单独处理,最后的结果收纳到rates[]列表中
for i in product_rates:
    star = str(i).count('star')
    empty = str(i).count('empty')
    rates.append(star - empty)

# 这里用zip()函数做一个词典作为结果
for image, price, title, comment, rate in \
        zip (image_urls, product_prices, product_titles, comment_numbs, rates):

    data = {
        'image' : image.get('src'),
        'price' : price.get_text(),
        'title' : title.get_text(),
        'comment' : comment.get_text().replace(' reviews', ''),
        'rate' : rate
    }
    print(data)

我的感受:

上一篇 下一篇

猜你喜欢

热点阅读