1-5 使用pandas保存豆瓣短评数据

2018-06-13 本文已影响219人 pnjoe

常用的保存函数

open

pandas (推荐)

csv

numpy

open函数

回顾上节课用Xpath来解析数据代码如下：

import requests
from lxml import etree
url = 'https://book.douban.com/subject/1084336/comments/'
r = requests.get(url).text

s = etree.HTML(r)
file = s.xpath('//div[@class ="comment"]/p/text()')

用open来保存数据演示代码：

with open('pinglun.txt','w',encoding = 'utf-8')as f:
   for i in file:
       print(i)
       f.write(i)

用open函数保存的txt文件

pandas函数

用pandas函数保存数据演示代码

import requests
from lxml import etree

url = 'https://book.douban.com/subject/1084336/comments/'
r = requests.get(url).text

s = etree.HTML(r)
file = s.xpath('//div[@class="comment"]/p/text()')

import pandas as pd
df = pd.DataFrame(file)
df.to_excel('pinglun.xlsx')

用pandas保存的xlsx文件

课后作业：

小王子的短评有5页。完善代码，将5页短评内容爬取下来。并以csv格式保存成文件。
通过打开第2页时，我们发现浏览器的地址发生了变化，在后面多了/hot?p=2,再点开第3页。发现变成了/hot?p=3。那么我们大概知道了通过改变后面的数值来实现翻页的功能。我们来试一下
import requests
from lxml import etree
page = 1
file = []
while page < 6:
    url = 'https://book.douban.com/subject/1084336/comments/hot?p=' + str(page)
    r = requests.get(url).text
    s = etree.HTML(r)
    file += s.xpath('//div[@class="comment"]/p/text()')
    page += 1

import pandas as pd
df = pd.DataFrame(file)

df.to_csv('zuoye.csv',encoding = 'utf-8-sig')

1-5 使用pandas保存豆瓣短评数据

常用的保存函数

open函数

pandas函数

用pandas函数保存数据演示代码

课后作业：

猜你喜欢

热点阅读

1-5 使用pandas保存豆瓣短评数据

常用的保存函数

open函数

pandas函数

用pandas函数保存数据 演示代码

课后作业：

猜你喜欢

热点阅读

用pandas函数保存数据演示代码