Python

Python: 爬取one上的文章,发送到自己的Kindle

2016-09-23  本文已影响365人  bluescorpio

我一直喜欢用kindle看书,最近在爬取网页的时候,忽然冒出一个念头,为什么不能把自己爬到的文章发送到自己的Kindle上去呢,这样自己有时间就可以把这些好文章读一读了。

关于如何配置这个邮箱,网上有很多文章,自己可以搜一下。

这个项目的主要思路是

  1. 使用requests和beautifulsoup把文章爬取到本地
#!usr/bin/env
# -*-coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup as BS
import os
import codecs
import smtplib
import datetime

from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

sub_folder = os.path.join(os.getcwd(), "one")
if not os.path.exists(sub_folder):
    os.mkdir(sub_folder)

one_url = "http://wufazhuce.com/article/"

for i in range(2000):
    url = one_url + str(i + 1)
    r = requests.get(url)
    if r.status_code == 200:
        print url
        soup = BS(r.text, "lxml")
        title = soup.select('div.tab-content > div.one-articulo > h2')
        # print title
        print "标题: ", title[0].get_text().strip()
        # file_name = title[0].get_text().strip() + ".txt"
        file_name = str(i + 1) + ".txt"

        content = soup.select('div.articulo-contenido')

        filename = sub_folder + "/" + file_name
        # print filename
        # print os.path.join(sub_folder, file_name)

        print "start writing content into file", file_name
        f = codecs.open(filename, "a", "utf-8")
        f.write(content[0].get_text())
        f.close()
        print "finish writing into file \n"

  1. 使用smtplib模块依次把文章发送给自己的kindle邮箱,注意,发送最多只能依次发送25个附件
sender = "youremailid@163.com"
sender_password = raw_input("Please input your password: ")
receiver = 'your_kindle_email_id@kindle.cn'
msg = MIMEMultipart('alternative')
msg['Subject'] = "convert" + str(datetime.datetime.now())
msg['From'] = sender
msg['To'] = "Your Kindle" + "<" + receiver + ">"
att = MIMEText(open(os.path.join(sub_folder, file_name), 'rb').read(), 'base64', 'utf-8')
att["Content-Type"] = 'application/octet-stream'
att["Content-Disposition"] = 'attachment; filename="%s"' % file_name
msg.attach(att)
try:
    server = smtplib.SMTP()
    server.connect('smtp.163.com')
    print "start login"
    server.login(sender, sender_password)
    print "start sending email"
    server.sendmail(sender, receiver, msg.as_string())
    server.quit()
    print "success sending email \n"
except Exception, e:
     print str(e)
  1. 下一步计划,继续爬取四大名著的每一章节
上一篇下一篇

猜你喜欢

热点阅读