我爱编程

Python爬取Sketch-A-Day网站内容

2018-06-28  本文已影响0人  KURAUDO4869

程序所做的事:

方案实现:

#! python3

import requests, os, bs4

url = 'http://www.sketch-a-day.com/page/1'
os.makedirs('SketchADay', exist_ok=True)
while not url.endswith('#'):
    for i in range(1, 969):
        url = 'http://www.sketch-a-day.com/page/' + str(i)
        print('Downloading page %s...' % url)
        res = requests.get(url)
        try:
            res.raise_for_status()
            soup = bs4.BeautifulSoup(res.text, "html.parser")
            imgElem = soup.select('.content img')
            if imgElem == []:
                print('Could not find sketch image.')
            else:
                imgUrl = imgElem[0].get('src')
                print('Downloading image %s...' % (imgUrl))
                res = requests.get(imgUrl)
                res.raise_for_status()
                print(os.path.join('SketchADay', '%04d_' % i + os.path.basename(imgUrl)))
                imageFile = open(os.path.join('SketchADay', '%04d_' % i + os.path.basename(imgUrl)), 'wb')
                for chunk in res.iter_content(100000):
                    imageFile.write(chunk)
                imageFile.close()
            # prevLink = soup.select('a[class="next"]')[0]
            # url = prevLink.get('href')
        except Exception as e:
            continue        
print('Done.')

环境:python3

上一篇 下一篇

猜你喜欢

热点阅读