Python爬虫下载豆瓣某演员的所有图片
2018-05-01 本文已影响45人
幻无名
一:实现代码
import requests
from bs4 import BeautifulSoup
import os
#创建下载目录
def mkdir(path):
path = path.strip()
isExists = os.path.exists(os.path.join("D:\mansongyu\Evan Rachel Wood", path))
if not isExists:
os.makedirs(os.path.join("D:\mtansongyu\Evan Rachel Wood", path))
os.chdir(os.path.join("D:\mtansongyu\Evan Rachel Wood", path))
return True
else:
return False
#进入豆瓣中某位演员的图片所在页面,选择按时间排序,经分析此时的链接,到下一页的时候,只有start的值增加了30,所以主要对start
#的值进行更改,对于不同的演员,更改URL,以及下面的最大页数就可以了,当然下载目录也可自行更改
url1 = 'https://movie.douban.com/celebrity/1035652/photos/?type=C&start='
url2 = '&sortby=time&size=a&subtype=a'
headers = {'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1"}
#30为相册的最大页数,埃文·蕾切尔·伍德 Evan Rachel Wood的图片最大页数为30
for i in range(0, 30):
i = 30*i
url = url1 + str(i) + url2
html = requests.get(url, headers=headers)
print('starting download Evan Rachel Wood ' + str(i))
mkdir('Evan Rachel Wood'+str(i)) #创建文件名
soup = BeautifulSoup(html.text, 'lxml')
img_list = soup.find('ul', class_='poster-col3 clearfix').find_all('img')
for i in img_list:
img_s = i['src']
name = img_s[-9:-4]
img = requests.get(img_s)
f = open(name + '.jpg', 'ab')
f.write(img.content)
f.close()
二:下载效果,以演员埃文·蕾切尔·伍德 Evan Rachel Wood为例
![](https://img.haomeiwen.com/i1696415/f6b88a75a1f7c0de.png)
![](https://img.haomeiwen.com/i1696415/edd772b55a39d6e4.png)
![](https://img.haomeiwen.com/i1696415/a98f4e5de5829b6e.png)
![](https://img.haomeiwen.com/i1696415/124dde970c059ef2.png)