自动下载网页图片

2017-10-13 本文已影响0人 hubert1002

简易版的网页爬虫，寻找网页中的图片链接，通过python完成。直接运行py文件即可，但需要在命令行中运行，不大方便，所以使用了python-script-converter或者pyinstaller 将py转成可执行文件，双击即可运行。

上代码img.py

#encoding:UTF-8
import sys
import urllib
import re
import os
# import urllib2
from bs4 import BeautifulSoup

def getImg(html):
    html = urllib.urlopen(url)
    page = html.read()
    soup = BeautifulSoup(page, "html.parser")
    imglist = soup.find_all('img')  # 发现html中带img标签的数据，输出格式为<img xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx，存入集合
    lenth = len(imglist)  # 计算集合的个数
    path = sys.path[0]
    print(path)
    pathArg = sys.argv[0]
    print(pathArg)
    filePath = os.path.dirname(os.path.realpath(pathArg))
    print(filePath)
    for i in range(lenth):
        try:
            imageUrl = getImageUrl(imglist[i])
            index = i+1
            print('[{0}-{1}]{2}'.format(index, lenth,imageUrl))
            if(len(imageUrl)>0):
                urllib.urlretrieve(imageUrl,filePath+'/'+'%s.jpg' % index)

        except Exception as e:
            print(e)

def getImageUrl(item):
    imageUrl = ""

    if(item.has_attr('src')):
        imageUrl = item.attrs['src']
    elif(item.has_attr('data-src')):
        imageUrl = item.attrs['data-src']
    else:
        print(item)
        for i in item.attrs:
            if i.index('src') > -1:
                imageUrl = item.attrs[i]
                break

    #
    # try:
    #     imageUrl = item.attrs['src']
    # except Exception as e:
    #     print(e)
    # try:
    #     imageUrl = item.attrs['data-src']
    # except Exception as e:
    #     print(e)
    return getRealUrl(imageUrl)


def getRealUrl(url):
    reg = r'http+?'
    imgre = re.compile(reg)
    imglist = re.findall(imgre, url)
    totalSize = len(imglist)
    realUrl = ""
    if(totalSize>0):
        realUrl = url
    return realUrl



if(len(sys.argv)>1):
    url = sys.argv[1]
    print("url = "+sys.argv[1])
    getImg(url)
else:
    url = raw_input("please input url:")
    print(url)
    # url = "https://mp.weixin.qq.com/s/SBM1gq5i7ZfrE4GMBzK6dw"
    getImg(url)

使用方法

新建文件夹
将img.py 拷贝到刚建文件夹
运行命令(xxx为网址)，图片会下载在当前文件夹
python img xxx

py文件转可执行文件

python-script-converter
https://github.com/ZYunH/Python-script-converter/blob/master/Readme-cn.md

psc test.py 2
chmod -x img.command

pyinstaller

pyinstaller -F img.py

自动下载网页图片

猜你喜欢

热点阅读