Python 批量下载 网站资源pptx

2023-10-15  本文已影响0人  火卫控

Python 批量下载 网站资源pptx
sci常用矢量图 模型 pptx文件

下载文件如下:


ai_qiangyun@DESKTOP-727JVLV:/mnt/d/Coding/python_gzlab_docu/pachong_gz_vs/scipptx$ tree -a
.
├── Animals.pptx
├── Arteries_atherothrombosis.pptx
├── Arteries_pathophysiology.pptx
├── Arteries_physiology.pptx
├── Bacteriology_virology.pptx
├── Blood_immunology.pptx
├── Bone_fractures.pptx
├── Bone_structure.pptx
├── Bones.pptx
├── Cell_membrane.pptx
├── Chemistry.pptx
├── Dermatology.pptx
├── Diabetes.pptx
├── Dietetics.pptx
├── Digestive_system.pptx
├── Drugs.pptx
├── ENT.pptx
├── Embryology.pptx
├── Endocrinology.pptx
├── General-items.pptx
├── Genetics.pptx
├── Heart_pathophysiology.pptx
├── Heart_physiology.pptx
├── Intracellular_components.pptx
├── Lab_apparatus.pptx
├── Lipids.pptx
├── Lymphatic_system.pptx
├── Medical_acts.pptx
├── Medical_equipment.pptx
├── Microbiology_cellculture.pptx
├── Muscles.pptx
├── Nervous_system.pptx
├── Neural_cells.pptx
├── Nucleic_acids.pptx
├── Oncology.pptx
├── Ophthalmology.pptx
├── Paraclinical_exams.pptx
├── Parasitology.pptx
├── People.pptx
├── Receptors_channels.pptx
├── Reproduction.pptx
├── Respiratory_system.pptx
├── Risk_Factors.pptx
├── Scientific_graphs.pptx
├── Tissues.pptx
├── Urinary_system.pptx
├── Veins.pptx
├── World_maps.pptx
├── sci2ppt.py
├── scippt.html
└── scipptx.py

0 directories, 51 files

源代码如下:

# from bs4 import BeautifulSoup
import requests
import re
import time
import random
import os
from bs4 import BeautifulSoup

# 以该脚本所在目录为工作路径
homePath = os.path.dirname(os.path.abspath(__file__))
os.chdir(homePath)


with open("./scippt.html") as f:
    html = f.read()
# print(html)

soup = BeautifulSoup(html, 'html.parser')
# 获取所有链接中含有.pptx 的标签,包括a 标签
links = soup.find_all(href=re.compile(".pptx"))
print(links)

for link in links:
    href = link.get('href')
    print(href)

    # 下载
    headers ={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
    }

    # rsplit从后切一段出来,且选择最后一段。即从url切一段作为文件名
    name = href.rsplit("/",1)[1]
    pptname = name
    print(pptname)
    # 由于链接url没有主站路径,需要再获取的url前面添加上"https://smart.servier.com/"
    r = requests.get(("https://smart.servier.com/"+ href),headers=headers)

    # 保存在当前路径
    with open(pptname ,mode = "wb") as f:
        f.write(r.content) #图片内容写入文件

    # x = random.randint(1, 4)  # 随机一个大于等于1且小于等于5的整数
    # time.sleep(x) 

运行结果如下:

(base) D:\Coding\python_gzlab_docu>D:/ruanjian/labsoft/anaconda3/python.exe d:/Coding/python_gzlab_docu/pachong_gz_vs/scipptx/sci2ppt.py
[<a href="/wp-content/uploads/2016/10/Cell_membrane.pptx" style="font-size: 16px;">Cell membrane</a>, <a href="/wp-content/uploads/2016/10/Receptors_channels.pptx" style="font-size: 16px;">Receptors and Channels</a>, <a href="/wp-content/uploads/2016/10/Intracellular_components.pptx" style="font-size: 16px;">Intracellular components</a>, <a href="/wp-content/uploads/2016/10/Nucleic_acids.pptx" style="font-size: 16px;">Nucleic acids</a>, <a href="/wp-content/uploads/2016/10/Genetics.pptx" style="font-size: 16px;">Genetics</a>, <a href="/wp-content/uploads/2016/10/Tissues.pptx" style="font-size: 16px;">Tissues</a>, <a href="/wp-content/uploads/2016/10/Oncology.pptx" style="font-size: 16px;">Oncology</a>, <a href="/wp-content/uploads/2016/10/Heart_physiology.pptx" style="font-size: 16px;">Heart – Physiology</a>, <a href="/wp-content/uploads/2016/10/Heart_pathophysiology.pptx" style="font-size: 16px;">Heart – Pathophysiology</a>, <a href="/wp-content/uploads/2016/10/Blood_immunology.pptx" style="font-size: 16px;">Blood and Immunology</a>, <a href="/wp-content/uploads/2016/10/Arteries_physiology.pptx" style="font-size: 16px;">Arteries – Physiology</a>, <a href="/wp-content/uploads/2016/10/Arteries_atherothrombosis.pptx" style="font-size: 16px;">Arteries – Atherothrombosis</a>, <a href="/wp-content/uploads/2016/10/Arteries_pathophysiology.pptx" style="font-size: 16px;">Arteries – Pathophysiology</a>, <a href="/wp-content/uploads/2016/10/Veins.pptx" style="font-size: 16px;">Veins</a>, <a href="/wp-content/uploads/2016/10/Lymphatic_system.pptx" style="font-size: 16px;">Lymphatic system</a>, <a href="/wp-content/uploads/2016/10/Urinary_system.pptx" style="font-size: 16px;">Urinary system</a>, <a href="/wp-content/uploads/2016/10/Reproduction.pptx" style="font-size: 16px;">Reproduction</a>, <a href="/wp-content/uploads/2016/10/Embryology.pptx" style="font-size: 16px;">Embryology</a>, <a href="/wp-content/uploads/2016/10/Endocrinology.pptx" style="font-size: 16px;">Endocrinology</a>, <a href="/wp-content/uploads/2016/10/Diabetes.pptx" style="font-size: 16px;">Diabetes</a>, <a href="/wp-content/uploads/2016/10/Nervous_system.pptx" style="font-size: 16px;">Nervous system</a>, <a href="/wp-content/uploads/2016/10/Neural_cells.pptx" style="font-size: 16px;">Neural cells</a>, <a href="/wp-content/uploads/2016/10/Bones.pptx" style="font-size: 16px;">Skeletons and Bones</a>, 
<a href="/wp-content/uploads/2016/10/Bone_structure.pptx" style="font-size: 16px;">Bone structure</a>, <a href="/wp-content/uploads/2016/10/Bone_fractures.pptx" style="font-size: 16px;">Fractures</a>, <a href="/wp-content/uploads/2016/10/Bacteriology_virology.pptx" style="font-size: 16px;">Bacteriology and virology</a>, <a href="/wp-content/uploads/2016/10/Parasitology.pptx" style="font-size: 16px;">Parasitology</a>, <a href="/wp-content/uploads/2016/10/Digestive_system.pptx" style="font-size: 16px;">Digestive system</a>, <a href="/wp-content/uploads/2016/10/Respiratory_system.pptx" style="font-size: 16px;">Respiratory system</a>, <a href="/wp-content/uploads/2016/10/ENT.pptx" style="font-size: 16px;">ENT</a>, 
<a href="/wp-content/uploads/2016/10/Muscles.pptx" style="font-size: 16px;">Muscles</a>, <a href="/wp-content/uploads/2016/10/Ophthalmology.pptx" style="font-size: 16px;">Ophthalmology</a>, <a href="/wp-content/uploads/2016/10/Dermatology.pptx" style="font-size: 16px;">Dermatology</a>, <a href="/wp-content/uploads/2016/10/Risk_Factors.pptx" style="font-size: 16px;">Risk Factors</a>, <a href="/wp-content/uploads/2016/10/Lipids.pptx" style="font-size: 16px;">Lipids</a>, <a href="/wp-content/uploads/2016/10/Dietetics.pptx" style="font-size: 16px;">Dietetics</a>, <a href="/wp-content/uploads/2016/10/Medical_equipment.pptx" style="font-size: 16px;">Medical equipment</a>, <a href="/wp-content/uploads/2016/10/Medical_acts.pptx" style="font-size: 16px;">Medical acts</a>, <a href="/wp-content/uploads/2016/10/Paraclinical_exams.pptx" style="font-size: 16px;">Paraclinical Exams</a>, <a href="/wp-content/uploads/2016/10/Drugs.pptx" style="font-size: 16px;">Drugs</a>, <a href="/wp-content/uploads/2016/10/Microbiology_cellculture.pptx" style="font-size: 16px;">Cell culture and 
microbiology</a>, <a href="/wp-content/uploads/2016/10/Chemistry.pptx" style="font-size: 16px;">Chemistry</a>, <a href="/wp-content/uploads/2016/10/Lab_apparatus.pptx" style="font-size: 16px;">Lab apparatus</a>, <a href="/wp-content/uploads/2016/10/People.pptx" style="font-size: 16px;">People</a>, <a href="/wp-content/uploads/2016/10/World_maps.pptx" style="font-size: 16px;">World maps</a>, <a href="/wp-content/uploads/2016/10/Animals.pptx" style="font-size: 16px;">Animals</a>, <a href="/wp-content/uploads/2016/10/Scientific_graphs.pptx" style="font-size: 16px;">Scientific graphs</a>, <a href="/wp-content/uploads/2016/10/General-items.pptx" style="font-size: 16px;">General Items</a>]
/wp-content/uploads/2016/10/Cell_membrane.pptx
Cell_membrane.pptx
/wp-content/uploads/2016/10/Receptors_channels.pptx
Receptors_channels.pptx
/wp-content/uploads/2016/10/Intracellular_components.pptx
Intracellular_components.pptx
/wp-content/uploads/2016/10/Nucleic_acids.pptx
Nucleic_acids.pptx
/wp-content/uploads/2016/10/Genetics.pptx
Genetics.pptx
/wp-content/uploads/2016/10/Tissues.pptx
Tissues.pptx
/wp-content/uploads/2016/10/Oncology.pptx
Oncology.pptx
/wp-content/uploads/2016/10/Heart_physiology.pptx
Heart_physiology.pptx
/wp-content/uploads/2016/10/Heart_pathophysiology.pptx
Heart_pathophysiology.pptx
/wp-content/uploads/2016/10/Blood_immunology.pptx
Blood_immunology.pptx
/wp-content/uploads/2016/10/Arteries_physiology.pptx
Arteries_physiology.pptx
/wp-content/uploads/2016/10/Arteries_atherothrombosis.pptx
Arteries_atherothrombosis.pptx
/wp-content/uploads/2016/10/Arteries_pathophysiology.pptx
Arteries_pathophysiology.pptx
/wp-content/uploads/2016/10/Veins.pptx
Veins.pptx
/wp-content/uploads/2016/10/Lymphatic_system.pptx
Lymphatic_system.pptx
/wp-content/uploads/2016/10/Urinary_system.pptx
Urinary_system.pptx
/wp-content/uploads/2016/10/Reproduction.pptx
Reproduction.pptx
/wp-content/uploads/2016/10/Embryology.pptx
Embryology.pptx
/wp-content/uploads/2016/10/Endocrinology.pptx
Endocrinology.pptx
/wp-content/uploads/2016/10/Diabetes.pptx
Diabetes.pptx
/wp-content/uploads/2016/10/Nervous_system.pptx
Nervous_system.pptx
/wp-content/uploads/2016/10/Neural_cells.pptx
Neural_cells.pptx
/wp-content/uploads/2016/10/Bones.pptx
Bones.pptx
/wp-content/uploads/2016/10/Bone_structure.pptx
Bone_structure.pptx
/wp-content/uploads/2016/10/Bone_fractures.pptx
Bone_fractures.pptx
/wp-content/uploads/2016/10/Bacteriology_virology.pptx
Bacteriology_virology.pptx
/wp-content/uploads/2016/10/Parasitology.pptx
Parasitology.pptx
/wp-content/uploads/2016/10/Digestive_system.pptx
Digestive_system.pptx
/wp-content/uploads/2016/10/Respiratory_system.pptx
Respiratory_system.pptx
/wp-content/uploads/2016/10/ENT.pptx
ENT.pptx
/wp-content/uploads/2016/10/Muscles.pptx
Muscles.pptx
/wp-content/uploads/2016/10/Ophthalmology.pptx
Ophthalmology.pptx
/wp-content/uploads/2016/10/Dermatology.pptx
Dermatology.pptx
/wp-content/uploads/2016/10/Risk_Factors.pptx
Risk_Factors.pptx
/wp-content/uploads/2016/10/Lipids.pptx
Lipids.pptx
/wp-content/uploads/2016/10/Dietetics.pptx
Dietetics.pptx
/wp-content/uploads/2016/10/Medical_equipment.pptx
Medical_equipment.pptx
/wp-content/uploads/2016/10/Medical_acts.pptx
Medical_acts.pptx
/wp-content/uploads/2016/10/Paraclinical_exams.pptx
Paraclinical_exams.pptx
/wp-content/uploads/2016/10/Drugs.pptx
Drugs.pptx
/wp-content/uploads/2016/10/Microbiology_cellculture.pptx
Microbiology_cellculture.pptx
/wp-content/uploads/2016/10/Chemistry.pptx
Chemistry.pptx
/wp-content/uploads/2016/10/Lab_apparatus.pptx
Lab_apparatus.pptx
/wp-content/uploads/2016/10/People.pptx
People.pptx
/wp-content/uploads/2016/10/World_maps.pptx
World_maps.pptx
/wp-content/uploads/2016/10/Animals.pptx
Animals.pptx
/wp-content/uploads/2016/10/Scientific_graphs.pptx
Scientific_graphs.pptx
/wp-content/uploads/2016/10/General-items.pptx
General-items.pptx

参考
ython 字符分割时,只分割最后一个(rsplit的使用)
soup.find_all()用法
Beautiful Soup之find()和find_all()的基本使用

上一篇下一篇

猜你喜欢

热点阅读