Python-145 LPSN读取科内属的信息table 202

2024-02-24 本文已影响0人 RashidinAbdu

主要是在文章撰写和统计中可能需要读取必要的物种等信息，所以撰写了该脚本，用于读取网页上的table：
事先需要：
1、安装requests,BeautifulSoup4,pandas, 还有可能需要更新pip；
2、复制-黏贴网址即可；

import requests
from bs4 import BeautifulSoup
import pandas as pd

# 指定要爬取的网站链接
url = 'https://lpsn.dsmz.de/family/clostridiaceae'

# 发起网络请求
response = requests.get(url)

if response.status_code == 200:
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')

    # 找到所有表格元素
    tables = soup.find_all('table')

    if tables:
        for i, table in enumerate(tables):
            # 使用pandas的read_html函数读取网页上的表格数据
            df = pd.read_html(str(table))[0]

            # 将表格数据保存为Excel文件
            file_name = f'table_{i+1}.xlsx'
            df.to_excel(file_name, index=False)
            print(f"表格数据已保存为 {file_name}")
    else:
        print("未找到表格元素")
else:
    print("无法访问网页")

会得到这个页面的所有table，但是能用到的是第二个！
使用时，只需要对网址进行修订！该脚本会逐个输出对应的table，所以网页有10个table的话，就会输出table1到10。

image.png
image.png
image.png

Python-145 LPSN读取科内属的信息table 202

猜你喜欢

热点阅读