Python爬取三甲医院列表
2022-05-09 本文已影响0人
任源_c4d5
网站
url = https://www.yixue.com/
[图片上传失败...(image-bf048c-1652106269632)]
爬取过程
这个过程没有什么好说的,直接上代码
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.yixue.com/%E5%85%A8%E5%9B%BD%E4%B8%89%E7%94%B2%E5%8C%BB%E9%99%A2%E5%90%8D%E5%8D%95#.E5.8C.97.E4.BA.AC.E5.B8.82.E4.B8.89.E7.94.B2.E5.8C.BB.E9.99.A2.E5.90.8D.E5.8D.95"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
Soup = BeautifulSoup(response.text,'lxml')
output = Soup.find(class_="mw-parser-output")
li = output.select('ul>li')
hospital_df = pd.DataFrame()
for i in li:
try:
a = i.b.a
title = a['title']
href = a['href']
link = 'https://www.yixue.com'+href
info = i.ul
address = (info.text).splitlines()[0]
phone = (info.text).splitlines()[1]
info_dict = {'name':title,'link':link,'address':address,'phone':phone}
df = pd.DataFrame([info_dict])
hospital_df = hospital_df.append([df])
except:
pass
print(hospital_df)
hospital_df.to_excel('全国三甲医院1.xlsx')
结果
[图片上传失败...(image-bf20d0-1652106269632)]
地理编码
这个以前说过,也没有什么好说的,直接看效果。
[图片上传失败...(image-ad7ecf-1652106269632)]
我多说一点,这里大家也看到了,这是Baidu09坐标,对于这种尺度来说,我们不需要进行坐标纠偏的,就拿几百米偏移真的没有什么意思。
可视化和分析
自己比较菜吧,我觉得geopandas里面的投影不太好用,所以上了ArcGIS投影。
data:image/s3,"s3://crabby-images/c98dd/c98dd1afc0cb5f039e10cc414a2135ebfc0c7c21" alt=""
data:image/s3,"s3://crabby-images/f5205/f520577ba822c0f61e5f43029f127c03fd6611ae" alt=""