程序员

用Python写一个小的爬虫工具

2017-03-27  本文已影响0人  华子dev

事情是这样的,公司是用的第三方支付工具ping++,工作需要将他们文档中的银行编号和银行名称转成Json字符串放在本地使用。
具体链接在这:https://www.pingxx.com/api#银行编号说明
那么问题来了,对于这个银行编号表格,改怎么弄呢,总不能挨个复制吧。这样效率也太低了,不是一个程序猿的作风。于是我找ios同事要了一份他们解析好的json数据。。。哈哈哈。。就是这么简单粗暴。。要确实要了,但还是要自己弄一下,要善于将生活中可以写代码处理的事用代码去解决。
确定了要做什么事就好办了,首先想到的就是用Python爬虫去解析网页,然后分析数据处理。

组成爬虫的关键模块

$ easy_install beautifulsoup4

或者

pip install beautifulsoup4

下载网页

import urllib.request

response = urllib.request.urlopen("https://www.pingxx.com/api#银行编号说明")
print(response.read())

没错就需要两行代码就可以下载一个网页,其实一行也可以

解析网页

上面下的网页就是一个html的源码,要解析它,BeautifulSoup支持多种方式的解析,获取自己想要的数据。这里就需要去网站看一看源码,我们可以使用Command+option+J 审查元素,来看一下我们需要数据处的源码的一些可用的地方,以方便爬取,下图使我们定位到的源码

Paste_Image.png

查看之后,分析需要的数据在那个div下的table里,我们需要获取table,然后解析table就可以了,但是div不太好直接拿到,但是有个

<h2 id="银行编号说明">银行编号说明</h2>

比较好拿,于是就先取他,再拿他的父节点的然后再取table
代码如下:

import urllib.request
from bs4 import BeautifulSoup

response = urllib.request.urlopen("https://www.pingxx.com/api#银行编号说明")
soup = BeautifulSoup(response, "html.parser")
table = soup.find("h2", id="银行编号说明").parent.find("table").find("tbody")

这样就获取到了table,剩下的就简单了,就是处理table了。先获取所有的tr,再获取每个tr中的td,然后获取需要td中的值进行拼接Json字符串。
完整代码如下:

import urllib.request
from bs4 import BeautifulSoup

response = urllib.request.urlopen("https://www.pingxx.com/api#银行编号说明")
soup = BeautifulSoup(response, "html.parser")
table = soup.find("h2", id="银行编号说明").parent.find("table").find("tbody")

bankJson = "["
for row in table.findAll('tr'):
    if len(row) > 2:
        cells = row.findAll('td')
        bank_code = cells[0].find(text=True)
        bank_name = cells[1].find(text=True)
        bankJson = bankJson + "{" + "\"code\":\"" + bank_code + "\"," + "\"name\":\"" + bank_name + "\"},"

bankJson = bankJson[0:len(bankJson) - 1] + "]"
print(bankJson)

运行代码就获取到打印的结果:

[{"code":"0100","name":"中国邮政储蓄银行"},{"code":"0102","name":"工商银行"},{"code":"0103","name":"农业银行"},{"code":"0104","name":"中国银行"},{"code":"0105","name":"建设银行"},{"code":"0301","name":"交通银行"},{"code":"0302","name":"中信银行"},{"code":"0303","name":"光大银行"},{"code":"0304","name":"华夏银行"},{"code":"0305","name":"民生银行"},{"code":"0306","name":"广发银行"},{"code":"0308","name":"招商银行"},{"code":"0309","name":"兴业银行"},{"code":"0310","name":"浦发银行"},{"code":"0311","name":"恒丰银行"},{"code":"0313","name":"临沂市商业银行"},{"code":"0316","name":"浙商银行"},{"code":"0317","name":"渤海银行"},{"code":"0318","name":"平安银行"},{"code":"0328","name":"新韩银行(中国)"},{"code":"0329","name":"韩亚银行(中国)"},{"code":"0336","name":"企业银行"},{"code":"0401","name":"上海银行"},{"code":"0402","name":"厦门银行"},{"code":"0403","name":"北京银行"},{"code":"0404","name":"烟台市商业银行"},{"code":"0405","name":"福建海峡银行"},{"code":"0406","name":"吉林银行"},{"code":"0408","name":"宁波银行"},{"code":"0412","name":"温州银行"},{"code":"0413","name":"广州银行"},{"code":"0414","name":"汉口银行"},{"code":"0418","name":"洛阳银行"},{"code":"0420","name":"大连银行"},{"code":"0422","name":"河北银行"},{"code":"0423","name":"杭州商业银行"},{"code":"0424","name":"南京银行"},{"code":"0427","name":"乌鲁木齐市商业银行"},{"code":"0428","name":"绍兴银行"},{"code":"0433","name":"葫芦岛市商业银行"},{"code":"0434","name":"天津银行"},{"code":"0435","name":"郑州银行"},{"code":"0436","name":"宁夏银行"},{"code":"0438","name":"齐商银行"},{"code":"0439","name":"锦州银行"},{"code":"0440","name":"徽商银行"},{"code":"0441","name":"重庆银行"},{"code":"0442","name":"哈尔滨银行"},{"code":"0443","name":"贵阳银行"},{"code":"0447","name":"兰州银行"},{"code":"0448","name":"南昌银行"},{"code":"0449","name":"晋商银行"},{"code":"0450","name":"青岛银行"},{"code":"0455","name":"日照市商业银行"},{"code":"0456","name":"鞍山银行"},{"code":"0458","name":"青海银行"},{"code":"0459","name":"台州银行"},{"code":"0461","name":"长沙银行"},{"code":"0463","name":"赣州银行"},{"code":"0465","name":"营口银行"},{"code":"0467","name":"阜新银行"},{"code":"0474","name":"内蒙古银行"},{"code":"0475","name":"湖州市商业银行"},{"code":"0476","name":"沧州银行"},{"code":"0479","name":"包商银行"},{"code":"0481","name":"威海商业银行"},{"code":"0483","name":"攀枝花市商业银行"},{"code":"0485","name":"绵阳市商业银行"},{"code":"0490","name":"张家口市商业银行"},{"code":"0492","name":"龙江银行"},{"code":"0495","name":"柳州银行"},{"code":"0497","name":"莱商银行"},{"code":"0498","name":"德阳银行"},{"code":"0503","name":"晋城银行"},{"code":"0505","name":"东莞商行"},{"code":"0508","name":"江苏银行"},{"code":"0513","name":"承德市商业银行"},{"code":"0515","name":"德州银行"},{"code":"0517","name":"邯郸市商业银行"},{"code":"0525","name":"浙江民泰商业银行"},{"code":"0526","name":"上饶市商业银行"},{"code":"0527","name":"东营银行"},{"code":"0528","name":"泰安市商业银行"},{"code":"0530","name":"浙江稠州商业银行"},{"code":"0534","name":"鄂尔多斯银行"},{"code":"0537","name":"济宁银行"},{"code":"0547","name":"昆仑银行"},{"code":"0554","name":"邢台银行"},{"code":"0556","name":"漯河商行"},{"code":"1401","name":"上海农商银行"},{"code":"1402","name":"昆山农信社"},{"code":"1403","name":"常熟市农村商业银行"},{"code":"1404","name":"深圳农村商业银行"},{"code":"1405","name":"广州农村商业银行"},{"code":"1408","name":"佛山顺德农村商业银行"},{"code":"1409","name":"昆明农村信用社联合社"},{"code":"1410","name":"湖北农信社"},{"code":"1415","name":"东莞农村商业银行"},{"code":"1416","name":"张家港农村商业银行"},{"code":"1417","name":"福建省农村信用社联合社"},{"code":"1418","name":"北京农村商业银行"},{"code":"1419","name":"天津农村商业银行"},{"code":"1420","name":"宁波鄞州农村合作银行"},{"code":"1424","name":"江苏省农村信用社联合社"},{"code":"1428","name":"江苏吴江农村商业银行"},{"code":"1430","name":"苏州银行"},{"code":"1443","name":"广西农村信用社联合社"},{"code":"1446","name":"黄河农村商业银行"},{"code":"1447","name":"安徽省农村信用社联合社"},{"code":"1448","name":"海南省农村信用社联合社"},{"code":"1513","name":"重庆农村商业银行"},{"code":"6462","name":"潍坊市商业银行"},{"code":"6466","name":"富滇银行"},{"code":"6473","name":"浙江泰隆商业银行"},{"code":"6478","name":"广西北部湾银行"},{"code":"6567","name":"商丘商行"}]

好了,问题解决了,说实话第一次写代码去解决实际中的碰到的问题,挺好玩,带着要解决的问题去学习的效果会更好一些。感想就是在想学东西或者想做一些事情的时候,不要先想太多,比如把所有资料都准备齐、哪个人的资料好,怎么才能少走弯路,学不好做不好会怎么样,等等.....然后你的激情就没了,这件事也就不了了之了,这些都不好。我们要的就是想做什么立刻去做,just do it。

上一篇下一篇

猜你喜欢

热点阅读