Python练习

11_全球空气监测站点列表

2017-09-25  本文已影响13人  过桥

简述

本节抓取全球空气监测站点列表数据

目标对象

World Meteorological Organization

Country Profile Database

实现逻辑

# WMO站点信息列表
station_name    station_id  index_nbr   latitude    longitude   obs_rems
https://www.wmo.int/cpdb/volume_a_observing_stations/list_stations?sEcho=2&iColumns=6&sColumns=station_name%2Cstation_id%2Cindex_nbr%2Clatitude%2Clongitude%2Cobs_rems&iDisplayStart=25&iDisplayLength=25&mDataProp_0=0&sSearch_0=&bRegex_0=false&bSearchable_0=true&bSortable_0=true&mDataProp_1=1&sSearch_1=&bRegex_1=false&bSearchable_1=true&bSortable_1=true&mDataProp_2=2&sSearch_2=&bRegex_2=false&bSearchable_2=true&bSortable_2=true&mDataProp_3=3&sSearch_3=&bRegex_3=false&bSearchable_3=true&bSortable_3=true&mDataProp_4=4&sSearch_4=&bRegex_4=false&bSearchable_4=true&bSortable_4=true&mDataProp_5=5&sSearch_5=&bRegex_5=false&bSearchable_5=true&bSortable_5=true&sSearch=&bRegex=false&iSortCol_0=0&sSortDir_0=asc&iSortingCols=1&_=1506328442807
# iDisplayStart,开始页数
# iDisplayLength,每页显示行数
# _, 时间戳

单独请求服务,或页面中打开URL,系统自动跳转首页或返回首页数据

自动跳转首页

实现代码

引用包
import requests #数据抓取
import time, os 
import datetime
from MSSql_SqlHelp import MSSQL 
import json
检查自动跳转原因
Request Headers
def download_page(url):
    try:
        return requests.get(url, cookies=cookies,headers={
                'X-Requested-With':'XMLHttpRequest',
                'Accept':'application/json, text/javascript, */*; q=0.01',
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36'
                }, timeout=120).json()
    except Exception as e:
        print("download_page抓取异常:" + url)
        time.sleep(30) #延迟N秒再抓取
        main()

总结

除需指明'X-Requested-With':'XMLHttpRequest',“告诉”服务为为ajax请求,否则自动跳转首页

源码

spider_www.wmo.int.py
MSSql_SqlHelp.py

上一篇下一篇

猜你喜欢

热点阅读