爬取it之家新闻

2016-10-16 本文已影响138人 Gaolex

随意转载，注明出处

requests库连接网络，处理http协议
beautifulsoup库将网页变成结构化数据

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 QIHU 360EE'}
page = requests.get('http://www.ithome.com/',headers=headers)
page.encoding = 'gb2312'
soup =  BeautifulSoup(page.text,"html.parser")
news = soup.find_all("span",class_="title")
for new in news:
    a=new.a
    print(a.string)

结果如图：

爬虫结果

爬取it之家新闻

猜你喜欢

热点阅读