爬取it之家新闻
2016-10-16 本文已影响138人
Gaolex
随意转载,注明出处
- requests库 连接网络,处理http协议
- beautifulsoup库 将网页变成结构化数据
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 QIHU 360EE'}
page = requests.get('http://www.ithome.com/',headers=headers)
page.encoding = 'gb2312'
soup = BeautifulSoup(page.text,"html.parser")
news = soup.find_all("span",class_="title")
for new in news:
a=new.a
print(a.string)
结果如图:
爬虫结果