BeautifulSoup4
2020-07-28 本文已影响0人
livein80
1. bs4简介
- BeautifulSoup,一个可以从html或者xml文件中提取数据的网页信息库
- 安装:
pip install lxml pip install bs4
2. bs4使用
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; a
nd their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>
</html>
"""
1 # 获取bs对象
2 bs = BeautifulSoup(html_doc,'lxml')
3 # 打印⽂档内容(把我们的标签更加规范的打印)
4 print(bs.prettify())
5 print(bs.title) # 获取title标签内容 <title>The Dormouse's story</title>
6 print(bs.title.name) # 获取title标签名称 title
7 print(bs.title.string) # title标签⾥⾯的⽂本内容 The Dormouse's story
8 print(bs.p) # 获取p段落