BeautifulSoup4

2020-07-28 本文已影响0人 livein80

1. bs4简介

BeautifulSoup，一个可以从html或者xml文件中提取数据的网页信息库
安装：
```
  pip install lxml
  pip install bs4
```

2. bs4使用

html_doc = """
<html><head><title>The Dormouse's story</title></head>
  <body>
      <p class="title"><b>The Dormouse's story</b></p>
      <p class="story">Once upon a time there were three little sisters; a
      nd their names were
      <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
      <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>and
      <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;and they lived at the bottom of a well.</p>
      <p class="story">...</p>
  </body>
</html>
"""

1 # 获取bs对象
2 bs = BeautifulSoup(html_doc,'lxml')
3 # 打印⽂档内容(把我们的标签更加规范的打印)
4 print(bs.prettify())
5 print(bs.title) # 获取title标签内容 <title>The Dormouse's story</title>
6 print(bs.title.name) # 获取title标签名称 title
7 print(bs.title.string) # title标签⾥⾯的⽂本内容 The Dormouse's story
8 print(bs.p) # 获取p段落

BeautifulSoup4

1. bs4简介

2. bs4使用

猜你喜欢

热点阅读