python之BeautifulSoup模块

2020-04-18  本文已影响0人  DarknessShadow
BeautifulSoup模块
import requests
import bs4

url = 'https://www.lagou.com/'
res = requests.get(url)
res.raise_for_status()
no = bs4.BeautifulSoup(res.text)
print(type(no))

bs4.BeautifulSoup('Html文件中的内容的字符串'):获取一个BeautifulSoup对象
上面的代码直接运行会有警告:

D:/JavaSoft/pycharm-professional-2019.3/WorkSpace/python_learning/python_base/webcrawle/webcrawle_demo3.py:11: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 11 of the file D:/JavaSoft/pycharm-professional-2019.3/WorkSpace/python_learning/python_base/webcrawle/webcrawle_demo3.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

谷歌翻译一下警告:没有显式指定解析器,所以我使用这个系统中可用的最佳HTML解析器(“lxml”)。这通常不是问题,但是如果您在另一个系统上或在不同的虚拟环境中运行这段代码,它可能会使用不同的解析器,并且行为也会有所不同。
总结:总的来说就是缺少一个html解析器,然后在程序中安装这个lxml模块,然后在初始化的时候把这个变量添加上去就可以解决了

import requests
import bs4

url = 'https://www.lagou.com/'
res = requests.get(url)
res.raise_for_status()
no = bs4.BeautifulSoup(res.text, 'lxml')
print(type(no))
import requests
import bs4

# 从拉钩网把数据下载下来然后存储在本地的文件中(二进制存储)
# url = 'https://www.lagou.com/'
# res = requests.get(url)
# res.raise_for_status()
# with open('lagou.txt', 'wb') as op:
#     for line in res.iter_content(1000):
#         op.write(line)
file = open('lagou.txt', 'r', encoding='utf-8')
soup = bs4.BeautifulSoup(file, 'lxml')
print(type(soup))
elems = soup.select('#search_input')
print(elems)
print(type(elems))
print(len(elems))
print(elems[0])
print(type(elems[0]))
print(elems[0].getText())
print(elems[0].attrs)
print(elems[0].get('placeholder'))

代码执行之后的结果

<class 'bs4.BeautifulSoup'>
[<input autocomplete="off" class="search_input" id="search_input" maxlength="64" placeholder="搜索职位、公司或地点" tabindex="1" type="text" value=""/>]
<class 'bs4.element.ResultSet'>
1
<input autocomplete="off" class="search_input" id="search_input" maxlength="64" placeholder="搜索职位、公司或地点" tabindex="1" type="text" value=""/>
<class 'bs4.element.Tag'>

{'maxlength': '64', 'placeholder': '搜索职位、公司或地点', 'type': 'text', 'id': 'search_input', 'class': ['search_input'], 'autocomplete': 'off', 'tabindex': '1', 'value': ''}
搜索职位、公司或地点

上一篇 下一篇

猜你喜欢

热点阅读