Python爬虫：最牛逼的 selenium爬取方式！

2019-03-20 本文已影响11人 1a076099f916

作为一个男人

在最高光的时刻

这是小编准备的python爬虫学习资料，加群：700341555即可免费获取!

Python爬虫：最牛逼的 selenium爬取方式！

就是说出那句

Python爬虫：最牛逼的 selenium爬取方式！

之后

还不会被人打

...

虽然在现实生活中你无法这样

但是在这里

就让你体验一番

那种呼风唤雨的感觉

Python爬虫：最牛逼的 selenium爬取方式！

我们之前在爬取某些网站的时候

使用到了一些 python 的请求库

模拟浏览器的请求

我们需要抓包啥的

能不能不这样

可不可以就写几行代码

让它自己去打开浏览器

自己去请求我们要爬取的网站

自己去模拟我们的一些搜索

等等

反正就是

老子躺着，让它自己动

Python爬虫：最牛逼的 selenium爬取方式！

躺好

让 selenium 满足你的要求

怎么玩呢？

那么接下里就是

学习 python 的正确姿势

Python爬虫：最牛逼的 selenium爬取方式！

什么是 selenium ？

其实它就是一个自动化测试工具，支持各种主流的浏览器

直到遇到了 python

转身一变

Python爬虫：最牛逼的 selenium爬取方式！

selenium 变成了爬虫利器

我们先来安装一下

Python爬虫：最牛逼的 selenium爬取方式！

接着我们还要下载浏览器驱动

小帅b用的是 Chrome 浏览器

所以下载的是 Chrome 驱动

当然你用别的浏览器也阔以

去相应的地方下载就行了

Python爬虫：最牛逼的 selenium爬取方式！

下载完之后

要配置一下环境变量

Python爬虫：最牛逼的 selenium爬取方式！

接着打开 pycharm

撸点代码

Python爬虫：最牛逼的 selenium爬取方式！

运行一下

Python爬虫：最牛逼的 selenium爬取方式！

可以看到

它自己打开了 Chrome 浏览器

访问了百度

搜索了苍老师的照片

Python爬虫：最牛逼的 selenium爬取方式！

这就是 selenium 的魅力

我们来看下我们刚刚写的代码

我们导入了 web 驱动模块

Python爬虫：最牛逼的 selenium爬取方式！

接着我们创建了一个 Chrome 驱动

Python爬虫：最牛逼的 selenium爬取方式！

有了实例之后

相当于我们有了 Chrome 浏览器了

接着使用 get 方法打开百度

Python爬虫：最牛逼的 selenium爬取方式！

打开百度之后

我们获取到输入框

至于怎么获取

等等会讲

获取到输入框之后我们就往里面写入我们要搜索的内容

Python爬虫：最牛逼的 selenium爬取方式！

输入完了之后呢

我们就获取到搜索这个按钮

然后点击

Python爬虫：最牛逼的 selenium爬取方式！

就这样完成了一次自动的百度搜索

Python爬虫：最牛逼的 selenium爬取方式！

当我们使用驱动打开了一个页面

这时候其实没什么鸟用

因为我们要对那些元素进行操作

就像刚刚我们要获取输入框然后输入一些内容

还有获取按钮点击什么的

selenium 提供了挺多方法给我们获取的

当我们要在页面中获取一个元素的时候

可以使用这些方法

find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

想要在页面获取多个元素呢

就可以这样

find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

比如我们打开了一个页面

是这样的 HTML

Python爬虫：最牛逼的 selenium爬取方式！

可以通过 id 获取 form 表单

Python爬虫：最牛逼的 selenium爬取方式！

通过 name 获取相应的输入框

Python爬虫：最牛逼的 selenium爬取方式！

通过 xpath 获取表单

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">login_form = driver.find_element_by_xpath("/html/body/form[1]")
login_form = driver.find_element_by_xpath("//form[1]")
login_form = driver.find_element_by_xpath("//form[@id='loginForm']")
</pre>

通过标签获取相应的输入框