爬虫我爱编程

selenium模拟登陆百度贴吧和新浪微博

2017-09-20  本文已影响127人  Evtion

selenium和phantomjs是爬取动态获取数据和AJAX的良配。上一篇已经说了phantomjs的安装方法与常见用法。接下来就是安装Selenium和phantomjs实现模拟登陆百度贴吧,并实现签到。
selenium的安装过程对于作者本人有点坎坷,真心感觉纸上得来终觉浅,要得出真知还是实践来撸一遍。

撸一遍
pip install selenium
pip list
安装的模块
from selenium import webdriver
driver=webdriver.PhantomJS()
print(driver.page_source)
driver.quit()

吓屎了
C:\Users\userName>python
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v
900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import selenium
>>> print( selenium.__file__)
D:\anaconda\lib\site-packages\selenium\__init__.py
>>>
添加到path
<html>
 <head></head>
 <body>
  <p id="TANGRAM__PSP_10__userNameWrapper" class="pass-form-item pass-form-item-userName" style="display:"><label for="TANGRAM__PSP_10__userName" id="TANGRAM__PSP_10__userNameLabel" class="pass-label pass-label-userName">手机/邮箱/用户名</label><input id="TANGRAM__PSP_10__userName" type="text" name="userName" class="pass-text-input pass-text-input-userName open" autocomplete="off" value="" placeholder="手机/邮箱/用户名" /><span id="TANGRAM__PSP_10__userName_clearbtn" class="pass-clearbtn pass-clearbtn-userName" style="display: none; visibility: hidden; opacity: 1;"></span><span id="TANGRAM__PSP_10__userNameTip" class="pass-item-tip pass-item-tip-userName" style="display:none"><span id="TANGRAM__PSP_10__userNameTipText" class="pass-item-tiptext pass-item-tiptext-userName"></span></span></p>
  <ul id="TANGRAM__PSP_10__suggestionWrapper" class="pass-suggestion-list" style="display: none; visibility: hidden; opacity: 1;">
   <li class="pass-item-suggsetion" data-select="13071673760" data-type="history">13071673760<a data-delete="13071673760" title="删除该记录"></a></li>
   <li class="pass-item-suggsetion" data-select="13415334317" data-type="history">13415334317<a data-delete="13415334317" title="删除该记录"></a></li>
   <li class="pass-item-suggsetion" data-select="huahaoworkspace@163.com" data-type="history">huahaoworkspace@163.com<a data-delete="huahaoworkspace@163.com" title="删除该记录"></a></li>
  </ul>
  <span class="pass-item-selectbtn pass-item-selectbtn-userName" style="display: none; visibility: hidden; opacity: 1;"></span>
  <p></p>
 </body>
</html>
from selenium import webdriver
from time import sleep
driver=webdriver.PhantomJS()
driver.get("https://tieba.baidu.com/index.html#")
sleep(3)
#模拟点击登录按钮,并且弹出模态窗
driver.find_element_by_xpath("//li[@class='u_login']/div/a").click()
sleep(2)
#清除账号输入框的历史账号
driver.find_element_by_id("TANGRAM__PSP_10__userName").clear()
#填写账号
driver.find_element_by_id("TANGRAM__PSP_10__userName").send_keys("账号")
driver.find_element_by_id("TANGRAM__PSP_10__password").clear()
#填写密码
driver.find_element_by_id("TANGRAM__PSP_10__password").send_keys("密码")
#模拟点击登录按钮
driver.find_element_by_id("TANGRAM__PSP_10__submit").click()
sleep(5)
# div#onekey_sign>a是一键签到的css选择器,其中j_sign_btn是点击一键签到后踏出来的模态窗的按钮css选择器
driver.find_element_by_css_selector("div#onekey_sign>a").click()
driver.find_element_by_css_selector(".j_sign_btn ").click()
sleep(2)
driver.get_screenshot_as_file("C:\\Users\\username\\Desktop\\1.png")
driver.quit()

其实是别的博主给的建议
from selenium import webdriver
from time import sleep
driver=webdriver.PhantomJS()
driver.maximize_window()
driver.get("https://www.weibo.com/login.php")
sleep(3)
driver.find_element_by_id("loginname").clear()
driver.find_element_by_id("loginname").send_keys("xxxx")
driver.find_element_by_xpath("//*[@id='pl_login_form']/div/div[3]/div[2]/div/input").clear()
driver.find_element_by_xpath("//*[@id='pl_login_form']/div/div[3]/div[2]/div/input").send_keys("xxxx")
driver.find_element_by_xpath("//*[@id='pl_login_form']/div/div[3]/div[6]/a").click()
sleep(10)
driver.get_screenshot_as_file("C:\\Users\\Username\\Desktop\\1.png")
driver.quit()
登录后的场景

关于Selenium的相关介绍,下一篇介绍。感觉又挖了一波坑。改进的版本要早做准备了。

正常表情
上一篇下一篇

猜你喜欢

热点阅读