大数据 爬虫Python AI SqlPython小哥哥

利用 Python 全自动下载抖音视频!躲在被子里偷偷的笑!

2019-05-05  本文已影响1人  14e61d025165

为什么写这篇文章,主要也是因为看了网易云课堂的一篇软广。

「用Python在抖音扒了这些高颜值女神后,突然成了人生赢家」,文中简述了一名工程师利用Python+ADB+鹅厂的AI,一晚上关注了一千多个漂亮小姐姐。

充分体现了厂子里的大学生和工人们的不同,这里我不得不说一声×××牛皮...

曾经的我也独自一人在那个诺大的工厂思考人生,思考着我该何去何从。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034090 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

记得那时我也靠着刷抖音,度过那段煎熬的日子。

只不过没有上面那位大佬那么厉害而已,我是人工识别漂亮的小姐姐...

即使现在的我也注册了鹅厂的AI账号,可我还是不会用。

那么就先来点简单的,提前关注好,然后利用Python实现自动化下载街拍视频!!!

/ 01 / Charles

Python学习交流群:1004391443,这里是python学习者聚集地,有大牛答疑,有资源共享!小编也准备了一份python学习资料,有想学习python编程的,或是转行,或是大学生,还有工作中想提升自己能力的,正在学习的小伙伴欢迎加入学习。

用Charles来找视频的API接口,具体操作和之前当当网那个案例一样,不细说。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034102 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

这里通过滑动抖音App,可以获取到视频的请求信息。

通过多次实验,发现链接的后面是会不停的改变,只有链接的前头始终不变,即「http://v1-dy」「http://v6-dy」「http://v9-dy」不变。

所以在写脚本的时候,可以以这些信息做为链接开头。

/ 02 / mitmproxy

利用mitmproxy中的mitmdump组件,对接Python脚本,用Python实现监听后的处理。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034109 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

这里我只是利用脚本获取链接,并没有直接利用脚本下载视频。

因为我是在mitmdump.exe文件所在的文件夹运行脚本,脚本里导入不了requests模块。

不想搞那些烦人的环境变量,所以只获取链接。

然后再去下载视频,视频链接需要去重,可能会有重复的。

Python脚本如下。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def response(flow):
urls = ['http://v1-dy', 'http://v3-dy', 'http://v6-dy', 'http://v9-dy']
# 对url进行筛选,只选取视频的url
for url in urls:
if url in flow.request.url:
print('\n\n抖音视频\n\n')
with open('douyin.csv', 'a+', encoding='utf-8-sig') as f:
f.write(flow.request.url + '\n')
</pre>

/ 03 / Appium

配置抖音的Appium参数。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034126 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

点击蓝色按钮,手机自动启动抖音App!

接下来操作手机,然后点击Appium的刷新键,获取元素定位代码。

通过本次的实践发现Appium有时并不能很好的获取元素的定位,这可能就跟Web端的iframe页面一样。

所以针对找不到的元素,我直接对手机屏幕位置进行点击。

由于大家手机屏幕大小不同,这个参数肯定是会变化的,所以存在弊端,无法通用。

{ 左右滑动切换图片 }

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034130" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034134" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image <input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image> <tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034136" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

大致操作如上图。UP主的主页图漏了,请自行脑补,Python代码如下。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time
import random
from appium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from appium.webdriver.common.touch_action import TouchAction
from selenium.webdriver.support import expected_conditions as EC
def main():
# 设置驱动配置
server = 'http://localhost:4723/wd/hub'
desired_caps = {
'platformName': 'Android',
'deviceName': 'STF_AL00',
'appPackage': 'com.ss.android.ugc.aweme',
'appActivity': '.main.MainActivity',
# 关闭手机软键盘
'unicodeKeyboard': True,
'resetKeyboard': True
}
driver = webdriver.Remote(server, desired_caps)
wait = WebDriverWait(driver, 60)
# 同意用户隐私协议,点击
button_1 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/q6')))
button_1.click()
# 禁止电话权限,点击
button_2 = wait.until(EC.presence_of_element_located((By.ID, 'com.android.packageinstaller:id/permission_deny_button')))
button_2.click()
# 禁止位置权限,点击
button_3 = wait.until(EC.presence_of_element_located((By.ID, 'com.android.packageinstaller:id/permission_deny_button')))
button_3.click()
time.sleep(2)
# 向上滑动,进入抖音视频播放页面
TouchAction(driver).press(x=515, y=1200).move_to(x=515, y=1000).release().perform()
# 这里需要设置一个较长时间的延迟,因为抖音有引导操作和提示,需等待片刻
time.sleep(20)
# 点击抖音"喜欢"处,以此进入登录界面
TouchAction(driver).press(x=950, y=800).release().perform()
# 点击密码登录
button_4 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/afg')))
button_4.click()
# 输入账号
button_5 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/ab_')))
button_5.send_keys('你的账号')
# 输入密码
button_6 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/aes')))
button_6.send_keys('你的密码')
time.sleep(2)
# 因为会跳出软键盘,会遮挡登录按钮,需点击软键盘取消
TouchAction(driver).press(x=980, y=1850).release().perform()
time.sleep(2)
# 点击登录按钮
button_7 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/abb')))
button_7.click()
time.sleep(2)
# 登录成功,进入抖音视频界面,点击下方标题栏 "我"
TouchAction(driver).press(x=990, y=1850).release().perform()
# 进入个人主页,点击关注处
button_8 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/a_7')))
button_8.click()
# 进入关注栏,点击第二个关注
button_9 = wait.until(EC.presence_of_element_located((By.XPATH, ' /hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.view.ViewGroup/android.widget.LinearLayout/android.support.v7.widget.RecyclerView/android.widget.RelativeLayout[2]/android.widget.RelativeLayout[1]')))
button_9.click()
# 进入UP主主页,点击第一个视频
button_10 = wait.until(EC.presence_of_element_located((By.ID, 'com.ss.android.ugc.aweme:id/aqm')))
button_10.click()
# 不断下滑页面,直到底部
while True:
TouchAction(driver).press(x=515, y=1247).move_to(x=515, y=1026).release().perform()
time.sleep(float(random.randint(5, 10)))
if name == 'main':
main()
</pre>

下载视频代码,需要对视频链接去重。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import pandas as pd
import requests
import os
num = 0
dom = []
folder_path = "F:/video/"
os.makedirs(folder_path)
df = pd.read_csv('douyin.csv', header=None, names=["url"])

对链接去重及去除刚进入抖音获取的视频链接

for i in df['url'][2:]:
if i not in dom:
dom.append(i)

下载视频

for j in dom:
url = j
num += 1
response = requests.get(url, stream=True)
filename = str(num) + '.mp4'
with open('F:\video\' + filename, 'ab+') as f:
f.write(response.content)
f.flush()
print(filename + '下载完成')
</pre>

最后成功获取小姐姐们的全部视频...

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1557038034207 ql-align-center" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; text-align: left; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

要是曾经在厂子里的我,那时会这骚操作该多好,哈哈。

其实我想的更多,多数妹子都挺喜欢拍抖音,不过她们应该不会下载这种操作滴。

那么小老弟们的机会就来了,下载下来喜欢的妹子的抖音视频。

然后剪辑出一个「最美瞬间」系列的视频,机会不就来了嘛...

/ 04 / 总结

前些天有小伙伴叫我给公众号改名,取个跟Pyhton有关的名字,更吸引人。

说实话,我不太愿意,目前的我更趋向于做个人的自媒体,写点属于自己的文字。

多点与学习,与生活有关的文字,有趣有料,开心就好!!!

阅读原文,代码都放「GitHub」上头了。

上一篇下一篇

猜你喜欢

热点阅读