华数TV视频的地址解析下载
2016-11-01 本文已影响1557人
Maslino
以华数TV播放页地址 http://www.wasu.cn/Play/show/id/7882670 为例,说明如何得到视频的真实地址。
打开浏览器的开发者工具查看加载播放页面时的网络请求,经分析发现,从页面加载到视频开始播放,依次出现如下相关请求:
http://www.wasu.cn/Api/getPlayInfoById/id/7882670/datatype/xml
http://apiontime.wasu.cn/Auth/getVideoUrl?id=7882670&key=11ac882a1f434800cf661ae5dbd81ca4&url=aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ=
http://vodpc-al.wasu.cn/pcsan12/mams/vod/201610/27/17/2016102717191609497456d16.mp4?auth_key=97f712597251633ab91f611e75b058ff-1477935556-f8dc297b46735af55223e73d3e3af535-&vid=7882670&cid=4&start=3165&end=3170&version=P2PPlayer_V.4.1.0
从字面来看,第1个接口请求用来获取播放信息,第2个接口请求用来获取视频地址,第3个请求就是视频的真实地址了。
获取播放信息
播放信息获取接口 http://www.wasu.cn/Api/getPlayInfoById/id/7882670/datatype/xml 中的7882670即为视频ID,在视频播放页地址中可以提取到。
请求该接口,我们发现返回如下有用信息:
<mp4>
<hd1>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ=</hd1>
<hd4>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MDkvMTgvMDcvMjAxNjA5MTgwNzA0MjQ0NDJjMzU2OGJmMV9mN2Q2YjNhOC5tcDQ=</hd4>
<hd3>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzIzMTY1MTEzODRlNzliNy5tcDQ=</hd3>
<hd2>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzIwMDUzNDFjNDI4ZWRhOS5tcDQ=</hd2>
<hd0>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzI1NTk5ODhlOGZlMmFlZi5tcDQ=</hd0>
</mp4>
hd0~hd4为视频的清晰度,每个加密的字符串是什么含义,目前我们还无法得出。
获取视频地址
接下来看看获取视频地址的接口是如何构造的。
- 参数id为视频ID
- 参数key尚不知道从哪里来
- 参数url就是第1个接口返回信息中某个清晰度对应的加密字符串
现在的问题是参数key从哪里得到。实际上,我们可以在播放页源码中找到key
_playKey = '11ac882a1f434800cf661ae5dbd81ca4'
OK,第2个接口的参数搞定了。我们看看这个接口返回的数据是什么样的:
<?xml version="1.0" encoding="utf-8"?>
<root>
<id></id>
<title></title>
<video>
<![CDATA[8ec7ZEZEWDowIRsyTmMjGXQKUiQYai5SPn8VCUZ+ciYNEDdrejlBEQMud3EBJG8Caz5mOylbWFcTXhQOZCUJbi5Ybj9kSy1BHkgdHCkzRRMYMwomAVQAW0UyXFlzdFF0UxBPB0gbCRZTUhYrfi9fNgYgTypyCGhvFn0VXB40IWdla0dPdwEreFUlL2J+CUwhVRR9GQhQbT1cWX0lN0p4CDdoODEwO2sDTFx2AT1qJzJkHl0POlsjfRkSEGsQMDE0D3smYU1fVw==]]>
</video>
<page></page>
</root>
是个XML格式内容,video标签包含的应该就是视频的地址,只不过是个加密后的地址,就看如何解密了。
在浏览器开发者工具中,可以看到第2个请求是Flash播放器发出的。很有可能Flash播放器中对加密视频地址作了解密。经过反编译华数TV的Flash播放器文件WsPlayer.swf,找到了相关解密方法,翻译成Python语言如下:
def url_decode(param1):
# md5_hex是用来计算md5哈希值的
param2 = md5_hex('wasu!@#48217#$@#1')
loc7 = md5_hex(param2[0:16])
loc8 = md5_hex(param2[16:32])
loc11 = loc7 + md5_hex(loc7 + param1[0:4])
loc12 = len(loc11)
param1 = base64.b64decode(param1[4:])
loc13 = len(param1)
loc14 = []
loc15 = []
loc16 = 0
while loc16 < 128:
loc14.append(loc16)
loc15.append(ord(loc11[loc16 % loc12]) & 255)
loc16 += 1
loc16 = 0
loc17 = 0
loc19 = 0
while loc16 < 128:
loc17 = (loc17 + loc14[loc16] + loc15[loc16]) % 128
loc19 = loc14[loc16]
loc14[loc16] = loc14[loc17]
loc14[loc17] = loc19
loc16 += 1
loc17 = 0
loc16 = 0
loc18 = 0
loc20 = []
while loc16 < loc13:
loc18 = (loc18 + 1) % 128
loc17 = (loc17 + loc14[loc18]) % 128
loc19 = loc14[loc18]
loc14[loc18] = loc14[loc17]
loc14[loc17] = loc19
t = ord(param1[loc16]) & 255 ^ loc14[(loc14[loc18] + loc14[loc17]) % 128]
loc20.append(chr(ord(param1[loc16]) & 255 ^ loc14[(loc14[loc18] + loc14[loc17]) % 128]))
loc16 += 1
return (''.join(loc20))[26:]
至此,大功告成。
Python代码示例
import re
import requests
import base64
import hashlib
from pyquery import PyQuery as pq
def md5_hex(data):
m = hashlib.md5()
m.update(data)
return m.hexdigest()
url = 'http://www.wasu.cn/Play/show/id/7882670'
# get vid
vid = re.search('/id/(\d+)', url).group(1)
# get key
r = requests.get(url)
key = re.search('_playKey\s*=\s*\'(\w+)\'', r.content).group(1)
r = requests.get('http://www.wasu.cn/Api/getPlayInfoById/id/%s/datatype/xml' % vid)
d = pq(r.content)
for definition in ('hd3', 'hd2', 'hd1', 'hd0'):
element = d('mp4')(definition)
r = requests.get('http://apiontime.wasu.cn/Auth/getVideoUrl?id=%s&key=%s&url=%s' % (vid, key, element.text()))
tmp_d = pq(r.content)
encoded_url = tmp_d('video').text()
print definition, url_decode(encoded_url)