视频网站地址解析下载

华数TV视频的地址解析下载

2016-11-01  本文已影响1557人  Maslino

以华数TV播放页地址 http://www.wasu.cn/Play/show/id/7882670 为例,说明如何得到视频的真实地址。

打开浏览器的开发者工具查看加载播放页面时的网络请求,经分析发现,从页面加载到视频开始播放,依次出现如下相关请求:

http://www.wasu.cn/Api/getPlayInfoById/id/7882670/datatype/xml
http://apiontime.wasu.cn/Auth/getVideoUrl?id=7882670&key=11ac882a1f434800cf661ae5dbd81ca4&url=aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ=
http://vodpc-al.wasu.cn/pcsan12/mams/vod/201610/27/17/2016102717191609497456d16.mp4?auth_key=97f712597251633ab91f611e75b058ff-1477935556-f8dc297b46735af55223e73d3e3af535-&vid=7882670&cid=4&start=3165&end=3170&version=P2PPlayer_V.4.1.0

从字面来看,第1个接口请求用来获取播放信息,第2个接口请求用来获取视频地址,第3个请求就是视频的真实地址了。

获取播放信息

播放信息获取接口 http://www.wasu.cn/Api/getPlayInfoById/id/7882670/datatype/xml 中的7882670即为视频ID,在视频播放页地址中可以提取到。

请求该接口,我们发现返回如下有用信息:

<mp4>
       <hd1>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ=</hd1>
       <hd4>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MDkvMTgvMDcvMjAxNjA5MTgwNzA0MjQ0NDJjMzU2OGJmMV9mN2Q2YjNhOC5tcDQ=</hd4>
       <hd3>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzIzMTY1MTEzODRlNzliNy5tcDQ=</hd3>
       <hd2>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzIwMDUzNDFjNDI4ZWRhOS5tcDQ=</hd2>
       <hd0>aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzI1NTk5ODhlOGZlMmFlZi5tcDQ=</hd0>
</mp4>

hd0~hd4为视频的清晰度,每个加密的字符串是什么含义,目前我们还无法得出。

获取视频地址

接下来看看获取视频地址的接口是如何构造的。

观察接口地址 http://apiontime.wasu.cn/Auth/getVideoUrl?id=7882670&key=11ac882a1f434800cf661ae5dbd81ca4&url=aHR0cDovL3ZvZHBjLWFsLndhc3UuY24vcGNzYW4xMi9tYW1zL3ZvZC8yMDE2MTAvMjcvMTcvMjAxNjEwMjcxNzE5MTYwOTQ5NzQ1NmQxNi5tcDQ= 发现:

  1. 参数id为视频ID
  2. 参数key尚不知道从哪里来
  3. 参数url就是第1个接口返回信息中某个清晰度对应的加密字符串

现在的问题是参数key从哪里得到。实际上,我们可以在播放页源码中找到key

_playKey = '11ac882a1f434800cf661ae5dbd81ca4'

OK,第2个接口的参数搞定了。我们看看这个接口返回的数据是什么样的:

<?xml version="1.0" encoding="utf-8"?>
<root>
    <id></id>
    <title></title>
    <video>
        <![CDATA[8ec7ZEZEWDowIRsyTmMjGXQKUiQYai5SPn8VCUZ+ciYNEDdrejlBEQMud3EBJG8Caz5mOylbWFcTXhQOZCUJbi5Ybj9kSy1BHkgdHCkzRRMYMwomAVQAW0UyXFlzdFF0UxBPB0gbCRZTUhYrfi9fNgYgTypyCGhvFn0VXB40IWdla0dPdwEreFUlL2J+CUwhVRR9GQhQbT1cWX0lN0p4CDdoODEwO2sDTFx2AT1qJzJkHl0POlsjfRkSEGsQMDE0D3smYU1fVw==]]>
    </video>
    <page></page>
</root>

是个XML格式内容,video标签包含的应该就是视频的地址,只不过是个加密后的地址,就看如何解密了。

在浏览器开发者工具中,可以看到第2个请求是Flash播放器发出的。很有可能Flash播放器中对加密视频地址作了解密。经过反编译华数TV的Flash播放器文件WsPlayer.swf,找到了相关解密方法,翻译成Python语言如下:

def url_decode(param1):
    # md5_hex是用来计算md5哈希值的
    param2 = md5_hex('wasu!@#48217#$@#1')
    loc7 = md5_hex(param2[0:16])
    loc8 = md5_hex(param2[16:32])
    loc11 = loc7 + md5_hex(loc7 + param1[0:4])
    loc12  = len(loc11)
    param1 = base64.b64decode(param1[4:])
    loc13 = len(param1)

    loc14 = []
    loc15 = []
    loc16 = 0
    while loc16 < 128:
        loc14.append(loc16)
        loc15.append(ord(loc11[loc16 % loc12]) & 255)
        loc16 += 1

    loc16 = 0
    loc17 = 0
    loc19 = 0
    while loc16 < 128:
        loc17 = (loc17 + loc14[loc16] + loc15[loc16]) % 128
        loc19 = loc14[loc16]
        loc14[loc16] = loc14[loc17]
        loc14[loc17] = loc19
        loc16 += 1

    loc17 = 0
    loc16 = 0
    loc18 = 0
    loc20 = []
    while loc16 < loc13:
        loc18 = (loc18 + 1) % 128
        loc17 = (loc17 + loc14[loc18]) % 128
        loc19 = loc14[loc18]
        loc14[loc18] = loc14[loc17]
        loc14[loc17] = loc19
        t = ord(param1[loc16]) & 255 ^ loc14[(loc14[loc18] + loc14[loc17]) % 128]
        loc20.append(chr(ord(param1[loc16]) & 255 ^ loc14[(loc14[loc18] + loc14[loc17]) % 128]))
        loc16 += 1

    return (''.join(loc20))[26:]

至此,大功告成。

Python代码示例

import re
import requests
import base64
import hashlib
from pyquery import PyQuery as pq

def md5_hex(data):
    m = hashlib.md5()
    m.update(data)
    return m.hexdigest()

url = 'http://www.wasu.cn/Play/show/id/7882670'
# get vid
vid = re.search('/id/(\d+)', url).group(1)
# get key
r = requests.get(url)
key = re.search('_playKey\s*=\s*\'(\w+)\'', r.content).group(1)

r = requests.get('http://www.wasu.cn/Api/getPlayInfoById/id/%s/datatype/xml' % vid)
d = pq(r.content)
for definition in ('hd3', 'hd2', 'hd1', 'hd0'):
    element = d('mp4')(definition)
    r = requests.get('http://apiontime.wasu.cn/Auth/getVideoUrl?id=%s&key=%s&url=%s' % (vid, key, element.text()))
    tmp_d = pq(r.content)
    encoded_url = tmp_d('video').text()
    print definition, url_decode(encoded_url)
上一篇下一篇

猜你喜欢

热点阅读