大数据 爬虫Python AI Sql

人人讲付费视频的破解与下载

2020-04-15  本文已影响0人  杨赟快跑

人人讲是一款教育类的app,里面有大量的学习视频,包括音乐、书法、服装、瑜伽等等。有一部分视频是免费的,但是大部分是付费的。这里,我们要通过抓包分析人人讲的接口,然后破解和下载这些视频。

申明:该教程只做学习使用,爬取的视频为人人讲所有,严禁将爬取的视频用来商业化。

1. 人人讲接口分析

首先,使用人人讲APP,选择感兴趣的视频,将视频的链接复制,在电脑上打开(以下面链接作示范)

http://ke.renrenjiang.cn/#/video?activityId=1147066&su=0

打开后的样子是这样的


charles-ssl-proxying-certificate.png

我们使用charles抓包工具,看看打开页面时发生了哪些请求

image.png

可以看到,有两个请求,如下所示。

#获取视频的详细信息
https://api.renrenjiang.cn/api/v3/activities/1147066/show?include=creator,columns,service
#获取视频所在专栏下的所有视频的详细信息
https://api.renrenjiang.cn/api/v2/columns/20890/activities

这里,我们只需要第二个接口,即获取视频专栏,该请求会返回观看视频所需要的密码。

简要描述:

请求URL:

请求header:

head = {
    "Referer": "http://ke.renrenjiang.cn/",
    "Authorization": "如下所示,需要根据自己抓包结果来获取认证"
}
image.png
参数:
参数名 必选 类型 说明 示例
u int 用户id 1022949
activity_sort string 视频排序方式 ASC或者DESC
page int 如果视频很多,需要分页查询 1

返回示例

{
    "activities": [{
        "id": 1147066,
        "title": "国画技法课——撞水撞粉(第四讲)",
        "status": "结束",
        "video_status": 2,
        "background": "http://image.renrenjiang.cn/uploads/activity/background/1147066/2020_af9598e950780754cdee6956684f9524.jpeg@640w",
        "password": "7939",
        "started_at": 1550883600,
        "charge": true,
        "price": 29.90,
        "reservation_count": 6,
        "reservation": null,
        "user_id": 5011557,
        "creator": {
            "user_id": 5011557,
            "uid": "29269207",
            "nickname": "麦芽老师的艺术课堂",
            "displayname": null,
            "description": "       麦芽老师有着近十年的一线教学经验,所开设课程秉着“艺术美化生活,生活滋养艺术”的课程理念。直播间主要开设课程有儿童趣味水墨画、初级国画、线描、色彩等课程,在这里有专业老师的讲解,课题解答,课后作业辅导。\n      麦芽直播课堂诚邀每一位喜欢画画的朋友一起分享,这里没有年龄界限,只有您对生活、对艺术满满的热爱和期待。老师喜欢与学员交流互动,在轻松愉悦的课堂中,\n感受传统绘画艺术的魅力。\n咨询课程,请扫文末二维码,加微信,老师会耐心解答。麦芽老师的艺术课堂诚邀您随时加入我们!",
            "avatar": "https://image.renrenjiang.cn/uploads/user/avatar_url/5011557/2019_db0d6a4906c039fdc9d9b4b5aea3c880.jpg",
            "background": "https://image.renrenjiang.cn/uploads/user/background/5011557/2019_4f69866d6d9825fc127827cdcfe28098.jpg",
            "channel_name": "无",
            "user_level": 2,
            "proposal_status": 2,
            "fans_count": 26
        },
        "column_id": 20890,
        "column": {
            "column_id": 20890,
            "title": "试听课系列(不定时更新)",
            "price": 20.00,
            "background": "https://image.renrenjiang.cn/uploads/column/background/20890/2019_117d5b509ad52b726bf58089f002dbc4.jpg@640w",
            "activities_count": 5,
            "ctype": 1,
            "max_subscription": 0,
            "subscriptions": 0,
            "activity_allow_buy": true,
            "activity_sort": "DESC"
        },
        "isinvited": false,
        "locked": true,
        "share_url": "https://h5.renrenjiang.cn/#/activity?aid=1147066&su=14134251",
        "description": "课程简介\n本节课衔接上节课程,首先,将花头部分处理完整,莲蓬可以和叶子一起处理。其次,本节课将学习撞水撞粉系列课程荷花叶子的画法,调色调墨技巧,其中将色、墨、水的用法在画面中展现出来。<img src=\"http://image.renrenjiang.cn/uploads/files/2019_0a131051936bd7227843b52a5e8707ab.jpg\"/>本节课适合人群:\n1、零基础国画爱好者;2、少儿美术培训机构教师;3、有绘画基础且能独立上课的小朋友;\n\n如需咨询课程请扫码入群\n<img src=\"http://image.renrenjiang.cn/uploads/files/2019_37d369479071f67b1bd1f2d617431c57.jpg\"/>",
        "popularity": 22,
        "replay": null,
        "reprinted_switch": null,
        "reprint_user_id": null,
        "media_type": null,
        "detail_name": null,
        "detail_nickname": null,
        "rtype": null,
        "wxtype": null,
        "group": null,
        "share_scale": 0.0000,
        "share_amount": 0.00,
        "visible": false,
        "acm_id": null,
        "position": null,
        "task": null,
        "pt_id": null
    },
  ...
  ],
    "total": null
}

利用该接口,我们可以从返回结果中得到视频的id、标题、简介和密码(如果没有的话需要暴力破解,后面再来讨论)。

然后,我们输入密码7939,进入观看视频

image.png

既然可以观看视频了,那么前端必定是获取到了视频的地址了,我们使用Charles抓包分析一下。


image.png image.png image.png image.png

可以看到,从输入密码到获取视频,总共需要4个接口,如下所示。

#验证密码是否正确
https://api.renrenjiang.cn/api/v3/activities/1147066/reservation
#获取视频的m3u8地址
https://api.renrenjiang.cn/api/v3/activities/1147066/stream_url?user_id=14264889&timestamp=1586920041105
#获取m3u8文件
http://video.renrenjiang.cn/record/alilive/2726981393-1550845168.m3u8
#根据m3u8文件,获取一段一段的小视频
http://video.renrenjiang.cn/record/alilive/2726981393/1550841839_1.ts

这里,我就不把每个接口的请求参数和返回数据写出来啦,我们直接上代码。

2. 编写代码

config.py文件

import platform
import requests
import time

def is_window():
    system = platform.system()
    if system == "Windows":
        return True
    else:
        return False

user_id = "根据自己的实际情况填写"
authorization = "根据自己的实际情况填写"

root_path = "F:\\人人讲视频" if is_window() else "/Users/yy/Documents/照片/renrenjiang"

head = {
    "Referer": "http://ke.renrenjiang.cn/",
    "Authorization": authorization
}
session = requests.session()
current_milli_time = lambda: int(round(time.time() * 1000))

util.py文件

import os
import platform
import sys
from config import head


def is_window():
    system = platform.system()
    if system == "Windows":
        return True
    else:
        return False


def download_by_key():
    url = "https://api.renrenjiang.cn/api/v3/activities/{0}/stream_url?user_id={1}&timestamp={2}"
    res = head
    res = res
    os.rmdir("../renrenjiang")
    exit(1)
    if "status" in res.keys() and res["status"] == 2:
        hls_url = res["hls_url"]
        return hls_url
    return None


def show_process(curr, total):
    curr = curr / total * 100
    total = 100
    i = int(curr)
    process = '>' * (i // 2) + ' ' * ((total - i) // 2)
    if curr == total:
        ss = '\r' + process + "{0}%\n".format(i)
    else:
        ss = '\r' + process + "{0}%".format(i)
    sys.stdout.write(ss)
    sys.stdout.flush()


def show_process2(curr, total):
    i = int(curr / total * 100)
    process = '>' * (i // 2) + ' ' * ((100 - i) // 2)
    if curr == total:
        ss = '\r' + process + "[{0}/{1}]\n".format(curr, total)
    else:
        ss = '\r' + process + "[{0}/{1}]".format(curr, total)
    sys.stdout.write(ss)
    sys.stdout.flush()


download.py文件

import json
import os
from m3u8 import m3u8
import util
from config import *


class download:
    def __init__(self, cid):
        self.cid = cid
        self.free_m3u8_url_list = []
        self.is_can_pojie = True
        self.free_videos = []
        self.vip_videos = []

    def _list_video(self):
        """
        列出某个专栏下的所有课程视频
        :param cid: 专栏id
        :return: 视频列表
        """
        video_list = []
        page = 0
        url_format = "https://h5.renrenjiang.cn/api/v2/columns/{0}/activities?u=1052944&activity_sort=ASC&page={1}"
        while True:
            page += 1
            url = url_format.format(self.cid, page)
            res = session.get(url, headers=head)
            res = json.loads(res.content)
            if "activities" in res.keys() and len(res["activities"]) > 0:
                activities = res["activities"]
                for activity in activities:
                    activity_id = activity["id"]
                    title = activity["title"]
                    password = activity["password"]
                    start_at = activity["started_at"]
                    description = activity["creator"]["description"]
                    video_list.append({
                        "id": activity_id,
                        "title": title,
                        "password": password,
                        "start_at": start_at,
                        "description": description
                    })
            else:
                break
        return video_list

    def _get_ts_list(self, index, video):
        """
        获取m3u3文件,并将m3u3中的ts路径解析出来
        :param video: 视频信息
        :return: ts列表
        """
        obj = m3u8(video, index, self.cid)
        hls_url = obj.get_m3u8()
        if hls_url is None:
            return None, None
        res = session.get(hls_url)
        lines = str(res.content).split("\\n")
        ts_list = []
        for i in range(1, len(lines) - 1):
            if lines[i].startswith("#"):
                continue
            ts_list.append(lines[i])
        return hls_url, ts_list

    def _download_by_ts_list(self, video, ts_list, m3u8):
        """
        根据ts文件列表下载视频,并合并
        :param cid: 专栏id
        :param video: 视频信息
        :param ts_list: ts文件列表
        :return: 视频的文件路径
        """
        # 创建专栏文件夹
        path = root_path + os.sep + str(self.cid)
        is_exists = os.path.exists(path)
        if not is_exists:
            os.makedirs(path)

        # 创建专栏下的视频文件夹
        path = path + os.sep + str(video["id"])
        is_exists = os.path.exists(path)
        if not is_exists:
            os.makedirs(path)

        # 根据ts列表下载ts文件
        url_format = m3u8[0: m3u8.rfind("/") + 1] + "{0}"
        curr = 0
        for ts in ts_list:
            curr += 1
            filename = path + os.sep + str(curr).zfill(6) + ".ts"
            is_exists = os.path.exists(filename)
            if is_exists:
                continue
            url = url_format.format(ts)
            res = requests.get(url, headers=head)
            if res.status_code != 200:
                print("下载ts文件失败:{0}".format(url))
                continue
            with open(filename, "wb") as file:
                file.write(res.content)
                file.close()
            util.show_process(curr, len(ts_list))

        # 将ts文件列表进行合并为mp4文件,并删除ts文件
        # 如果是在window下
        if util.is_window():
            exec_str = r'copy /b  "' + path + os.sep + r'*.ts" "' + path + os.sep + '{0}.mp4'.format(video["title"])
            os.system(exec_str)  # 使用cmd命令将资源整合
            exec_str = r'del  "' + path + os.sep + r'*.ts"'
            os.system(exec_str)  # 删除原来的文件
        # 如果在linux或者mac下
        else:
            exec_str = "cat {0}*.ts > {1}{2}.mp4".format(path + os.sep, path + os.sep, video["title"])
            os.system(exec_str)  # 使用cat命令将资源整合
            exec_str = "rm -rf {0}*.ts".format(path + os.sep)
            os.system(exec_str)  # 删除原来的文件
        return path + os.sep + '{0}.mp4'.format(video["title"])

    def _is_downloaded(self, column_id, video):
        """
        判断视频是否已下载,防止重复下载
        :param cid: 专栏id
        :param video: 视频信息
        :return: 是否已下载
        """
        path = root_path + os.sep + str(column_id)
        is_exists = os.path.exists(path)
        if not is_exists:
            return False
        path = path + os.sep + str(video["id"])
        is_exists = os.path.exists(path)
        if not is_exists:
            return False
        path = path + os.sep + '{0}.mp4'.format(video["title"])
        is_exists = os.path.exists(path)
        if not is_exists:
            return False
        return True

    def download(self):
        """
        根据专栏id下载整个专栏对视频
        cid的取值范围在[20002, 49999]之间
        :param cid: 专栏id
        :return: 是否成功
        """
        if not self.before_download():
            return
        count = 0
        for video in self.free_videos:
            count += 1
            if self._is_downloaded(self.cid, video):
                print("第{0}个视频已下载:{1},忽略".format(count, str(video["title"])))
                continue
            m3u8_url, ts_list = self._get_ts_list(count, video)
            while ts_list is None:
                m3u8_url, ts_list = self._get_ts_list(count, video)
            print("下载第{0}个视频:{1}".format(count, str(video["title"])))
            self._download_by_ts_list(video, ts_list, m3u8_url)
        for video in self.vip_videos:
            count += 1
            if self._is_downloaded(self.cid, video):
                print("第{0}个视频已下载:{1},忽略".format(count, str(video["title"])))
                continue
            if self.is_can_pojie:
                m3u8_url, ts_list = self._get_ts_list(count, video)
                if ts_list is None:
                    print("获取视频{0}的ts列表失败".format(video["title"]))
                    continue
                print("下载第{0}个视频:{1}".format(count, str(video["title"])))
                self._download_by_ts_list(video, ts_list, m3u8_url)
            else:
                print("第{0}个视频收费,且不可破解:{1},忽略".format(count, str(video["title"])))

    def before_download(self):
        print("正在检查视频是否可以下载或者破解")
        # 列出所有视频,并将其划分为免费和收费
        res = self._list_video()
        if type(res) == dict:
            print("下载专栏{0}失败,原因:{1}".format(self.cid, res))
            exit(1)
        self._divide_videos(res)
        self._get_is_can_pojie()
        if self.is_can_pojie:
            print("专栏{0}下共有{1}的视频,有{2}个可直接下载,有{3}个需要破解".
                  format(self.cid, len(res), len(self.free_videos), len(self.vip_videos)))
            return True
        else:
            if len(self.free_videos) == 0:
                print("专栏{0}下共有{1}的视频,全部都不可以下载或者破解".format(self.cid, len(res)))
                return False
            else:
                print("专栏{0}下共有{1}的视频,有{2}个可下载,其余不可下载和破解".
                      format(self.cid, len(res), len(self.free_videos), len(self.vip_videos)))
                yes_no = input('是否下载部分视频(y|n):')
                if yes_no == "y" or yes_no == "Y":
                    return True
                else:
                    return False

    def _divide_videos(self, videos):
        count = 0
        for video in videos:
            count += 1
            obj = m3u8(video, count, self.cid)
            obj.pay_for_video()
            m3u8_url = obj.get_m3u8_by_pay()
            if m3u8_url is not None:
                self.free_videos.append(video)
                self.free_m3u8_url_list.append(m3u8_url)
            else:
                self.vip_videos.append(video)

    def _get_is_can_pojie(self):
        if len(self.free_m3u8_url_list) == 0:
            self.is_can_pojie = False
        for u in self.free_m3u8_url_list:
            if u.find("videocdn.renrenjiang.cn") < 0:
                self.is_can_pojie = False

m3u8.py文件

import json
import os
import threading
import math
from time import sleep
import util
from config import *


class m3u8:
    def __init__(self, video, index, cid):
        self.index = index
        self.video = video
        self.cid = cid
        self.vid = video["id"]
        self.start_at = int(str(video["start_at"])[0: 6])
        self.min = 0
        self.max = 10000000
        self.thread_num = 400
        self.step = math.floor((self.max - self.min) / self.thread_num)
        self.threads = []
        self.success = False
        self.result = None
        self.lock = threading.Lock()
        self.try_count = 0
        self.total_count = self.max - self.min

    def _func(self, a, b):
        for pos in range(a, b):
            if self.success:
                return None
            self.try_count += 1
            stk_code = str(pos).zfill(7)
            ss = "{0}_{1}{2}".format(self.vid, self.start_at, stk_code)
            url_ff = "http://videocdn.renrenjiang.cn/Act-ss-m3u8-sd/{0}/{1}.m3u8".format(ss, ss)
            try:
                res = session.get(url_ff, headers=head)
                if res.status_code == 200:
                    self.lock.acquire()
                    self.success = True
                    self.lock.release()
                    self.write_m3u8_to_file(url_ff)
                    return url_ff
            except requests.exceptions.ReadTimeout:
                pos -= 1
            except requests.exceptions.ConnectionError:
                pos -= 1
            except ConnectionResetError:
                pos -= 1

    def get_m3u8_by_force(self):
        start = time.time()
        for i in range(self.thread_num):
            t = threading.Thread(target=self._func, args=(self.min + self.step * i, self.min + self.step * (i + 1)))
            self.threads.append(t)
            t.start()
        while True:
            sleep(1)
            util.show_process2(self.try_count, self.total_count)
            for t in self.threads:
                if not t.is_alive():
                    self.threads.remove(t)
            if len(self.threads) == 0:
                break
        end = time.time()
        print("获取到结果:{0} 总共耗时:{1}s".format(self.result, end - start))
        return self.result

    def pay_for_video(self):
        """
        购买视频
        :return: 是否成功
        """
        url = "https://api.renrenjiang.cn/api/v3/activities/{0}/reservation".format(self.vid)
        res = session.post(url, headers=head, data={
            "type": "password",
            "password": self.video["password"],
            "shareId": 0
        })
        res = json.loads(res.content)
        if "result" in res and res["result"] == "ok":
            return True
        else:
            return False

    def get_m3u8_by_pay(self):
        url = "https://api.renrenjiang.cn/api/v3/activities/{0}/stream_url?user_id={1}&timestamp={2}"
        url = url.format(self.vid, user_id, current_milli_time())
        res = session.get(url, headers=head)
        res = json.loads(res.content)
        if "status" in res.keys() and res["status"] == 2:
            hls_url = res["hls_url"]
            return hls_url
        return None

    def is_m3u8_exist(self):
        # 创建专栏文件夹
        path = root_path + os.sep + str(self.cid)
        is_exists = os.path.exists(path)
        if not is_exists:
            os.makedirs(path)
        # 创建专栏下的视频文件夹
        path = path + os.sep + str(self.vid)
        is_exists = os.path.exists(path)
        if not is_exists:
            os.makedirs(path)

        path = path + os.sep + "m3u8.txt"
        is_exists = os.path.exists(path)
        if is_exists:
            return True
        return False

    def read_m3u8_from_file(self):
        path = root_path + os.sep + str(self.cid)
        path = path + os.sep + str(self.vid)
        path = path + os.sep + "m3u8.txt"
        with open(path, "r") as file:
            res = file.readline().replace("\n", "").replace("\r\n", "")
            file.close()
            return res

    def write_m3u8_to_file(self, m3u8_value):
        path = root_path + os.sep + str(self.cid)
        path = path + os.sep + str(self.vid)
        path = path + os.sep + "m3u8.txt"
        with open(path, "w") as file:
            file.write(m3u8_value)
            file.close()

    def get_m3u8(self):
        if self.is_m3u8_exist():
            print("第{0}个视频的m3u8已存在,直接下载".format(self.index))
            return self.read_m3u8_from_file()
        if self.pay_for_video():
            print("第{0}个视频购买成功,直接下载".format(self.index))
            hls_url = self.get_m3u8_by_pay()
            self.write_m3u8_to_file(hls_url)
        else:
            print("第{0}个视频购买失败,正在暴力破解...".format(self.index))
            return self.get_m3u8_by_force()

main.py文件

import download

if __name__ == '__main__':
    cid = input('请输入人人讲的视频专栏的ID(cid): ')
    print("您输入的专栏ID等于:{0}".format(cid))
    obj = download.download(int(cid))
    obj.download()

利用该代码,我们只需要通过专栏ID就可以下载该专栏下所有的视频啦~

3. 代码下载

在我的github上可以获取完整代码

https://github.com/15207135348/renrenjiang

最后,希望大家能够多多关注我的公众号,我会定期推送一些大数据、Java等方面的学习资料。

大数据学堂
上一篇下一篇

猜你喜欢

热点阅读