[Economist] 流媒体碎片(一)
Stream slip
A different kind of film review
REMEMBER that racy film you probably should not have enjoyed on Netflix last weekend? Eran Tromer’s algorithms can tell what it was. Although videos streamed from services such as Netflix, Amazon and YouTube are encrypted in various ways to ensure privacy, all have one thing in common: they leak information. Dr Tromer, of Tel Aviv university, his colleague Roei Schuster and Vitaly Shmatikov of Cornell have worked out how those leaks can identify the film you are watching—even if they cannot directly observe the stream of bits delivering it, or obtain access to the device on which you are watching it.
还记得上个周末在 Netflix 上观看的那部不怎么喜欢的电影吗?Eran Tromer 能识别出那部电影叫做什么名字。尽管视频提供商诸如 Netflix, Amazon 以及 YouTube 会通过各种手段加密来保护隐私,但有一点是相同的,他们泄露了信息。Tel Aviv university 的 Dr Tromer 和他的同事 Roei Schuster 以及 来自Cornell 的 Vitaly Shmatikov 共同研究出了这些泄露的信息是如何可以确定你正在观看的电影的,即使他们无法直接看到流媒体的数据,或者直接侵入你的观看设备。
Videos streamed over the internet are usually transmitted using a standard called MPEG- DASH. This chops a data stream up into segments that are then encrypted and fetched one at a time by the machine playing the video. The result is an on-off, “bursty” pattern of data arrival. But not all segments are equal. One depicting the mating habits of sloths will contain less information than another showing a car chase. Streaming services use something called variable bit-rate ( VBR) compression to take advantage of this. Amorous-sloth segments are compressed to a greater degree than those involving car chases, reducing the overall amount of data that must be transmitted. That means segments of the same duration (in seconds) have different sizes (in bytes). The resulting pattern forms a video fingerprint.
在互联网上传播的视频通常都使用一种名为 MPEG- DASH 的标准。这种方式会把数据流切成小段然后进行加密,播放视频的设备会按顺序依次传输这些数据。结果就是往来的,“脉冲式”的数据连接。但并不是所有的片段都是平均的。一个描写树懒交配的画面所包含的信息少于一场汽车追逐。流媒体服务使用可变比特率的压缩方式来利用这一点。激情的树懒片段相较于汽车追组被压缩的更加严重,以减少最终需要传输的数据流。这就意味着相同长度(秒)的片段的大小(比特)是不同的。这就产生了视频的指纹。
Dr Tromer’s method recognises this fingerprint by comparing it with a preassembled library of such prints that a snooper has made from videos the viewership of which he might want to follow. The detection algorithm involved is a version of a program called a neural network, a type of software adept at signal-recognition tasks. Once trained, Dr Tromer’s neural network can identify films with up to 99% accuracy, based on a fingerprint between one and five minutes long.
Dr Tromer 的方法是将这种指纹和通过一个嗅探器来获取用户可能观看的视频库的指纹,进行比较。检测算法的用到了一个称为神经网络的程序,即一个信号识别的软件。通过训练,Dr Tromer 的神经网络能够基于一到五分钟的视频长度来达到 99% 的视频识别度。