General data compression limits(

2023-08-19  本文已影响0人  刘东利2020

什么是typical sequence?

A typical sequence is an outcome which has information close to the entropy of random event to which it belongs.

因为是'close to',所以typical sequence不可能只有一个,他们的集合就称之为typical set:

The typical set is the set of all such typical sequences, which is usually a small subset of all possible sequences.

对于独立同分布(independent and indentically distributed random events)变量,根据AEP(asymptotic equipartition property):

为什么会有这个结果呢?作者这里试图用语言和图表,但是我没看懂(O(∩_∩)O哈哈~),因此我打开了另外一本神作,Thomas Cover的The Elements of Information Theory:

证明过程如下:

而在原作中的图表,我也先留在这里:

一个有趣的性质是,typical set并不是那么的typical:

Interestingly, the typical set does not include the most probable individual sequences. For example, in Figure 3b, the sequence for each N that consists of N blues in a row is the most probably sequence, would be omitted from the typical set for a small value of ǫ. For any fifinite N you could improve the compression algorithm described above by taking the binary string used for the least probable typical sequence and instead using it for the sequence of all blue. The demonstrates an important point about the typical set. It is useful for theoretical work, because the number of sequences it contains can be counted. 

在数据压缩中有一个现实考虑:

However, in practical situations (i.e. N < ∞), better compression can be achieved by designing for the set of most probable sequences. As N → ∞, this becomes less and less true, because the most probable sequences contain vanishingly small total probability mass.

这就如同你的世界不断扩张,你的视角就需要越来越有包容性。

上一篇 下一篇

猜你喜欢

热点阅读