Entropy and data compression ——

2023-08-07 本文已影响0人刘东利2020

信息熵还和（无损）数据压缩有关：

For example, it quantififies the limit of much (lossless) compression of data can be achieved, a problem in information theory known as source coding.

具体来说：

Entropy can be interpreted as the average length of the shortest encoding of a sequenceof random events, which here are repeated draws (with replacement) of colored marbles from anurn. (Top) With equal probability of each color, the shortest binary recording assigns a two-digitbinary string code to each event. The entropy is the average number of bits per event of a typical sequence: 2 bits. (Bottom) When some colors are more likely than others, the more probable onescan be recorded as shorter binary strings to save space. This gives a shorter entropy: 1.75 bits.

计算一下两个概率分布也可以看出，第一个案例的信息熵就是2，第二个就是1.75。

尤其是针对第二个案例：

This is an example of a prefifix code, which can be uniquely decoded even if the the codewords are concatenated one after another, since no codeword is a prefifix of another.

如果也采用了和第一个案例的相同编码，那么信息熵没有变，但是平均编码变长了，成为了2位：

In other words, this code would introduce redundancy.

Entropy and data compression ——

猜你喜欢

热点阅读