Entropy and data compression ——

2023-08-07  本文已影响0人  刘东利2020

信息熵还和(无损)数据压缩有关:

For example, it quantififies the limit of much (lossless) compression of data can be achieved, a problem in information theory known as source coding.

具体来说:

Entropy can be interpreted as the average length of the shortest encoding of a sequenceof random events, which here are repeated draws (with replacement) of colored marbles from anurn. (Top) With equal probability of each color, the shortest binary recording assigns a two-digitbinary string code to each event. The entropy is the average number of bits per event of a typical sequence: 2 bits. (Bottom) When some colors are more likely than others, the more probable onescan be recorded as shorter binary strings to save space. This gives a shorter entropy: 1.75 bits.

计算一下两个概率分布也可以看出,第一个案例的信息熵就是2,第二个就是1.75。

尤其是针对第二个案例:

This is an example of a prefifix code, which can be uniquely decoded even if the the codewords are concatenated one after another, since no codeword is a prefifix of another.

如果也采用了和第一个案例的相同编码,那么信息熵没有变,但是平均编码变长了,成为了2位:

In other words, this code would introduce redundancy.

上一篇 下一篇

猜你喜欢

热点阅读