Trie 和哈夫曼树

2022-04-14 本文已影响0人 freemanIT

Trie

也叫作字典树, 前缀树(Prefix Tree), 单词查找树

Trie 搜索字符串的效率跟字符串的长度有关

trie树

public class Trie<V> {
    
    private int size;
    private Node<V> root;
    
    public int size() {
        return size;
    }
    
    public boolean isEmpty() {
        return size == 0;
    }
    
    public void clear() {
        size = 0;
        root = null;
    }
    
    public V get(String key) {
        Node<V> node = node(key);
        return node != null && node.word ? node.value : null;
    }
    
    public boolean contains(String key) {
        Node<V> node = node(key);
        return node != null && node.word;
    }
    
    public V add(String key, V value) {
        keyCheck(key);
        
        if (root == null) {
            root = new Node<>(null);
        }
        
        Node<V> node = root;
        int len = key.length();
        for (int i = 0; i < len; i++) {
            char c = key.charAt(i);
            boolean emptyChildren = node.children == null;
            Node<V> childNode = emptyChildren ? null : node.children.get(c);
            if (childNode == null) {
                childNode = new Node<>(node);
                childNode.character = c;
                node.children = emptyChildren ? new HashMap<>() : node.children;
                node.children.put(c, childNode);
            }
            node = childNode;
        }
        
        if (node.word) {// 已经存在这个单词
            V oldValue = node.value;
            node.value = value;
            return oldValue;
        }
        
        // 新增一个单词
        node.word = true;
        node.value = value;
        size++;
        return null;
    }
    
    
    public V remove(String key) {
        // 找到最后一个节点
        Node<V> node  = node(key);
        // 如果不是单词结尾, 不做任务处理
        if (node == null || !node.word) return null;
        size--;
        V oldValue = node.value;
        
        // 如果还有子节点
        if (node.children != null && !(node.children.isEmpty())) {
            node.word = false;
            node.value = null;
            return oldValue;
        }
        
        // 没有子节点
        Node<V> parent = null;
        while ((parent = node.parent) != null) {
            parent.children.remove(node.character);
            if (parent.word || !parent.children.isEmpty()) break;
            node = parent;
        }
        return oldValue;
    }
    
    public boolean startWith(String prefix) {
        return node(prefix) != null;
    }
    
    private Node<V> node(String key) {
        if (root == null) return null;
        keyCheck(key);
        Node<V> node = root;
        int length = key.length();
        for (int i = 0; i < length; i++) {
            if (node == null || node.children == null || node.children.isEmpty()) return null;
            char c = key.charAt(i);
            node = node.children.get(c);
            
        }
        return node;
    }
    
    private void keyCheck(String key) {
        if (key == null || key.length() == 0) {
            throw new IllegalArgumentException("key must not be empty");
        }
    }
    
    private static class Node<V> {
        Node<V> parent;
        HashMap<Character, Node<V>> children;
        Character character;
        V value;
        boolean word;
        public Node(Node<V> parent) {
            this.parent = parent;
        }
        
//      public HashMap<Character, Node<V>> getChildren() {
//          return children == null ? (children = new HashMap<>()) : children;
//      }
        
    }
    
}

Trie 的有点: 搜索前缀的效率主要跟前缀长度有关

Trie 的缺点: 需要消耗大量的内存

哈夫曼树

哈夫曼编码

又称为霍夫曼编码, 是现代压缩算法的基础

假设要把字符串[ABBBCCCCCCCCDDDDDDEE] 转出二进制编码传说

可以转出ASCII 编码, 但是比较长

如果给5 个字母对应的二进制为

A	B	C	D	E
000	001	010	011	100

对应的二进制编码为:

[000001001001010010010010010010010010011011011011011011100100]
一共20 个字母, 转成60 个二进制位

霍夫曼树

先计算每个字母的出现频率

A	B	C	D	E
1	3	8	6	2

利用权值, 构建一颗哈夫曼树

以权值作为根节点构建n 棵二叉树, 组成森林
以森林中选出2 个跟接地那最小的树合, 作为一棵新树的左右子树, 且新树的根节点作为其左右子树根节点之和
从森林中删除刚才选取的2 棵树, 并将新树加入到森林
重复2, 3 步骤, 直到森林只剩下一棵树, 该树为哈夫曼树

哈夫曼树

left 为0, right 为1, 可以得出5 个字母对应的哈夫曼编码

A B C D E

1110 110 0 10 1111
字符串[ABBBCCCCCCCCDDDDDDEE] 的哈夫曼编码为[1110110110110000000001010101010101111]
总结
- n 个权值构建出来的哈夫曼树拥有n 个叶子节点
- 每个哈夫曼编码都不是另一个哈夫曼编码的前缀
- 哈夫曼树时带权路径长度最短的树, 权值较大的节点离根节点较近
- 带权路径长度, 树中所有的叶子节点的权值乘上其到跟节点路径长度, 与最终的哈夫曼编码总长度成正比关系

Trie 和哈夫曼树

Trie

哈夫曼树

哈夫曼编码

霍夫曼树

猜你喜欢

热点阅读