Guava——Cache

2018-02-24 本文已影响163人 jiangmo

使用场景

缓存在很多场景下都是相当有用的。例如，计算或检索一个值的代价很高，并且对同样的输入需要不止一次获取值的时候，就应当考虑使用缓存。

Guava Cache与ConcurrentMap很相似，但也不完全一样。最基本的区别是ConcurrentMap会一直保存所有添加的元素，直到显式地移除。相对地，Guava Cache为了限制内存占用，通常都设定为自动回收元素。在某些场景下，尽管LoadingCache 不回收元素，它也是很有用的，因为它会自动加载缓存。

通常来说，Guava Cache适用于：

你愿意消耗一些内存空间来提升速度。
你预料到某些键会被查询一次以上。
缓存中存放的数据总量不会超出内存容量。（Guava Cache是单个应用运行时的本地缓存。它不把数据存放到文件或外部服务器。如果这不符合你的需求，请尝试Memcached这类工具）

如果你的场景符合上述的每一条，Guava Cache就适合你。

注：如果你不需要Cache中的特性，使用ConcurrentHashMap有更好的内存效率——但Cache的大多数特性都很难基于旧有的ConcurrentMap复制，甚至根本不可能做到。

如何使用

一个简单的例子：缓存大写的字符串key。

public void whenCacheMiss_thenValueIsComputed() throws InterruptedException {
       CacheLoader<String, String> loader;
       loader = new CacheLoader<String, String>() {
           // 当guava cache中不存在，则会调用load方法
           @Override
           public String load(String key) {
               return key.toUpperCase();
           }
       };
       LoadingCache<String, String> cache;
       cache = CacheBuilder
               .newBuilder()
               // 写数据1s后重新加载缓存
               .refreshAfterWrite(1L, TimeUnit.SECONDS)
               .build(loader);
       assertEquals(0, cache.size());
       cache.put("test", "test");
       assertEquals("test", cache.getUnchecked("test"));
       assertEquals("HELLO", cache.getUnchecked("hello"));
       assertEquals(2, cache.size());
       TimeUnit.SECONDS.sleep(2);
       assertEquals("TEST", cache.getUnchecked("test"));
   }

回收策略

基于容量的回收(Eviction by Size)

通过maximumSize()方法限制cache的size，如果cache达到了最大限制，oldest items 将会被回收。

public void whenCacheReachMaxSize_thenEviction() {
       CacheLoader<String, String> loader;
       loader = new CacheLoader<String, String>() {
           @Override
           public String load(String key) {
               return key.toUpperCase();
           }
       };
       LoadingCache<String, String> cache;
       cache = CacheBuilder.newBuilder().maximumSize(3).build(loader);
       cache.getUnchecked("first");
       cache.getUnchecked("second");
       cache.getUnchecked("third");
       cache.getUnchecked("forth");
       assertEquals(3, cache.size());
       assertNull(cache.getIfPresent("first"));
       assertEquals("FORTH", cache.getIfPresent("forth"));
   }

定时回收(Eviction by Time)

CacheBuilder提供两种定时回收的方法：

expireAfterAccess(long, TimeUnit)：缓存项在给定时间内没有被读/写访问，则回收。请注意这种缓存的回收顺序和基于大小回收一样。
expireAfterWrite(long, TimeUnit)：缓存项在给定时间内没有被写访问（创建或覆盖），则回收。如果认为缓存数据总是在固定时候后变得陈旧不可用，这种回收方式是可取的。
下面代码将示例expireAfterAccess的用法：

public void whenEntryIdle_thenEviction() throws InterruptedException {
       CacheLoader<String, String> loader;
       loader = new CacheLoader<String, String>() {
           @Override
           public String load(String key) {
               return key.toUpperCase();
           }
       };
       LoadingCache<String, String> cache;
       cache = CacheBuilder.newBuilder()
               .expireAfterAccess(2, TimeUnit.MILLISECONDS)
               .build(loader);
       cache.getUnchecked("hello");
       assertEquals(1, cache.size());
       cache.getUnchecked("hello");
       Thread.sleep(300);
       cache.getUnchecked("test");
       assertEquals(1, cache.size());
       assertNull(cache.getIfPresent("hello"));
   }

基于引用的回收(Reference-based Eviction)

通过使用弱引用的键、或弱引用的值、或软引用的值，Guava Cache可以把缓存设置为允许垃圾回收：

CacheBuilder.weakKeys()：使用弱引用存储键。当键没有其它（强或软）引用时，缓存项可以被垃圾回收。因为垃圾回收仅依赖恒等式（==），使用弱引用键的缓存用==而不是equals比较键。
CacheBuilder.softValues()：使用软引用存储值。软引用只有在响应内存需要时，才按照全局最近最少使用的顺序回收。考虑到使用软引用的性能影响，我们通常建议使用更有性能预测性的缓存大小限定（见上文，基于容量回收）。使用软引用值的缓存同样用==而不是equals比较值。

显式清除

任何时候，你都可以显式地清除缓存项，而不是等到它被回收：

个别清除：Cache.invalidate(key)
批量清除：Cache.invalidateAll(keys)
清除所有缓存项：Cache.invalidateAll()

移除监听(RemovalNotification)

通过CacheBuilder.removalListener(RemovalListener)，你可以声明一个监听器，以便缓存项被移除时做一些额外操作。缓存项被移除时，RemovalListener会获取移除通知[RemovalNotification]，其中包含移除原因[RemovalCause]、键和值。

请注意，RemovalListener抛出的任何异常都会在记录到日志后被丢弃[swallowed]。

public void whenEntryRemovedFromCache_thenNotify() {
       CacheLoader<String, String> loader;
       loader = new CacheLoader<String, String>() {
           @Override
           public String load(final String key) {
               return key.toUpperCase();
           }
       };
       RemovalListener<String, String> listener;
       listener = new RemovalListener<String, String>() {
           @Override
           public void onRemoval(RemovalNotification<String, String> n) {
               if (n.wasEvicted()) {
                   String cause = n.getCause().name();
                   assertEquals(RemovalCause.SIZE.toString(), cause);
               }
           }
       };
       LoadingCache<String, String> cache;
       cache = CacheBuilder.newBuilder()
               .maximumSize(3)
               .removalListener(listener)
               .build(loader);
       cache.getUnchecked("first");
       cache.getUnchecked("second");
       cache.getUnchecked("third");
       cache.getUnchecked("last");
       assertEquals(3, cache.size());
   }

刷新( Refresh the Cache)

刷新和回收不太一样。正如LoadingCache.refresh(K)所声明，刷新表示为键加载新值，这个过程可以是异步的。在刷新操作进行时，缓存仍然可以向其他线程返回旧值，而不像回收操作，读缓存的线程必须等待新值加载完成。

如果刷新过程抛出异常，缓存将保留旧值，而异常会在记录到日志后被丢弃[swallowed]。

重载CacheLoader.reload(K, V)可以扩展刷新时的行为，这个方法允许开发者在计算新值时使用旧的值。

public void cache_reLoad() {
    CacheLoader<String, String> loader;
    loader = new CacheLoader<String, String>() {
        @Override
        public String load(String key) {
            return key.toUpperCase();
        }
        /**
         * 重写reload方法可以定制自己的reload策略
         * @param key
         * @param oldValue
         * @return
         * @throws Exception
         */
        @Override
        public ListenableFuture<String> reload(String key, String oldValue) throws Exception {
            return super.reload(key, oldValue);
        }
    };
    LoadingCache<String, String> cache;
    cache = CacheBuilder.newBuilder()
            .build(loader);
}

CacheBuilder.refreshAfterWrite(long, TimeUnit)可以为缓存增加自动定时刷新功能。和expireAfterWrite相反，refreshAfterWrite通过定时刷新可以让缓存项保持可用，但请注意：缓存项只有在被检索时才会真正刷新（如果CacheLoader.refresh实现为异步，那么检索不会被刷新拖慢）。因此，如果你在缓存上同时声明expireAfterWrite和refreshAfterWrite，缓存并不会因为刷新盲目地定时重置，如果缓存项没有被检索，那刷新就不会真的发生，缓存项在过期时间后也变得可以回收。

public void whenLiveTimeEnd_thenRefresh() {
       CacheLoader<String, String> loader;
       loader = new CacheLoader<String, String>() {
           @Override
           public String load(String key) {
               return key.toUpperCase();
           }
       };
       LoadingCache<String, String> cache;
       cache = CacheBuilder.newBuilder()
               .refreshAfterWrite(1, TimeUnit.MINUTES)
               .build(loader);
   }

处理空值(Handle null Values)

实际上Guava整体设计思想就是拒绝null的，很多地方都会执行com.google.common.base.Preconditions.checkNotNull的检查。

默认情况guava cache将会抛出异常，如果试图加载null value–因为cache null 是没有任何意义的。
但是如果null value 对你的代码而已有一些特殊的含义，你可以尝试用Optional来表达，像下面这个例子：

public void whenNullValue_thenOptional() {
        CacheLoader<String, Optional<String>> loader;
        loader = new CacheLoader<String, Optional<String>>() {
            @Override
            public Optional<String> load(String key) {
                return Optional.fromNullable(getSuffix(key));
            }
        };
        LoadingCache<String, Optional<String>> cache;
        cache = CacheBuilder.newBuilder().build(loader);
        assertEquals("txt", cache.getUnchecked("text.txt").get());
        assertFalse(cache.getUnchecked("hello").isPresent());
    }
    private String getSuffix(final String str) {
        int lastIndex = str.lastIndexOf('.');
        if (lastIndex == -1) {
            return null;
        }
        return str.substring(lastIndex + 1);
    }

统计

CacheBuilder.recordStats()用来开启Guava Cache的统计功能。统计打开后，Cache.stats()方法会返回CacheStats对象以提供如下统计信息：

hitRate()：缓存命中率；
averageLoadPenalty()：加载新值的平均时间，单位为纳秒；
evictionCount()：缓存项被回收的总数，不包括显式清除。

此外，还有其他很多统计信息。这些统计信息对于调整缓存设置是至关重要的，在性能要求高的应用中我们建议密切关注这些数据。

Notes

什么时候用get，什么时候用getUnchecked
官网文档说:

If you have defined a CacheLoader that does not declare any checked exceptions then you can perform cache lookups using getUnchecked(K);
however care must be taken not to call getUnchecked on caches whose CacheLoaders declare checked exceptions.

即：如果你的CacheLoader没有定义任何checked Exception，那你可以使用getUnchecked。

用处--本地缓存

Generally, the Guava caching utilities are applicable whenever:

You are willing to spend some memory to improve speed.
You expect that keys will sometimes get queried more than once.
Your cache will not need to store more data than what would fit in RAM. (Guava caches are local to a single run of your application. They do not store data in files, or on outside servers. If this does not fit your needs, consider a tool like Memcached.)

加载数据

From a CacheLoader

A LoadingCache is a Cache built with an attached CacheLoader. Creating a CacheLoader is typically as easy as implementing the method V load(K key) throws Exception. So, for example, you could create a LoadingCache with the following code:

LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
       .maximumSize(1000)
       .build(
           new CacheLoader<Key, Graph>() {
             public Graph load(Key key) throws AnyException {
               return createExpensiveGraph(key);
             }
           });

...
try {
  return graphs.get(key);
} catch (ExecutionException e) {
  throw new OtherException(e.getCause());
}

The canonical way to query a LoadingCache is with the method get(K); however care must be taken not to call getUnchecked on caches whose CacheLoaders declare checked exceptions.

LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
       .expireAfterAccess(10, TimeUnit.MINUTES)
       .build(
           new CacheLoader<Key, Graph>() {
             public Graph load(Key key) { // no checked exception
               return createExpensiveGraph(key);
             }
           });

...
return graphs.getUnchecked(key);

可以通过重写loadAll进行批量加载

From a Callable

All Guava caches, loading or not, support the method get(K, Callable<V>). This method returns the value associated with the key in the cache, or computes it from the specified Callable and adds it to the cache. No observable state associated with this cache is modified until loading completes. This method provides a simple substitute for the conventional "if cached, return; otherwise create, cache and return" pattern.

Cache<Key, Value> cache = CacheBuilder.newBuilder()
    .maximumSize(1000)
    .build(); // look Ma, no CacheLoader
...
try {
  // If the key wasn't in the "easy to compute" group, we need to
  // do things the hard way.
  cache.get(key, new Callable<Value>() {
    @Override
    public Value call() throws AnyException {
      return doThingsTheHardWay(key);
    }
  });
} catch (ExecutionException e) {
  throw new OtherException(e.getCause());
}

Inserted Directly

也可直接通过Put插入

缓存淘汰策略

The cold hard reality is that we almost certainly don't have enough memory to cache everything we could cache. You must decide: when is it not worth keeping a cache entry? Guava provides three basic types of eviction:

size-based eviction
time-based eviction
reference-based eviction

Size-based Eviction

If your cache should not grow beyond a certain size, just useCacheBuilder.maximumSize(long). The cache will try to evict entries that haven't been used recently or very often. Warning: the cache may evict entries before this limit is exceeded -- typically when the cache size is approaching the limit.

Alternately, if different cache entries have different "weights" -- for example, if your cache values have radically different memory footprints -- you may specify a weight function with CacheBuilder.weigher(Weigher) and a maximum cache weight with CacheBuilder.maximumWeight(long). In addition to the same caveats as maximumSizerequires, be aware that weights are computed at entry creation time, and are static thereafter.

LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder()
       .maximumWeight(100000)
       .weigher(new Weigher<Key, Graph>() {
          public int weigh(Key k, Graph g) {
            return g.vertices().size();
          }
        })
       .build(
           new CacheLoader<Key, Graph>() {
             public Graph load(Key key) { // no checked exception
               return createExpensiveGraph(key);
             }
           });

Timed Eviction

CacheBuilder provides two approaches to timed eviction:

expireAfterAccess(long, TimeUnit) Only expire entries after the specified duration has passed since the entry was last accessed by a read or a write. Note that the order in which entries are evicted will be similar to that of size-based eviction.
expireAfterWrite(long, TimeUnit) Expire entries after the specified duration has passed since the entry was created, or the most recent replacement of the value. This could be desirable if cached data grows stale after a certain amount of time.

Timed expiration is performed with periodic maintenance during writes and occasionally during reads, as discussed below.

testing-timed-eviction

Use the Ticker interface and the CacheBuilder.ticker(Ticker) method to specify a time source in your cache builder, rather than having to wait for the system clock.

Reference-based Eviction

Guava allows you to set up your cache to allow the garbage collection of entries, by using weak references for keys or values, and by using soft references for values.

CacheBuilder.weakKeys() stores keys using weak references. This allows entries to be garbage-collected if there are no other (strong or soft) references to the keys. Since garbage collection depends only on identity equality, this causes the whole cache to use identity (==) equality to compare keys, instead of equals().
CacheBuilder.weakValues() stores values using weak references. This allows entries to be garbage-collected if there are no other (strong or soft) references to the values. Since garbage collection depends only on identity equality, this causes the whole cache to use identity (==) equality to compare values, instead of equals().
CacheBuilder.softValues() wraps values in soft references. Softly referenced objects are garbage-collected in a globally least-recently-used manner, in response to memory demand. Because of the performance implications of using soft references, we generally recommend using the more predictable maximum cache size instead. Use of softValues()will cause values to be compared using identity (==) equality instead of equals().

Explicit Removals

At any time, you may explicitly invalidate cache entries rather than waiting for entries to be evicted. This can be done:

individually, using Cache.invalidate(key)
in bulk, using Cache.invalidateAll(keys)
to all entries, using Cache.invalidateAll()

Removal Listeners

You may specify a removal listener for your cache to perform some operation when an entry is removed, via CacheBuilder.removalListener(RemovalListener). The RemovalListener gets passed a RemovalNotification, which specifies the RemovalCause, key, and value.

Note that any exceptions thrown by the RemovalListener are logged (using Logger) and swallowed.

When Does Cleanup Happen?

Caches built with CacheBuilder do not perform cleanup and evict values "automatically," or instantly after a value expires, or anything of the sort. Instead, it performs small amounts of maintenance during write operations, or during occasional read operations if writes are rare.

The reason for this is as follows: if we wanted to perform Cache maintenance continuously, we would need to create a thread, and its operations would be competing with user operations for shared locks. Additionally, some environments restrict the creation of threads, which would make CacheBuilder unusable in that environment.

If you want to schedule regular cache maintenance for a cache which only rarely has writes, just schedule the maintenance using ScheduledExecutorService.

Features

By using CacheBuilder.recordStats(), you can turn on statistics collection for Guava caches. The Cache.stats() method returns a CacheStats object, which provides statistics such as

hitRate(), which returns the ratio of hits to requests
averageLoadPenalty(), the average time spent loading new values, in nanoseconds
evictionCount(), the number of cache evictions

asMap

You can view any Cache as a ConcurrentMap using its asMap view, but how the asMap view interacts with the Cache requires some explanation.

cache.asMap() contains all entries that are currently loaded in the cache. So, for example, cache.asMap().keySet() contains all the currently loaded keys.
asMap().get(key) is essentially equivalent to cache.getIfPresent(key), and never causes values to be loaded. This is consistent with the Map contract.
Access time is reset by all cache read and write operations (including Cache.asMap().get(Object) and Cache.asMap().put(K, V)), but not by containsKey(Object), nor by operations on the collection-views of Cache.asMap(). So, for example, iterating through cache.asMap().entrySet() does not reset access time for the entries you retrieve.

Ref:
https://github.com/google/guava/wiki/CachesExplained
http://ifeve.com/google-guava-cachesexplained/