Java本地高性能缓存的几种实现方式

Posted 2023-04-10 吳名氏

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Java本地高性能缓存的几种实现方式相关的知识，希望对你有一定的参考价值。

Java缓存技术可分为远端缓存和本地缓存，远端缓存常用的方案有著名的redis和memcache，而本地缓存的代表技术主要有HashMap，Guava Cache，Caffeine和Encahche。本篇博文仅覆盖了本地缓存，且突出探讨高性能的本地缓存。

本篇博文将首先介绍常见的本地缓存技术，对本地缓存有个大概的了解；其次介绍本地缓存中号称性能最好的Cache，可以探讨看看到底有多好？怎么做到这么好？最后通过几个实战样例，在日常工作中应用高性能的本地缓存。

一、 Java本地缓存技术介绍

1.1 使用List集合contains方法循环遍历(有序) 1.1 HashMap

通过Map的底层方式，直接将需要缓存的对象放在内存中。

优点：简单粗暴，不需要引入第三方包，比较适合一些比较简单的场景。
缺点：没有缓存淘汰策略，定制化开发成本高。


public class LRUCache extends LinkedHashMap 

    /**
     * 可重入读写锁，保证并发读写安全性
     */
    private ReentrantReadWriteLock readWriteLock = new ReentrantReadWriteLock();
    private Lock readLock = readWriteLock.readLock();
    private Lock writeLock = readWriteLock.writeLock();

    /**
     * 缓存大小限制
     */
    private int maxSize;

    public LRUCache(int maxSize) 
        super(maxSize + 1, 1.0f, true);
        this.maxSize = maxSize;
    

    @Override
    public Object get(Object key) 
        readLock.lock();
        try 
            return super.get(key);
         finally 
            readLock.unlock();
        
    

    @Override
    public Object put(Object key, Object value) 
        writeLock.lock();
        try 
            return super.put(key, value);
         finally 
            writeLock.unlock();
        
    

    @Override
    protected boolean removeEldestEntry(Map.Entry eldest) 
        return this.size() > maxSize;

1.2 Guava Cache

Guava Cache是由Google开源的基于LRU替换算法的缓存技术。但Guava Cache由于被下面即将介绍的Caffeine全面超越而被取代，因此不特意编写示例代码了，有兴趣的读者可以访问Guava Cache主页。

优点：支持最大容量限制，两种过期删除策略（插入时间和访问时间），支持简单的统计功能。
缺点：springboot2和spring5都放弃了对Guava Cache的支持。

1.3 Caffeine

Caffeine采用了W-TinyLFU（LUR和LFU的优点结合）开源的缓存技术。缓存性能接近理论最优，属于是Guava Cache的增强版。


public class CaffeineCacheTest 

    public static void main(String[] args) throws Exception 
        //创建guava cache
        Cache<String, String> loadingCache = Caffeine.newBuilder()
                //cache的初始容量
                .initialCapacity(5)
                //cache最大缓存数
                .maximumSize(10)
                //设置写缓存后n秒钟过期
                .expireAfterWrite(17, TimeUnit.SECONDS)
                //设置读写缓存后n秒钟过期,实际很少用到,类似于expireAfterWrite
                //.expireAfterAccess(17, TimeUnit.SECONDS)
                .build();
        String key = "key";
        // 往缓存写数据
        loadingCache.put(key, "v");

        // 获取value的值，如果key不存在，获取value后再返回
        String value = loadingCache.get(key, CaffeineCacheTest::getValueFromDB);

        // 删除key
        loadingCache.invalidate(key);
    

    private static String getValueFromDB(String key) 
        return "v";

1.4 Encache

Ehcache是一个纯java的进程内缓存框架，具有快速、精干的特点。是hibernate默认的cacheprovider。

优点：支持多种缓存淘汰算法，包括LFU，LRU和FIFO；缓存支持堆内缓存，堆外缓存和磁盘缓存；支持多种集群方案，解决数据共享问题。
缺点：性能比Caffeine差


public class EncacheTest 

    public static void main(String[] args) throws Exception 
        // 声明一个cacheBuilder
        CacheManager cacheManager = CacheManagerBuilder.newCacheManagerBuilder()
                .withCache("encacheInstance", CacheConfigurationBuilder
                        //声明一个容量为20的堆内缓存
                        .newCacheConfigurationBuilder(String.class,String.class, ResourcePoolsBuilder.heap(20)))
                .build(true);
        // 获取Cache实例
        Cache<String,String> myCache =  cacheManager.getCache("encacheInstance", String.class, String.class);
        // 写缓存
        myCache.put("key","v");
        // 读缓存
        String value = myCache.get("key");
        // 移除换粗
        cacheManager.removeCache("myCache");
        cacheManager.close();

在Caffeine的官网介绍中，Caffeine在性能和功能上都与其他几种方案相比具有优势，因此接下来主要探讨Caffeine的性能和实现原理。

二、高性能缓存Caffeine

2.1 缓存类型

2.1.1 Cache


Cache<Key, Graph> cache = Caffeine.newBuilder()
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .maximumSize(10_000)
    .build();

// 查找一个缓存元素， 没有查找到的时候返回null
Graph graph = cache.getIfPresent(key);
// 查找缓存，如果缓存不存在则生成缓存元素,  如果无法生成则返回null
graph = cache.get(key, k -> createExpensiveGraph(key));
// 添加或者更新一个缓存元素
cache.put(key, graph);
// 移除一个缓存元素
cache.invalidate(key);

Cache 接口提供了显式搜索查找、更新和移除缓存元素的能力。当缓存的元素无法生成或者在生成的过程中抛出异常而导致生成元素失败，cache.get 也许会返回 null 。

2.1.2 Loading Cache


LoadingCache<Key, Graph> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .build(key -> createExpensiveGraph(key));

// 查找缓存，如果缓存不存在则生成缓存元素,  如果无法生成则返回null
Graph graph = cache.get(key);
// 批量查找缓存，如果缓存不存在则生成缓存元素
Map<Key, Graph> graphs = cache.getAll(keys);

一个LoadingCache是一个Cache 附加上 CacheLoader能力之后的缓存实现。
如果缓存不错在，则会通过CacheLoader.load来生成对应的缓存元素。

2.1.3 Loading Cache 2.1.3 Async Cache


AsyncCache<Key, Graph> cache = Caffeine.newBuilder()
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .maximumSize(10_000)
    .buildAsync();

// 查找一个缓存元素， 没有查找到的时候返回null
CompletableFuture<Graph> graph = cache.getIfPresent(key);
// 查找缓存元素，如果不存在，则异步生成
graph = cache.get(key, k -> createExpensiveGraph(key));
// 添加或者更新一个缓存元素
cache.put(key, graph);
// 移除一个缓存元素
cache.synchronous().invalidate(key);

AsyncCache就是Cache的异步形式，提供了Executor生成缓存元素并返回CompletableFuture的能力。默认的线程池实现是 ForkJoinPool.commonPool() ，当然你也可以通过覆盖并实现 Caffeine.executor(Executor)方法来自定义你的线程池选择。

2.1.4 Async Loading Cache


AsyncLoadingCache<Key, Graph> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(10, TimeUnit.MINUTES)
    // 你可以选择: 去异步的封装一段同步操作来生成缓存元素
    .buildAsync(key -> createExpensiveGraph(key));
    // 你也可以选择: 构建一个异步缓存元素操作并返回一个future
    .buildAsync((key, executor) -> createExpensiveGraphAsync(key, executor));

// 查找缓存元素，如果其不存在，将会异步进行生成
CompletableFuture<Graph> graph = cache.get(key);
// 批量查找缓存元素，如果其不存在，将会异步进行生成
CompletableFuture<Map<Key, Graph>> graphs = cache.getAll(keys);

AsyncLoadingCache就是LoadingCache的异步形式，提供了异步load生成缓存元素的功能。

2.2 驱逐策略

基于容量


// 基于缓存内的元素个数进行驱逐
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .maximumSize(10_000)
    .build(key -> createExpensiveGraph(key));

// 基于缓存内元素权重进行驱逐
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .maximumWeight(10_000)
    .weigher((Key key, Graph graph) -> graph.vertices().size())
    .build(key -> createExpensiveGraph(key));

基于时间


// 基于固定的过期时间驱逐策略
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .expireAfterAccess(5, TimeUnit.MINUTES)
    .build(key -> createExpensiveGraph(key));
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .build(key -> createExpensiveGraph(key));

// 基于不同的过期驱逐策略
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .expireAfter(new Expiry<Key, Graph>() 
      public long expireAfterCreate(Key key, Graph graph, long currentTime) 
        // Use wall clock time, rather than nanotime, if from an external resource
        long seconds = graph.creationDate().plusHours(5)
            .minus(System.currentTimeMillis(), MILLIS)
            .toEpochSecond();
        return TimeUnit.SECONDS.toNanos(seconds);
      
      public long expireAfterUpdate(Key key, Graph graph, 
          long currentTime, long currentDuration) 
        return currentDuration;
      
      public long expireAfterRead(Key key, Graph graph,
          long currentTime, long currentDuration) 
        return currentDuration;
      
    )
    .build(key -> createExpensiveGraph(key));

基于引用


// 当key和缓存元素都不再存在其他强引用的时候驱逐
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .weakKeys()
    .weakValues()
    .build(key -> createExpensiveGraph(key));

// 当进行GC的时候进行驱逐
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .softValues()
    .build(key -> createExpensiveGraph(key));

2.3 刷新机制


LoadingCache<Key, Graph> graphs = Caffeine.newBuilder()
    .maximumSize(10_000)
    .refreshAfterWrite(1, TimeUnit.MINUTES)
    .build(key -> createExpensiveGraph(key));

只有在LoadingCache中可以使用刷新策略，与驱逐不同的是，在刷新的时候如果查询缓存元素，其旧值将仍被返回，直到该元素的刷新完毕后结束后才会返回刷新后的新值。

2.4 统计


Cache<Key, Graph> graphs = Caffeine.newBuilder()
    .maximumSize(10_000)
    .recordStats()
    .build();

通过使用Caffeine.recordStats()方法可以打开数据收集功能。Cache.stats()方法将会返回一个CacheStats对象，其将会含有一些统计指标，比如：

hitRate(): 查询缓存的命中率
evictionCount(): 被驱逐的缓存数量
averageLoadPenalty(): 新值被载入的平均耗时

配合SpringBoot提供的RESTful Controller，能很方便的查询Cache的使用情况。

三、Caffeine在SpringBoot的实战

按照Caffeine Github官网文档的描述，Caffeine是基于Java8的高性能缓存库。并且在Spring5（SpringBoot2.x)官方放弃了Guava，而使用了性能更优秀的Caffeine作为默认的缓存方案。

SpringBoot使用Caffeine有两种方式：

方式一：直接引入Caffeine依赖，然后使用Caffeine的函数实现缓存
方式二：引入Caffeine和Spring Cache依赖，使用SpringCache注解方法实现缓存
下面分别介绍两种使用方式。

方式一：使用Caffeine依赖

首先引入maven相关依赖：


<dependency>  
  <groupId>com.github.ben-manes.caffeine</groupId>  
    <artifactId>caffeine</artifactId>  
</dependency>

其次，设置缓存的配置选项


@Configuration
public class CacheConfig 

    @Bean
    public Cache<String, Object> caffeineCache() 
        return Caffeine.newBuilder()
                // 设置最后一次写入或访问后经过固定时间过期
                .expireAfterWrite(60, TimeUnit.SECONDS)
                // 初始的缓存空间大小
                .initialCapacity(100)
                // 缓存的最大条数
                .maximumSize(1000)
                .build();

最后给服务添加缓存功能


@Slf4j
@Service
public class UserInfoServiceImpl 

    /**
     * 模拟数据库存储数据
     */
    private HashMap<Integer, UserInfo> userInfoMap = new HashMap<>();

    @Autowired
    Cache<String, Object> caffeineCache;

    public void addUserInfo(UserInfo userInfo) 
        userInfoMap.put(userInfo.getId(), userInfo);
        // 加入缓存
        caffeineCache.put(String.valueOf(userInfo.getId()),userInfo);
    

    public UserInfo getByName(Integer id) 
        // 先从缓存读取
        caffeineCache.getIfPresent(id);
        UserInfo userInfo = (UserInfo) caffeineCache.asMap().get(String.valueOf(id));
        if (userInfo != null)
            return userInfo;
        
        // 如果缓存中不存在，则从库中查找
        userInfo = userInfoMap.get(id);
        // 如果用户信息不为空，则加入缓存
        if (userInfo != null)
            caffeineCache.put(String.valueOf(userInfo.getId()),userInfo);
        
        return userInfo;
    

    public UserInfo updateUserInfo(UserInfo userInfo) 
        if (!userInfoMap.containsKey(userInfo.getId())) 
            return null;
        
        // 取旧的值
        UserInfo oldUserInfo = userInfoMap.get(userInfo.getId());
        // 替换内容
        if (!StringUtils.isEmpty(oldUserInfo.getAge())) 
            oldUserInfo.setAge(userInfo.getAge());
        
        if (!StringUtils.isEmpty(oldUserInfo.getName())) 
            oldUserInfo.setName(userInfo.getName());
        
        if (!StringUtils.isEmpty(oldUserInfo.getSex())) 
            oldUserInfo.setSex(userInfo.getSex());
        
        // 将新的对象存储，更新旧对象信息
        userInfoMap.put(oldUserInfo.getId(), oldUserInfo);
        // 替换缓存中的值
        caffeineCache.put(String.valueOf(oldUserInfo.getId()),oldUserInfo);
        return oldUserInfo;
    

    @Override
    public void deleteById(Integer id) 
        userInfoMap.remove(id);
        // 从缓存中删除
        caffeineCache.asMap().remove(String.valueOf(id));

方式二：使用Spring Cache注解

首先引入maven相关依赖


<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
</dependency>
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
</dependency>

其次，配置缓存管理类


@Configuration  
public class CacheConfig   
  
    /**  
     * 配置缓存管理器  
     *  
     * @return 缓存管理器  
     */  
    @Bean("caffeineCacheManager")  
    public CacheManager cacheManager()   
        CaffeineCacheManager cacheManager = new CaffeineCacheManager();  
        cacheManager.setCaffeine(Caffeine.newBuilder()  
                // 设置最后一次写入或访问后经过固定时间过期  
                .expireAfterAccess(60, TimeUnit.SECONDS)  
                // 初始的缓存空间大小  
                .initialCapacity(100)  
                // 缓存的最大条数  
                .maximumSize(1000));  
        return cacheManager;

最后给服务添加缓存功能

@Slf4j
@Service
@CacheConfig(cacheNames = "caffeineCacheManager")
public class UserInfoServiceImpl 

    /**
     * 模拟数据库存储数据
     */
    private HashMap<Integer, UserInfo> userInfoMap = new HashMap<>();

    @CachePut(key = "#userInfo.id")
    public void addUserInfo(UserInfo userInfo) 
        userInfoMap.put(userInfo.getId(), userInfo);
    

    @Cacheable(key = "#id")
    public UserInfo getByName(Integer id) 
        return userInfoMap.get(id);
    

    @CachePut(key = "#userInfo.id")
    public UserInfo updateUserInfo(UserInfo userInfo) 
        if (!userInfoMap.containsKey(userInfo.getId())) 
            return null;
        
        // 取旧的值
        UserInfo oldUserInfo = userInfoMap.get(userInfo.getId());
        // 替换内容
        if (!StringUtils.isEmpty(oldUserInfo.getAge())) 
            oldUserInfo.setAge(userInfo.getAge());
        
        if (!StringUtils.isEmpty(oldUserInfo.getName())) 
            oldUserInfo.setName(userInfo.getName());
        
        if (!StringUtils.isEmpty(oldUserInfo.getSex())) 
            oldUserInfo.setSex(userInfo.getSex());
        
        // 将新的对象存储，更新旧对象信息
        userInfoMap.put(oldUserInfo.getId(), oldUserInfo);
        // 返回新对象信息
        return oldUserInfo;
    

    @CacheEvict(key = "#id")
    public void deleteById(Integer id) 
        userInfoMap.remove(id);

"HybridDB · 性能优化 · Count Distinct的几种实现方式” 读后感

原文地址：HybridDB · 性能优化 · Count Distinct的几种实现方式

HybridDB是阿里基于GreenPlum开发的一款MPP分析性数据库，而GreenPlum本身基于PostgreSQL。

如此，HybridDB的优化思路和手段难免会受到PostgreSQL影响和限制。

文中的语句最终优化得到了几个不同计划，其优化的语句简化后形如

select count(distinct c1) from t group by c2;

这条语句在HybridDB下实现：

每个服务器自行分组、计算count(distinct)；
将上一步结果按照分组列重新分发；
每个服务器根据收到的数据进行二次分组计算；
收集汇总到一个服务器得到最终结果。

文中优化点集中在分组的实现方式上，

a), 排序+分组；

b), hash分组；

c), orca优化方式，同a#，改进在于第一次排序项不同。

原文中a#, b#, c#的计划概括

a),

Scan （Columnar Scan + Append) -> Sort(category) -> Group by(category) -> Redistribute -> Sort(category) -> Group by(category) -> Sort -> Gather

b),

Scan （Columnar Scan + Append) -> Group by(Hash(category，actionId)） -> Redistribute(category) -> Group by(Hash(category, acitonId)） -> Group by(Hash（category)） -> Sort -> Gather

c),
Scan (Dynamic Scan) -> Sort (category, actionId) -> Group by (category) -> Redistribute -> Sort (category) -> Group by(category) -> Sort -> Gather

个人认为这篇文章提到的优化和MPP关系不大，单机下也许也能得到类似的不同计划。

也许是受限于数据分布特性、数据量等因素，MPP下数据分发、汇总的MOTION优化并没有体现在这里。

以上是关于Java本地高性能缓存的几种实现方式的主要内容，如果未能解决你的问题，请参考以下文章