十一:Centralized Cache Management in HDFS 集中缓存管理

Posted 月饼馅饺子

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了十一:Centralized Cache Management in HDFS 集中缓存管理相关的知识,希望对你有一定的参考价值。

    集中的HDFS缓存管理,该机制可以让用户缓存特定的hdfs路径,这些块缓存在堆外内存中。namenode指导datanode完成这个工作。

Centralized cache management in HDFS has many significant advantages.

  1. Explicit pinning prevents frequently used data from being evicted from memory. This is particularly important when the size of the working set exceeds the size of main memory, which is common for many HDFS workloads. 阻止经常使用的数据被逐出内存。
  2. Because DataNode caches are managed by the NameNode, applications can query the set of cached block locations when making task placement decisions. Co-locating a task with a cached block replica improves read performance.
  3. When block has been cached by a DataNode, clients can use a new , more-efficient, zero-copy read API. Since checksum verification of cached data is done once by the DataNode, clients can incur essentially zero overhead when using this new API.可以使用更高效的无复制的api读这些块。
  4. Centralized caching can improve overall cluster memory utilization. When relying on the OS buffer cache at each DataNode, repeated reads of a block will result in all nreplicas of the block being pulled into buffer cache. With centralized cache management, a user can explicitly pin only m of the n replicas, saving n-m memory.减少重复读时使用的

适用的情况:
    经常需要读的文件。比如一个小文件。

结构:

    datanode通过heartbeats定期发送缓存块信息到namenode,namdenode把新进的缓存路径发送给datanode让其缓存。
namenode会定时的复查namespace和缓存列表来决定哪些需要缓存和不需要缓存,缓存信息会更新中fsimage中和edit log中。
注意:不会缓存不一致的块,也不会缓存快捷方式的目标对象。
注意:当前只支持文件和目录级别缓存,不支持块级别。目录只支持目录下第一级的数据缓存,不支持循环。


命令和配置:

中文件参考:









以上是关于十一:Centralized Cache Management in HDFS 集中缓存管理的主要内容,如果未能解决你的问题,请参考以下文章

第十一章 缓存机制——《跟我学Shiro》

A Concurrent-safe Centralized Pointer Managing Facility

git:distributed is the new centralized-part1

How-to centralized integration of eventbridge event notifications sent to feishu

SpringBoot 使用 EhCache2.x 缓存(三十一)

HDFS集中式的缓存管理原理与代码剖析