Hbase 优化参数

Posted yangxusun9

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hbase 优化参数相关的知识,希望对你有一定的参考价值。

在Hbase使用中,除了进行预分区,rowkey合理的设计外,平时也会对系统的内置参数进行优化

1、 堆内存的优化

HBase-site.xml

 <!-- regionServer的全局memstore的大小,超过该大小会触发flush到磁盘的操作,默认是堆大小的40%,而且regionserver级别的   
        flush会阻塞客户端读写 -->  
    <property>  
        <name>hbase.regionserver.global.memstore.size</name>  
        <value></value>  
        <description>Maximum size of all memstores in a region server before  
            new  
            updates are blocked and flushes are forced. Defaults to 40% of heap (0.4).  
            Updates are blocked and flushes are forced until size of all  
            memstores  
            in a region server hits  
            hbase.regionserver.global.memstore.size.lower.limit.  
            The default value in this configuration has been intentionally left  
            emtpy in order to  
            honor the old hbase.regionserver.global.memstore.upperLimit property if  
            present.  
        </description>  
    </property>  

    <!--可以理解为一个安全的设置,有时候集群的“写负载”非常高,写入量一直超过flush的量,这时,我们就希望memstore不要超过一定的安全设置。   
        在这种情况下,写操作就要被阻塞一直到memstore恢复到一个“可管理”的大小, 这个大小就是默认值是堆大小 * 0.4 * 0.95,也就是当regionserver级别   
        的flush操作发送后,会阻塞客户端写,一直阻塞到整个regionserver级别的memstore的大小为 堆大小 * 0.4 *0.95为止 -->  
    <property>  
        <name>hbase.regionserver.global.memstore.size.lower.limit</name>  
        <value></value>  
        <description>Maximum size of all memstores in a region server before  
            flushes are forced.  
            Defaults to 95% of hbase.regionserver.global.memstore.size (0.95).  
            A 100% value for this value causes the minimum possible flushing to  
            occur when updates are  
            blocked due to memstore limiting.  
            The default value in this configuration has been intentionally left  
            emtpy in order to  
            honor the old hbase.regionserver.global.memstore.lowerLimit property if  
            present.  
        </description>  
    </property>

这个参数并不是越大越好,因为内存如果设置得很大,当数据量一旦积压到阻塞条件,要想刷写到恢复正常的数据量(堆大小 * 0.4 * 0.95)也会加大,这样一来阻塞的时间就会加长

2、优化DataNode允许的最大文件打开数

hdfs-site.xml

<!--  HBase一般都会同一时间操作大量的文件,根据集群的数量和规模以及数据动作,设置为4096或者更高。-->
<property>
  <name>dfs.datanode.max.transfer.threads</name>
  <value>4096</value>
  <description>
        Specifies the maximum number of threads to use for transferring data
        in and out of the DN.
  </description>
</property>

3、优化延迟高的数据操作的等待时间

hdfs-site.xml

<!--如果对于某一次数据操作来讲,延迟非常高,socket需要等待更长的时间,建议把该值设置为更大的值(默认60000毫秒),以确保socket不会被timeout掉。 -->
<property>
  <name>dfs.image.transfer.timeout</name>
  <value>60000</value>
  <description>
        Socket timeout for image transfer in milliseconds. This timeout and the related
        dfs.image.transfer.bandwidthPerSec parameter should be configured such
        that normal image transfer can complete successfully.
        This timeout prevents client hangs when the sender fails during
        image transfer. This is socket timeout during image transfer.
  </description>
</property>

4、优化数据的写入效率(即开启压缩)

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>false</value>
  <description>Should the outputs of the maps be compressed before being
               sent across the network. Uses SequenceFile compression.
  </description>
</property>

<property>
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  <description>If the map outputs are compressed, how should they be 
               compressed?
  </description>
</property>

5、设置RPC监听数量

hbase-site.xml

<!-- regionServer端默认开启的RPC监控实例数,也即RegionServer能够处理的IO请求线程数
        当客户端过多或者读写请求过多时,可增加该值-->  
    <property>  
        <name>hbase.regionserver.handler.count</name>  
        <value>30</value>  
        <description>Count of RPC Listener instances spun up on RegionServers.  
            Same property is used by the Master for count of master handlers.  
        </description>  
    </property> 

6、优化HStore文件大小

hbase-site.xml

<!--HStoreFile最大的大小,当某个region的某个列族超过这个大小会进行region拆分
        如果需要运行HBase的MR任务,可以减小此值,因为一个region对应一个map任务,如果单个region过大,会导致map任务执行时间过长。-->  
    <property>  
        <name>hbase.hregion.max.filesize</name>  
        <value>10737418240</value>  
        <description>  
            Maximum HStoreFile size. If any one of a column families‘ HStoreFiles has  
            grown to exceed this value, the hosting HRegion is split in two.  
        </description>  
    </property> 

7、增大读缓存,写缓存

hbase-site.xml

 <!-- hbase客户端每次 写缓冲的大小(也就是客户端批量提交到server端),这块大小会同时占用客户端和服务端,缓冲区更大可以减少RPC次数,但是更大意味着内存占用更多 -->  
    <property>  
        <name>hbase.client.write.buffer</name>  
        <value>2097152</value>  
        <description>Default size of the HTable client write buffer in bytes.  
            A bigger buffer takes more memory -- on both the client and server  
            side since server instantiates the passed write buffer to process  
            it -- but a larger buffer size reduces the number of RPCs made.  
            For an estimate of server-side memory-used, evaluate  
            hbase.client.write.buffer * hbase.regionserver.handler.count  
        </description>  
    </property>
<!-- 在执行hbase scan操作的时候,客户端缓存的行数,设置小意味着更多的rpc次数,设置大比较吃内存 -->  
    <property>  
        <name>hbase.client.scanner.caching</name>  
        <value>2147483647</value>  
        <description>Number of rows that we try to fetch when calling next  
            on a scanner if it is not served from (local, client) memory. This  
            configuration  
            works together with hbase.client.scanner.max.result.size to try and use  
            the  
            network efficiently. The default value is Integer.MAX_VALUE by default so  
            that  
            the network will fill the chunk size defined by  
            hbase.client.scanner.max.result.size  
            rather than be limited by a particular number of rows since the size of  
            rows varies  
            table to table. If you know ahead of time that you will not require more  
            than a certain  
            number of rows from a scan, this configuration should be set to that row  
            limit via  
            Scan#setCaching. Higher caching values will enable faster scanners but will eat up  
            more  
            memory and some calls of next may take longer and longer times when the  
            cache is empty.  
            Do not set this value such that the time between invocations is greater  
            than the scanner  
            timeout; i.e. hbase.client.scanner.timeout.period  
        </description>  
    </property> 

 

以上是关于Hbase 优化参数的主要内容,如果未能解决你的问题,请参考以下文章

HBase客户端访问超时原因及参数优化

hbase参数配置优化

HBase客户端Rpc的重试机制以及客户端参数优化。

HBase优化 | regionserver g1 执行细节和参数调优方法论详解

Hbase优化

HBase性能调优:各个参数该如设置,客户端该如何优化