HBaseregionservermemorysizing

作者：邱文馨4966 | 来源：互联网 | 2018-06-11 03:04

SizingamachineforHBaseissomewhatofablackart.Unlikeapurestoragemachinethatwouldjustbeoptimizedfordisksizeandthroughput,anHBaseRegionServerisalsoacomputenode.Everybyteofdiskspaceneedstobematchedwith

Sizing a machine for HBase is somewhat of a black art.

Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node.

Every byte of disk space needs to be matched with a fraction of a byte in the RegionServer's Java heap.

You can estimate the ratio of raw disk space to required Java heap as follows:

RegionSize / MemstoreSize *
ReplicationFactor * HeapFractionForMemstores

Or in terms of HBase/HDFS configuration parameters:

regions.hbase.hregion.max.filesize /
hbase.hregion.memstore.flush.size *
dfs.replication *
hbase.regionserver.global.memstore.lowerLimit

Say you have the following parameters (these are the defaults in 0.94):

10GB regions
128MB memstores
HDFS replication factor of 3
40% of the heap use for the memstores

Then: 10GB/128MB*3*0.4 = 96.

Now think about this. With the default setting this means that if you wanted to serve 10T worth of disks space per region server you would need a 107GB Java heap!
Or if you give a region server a 10G heap you can only utilize about 1T of disk space per region server machine.

Most people are surprised by this. I know I was.

Let's double check:
In order to serve 10T worth of raw disk space - 3.3T of effective space after 3-way replication - with 10GB regions, you'd need ~338 regions. @128MB that's about 43GB. But only 40% is by default used for the memstores so what you actually need is 43GB/0.4 ~ 107GB. Yep it's right.

Maybe we can get away with a bit less by assuming that not all memstores are 100% full at all times. That is offset by the fact that not all region will be exactly the same size or 100% filled.

Now. What can you do?
There are several options:

Increase the region size. 20GB is about the maximum. Although some people claim they have 200GB regions. (hbase.hregion.max.filesize)
Decrease the memstore size. Depending on your write load you can go smaller, 64MB or even less. (hbase.hregion.memstore.flush.size).
You can allow a memstore to grow beyond this size temporarily. (hbase.hregion.memstore.block.multiplier)
Increase the HDFS replication factor. That does not really help per se, but if you have more disk space than you can utilize, increasing the replication factor would at least put your disks to good use.
Fiddle with the heap fractions used for the memstores. If you load is write-heave maybe up that 50% of the heap (hbase.regionserver.global.memstore.upperLimit, hbase.regionserver.global.memstore.lowerLimit)

These parameters (except the replication factor, which is an HDFS setting) are described in hbase-defaults.xml that ships with HBase.

Personally I would place the maximum disk space per machine that can be served exclusively with HBase around 6T, unless you have a very read-heavy workload.
In that case the Java heap should be 32GB (20G regions, 128M memstores, the rest defaults). With MSLAB in 0.94 that works.

Of course your needs may vary. You may have mostly readonly load, in which case you can shrink the memstores. Or the disk space might be shared with other applications.
Maybe you need smaller regions or larger memstores. In that case he maximum disk space you can serve per machine would be less.

Future JVMs might support bigger heap effectively (JDK7's G1 comes to mind).

In any case. The formula above provides a reasonable starting point.

原文地址：HBase region server memory sizing, 感谢原作者分享。

推荐阅读

xml
每天收获一点点Hadoop概述

一、Hadoop来历Hadoop的思想来源于Google在做搜索引擎的时候出现一个很大的问题就是这么多网页我如何才能以最快的速度来搜索到，由于这个问题Google发明 ... [详细]

蜡笔小新 2023-12-14 18:58:01
bash
Hyperledger Fabric外部链码构建与运行的开发笔记

本文介绍了Hyperledger Fabric外部链码构建与运行的相关知识，包括在Hyperledger Fabric 2.0版本之前链码构建和运行的困难性，外部构建模式的实现原理以及外部构建和运行API的使用方法。通过本文的介绍，读者可以了解到如何利用外部构建和运行的方式来实现链码的构建和运行，并且不再受限于特定的语言和部署环境。 ... [详细]

蜡笔小新 2023-12-13 21:47:39
get
的错误消息：

ZSI.generate.Wsdl2PythonError: unsupported local simpleType restriction ... [详细]

蜡笔小新 2023-12-13 20:28:08
get
Perl的测试框架Test::Base简介及使用方法

本文介绍了Perl的测试框架Test::Base，它是一个数据驱动的测试框架，可以自动进行单元测试，省去手工编写测试程序的麻烦。与Test::More完全兼容，使用方法简单。以plural函数为例，展示了Test::Base的使用方法。 ... [详细]

蜡笔小新 2023-12-13 20:05:31
int
HDFS2.x新特性

一、集群间数据拷贝scp实现两个远程主机之间的文件复制scp-rhello.txtroothadoop103:useratguiguhello.txt推pushscp-rr ... [详细]

蜡笔小新 2023-12-13 13:52:40
match
Android开发实现的计时器功能示例

本文分享了Android开发实现的计时器功能示例，包括效果图、布局和按钮的使用。通过使用Chronometer控件，可以实现计时器功能。该示例适用于Android平台，供开发者参考。 ... [详细]

蜡笔小新 2023-12-12 22:51:19
get
欢乐的票圈重构之旅——RecyclerView的头尾布局增加

项目重构的Git地址：https:github.comrazerdpFriendCircletreemain-dev项目同步更新的文集：http:www.jianshu.comno ... [详细]

蜡笔小新 2023-12-11 19:09:56
get
windows部署hadoop2.7.0

这里使用自己编译的hadoop-2.7.0版本部署在windows上，记得几年前，部署hadoop需要借助于cygwin，还需要开启ssh服务，最近发现，原来不需要借助cy ... [详细]

蜡笔小新 2023-10-17 21:04:04
int
Simple Tips on C++(对于C++的一些建议)

Introduction（简介）Forbeingapowerfulobject-orientedprogramminglanguage,Cisuseda ... [详细]

蜡笔小新 2023-10-17 19:48:02
format
Hadoop2.6.0 + 云centos +伪分布式只谈部署

3.0.3玩不好，现将2.6.0tar.gz上传到usr,chmod-Rhadoop:hadophadoop-2.6.0，rm掉3.0.32.在etcp ... [详细]

蜡笔小新 2023-10-17 19:28:24
search
Hadoop源码解析1Hadoop工程包架构解析

1 Hadoop中各工程包依赖简述 Google的核心竞争技术是它的计算平台。Google的大牛们用了下面5篇文章，介绍了它们的计算设施。 GoogleCluster：ht ... [详细]

蜡笔小新 2023-10-17 13:28:20
jar
Java消息队列Spring整合ActiveMq我是小强zz

1、概述首先和大家一起回顾一下Java消息服务，在我之前的博客《Java消息队列-JMS概述》中，我为大家分析了：然后在另一篇博客《Java消息队列-ActiveMq实战》中 ... [详细]

蜡笔小新 2023-10-17 10:34:08
jar
环境配置tips

一、MySQL在Linux下数据库名、表名、列名、别名大小写规则是这样的：　　1、数据库名与表名是严格区分大小写的；　　2、表的别名是严格区分大小写的& ... [详细]

蜡笔小新 2023-10-16 20:14:22
jar
MR程序的几种提交运行模式

MR程序的几种提交运行模式本地模型运行1在windows的eclipse里面直接运行main方法，就会将job提交给本地执行器localjobrunner执行-- ... [详细]

蜡笔小新 2023-10-16 18:29:26
get
关于Perl中split的用法的更多说明 - More clarification about the usage of split in Perl

Ihavethisfollowinginputfile:我有以下输入文件:test.csvdone_cfg,,,,port<0>,clk_in,subcktA,ins ... [详细]

蜡笔小新 2023-10-16 17:45:16

邱文馨4966

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章