热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

HBaseregionservermemorysizing

SizingamachineforHBaseissomewhatofablackart.Unlikeapurestoragemachinethatwouldjustbeoptimizedfordisksizeandthroughput,anHBaseRegionServerisalsoacomputenode.Everybyteofdiskspaceneedstobematchedwith

Sizing a machine for HBase is somewhat of a black art. Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node. Every byte of disk space needs to be matched with

Sizing a machine for HBase is somewhat of a black art.

Unlike a pure storage machine that would just be optimized for disk size and throughput, an HBase RegionServer is also a compute node.

Every byte of disk space needs to be matched with a fraction of a byte in the RegionServer's Java heap.

You can estimate the ratio of raw disk space to required Java heap as follows:

RegionSize / MemstoreSize *
ReplicationFactor * HeapFractionForMemstores

Or in terms of HBase/HDFS configuration parameters:

regions.hbase.hregion.max.filesize /
hbase.hregion.memstore.flush.size *
dfs.replication *
hbase.regionserver.global.memstore.lowerLimit

Say you have the following parameters (these are the defaults in 0.94):
  • 10GB regions
  • 128MB memstores
  • HDFS replication factor of 3
  • 40% of the heap use for the memstores

Then: 10GB/128MB*3*0.4 = 96.

Now think about this. With the default setting this means that if you wanted to serve 10T worth of disks space per region server you would need a 107GB Java heap!
Or if you give a region server a 10G heap you can only utilize about 1T of disk space per region server machine.

Most people are surprised by this. I know I was.

Let's double check:
In order to serve 10T worth of raw disk space - 3.3T of effective space after 3-way replication - with 10GB regions, you'd need ~338 regions. @128MB that's about 43GB. But only 40% is by default used for the memstores so what you actually need is 43GB/0.4 ~ 107GB. Yep it's right.

Maybe we can get away with a bit less by assuming that not all memstores are 100% full at all times. That is offset by the fact that not all region will be exactly the same size or 100% filled.

Now. What can you do?
There are several options:
  1. Increase the region size. 20GB is about the maximum. Although some people claim they have 200GB regions. (hbase.hregion.max.filesize)
  2. Decrease the memstore size. Depending on your write load you can go smaller, 64MB or even less. (hbase.hregion.memstore.flush.size).
    You can allow a memstore to grow beyond this size temporarily.
    (hbase.hregion.memstore.block.multiplier)
  3. Increase the HDFS replication factor. That does not really help per se, but if you have more disk space than you can utilize, increasing the replication factor would at least put your disks to good use.
  4. Fiddle with the heap fractions used for the memstores. If you load is write-heave maybe up that 50% of the heap (hbase.regionserver.global.memstore.upperLimit, hbase.regionserver.global.memstore.lowerLimit)
These parameters (except the replication factor, which is an HDFS setting) are described in hbase-defaults.xml that ships with HBase.

Personally I would place the maximum disk space per machine that can be served exclusively with HBase around 6T, unless you have a very read-heavy workload.
In that case the Java heap should be 32GB (20G regions, 128M memstores, the rest defaults). With MSLAB in 0.94 that works.

Of course your needs may vary. You may have mostly readonly load, in which case you can shrink the memstores. Or the disk space might be shared with other applications.
Maybe you need smaller regions or larger memstores. In that case he maximum disk space you can serve per machine would be less.

Future JVMs might support bigger heap effectively (JDK7's G1 comes to mind).

In any case. The formula above provides a reasonable starting point.

推荐阅读
  • 一、Hadoop来历Hadoop的思想来源于Google在做搜索引擎的时候出现一个很大的问题就是这么多网页我如何才能以最快的速度来搜索到,由于这个问题Google发明 ... [详细]
  • 本文介绍了Hyperledger Fabric外部链码构建与运行的相关知识,包括在Hyperledger Fabric 2.0版本之前链码构建和运行的困难性,外部构建模式的实现原理以及外部构建和运行API的使用方法。通过本文的介绍,读者可以了解到如何利用外部构建和运行的方式来实现链码的构建和运行,并且不再受限于特定的语言和部署环境。 ... [详细]
  • ZSI.generate.Wsdl2PythonError: unsupported local simpleType restriction ... [详细]
  • 本文介绍了Perl的测试框架Test::Base,它是一个数据驱动的测试框架,可以自动进行单元测试,省去手工编写测试程序的麻烦。与Test::More完全兼容,使用方法简单。以plural函数为例,展示了Test::Base的使用方法。 ... [详细]
  • HDFS2.x新特性
    一、集群间数据拷贝scp实现两个远程主机之间的文件复制scp-rhello.txtroothadoop103:useratguiguhello.txt推pushscp-rr ... [详细]
  • Android开发实现的计时器功能示例
    本文分享了Android开发实现的计时器功能示例,包括效果图、布局和按钮的使用。通过使用Chronometer控件,可以实现计时器功能。该示例适用于Android平台,供开发者参考。 ... [详细]
  • 欢乐的票圈重构之旅——RecyclerView的头尾布局增加
    项目重构的Git地址:https:github.comrazerdpFriendCircletreemain-dev项目同步更新的文集:http:www.jianshu.comno ... [详细]
  •     这里使用自己编译的hadoop-2.7.0版本部署在windows上,记得几年前,部署hadoop需要借助于cygwin,还需要开启ssh服务,最近发现,原来不需要借助cy ... [详细]
  • Introduction(简介)Forbeingapowerfulobject-orientedprogramminglanguage,Cisuseda ... [详细]
  • Hadoop2.6.0 + 云centos +伪分布式只谈部署
    3.0.3玩不好,现将2.6.0tar.gz上传到usr,chmod-Rhadoop:hadophadoop-2.6.0,rm掉3.0.32.在etcp ... [详细]
  • Hadoop源码解析1Hadoop工程包架构解析
    1 Hadoop中各工程包依赖简述   Google的核心竞争技术是它的计算平台。Google的大牛们用了下面5篇文章,介绍了它们的计算设施。   GoogleCluster:ht ... [详细]
  • 1、概述首先和大家一起回顾一下Java消息服务,在我之前的博客《Java消息队列-JMS概述》中,我为大家分析了:然后在另一篇博客《Java消息队列-ActiveMq实战》中 ... [详细]
  • 环境配置tips
    一、MySQL在Linux下数据库名、表名、列名、别名大小写规则是这样的:  1、数据库名与表名是严格区分大小写的;  2、表的别名是严格区分大小写的& ... [详细]
  • MR程序的几种提交运行模式本地模型运行1在windows的eclipse里面直接运行main方法,就会将job提交给本地执行器localjobrunner执行-- ... [详细]
  • Ihavethisfollowinginputfile:我有以下输入文件:test.csvdone_cfg,,,,port<0>,clk_in,subcktA,ins ... [详细]
author-avatar
邱文馨4966
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有