SomefunwithRedisClustertesting

作者：捕风的水中龙_106 | 来源：互联网 | 2018-06-11 21:41

OneofthestepstoreachthegoalofprovidingatestableRedisClusterexperiencetouserswithinafewweeks,issomeserioustestingthatgoesovertheusualImrunning3nodesinmymacbook,itworks.Finallythisispossible,si

One of the steps to reach the goal of providing a "testable" Redis Cluster experience to users within a few weeks, is some serious testing that goes over the usual "I'm running 3 nodes in my macbook, it works". Finally this is possible, since Redis Cluster entered into the "refinements" stage, and most of the system design and implementation is in its final form already.

In order to perform some testing I assembled an environment like that:

* Hardware: 6 real computers: 2 macbook pro, 2 macbook air, 1 Linux desktop, 1 Linux tiny laptop called EEEpc running with a single core at 800Mhz.

* Network: the six nodes were wired to the same network in different ways. Two nodes connected via ethernet, and four over wifi, with different access points. Basically there were three groups. The computers connected with the ethernet had 0.3 milliseconds RTT, other two computers connected with a near access point were at around 5 milliseconds, and another group of two with another access point were not very reliable, sometime some packet went lost, latency spikes at 300-1000 milliseconds.

During the simulation every computer ran Partitions.tcl (http://github.com/antirez/partitions) in order to simulate network partitions three times per minute, lasting an average of 10 seconds. Redis Cluster was configured to detect a failure after 500 milliseconds, so these settings are able to trigger a number of failover procedures.

Every computer ran the following:

Computer 1 and 2: Redis cluster node + Partitions.tcl + 1 client
Computer 3 to 6: Redis cluster node + Partitions.tcl

The cluster was configured to have three masters and three slaves in total.

As client software I ran the cluster consistency test that is shipped with redis-rb-cluster (http://github.com/antirez/redis-rb-cluster), that performs atomic counters increments remembering the value client side, to detect both lost writes and non acknowledged writes that were actually accepted by the cluster.

I left the simulation running for about 24 hours, however there were moments where the cluster was completely down due to too many nodes being down.

The bugs
===

The first thing that happened in the simulation was, a big number of crashes of nodes… the simulation was able to trigger bugs that I did not noticed in the past. Also there were obvious mis-behavior due to the fact that one node, the eeepc one, was running a Redis server compiled with a 32 bit target. So in the first part of the simulation I just fixed bugs:

7a666ac Cluster: set n->slaves to NULL in clusterNodeResetSlaves().
fda91db Cluster: check link is valid before sending UPDATE.
f57bb36 Cluster: initialize todo_before_sleep flags to 0.
c70c0c6 Cluster: use proper type mstime_t for ping delay var.
7c1cbdc Cluster: use an hardcoded 60 sec timeout in redis-trib connections.
47815d3 Fixed clearNodeFailureIfNeeded() time type to mstime_t.
e88e6a6 Cluster: use long long for timestamps in clusterGenNodesDescription().

The above commits made yesterday are a mix of bugs reported by valgrind (for some part of the simulation there were nodes running over valgrind), crashes, and misbehavior of the 32 bit instance.

After all the above fixes I left the simulation running for many hours without being able to trigger any crash. Basically the simulation “payed itself” just for this bug fixing activity… more minor bugs were found during the simulation that I’ve yet to fix.

Behavior under partition
===

One of the important goals of this simulation was to test how Redis Cluster performed under partitions. While Redis Cluster does not feature strong consistency, it is designed in order to minimize write loss under some very common failure modes, and to contain data loss within a given max window under other failure modes.

To understand how it works and the failure modes is simple because the way Redis Cluster itself works is simple to understand and predict. The system is just composed of different master instances handling a subset of keys each. Every master has from 0 to N replicas. In the specific case of the simulation every master had just one replica.

The most important aspect regarding safety and consistency of data is the failover procedure, that is executed as follows:

* A master must be detected as failing, according to the configured “node timeout”. I used 500 milliseconds as node timeout. However a single node cannot start a failover if it just detects a master is down. It must receive acknowledgements from the majority of the master nodes in order to flag the node as failing.
* Every slave that flagged a node as failing will try to be elected to perform the failover. Here we use the Raft protocol election step, so that only a single slave will be able to get elected for a given epoch. The epoch will be used in order to version the new configuration of the cluster for the set of slots served by the old master.

Once a slave performs the failover it reclaims the slots served by its master, and propagates the information ASAP. Other nodes that have an old configuration are updated by the cluster at a latter time if they were not reachable when the failover happened.

Since the replication is asynchronous, and when a master fails we pick a slave that may not have all the master data, there are obvious failure modes where writes are lost, however Redis Cluster try to do things in order to avoid situations where, for example, a client is writing forever to a master that is partitioned away and was already failed over in the majority side.

So this are the main precautions used by Redis Cluster to limit lost writes:

1) Not every slave is a candidate for election. If a slaves detects its data is too old, it will not try to get elected. This in practical terms means that the cluster does not recover in the case where none of the slaves of a master are able to talk with the master for a long time, the master fails, the slave are available but have very stale data.
2) If a master is isolated in the minority side of the cluster, that means, it senses the majority of the other masters are not reachable, it stops accepting writes.

There are still things to improve in the heuristics Redis Cluster uses to limit data loss, for example currently it does not use the replication offset in order to give an advantage to the slave with the most fresh version of data, but only the “disconnection time” from the master. This will be implemented in the next days.

However the point was to test how these mechanisms worked in the practice, and also to have a way to measure if further improvements will lead to less data loss.

So this is the results obtained in this first test:

* 1000 failovers
* 8.5 million writes performed by each client
* The system lost 2000 writes.
* The system retained 800 not acknowledged writes.

The amount of lost writes could appears to be extremely low considered the number of failovers performed. However note that the test program ran by the client was conceived to write to different keys so it was very easy when partitioned into a minority of masters for the client to hit an hash slot not served by the reachable masters. This resulted into waiting for the timeout to occur before of the next write. However writing to multiple keys is actually the most common case of real clients.

Impressions
===

Running this test was pretty interesting from the point of view of the paradigm shift from Redis to Redis Cluster.
When I started the test there were the bugs mentioned above still to fix, so instances crashed from time to time, still the client was almost always able to write (unless there was a partition that resulted into the cluster not being available). This is an obvious result of running a cluster, but as I’m used to see a different availability patter with what is currently the norm with Redis, this was pretty interesting to touch first hand.

Another positive result was that the system worked as expected in many ways, for example the nodes always agreed about the configuration when the partitions healed, there was never a time in which I saw no partitions and the client not able to reach all the hash slots and reporting errors.

The failure detection appeared to be robust enough, especially the computer connected with a very high latency network, from time to time appeared to be down to a single node, but that was not enough to get the agreement from the other nodes, avoiding a number of useless failover procedures.

At the same time I noticed a number of little issues that must be fixed. For example at some point there was a power outage and the router rebooted, causing many nodes to change address. There is a built-in feature in Redis Cluster so that the cluster reconfigures itself automatically with the new addresses as long as there is a node that did not changed address, assuming every node can reach it.
This system worked only half-way, and I noticed that indeed the implementation was not yet complete.

Future work
===

This is just an initial test, and this and other tests will require to be the norm in the process of testing Redis Cluster.
The first step will be to create a Redis Cluster testing environment that will be shipped with the system and that the user can run, so it is possible that the cluster will be able to support a feature to simulate partitions easily.

Another thing that is worthwhile with the current test setup using partitions.tcl is the ability of the test client, and of Partitions.tcl itself, to log events. For example with a log of partitions and data loss events that has accurate enough timestamps it is possible to correlate data loss events with partitions setups.

If you are interested in playing with Redis Cluster you can follow the Redis Cluster tutorial here: http://redis.io/topics/cluster-tutorial Comments

推荐阅读

git
Linux 正则表达式基础及使用注意事项

本文介绍了Linux系统中正则表达式的基础知识，包括正则表达式的简介、字符分类、普通字符和元字符的区别，以及在学习过程中需要注意的事项。同时提醒读者要注意正则表达式与通配符的区别，并给出了使用正则表达式时的一些建议。本文适合初学者了解Linux系统中的正则表达式，并提供了学习的参考资料。 ... [详细]

蜡笔小新 2023-12-13 14:24:45
python
学习SLAM的女生，很酷

本文介绍了学习SLAM的女生的故事，她们选择SLAM作为研究方向，面临各种学习挑战，但坚持不懈，最终获得成功。文章鼓励未来想走科研道路的女生勇敢追求自己的梦想，同时提到了一位正在英国攻读硕士学位的女生与SLAM结缘的经历。 ... [详细]

蜡笔小新 2023-12-14 17:55:18
python
云原生边缘计算之KubeEdge简介及功能特点

本文介绍了云原生边缘计算中的KubeEdge系统，该系统是一个开源系统，用于将容器化应用程序编排功能扩展到Edge的主机。它基于Kubernetes构建，并为网络应用程序提供基础架构支持。同时，KubeEdge具有离线模式、基于Kubernetes的节点、群集、应用程序和设备管理、资源优化等特点。此外，KubeEdge还支持跨平台工作，在私有、公共和混合云中都可以运行。同时，KubeEdge还提供数据管理和数据分析管道引擎的支持。最后，本文还介绍了KubeEdge系统生成证书的方法。 ... [详细]

蜡笔小新 2023-12-14 16:49:01
python
伊振华作品 | 沈阳市智慧城市运行管理中心的设计与建设

本文介绍了设计师伊振华受邀参与沈阳市智慧城市运行管理中心项目的整体设计，并以数字赋能和创新驱动高质量发展的理念，建设了集成、智慧、高效的一体化城市综合管理平台，促进了城市的数字化转型。该中心被称为当代城市的智能心脏，为沈阳市的智慧城市建设做出了重要贡献。 ... [详细]

蜡笔小新 2023-12-14 16:35:39
python
Centos7.6安装Gitlab教程及注意事项

本文介绍了在Centos7.6系统下安装Gitlab的详细教程，并提供了一些注意事项。教程包括查看系统版本、安装必要的软件包、配置防火墙等步骤。同时，还强调了使用阿里云服务器时的特殊配置需求，以及建议至少4GB的可用RAM来运行GitLab。 ... [详细]

蜡笔小新 2023-12-14 14:01:06
python
[译]技术公司十年经验的职场生涯回顾

本文是一位在技术公司工作十年的职场人士对自己职业生涯的总结回顾。她的职业规划与众不同，令人深思又有趣。其中涉及到的内容有机器学习、创新创业以及引用了女性主义者在TED演讲中的部分讲义。文章表达了对职业生涯的愿望和希望，认为人类有能力不断改善自己。 ... [详细]

蜡笔小新 2023-12-14 11:31:05
select
图解redis的持久化存储机制RDB和AOF的原理和优缺点

本文通过图解的方式介绍了redis的持久化存储机制RDB和AOF的原理和优缺点。RDB是将redis内存中的数据保存为快照文件，恢复速度较快但不支持拉链式快照。AOF是将操作日志保存到磁盘，实时存储数据但恢复速度较慢。文章详细分析了两种机制的优缺点，帮助读者更好地理解redis的持久化存储策略。 ... [详细]

蜡笔小新 2023-12-13 20:24:11
select
推荐系统遇上深度学习(十七）详解推荐系统中的常用评测指标

原创：石晓文小小挖掘机2018-06-18笔者是一个痴迷于挖掘数据中的价值的学习人，希望在平日的工作学习中，挖掘数据的价值， ... [详细]

蜡笔小新 2023-12-13 19:35:25
select
解决Cydia数据库错误：could not open file /var/lib/dpkg/status 的方法

本文介绍了解决iOS系统中Cydia数据库错误的方法。通过使用苹果电脑上的Impactor工具和NewTerm软件，以及ifunbox工具和终端命令，可以解决该问题。具体步骤包括下载所需工具、连接手机到电脑、安装NewTerm、下载ifunbox并注册Dropbox账号、下载并解压lib.zip文件、将lib文件夹拖入Books文件夹中，并将lib文件夹拷贝到/var/目录下。以上方法适用于已经越狱且出现Cydia数据库错误的iPhone手机。 ... [详细]

蜡笔小新 2023-12-13 19:02:44
select
sklearn数据集库中的常用数据集类型介绍

本文介绍了sklearn数据集库中常用的数据集类型，包括玩具数据集和样本生成器。其中详细介绍了波士顿房价数据集，包含了波士顿506处房屋的13种不同特征以及房屋价格，适用于回归任务。 ... [详细]

蜡笔小新 2023-12-13 17:45:15
select
如何通过全新应用内评价获取更多优质用户反馈？

Google Play推出全新的应用内评价API，帮助开发者获取更多优质用户反馈。用户每天在Google Play上发表数百万条评论，这有助于开发者了解用户喜好和改进需求。开发者可以选择在适当的时间请求用户撰写评论，以获得全面而有用的反馈。全新应用内评价功能让用户无需返回应用详情页面即可发表评论，提升用户体验。 ... [详细]

蜡笔小新 2023-12-13 17:23:03
select
Ubuntu 9.04中安装谷歌Chromium浏览器及使用体验[图文]

nsitionalENhttp:www.w3.orgTRxhtml1DTDxhtml1-transitional.dtd ... [详细]

蜡笔小新 2023-12-13 13:30:30
select
imx6ull开发板驱动MT7601U无线网卡的方法和步骤详解

本文详细介绍了在imx6ull开发板上驱动MT7601U无线网卡的方法和步骤。首先介绍了开发环境和硬件平台，然后说明了MT7601U驱动已经集成在linux内核的linux-4.x.x/drivers/net/wireless/mediatek/mt7601u文件中。接着介绍了移植mt7601u驱动的过程，包括编译内核和配置设备驱动。最后，列举了关键词和相关信息供读者参考。 ... [详细]

蜡笔小新 2023-12-13 12:34:44
select
ch3中可视化软件pangolin的安装步骤及注意事项

本文介绍了在ch3中安装可视化软件pangolin的步骤及注意事项。首先提供了pangolin的下载地址，并说明了下载后需要放到与虚拟机交互的文件夹地址。然后详细介绍了安装pangolin所需的依赖项，并提供了在终端进行安装的命令。最后给出了解压pangolin的步骤。 ... [详细]

蜡笔小新 2023-12-13 12:23:46
text
Support Paged.JS for automatic hugo resume> PDF conversion.

FeatureRequestIsyourfeaturerequestrelatedtoaproblem?Please ... [详细]

蜡笔小新 2023-12-13 11:52:05

捕风的水中龙_106

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章