热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

在Solaris中,pthread互斥对象与原子操作-pthreadmutexvsatomicopsinSolaris

Iwasdoingsometestswithasimpleprogrammeasuringtheperformanceofasimpleatomicincrement

I was doing some tests with a simple program measuring the performance of a simple atomic increment on a 64 bit value using an atomic_add_64 vs a mutex lock approach. What is puzzling me is the atomic_add is slower than the mutex lock by a factor of 2.

我用一个简单的程序做了一些测试,该程序使用atomic_add_64和互斥锁方法测量64位值上的简单原子增量的性能。让我迷惑不解的是atomic_add比互斥锁要慢2倍。

EDIT!!! I've done some more testing. Looks like atomics are faster than mutex and scale up to 8 concurrent threads. After that the performance of atomics degrades significantly.

编辑! ! !我做了更多的测试。看起来原子比互斥体快,并且扩展到8个并发线程。之后原子的性能显著下降。

The platform I've tested is:

我测试的平台是:

SunOS 5.10 Generic_141444-09 sun4u sparc SUNW,Sun-Fire-V490

SunOS 5.10 generic_1444 -09 sun4u sparc SUNW,Sun-Fire-V490

CC: Sun C++ 5.9 SunOS_sparc Patch 124863-03 2008/03/12

CC: Sun C+ 5.9 SunOS_sparc补丁124863-03 2008/03/12

The program is quite simple:

这个程序非常简单:

#include 
#include 
#include 
#include 

uint64_t        g_Loops = 1000000;
volatile uint64_t       g_Counter = 0;
volatile uint32_t       g_Threads = 20;

pthread_mutex_t g_Mutex;
pthread_mutex_t g_CondMutex;
pthread_cond_t  g_Condition;

void LockMutex() 
{ 
  pthread_mutex_lock(&g_Mutex); 
}

void UnlockMutex() 
{ 
   pthread_mutex_unlock(&g_Mutex); 
}

void InitCond()
{
   pthread_mutex_init(&g_CondMutex, 0);
   pthread_cond_init(&g_Condition, 0);
}

void SignalThreadEnded()
{
   pthread_mutex_lock(&g_CondMutex);
   --g_Threads;
   pthread_cond_signal(&g_Condition);
   pthread_mutex_unlock(&g_CondMutex);
}

void* ThreadFuncMutex(void* arg)
{
   uint64_t counter = g_Loops;
   while(counter--)
   {
      LockMutex();
      ++g_Counter;
      UnlockMutex();
   }
   SignalThreadEnded();
   return 0;
}

void* ThreadFuncAtomic(void* arg)
{
   uint64_t counter = g_Loops;
   while(counter--)
   {
      atomic_add_64(&g_Counter, 1);
   }
   SignalThreadEnded();
   return 0;
}


int main(int argc, char** argv)
{
   pthread_mutex_init(&g_Mutex, 0);
   InitCond();
   bool bMutexRun = true;
   if(argc > 1)
   {
      bMutexRun = false;
      printf("Atomic run!\n");
   }
   else
        printf("Mutex run!\n");

   // start threads
   uint32_t threads = g_Threads;
   while(threads--)
   {
      pthread_t thr;
      if(bMutexRun)
         pthread_create(&thr, 0,ThreadFuncMutex, 0);
      else
         pthread_create(&thr, 0,ThreadFuncAtomic, 0);
   }
   pthread_mutex_lock(&g_CondMutex);
   while(g_Threads)
   {
      pthread_cond_wait(&g_Condition, &g_CondMutex);
      printf("Threads to go %d\n", g_Threads);
   }
   printf("DONE! g_Counter=%ld\n", (long)g_Counter);
}

The results of a test run on our box is:

在我们的盒子上进行测试运行的结果是:

$ CC -o atomictest atomictest.C
$ time ./atomictest
Mutex run!
Threads to go 19
...
Threads to go 0
DONE! g_Counter=20000000

real    0m15.684s
user    0m52.748s
sys     0m0.396s

$ time ./atomictest 1
Atomic run!
Threads to go 19
...
Threads to go 0
DONE! g_Counter=20000000

real    0m24.442s
user    3m14.496s
sys     0m0.068s

Did you run into this type of performance difference on Solaris? Any ideas why this happens?

您在Solaris上遇到过这种性能差异吗?你知道为什么会这样吗?

On Linux the same code (using the gcc __sync_fetch_and_add) yields a 5-fold performance improvement over the mutex verstion.

在Linux上,相同的代码(使用gcc __sync_fetch_and_add)比互斥锁带来了5倍的性能改进。

Thanks, Octav

谢谢,Octav

1 个解决方案

#1


2  

You have to be careful what is happening here.

你必须小心这里发生的事情。

1) It takes significant time to create a thread. Thus, its likely that not all the threads are executing simultaneously. As evidence, I took your code and removed the mutex lock and got the correct answer every time I ran it. This means that none of the threads were executing at the same time! You should not count the time to create/destruct threads in your test. You should wait till all threads are created and running before you start the test.

1)创建线程需要花费大量的时间。因此,并不是所有的线程都同时执行。作为证据,我获取了您的代码并删除了互斥锁,每次运行时都得到了正确的答案。这意味着没有一个线程同时执行!您不应该计算在测试中创建/销毁线程的时间。您应该等到创建并运行所有线程后才开始测试。

2) Your test isn't fair. Your test has artificially very high lock contention. For whatever reason, the atomic add_and_fetch suffers in that situation. In real life, you would do some work in the thread. Once you add even a little bit of work, the atomic ops perform a lot better. This is because the chance of a race condition has dropped significantly. The atomic op has lower overhead when there is no contention. The mutex has more overhead than the atomic op when there is no contention.

你的考试不公平。您的测试人为地非常高锁争用。无论出于什么原因,原子add_and_fetch在这种情况下会出现问题。在现实生活中,你会在线程中做一些工作。一旦你添加了一点工作,原子操作就会表现得更好。这是因为发生种族状况的几率已经大大降低了。在没有争用的情况下,原子op的开销更低。当没有争用时,互斥对象的开销比原子操作要大。

3) # of threads. The fewer threads running, the lower the contention. This is why fewer threads do better for the atomic in this test. Your 8 thread number might be the number of simultaneous threads your system supports. It might not be because your test was so skewed towards contention. It would seem to me that your test would scale to the number of simultaneous threads allowed and then plateau. One thing I cannot figure out is why, when the # of threads gets higher than the number of simultaneous threads the system can handle, we don't see evidence of the situation where the mutex is left locked while the thread sleeps. Maybe we do, I just can't see it happening.

3)#的线程。运行的线程越少,争用就越少。这就是为什么在这个测试中更少的线程在原子上做得更好。您的8个线程数可能是系统支持的同步线程数。这可能不是因为您的测试太倾向于争用。在我看来,您的测试将扩展到允许的同步线程的数量,然后是平台。我搞不懂的一件事是,为什么当线程的数量超过系统可以处理的同时线程的数量时,我们看不到在线程休眠时互斥体被锁住的情况。也许是这样的,我只是看不到它的发生。

Bottom line, the atomics are a lot faster in most real life situations. They are not very good when you have to hold a lock for a long time...something you should avoid anyway (well in my opinion at least!)

总的来说,原子学在现实生活中的速度要快得多。当你长时间锁着的时候,它们不是很好……无论如何你都应该避免的事情(至少在我看来是这样!)

I changed your code so you can test with no work, barely any work, and a little more work as well as change the # of threads.

我更改了您的代码,这样您就可以不做任何工作、几乎不做任何工作、多做一些工作以及更改线程的#。

6sm = 6 threads, barely any work, mutex 6s = 6 threads, barely any work, atomic

6个线程,几乎没有任何工作,互斥6 = 6个线程,几乎没有任何工作,原子。

use a capitol S to get more work, and no s to get no work.

利用国会大厦来获得更多的工作,而不是没有工作。

These results show that with 10 threads, the amount of work affects how much faster atomics are. In the first case, there is no work, and the atomics are barely faster. Add a little work and the gap doubles to 6 sec, and a lot of work and it almost gets to 10 sec.

这些结果表明,在10个线程中,工作的数量影响了原子的速度。在第一种情况下,没有功,原子的速度也仅仅是更快。加一点功,间隔加倍到6秒,再加很多功,几乎达到10秒。

(2) /dev_tools/Users/c698174/temp/atomic 
[c698174@shldvgfas007] $ t=10; a.out $t ; a.out "$t"m
ATOMIC FAST g_Counter=10000000 13.6520 s
MUTEX  FAST g_Counter=10000000 15.2760 s

(2) /dev_tools/Users/c698174/temp/atomic 
[c698174@shldvgfas007] $ t=10s; a.out $t ; a.out "$t"m
ATOMIC slow g_Counter=10000000 11.4957 s
MUTEX  slow g_Counter=10000000 17.9419 s

(2) /dev_tools/Users/c698174/temp/atomic 
[c698174@shldvgfas007] $ t=10S; a.out $t ; a.out "$t"m
ATOMIC SLOW g_Counter=10000000 14.7108 s
MUTEX  SLOW g_Counter=10000000 23.8762 s

20 threads, atomics still better, but by a smaller margin. No work, they are almost the same speed. With a lot of work, atomics take the lead again.

20个线程,原子力仍然更好,但幅度较小。没有工作,他们的速度几乎是一样的。经过大量的工作,原子学再次领先。

(2) /dev_tools/Users/c698174/temp/atomic 
[c698174@shldvgfas007] $ t=20; a.out $t ; a.out "$t"m
ATOMIC FAST g_Counter=20000000 27.6267 s
MUTEX  FAST g_Counter=20000000 30.5569 s

(2) /dev_tools/Users/c698174/temp/atomic 
[c698174@shldvgfas007] $ t=20S; a.out $t ; a.out "$t"m
ATOMIC SLOW g_Counter=20000000 35.3514 s
MUTEX  SLOW g_Counter=20000000 48.7594 s

2 threads. Atomics dominate.

2个线程。原子占主导地位。

(2) /dev_tools/Users/c698174/temp/atomic 
[c698174@shldvgfas007] $ t=2S; a.out $t ; a.out "$t"m
ATOMIC SLOW g_Counter=2000000 0.6007 s
MUTEX  SLOW g_Counter=2000000 1.4966 s

Here is the code (redhat linux, using gcc atomics):

下面是代码(redhat linux,使用gcc原子):

#include 
#include 
#include 
#include 

volatile uint64_t __attribute__((aligned (64))) g_Loops = 1000000 ;
volatile uint64_t __attribute__((aligned (64))) g_Counter = 0;
volatile uint32_t __attribute__((aligned (64))) g_Threads = 7; 
volatile uint32_t __attribute__((aligned (64))) g_Active = 0;
volatile uint32_t __attribute__((aligned (64))) g_fGo = 0;
int g_fSlow = 0;

#define true 1
#define false 0
#define NANOSEC(t) (1000000000ULL * (t).tv_sec + (t).tv_nsec)

pthread_mutex_t g_Mutex;
pthread_mutex_t g_CondMutex;
pthread_cond_t  g_Condition;

void LockMutex() 
{ 
  pthread_mutex_lock(&g_Mutex); 
}

void UnlockMutex() 
{ 
   pthread_mutex_unlock(&g_Mutex); 
}

void Start(struct timespec *pT)
{
   int cActive = __sync_add_and_fetch(&g_Active, 1);
   while(!g_fGo) {} 
   clock_gettime(CLOCK_THREAD_CPUTIME_ID, pT);
}

uint64_t End(struct timespec *pT)
{
   struct timespec T;
   int cActive = __sync_sub_and_fetch(&g_Active, 1);
   clock_gettime(CLOCK_THREAD_CPUTIME_ID, &T);
   return NANOSEC(T) - NANOSEC(*pT);
}
void Work(double *x, double z)
{
      *x += z;
      *x /= 27.6;
      if ((uint64_t)(*x + .5) - (uint64_t)*x != 0)
        *x += .7;
}
void* ThreadFuncMutex(void* arg)
{
   struct timespec T;
   uint64_t counter = g_Loops;
   double x = 0, z = 0;
   int fSlow = g_fSlow;

   Start(&T);
   if (!fSlow) {
     while(counter--) {
        LockMutex();
        ++g_Counter;
        UnlockMutex();
     }
   } else {
     while(counter--) {
        if (fSlow==2) Work(&x, z);
        LockMutex();
        ++g_Counter;
        z = g_Counter;
        UnlockMutex();
     }
   }
   *(uint64_t*)arg = End(&T);
   return (void*)(int)x;
}

void* ThreadFuncAtomic(void* arg)
{
   struct timespec T;
   uint64_t counter = g_Loops;
   double x = 0, z = 0;
   int fSlow = g_fSlow;

   Start(&T);
   if (!fSlow) {
     while(counter--) {
        __sync_add_and_fetch(&g_Counter, 1);
     }
   } else {
     while(counter--) {
        if (fSlow==2) Work(&x, z);
        z = __sync_add_and_fetch(&g_Counter, 1);
     }
   }
   *(uint64_t*)arg = End(&T);
   return (void*)(int)x;
}


int main(int argc, char** argv)
{
   int i;
   int bMutexRun = strchr(argv[1], 'm') != NULL;
   pthread_t thr[1000];
   uint64_t aT[1000];
   g_Threads = atoi(argv[1]);
   g_fSlow = (strchr(argv[1], 's') != NULL) ? 1 : ((strchr(argv[1], 'S') != NULL) ? 2 : 0);

   // start threads
   pthread_mutex_init(&g_Mutex, 0);
   for (i=0 ; i

推荐阅读
  • #include<iostream>usingnamespacestd;intmain(){HereIseperatemynumberbe ... [详细]
  • Linux 中使用 clone 函数来创建线程
    2019独角兽企业重金招聘Python工程师标准Linux上创建线程一般使用的是pthread库实际上libc也给我们提供了创建线程的函数那就是cloneintclone(i ... [详细]
  • C语言编程gcc怎么生成静态库.a和动态库.so
    这篇文章将为大家详细讲解有关C语言编程gcc怎么生成静态库.a和动态库.so,小编觉得挺实用的,因此分享给大家做个参考,希望大家阅读完这篇文章 ... [详细]
  • burp爆破线程设置多少_Linux中的线程局部存储解析
    在Linux系统中使用CC进行多线程编程时,我们遇到最多的就是对同一变量的多线程读写问题,大多情况下遇到这类问题都是通过锁机制来处理,但这 ... [详细]
  • 本文介绍了九度OnlineJudge中的1002题目“Grading”的解决方法。该题目要求设计一个公平的评分过程,将每个考题分配给3个独立的专家,如果他们的评分不一致,则需要请一位裁判做出最终决定。文章详细描述了评分规则,并给出了解决该问题的程序。 ... [详细]
  • 本文讨论了使用差分约束系统求解House Man跳跃问题的思路与方法。给定一组不同高度,要求从最低点跳跃到最高点,每次跳跃的距离不超过D,并且不能改变给定的顺序。通过建立差分约束系统,将问题转化为图的建立和查询距离的问题。文章详细介绍了建立约束条件的方法,并使用SPFA算法判环并输出结果。同时还讨论了建边方向和跳跃顺序的关系。 ... [详细]
  • 本文介绍了一种划分和计数油田地块的方法。根据给定的条件,通过遍历和DFS算法,将符合条件的地块标记为不符合条件的地块,并进行计数。同时,还介绍了如何判断点是否在给定范围内的方法。 ... [详细]
  • Linux环境变量函数getenv、putenv、setenv和unsetenv详解
    本文详细解释了Linux中的环境变量函数getenv、putenv、setenv和unsetenv的用法和功能。通过使用这些函数,可以获取、设置和删除环境变量的值。同时给出了相应的函数原型、参数说明和返回值。通过示例代码演示了如何使用getenv函数获取环境变量的值,并打印出来。 ... [详细]
  • 3.223.28周学习总结中的贪心作业收获及困惑
    本文是对3.223.28周学习总结中的贪心作业进行总结,作者在解题过程中参考了他人的代码,但前提是要先理解题目并有解题思路。作者分享了自己在贪心作业中的收获,同时提到了一道让他困惑的题目,即input details部分引发的疑惑。 ... [详细]
  • AndroidJetpackNavigation基本使用本篇主要介绍一下AndroidJetpack组件Navigation导航组件的基本使用当看到Navigation单词的时候应 ... [详细]
  • 作者一直强调的一个概念叫做oneloopperthread,撇开多线程不谈,本篇博文将学习,怎么将传统的IO复用pollepoll封装到C++类中。1.IO复用复习使用p ... [详细]
  • oracle使索引不可见,关于oracle的不可见索引探究
    --FDH一、关于oracle的不可见索引oracle对于不可见索引的给出的官方定义是:AninvisibleindexismaintainedbyDMLoperat ... [详细]
  • 摘要:本文从介绍基础概念入手,探讨了在CC++中对日期和时间操作所用到的数据结构和函数,并对计时、时间的获取、时间的计算和显示格式等方面进行了阐述。本文还通过大量的实例向你展示了t ... [详细]
  • 主要用的线程函数:1.创建线程:12intpthread_create(pthread_t*thread,constpthread_attr_ ... [详细]
  • 近期看见一篇来自Intel的很有意思的分析文章,作者提到在他向45名与会的各公司程序员开发经理战略师提问“什么是实施并行编程的最大障碍”时,下面五个因素 ... [详细]
author-avatar
苗淑香哈哈_405_408
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有