我正在调试客户在其生产系统上遇到的系统负载问题,并且他们已经创建了一个模拟负载的测试应用程序来重现问题:
在这个特定的工作量中,编码器所做的一件事就是:
while(1) initialize inotify watch a directory for events receive event process event remove watch close inotify fd
奇怪的是,高系统负载来自close()
inotify fd:
inotify_init() = 4 <0.000020> inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.059537> write(1, "Child [032] sleeping\n", 21) = 21 <0.000012> read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.b8tlfT\0\0", 512) = 32 <0.231012> inotify_rm_watch(4, 1) = 0 <0.000044> close(4) = 0 <0.702530> open("/mnt/tmp/msys_sim/QUEUES/Child_032", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000031> lseek(4, 0, SEEK_SET) = 0 <0.000010> getdents(4, /* 3 entries */, 32768) = 88 <0.000048> getdents(4, /* 0 entries */, 32768) = 0 <0.000009> write(1, "Child [032] dequeue [SrcFile.b8t"..., 37) = 37 <0.000011> unlink("/mnt/tmp/msys_sim/QUEUES/Child_032/SrcFile.b8tlfT") = 0 <0.059298> lseek(4, 0, SEEK_SET) = 0 <0.000011> getdents(4, /* 2 entries */, 32768) = 48 <0.000038> getdents(4, /* 0 entries */, 32768) = 0 <0.000009> close(4) = 0 <0.000012> inotify_init() = 4 <0.000020> inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385> write(1, "Child [032] sleeping\n", 21) = 21 <0.000903> read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423> inotify_rm_watch(4, 1) = 0 <0.000012> close(4) = 0 <0.528736>
什么可能导致close()调用花费如此大量的时间?我可以确定两件事:
关闭和重新初始化inotify的每一次
有256K文件(平面),/mnt/tmp/msys_sim/SOURCES
并且特定文件/mnt/tmp/msys_sim/QUEUES/Child_032
硬链接到该目录中的文件.但是SOURCES永远不会被上述过程打开
是使用inotify错误的神器吗?我能指出什么,说"你在做什么是错的!"?
输出perf top
(我一直在找这个!)
Events: 109K cycles 70.01% [kernel] [k] _spin_lock 24.30% [kernel] [k] __fsnotify_update_child_dentry_flags 2.24% [kernel] [k] _spin_unlock_irqrestore 0.64% [kernel] [k] __do_softirq 0.60% [kernel] [k] __rcu_process_callbacks 0.46% [kernel] [k] run_timer_softirq 0.40% [kernel] [k] rcu_process_gp_end
甜!我怀疑某个地方有一个螺旋锁,当发生这种情况时整个系统会非常潜伏.