作者:Random, | 来源:互联网 | 2022-12-06 13:13
这里的目标是在内存压力期间将每个正在运行的进程的可执行代码保存在内存中.
在Linux中,我能够立即(1秒)引起高内存压力,并在Qubes OS R4.0 Fedora 28 AppVM内通过stress --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 + 4000;}' (来自此处的代码)触发OOM杀手,
最大内存为24000MB.编辑4:也许是相关的,但我忘了提及,事实是我没有启用交换(即CONFIG_SWAP
未设置)
dmesg报道:
[ 867.746593] Mem-Info:
[ 867.746607] active_anon:1390927 inactive_anon:4670 isolated_anon:0
active_file:94 inactive_file:72 isolated_file:0
unevictable:13868 dirty:0 writeback:0 unstable:0
slab_reclaimable:5906 slab_unreclaimable:12919
mapped:1335 shmem:4805 pagetables:5126 bounce:0
free:40680 free_pcp:978 free_cma:0
有趣的部分是active_file:94 inactive_file:72
它们以千字节为单位并且非常低.
这里的问题是,在内存压力期间,正在从磁盘重新读取可执行代码,导致磁盘抖动,从而导致操作系统冻结.(但在上述情况下,它只发生不到1秒)
我在内核中看到一个有趣的代码:mm/vmscan.c
if (page_referenced(page, 0, sc->target_mem_cgroup,
&vm_flags)) {
nr_rotated += hpage_nr_pages(page);
/*
* Identify referenced, file-backed active pages and
* give them one more trip around the active list. So
* that executable code get better chances to stay in
* memory under moderate memory pressure. Anon pages
* are not likely to be evicted by use-once streaming
* IO, plus JVM can create lots of anon VM_EXEC pages,
* so we ignore them here.
*/
if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
list_add(&page->lru, &l_active);
continue;
}
}
我认为,如果有人能够指出如何改变这一点,那么give them one more trip around the active list
我们give them infinite trips around the active list
应该完成工作,而不是我们做到的.或者也许有其他方式?
我可以修补和测试自定义内核.我只是没有关于在代码中改变什么的专门知识,以便始终将活动的可执行代码保存在内存中(实际上,我相信,这将避免磁盘抖动).
编辑:这是我到目前为止所做的工作(应用于内核4.18.5之上):
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2..7636498 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -208,7 +208,7 @@ enum lru_list {
#define for_each_lru(lru) for (lru = 0; lru lru_lock);
@@ -2345,7 +2345,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
sc->priority == DEF_PRIORITY);
blk_start_plug(&plug);
- while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
+ while (nr[LRU_INACTIVE_ANON] || //nr[LRU_ACTIVE_FILE] ||
nr[LRU_INACTIVE_FILE]) {
unsigned long nr_anon, nr_file, percentage;
unsigned long nr_scanned;
@@ -2372,7 +2372,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
* stop reclaiming one LRU and reduce the amount scanning
* proportional to the original scan target.
*/
- nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];
+ nr_file = nr[LRU_INACTIVE_FILE] //+ nr[LRU_ACTIVE_FILE]
+ ;
nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];
/*
@@ -2391,7 +2392,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
percentage = nr_anon * 100 / scan_target;
} else {
unsigned long scan_target = targets[LRU_INACTIVE_FILE] +
- targets[LRU_ACTIVE_FILE] + 1;
+ //targets[LRU_ACTIVE_FILE] +
+ 1;
lru = LRU_FILE;
percentage = nr_file * 100 / scan_target;
}
也看到了这里 GitHub上,因为在上面的代码中,标签得到了转化为空间!(mirror1,mirror2)
我已经测试了上面的补丁(现在在4000MB最大RAM上,比以前少了20G!),即使使用已知的磁盘驱动器将操作系统转换为永久冻结的Firefox编译,它也不会再发生了(oom-killer几乎立即杀死了违规的进程),同样使用上面的stress
命令,现在产生:
[ 745.830511] Mem-Info:
[ 745.830521] active_anon:855546 inactive_anon:20453 isolated_anon:0
active_file:26925 inactive_file:76 isolated_file:0
unevictable:10652 dirty:0 writeback:0 unstable:0
slab_reclaimable:26975 slab_unreclaimable:13525
mapped:24238 shmem:20456 pagetables:4028 bounce:0
free:14935 free_pcp:177 free_cma:0
那就是active_file:26925 inactive_file:76
,差不多27兆的活动文件......
所以,我不知道这有多好.我是否保留所有活动文件而不仅仅是内存中的可执行文件?在firefox编译期间,我有500meg Active(file)
(EDIT2:但根据:dmesg cat /proc/meminfo|grep -F -- 'Active(file)'
显示出与上面不同的价值active_file:
!!!)这让我怀疑它只是exes/libs ...
也许有人可以建议如何只保留可执行代码?(如果那不是已经发生的事情)
想法?
EDIT3:上述补丁,看起来也许需要(定期?)上运行sudo sysctl vm.drop_caches=1
,以释放一些陈旧的记忆,所以,如果我叫(?)stress
一个Firefox编译后,我得到:active_file:142281 inactive_file:0 isolated_file:0
(142megs)然后删除文件缓存(另一种方式:echo 1|sudo tee /proc/sys/vm/drop_caches
)然后又跑stress
了,我得到:active_file:22233 inactive_file:160 isolated_file:0
(22megs) - 我不确定......
没有上述补丁的
结果:这里
带有上述补丁的结果:这里