问题描述

在使用 haproxy 驱动 Pod 代理时,在个别 k8s 集群中 haproxy 容器 (haproxytech/haproxy-debian:2.7) 在启动时出现 OOM 秒重启,即使配置了 16GB 的内存限制依然被杀掉。而同样的镜像和配置在其他 Rocky Linux 9 集群中运行正常。

关键症状

内存占用异常

内存配置症状
2GBOOM
8GBOOM
15GBOOM

就是配置多大的内存都会被吞掉。当时日志记录如下

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: memory: usage 8388608kB, limit 8388608kB, failcnt 110
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: swap: usage 0kB, limit 0kB, failcnt 0
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod70d1da33_ec83_422a_8490_81a4cd60785e.slice/cri-containerd-8c13b4b91f76026cc10b09f8d15cf0ba7cf6cfe1d5a3e4833e7e660c3264c991.scope:
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: anon 8572551168
                                              file 4096
                                              kernel 17379328
                                              kernel_stack 16384
                                              pagetables 16850944
                                              sec_pagetables 0
                                              percpu 160
                                              sock 0
                                              vmalloc 0
                                              shmem 4096
                                              zswap 0
                                              zswapped 0
                                              file_mapped 4096
                                              file_dirty 0
                                              file_writeback 0
                                              swapcached 0
                                              anon_thp 8562671616
                                              file_thp 0
                                              shmem_thp 0
                                              inactive_anon 4096
                                              active_anon 8572551168
                                              inactive_file 0
                                              active_file 0
                                              unevictable 0
                                              slab_reclaimable 409080
                                              slab_unreclaimable 74336
                                              slab 483416
                                              workingset_refault_anon 0
                                              workingset_refault_file 0
                                              workingset_activate_anon 0
                                              workingset_activate_file 0
                                              workingset_restore_anon 0
                                              workingset_restore_file 0
                                              workingset_nodereclaim 0
                                              pgscan 0
                                              pgsteal 0
                                              pgscan_kswapd 0
                                              pgscan_direct 0
                                              pgscan_khugepaged 0
                                              pgsteal_kswapd 0
                                              pgsteal_direct 0
                                              pgsteal_khugepaged 0
                                              pgfault 6082
                                              pgmajfault 0
                                              pgrefill 0
                                              pgactivate 0
                                              pgdeactivate 0
                                              pglazyfree 0
                                              pglazyfreed 0
                                              zswpin 0
                                              zswpout 0
                                              thp_fault_alloc 4088
                                              thp_collapse_alloc 0
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: Tasks state (memory values in pages):
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: [3503610]     0 3503610 92289723  2095555 16863232        0           992 haproxy
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-8c13b4b91f76026cc10b09f8d15cf0ba7cf6cfe1d5a3e4833e7e660c3264c991.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod70d1da33_ec83_422a_8490_81a4cd60785e.slice/cri-containerd-8c13b4b91f76026cc10b09f8d15cf0ba7cf6cfe1d5a3e4833e7e660c3264c991.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod70d1da33_ec83_422a_8490_81a4cd60785e.slice/cri-containerd-8c13b4b91f76026cc10b09f8d15cf0ba7cf6cfe1d5a3e4833e7e660c3264c991.scope,task=haproxy,pid=3503610,uid=0
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: Memory cgroup out of memory: Killed process 3503610 (haproxy) total-vm:369158892kB, anon-rss:8371596kB, file-rss:10624kB, shmem-rss:0kB, UID:0 pgtables:16468kB oom_score_adj:992
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod70d1da33_ec83_422a_8490_81a4cd60785e.slice/cri-containerd-8c13b4b91f76026cc10b09f8d15cf0ba7cf6cfe1d5a3e4833e7e660c3264c991.scope are going to be killed due to memory.oom.group set
Feb 13 11:57:07 unit-z7p4-worker-02 kernel: Memory cgroup out of memory: Killed process 3503610 (haproxy) total-vm:369158892kB, anon-rss:8371596kB, file-rss:10624kB, shmem-rss:0kB, UID:0 pgtables:16468kB oom_score_adj:992

排查过程

之前代码设计是在启动时,controller 获取 k8s pod 信息完成映射,怀疑是因为启动时大量写导致的顺时内存突发,修改代码为异步处理[1] 后问题仍然存在。

上述问题实施的内容

  1. 增加了 haproxy 启动延迟
  2. 增加了内存限制,扩大了内存
  3. 排查了系统版本/内核版本/k8s版本/kubelet slice 均没有任何问题,和正常运行在 k8s 集群的其他集群的配置一致。

Pod 配置为

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
  spec:
    containers:
  - image: hub.rancher8888.com/base/haproxy-debian:2.7
    imagePullPolicy: IfNotPresent
    name: haproxy-proxier-ints
    ports:
    - containerPort: 8404
      hostPort: 8404
      protocol: TCP
    - containerPort: 5555
      hostPort: 5555
      protocol: TCP
      resources:
      limits:
        cpu: 200m
        memory: 1Gi
      requests:
        cpu: 200m
        memory: 1Gi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-gsgfp
      readOnly: true
  - args:
    - |
      echo "⏳ 等待 haproxy 启动中..."; sleep 30;
      echo "✅ 启动 pod-proxier";
      exec /apps/pod-proxier-gateway \
        --v=5 \
        --enable-v2=true \
        --port-name=debug \
        --port-range-start=13000 \
        --port-range-end=13500 \
        --check-timeout=1800 \
        --check-interval=1800 \
        --resync-time=30 \
        --allowed-namespaces=scanner-test      
    command:
    - /bin/sh
    - -c
    image: cylonchau/proxier:v0.1.2-1.30
    imagePullPolicy: IfNotPresent
    name: pod-proxier
    ports:
    - containerPort: 8848
      hostPort: 8848
      protocol: TCP
    - containerPort: 3343
      hostPort: 3343
      protocol: TCP
    hostNetwork: true

把限制从 1GB 提到 2GB,结果依然是秒OOM

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: memory: usage 2097152kB, limit 2097152kB, failcnt 115
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: swap: usage 0kB, limit 0kB, failcnt 0
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod277b4f89_39d1_453d_ad03_96797120509d.slice/cri-containerd-1bcdf7318e6c0ed6d20e95c5dcda608a541eec5b4265be15b27616340897628e.scope:
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: anon 2142683136
                                              file 4096
                                              kernel 4796416
                                              kernel_stack 16384
                                              pagetables 4268032
                                              sec_pagetables 0
                                              percpu 160
                                              sock 0
                                              vmalloc 0
                                              shmem 4096
                                              zswap 0
                                              zswapped 0
                                              file_mapped 4096
                                              file_dirty 0
                                              file_writeback 0
                                              swapcached 0
                                              anon_thp 2132803584
                                              file_thp 0
                                              shmem_thp 0
                                              inactive_anon 4096
                                              active_anon 2142683136
                                              inactive_file 0
                                              active_file 0
                                              unevictable 0
                                              slab_reclaimable 406888
                                              slab_unreclaimable 74336
                                              slab 481224
                                              workingset_refault_anon 0
                                              workingset_refault_file 0
                                              workingset_activate_anon 0
                                              workingset_activate_file 0
                                              workingset_restore_anon 0
                                              workingset_restore_file 0
                                              workingset_nodereclaim 0
                                              pgscan 0
                                              pgsteal 0
                                              pgscan_kswapd 0
                                              pgscan_direct 0
                                              pgscan_khugepaged 0
                                              pgsteal_kswapd 0
                                              pgsteal_direct 0
                                              pgsteal_khugepaged 0
                                              pgfault 3015
                                              pgmajfault 0
                                              pgrefill 0
                                              pgactivate 0
                                              pgdeactivate 0
                                              pglazyfree 0
                                              pglazyfreed 0
                                              zswpin 0
                                              zswpout 0
                                              thp_fault_alloc 1021
                                              thp_collapse_alloc 0
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: Tasks state (memory values in pages):
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: [ 199483]     0 199483 92289723   525761  4280320        0           968 haproxy
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-1bcdf7318e6c0ed6d20e95c5dcda608a541eec5b4265be15b27616340897628e.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod277b4f89_39d1_453d_ad03_96797120509d.slice/cri-containerd-1bcdf7318e6c0ed6d20e95c5dcda608a541eec5b4265be15b27616340897628e.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod277b4f89_39d1_453d_ad03_96797120509d.slice/cri-containerd-1bcdf7318e6c0ed6d20e95c5dcda608a541eec5b4265be15b27616340897628e.scope,task=haproxy,pid=199483,uid=0
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: Memory cgroup out of memory: Killed process 199483 (haproxy) total-vm:369158892kB, anon-rss:2092292kB, file-rss:10752kB, shmem-rss:0kB, UID:0 pgtables:4180kB oom_score_adj:968
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod277b4f89_39d1_453d_ad03_96797120509d.slice/cri-containerd-1bcdf7318e6c0ed6d20e95c5dcda608a541eec5b4265be15b27616340897628e.scope are going to be killed due to memory.oom.group set
Feb 13 23:31:47 unit-z7p4-worker-02 kernel: Memory cgroup out of memory: Killed process 199483 (haproxy) total-vm:369158892kB, anon-rss:2092292kB, file-rss:10752kB, shmem-rss:0kB, UID:0 pgtables:4180kB oom_score_adj:968

换一个节点,内存扩容到 15G 仍然会遇到 OOM 问题,如下日志所示

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: memory: usage 15728640kB, limit 15728640kB, failcnt 123
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: swap: usage 0kB, limit 0kB, failcnt 0
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podae2e77b9_d51b_4ab6_b949_70a218e0e398.slice/cri-containerd-734f4fd2b1f5db19eea6ae0db91d4f8013fe9b61cb8ead273c5cf3b0b3d7ce01.scope:
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: anon 16074063872
                                              file 4096
                                              kernel 32059392
                                              kernel_stack 16384
                                              pagetables 31531008
                                              sec_pagetables 0
                                              percpu 160
                                              sock 0
                                              vmalloc 0
                                              shmem 4096
                                              zswap 0
                                              zswapped 0
                                              file_mapped 4096
                                              file_dirty 0
                                              file_writeback 0
                                              swapcached 0
                                              anon_thp 16064184320
                                              file_thp 0
                                              shmem_thp 0
                                              inactive_anon 4096
                                              active_anon 16074063872
                                              inactive_file 0
                                              active_file 0
                                              unevictable 0
                                              slab_reclaimable 407984
                                              slab_unreclaimable 74336
                                              slab 482320
                                              workingset_refault_anon 0
                                              workingset_refault_file 0
                                              workingset_activate_anon 0
                                              workingset_activate_file 0
                                              workingset_restore_anon 0
                                              workingset_restore_file 0
                                              workingset_nodereclaim 0
                                              pgscan 0
                                              pgsteal 0
                                              pgscan_kswapd 0
                                              pgscan_direct 0
                                              pgscan_khugepaged 0
                                              pgsteal_kswapd 0
                                              pgsteal_direct 0
                                              pgsteal_khugepaged 0
                                              pgfault 9763
                                              pgmajfault 0
                                              pgrefill 0
                                              pgactivate 0
                                              pgdeactivate 0
                                              pglazyfree 0
                                              pglazyfreed 0
                                              zswpin 0
                                              zswpout 0
                                              thp_fault_alloc 7665
                                              thp_collapse_alloc 0
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: Tasks state (memory values in pages):
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: [ 238960]     0 238960 92289723  3926999 31547392        0           760 haproxy
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-734f4fd2b1f5db19eea6ae0db91d4f8013fe9b61cb8ead273c5cf3b0b3d7ce01.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podae2e77b9_d51b_4ab6_b949_70a218e0e398.slice/cri-containerd-734f4fd2b1f5db19eea6ae0db91d4f8013fe9b61cb8ead273c5cf3b0b3d7ce01.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podae2e77b9_d51b_4ab6_b949_70a218e0e398.slice/cri-containerd-734f4fd2b1f5db19eea6ae0db91d4f8013fe9b61cb8ead273c5cf3b0b3d7ce01.scope,task=haproxy,pid=238960,uid=0
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: Memory cgroup out of memory: Killed process 238960 (haproxy) total-vm:369158892kB, anon-rss:15697244kB, file-rss:10752kB, shmem-rss:0kB, UID:0 pgtables:30808kB oom_score_adj:760
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podae2e77b9_d51b_4ab6_b949_70a218e0e398.slice/cri-containerd-734f4fd2b1f5db19eea6ae0db91d4f8013fe9b61cb8ead273c5cf3b0b3d7ce01.scope are going to be killed due to memory.oom.group set
Feb 14 00:02:18 unit-z7p4-worker-03 kernel: Memory cgroup out of memory: Killed process 238960 (haproxy) total-vm:369158892kB, anon-rss:15697244kB, file-rss:10752kB, shmem-rss:0kB, UID:0 pgtables:30808kB oom_score_adj:760

最终根据 AI 提示,说是 “jemalloc 内存分配器问题”,我自己没有找到对应的相同案例

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ kexec pod-proxier-fdgdfq31-p7jfv -- bash
Defaulted container "haproxy-proxier-ints" out of: haproxy-proxier-ints, pod-proxier
root@pod-proxier-fdgdfq31-p7jfv:/usr/local/etc/haproxy# ldd /usr/local/sbin/haproxy
        linux-vdso.so.1 (0x00007ffd5d370000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007fb4fdc46000)
        libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007fb4fdb9c000)
        libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007fb4fd71a000)
        liblua5.4.so.0 => /lib/x86_64-linux-gnu/liblua5.4.so.0 (0x00007fb4fd6d8000)
        libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fb4fd63e000)
        libjemalloc.so.2 => /lib/x86_64-linux-gnu/libjemalloc.so.2 (0x00007fb4fd34f000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb4fd16e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb4fd08f000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb4fd08a000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb4fce70000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb4fce4e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb4fe20d000)

解决方式

更换了 haproxy:2.7-alpine(alpine 使用musl libc 不使用 jemalloc),官方已经升级到版本3,并且启用了 dataplane api,但是我一直没换 sdk,也有可能是版本太旧了的问题。记录一下后面技术提升后可以复盘问题,这个问题排查了好几个星期….

Reference

[1] fix: oomkill due to ServiceController batch operation