记一个程序oom的排查过程
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了记一个程序oom的排查过程相关的知识,希望对你有一定的参考价值。
一,背景
收到应用服务报警,然后登录上服务器查看原因,发现进程不再了。
二,问题分析
1,那么判断进程被干掉的原因如下:
(1),机器重启了
通过uptime看机器并未重启
(2),程序有bug自动退出了
通过查询程序的error log,并未发现异常
(3),被别人干掉了
由于程序比较消耗内存,故猜想是不是oom了,被系统给干掉了。所以查messages日志,发现的确是oom了:
Jul 27 13:29:54 kernel: Out of memory: Kill process 17982 (java) score 77 or sacrifice child
2,通过oom详细信息输出分析被干掉的具体原因
[511250.458988] mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [511250.458993] mysqld cpuset=/ mems_allowed=0 [511250.458996] CPU: 7 PID: 30063 Comm: mysqld Not tainted 3.10.0-514.21.2.el7.x86_64 #1 [511250.458997] Hardware name: Alibaba Cloud Alibaba Cloud ECS, Bios rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [511250.458999] ffff88056236bec0 0000000040f4df68 ffff88044b76b910 ffffffff81687073 [511250.459002] ffff88044b76b9a0 ffffffff8168201e ffffffff810eb0dc ffff88081ae80c20 [511250.459004] ffff88081ae80c38 ffff88044b76b9f8 ffff88056236bec0 0000000000000000 [511250.459007] Call Trace: [511250.459015] [<ffffffff81687073>] dump_stack+0x19/0x1b [511250.459020] [<ffffffff8168201e>] dump_header+0x8e/0x225 [511250.459026] [<ffffffff810eb0dc>] ? ktime_get_ts64+0x4c/0xf0 [511250.459033] [<ffffffff81184cfe>] oom_kill_process+0x24e/0x3c0 [511250.459035] [<ffffffff8118479d>] ? oom_unkillable_task+0xcd/0x120 [511250.459038] [<ffffffff81184846>] ? find_lock_task_mm+0x56/0xc0 [511250.459042] [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30 [511250.459045] [<ffffffff81185536>] out_of_memory+0x4b6/0x4f0 [511250.459047] [<ffffffff81682b27>] __alloc_pages_slowpath+0x5d7/0x725 [511250.459051] [<ffffffff8118b645>] __alloc_pages_nodemask+0x405/0x420 [511250.459055] [<ffffffff811cf94a>] alloc_pages_current+0xaa/0x170 [511250.459058] [<ffffffff81180bd7>] __page_cache_alloc+0x97/0xb0 [511250.459060] [<ffffffff81183750>] filemap_fault+0x170/0x410 [511250.459078] [<ffffffffa01b5016>] ext4_filemap_fault+0x36/0x50 [ext4] [511250.459082] [<ffffffff811ac84c>] __do_fault+0x4c/0xc0 [511250.459084] [<ffffffff811acce3>] do_read_fault.isra.42+0x43/0x130 [511250.459087] [<ffffffff811b1471>] handle_mm_fault+0x6b1/0x1040 [511250.459091] [<ffffffff810f55c0>] ? futex_wake+0x80/0x160 [511250.459096] [<ffffffff81692c04>] __do_page_fault+0x154/0x450 [511250.459098] [<ffffffff81692fe6>] trace_do_page_fault+0x56/0x150 [511250.459101] [<ffffffff8169268b>] do_async_page_fault+0x1b/0xd0 [511250.459103] [<ffffffff8168f178>] async_page_fault+0x28/0x30 [511250.459104] Mem-Info: [511250.459109] active_anon:7922627 inactive_anon:1653 isolated_anon:0 active_file:1675 inactive_file:2820 isolated_file:0 unevictable:0 dirty:11 writeback:2 unstable:0 slab_reclaimable:61817 slab_unreclaimable:25990 mapped:3607 shmem:4602 pagetables:42625 bounce:0 free:50021 free_pcp:149 free_cma:0 [511250.459112] Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [511250.459117] lowmem_reserve[]: 0 2814 31994 31994 [511250.459120] Node 0 DMA32 free:119704kB min:5940kB low:7424kB high:8908kB active_anon:2678512kB inactive_anon:276kB active_file:124kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883436kB mlocked:0kB dirty:0kB writeback:0kB mapped:1100kB shmem:1632kB slab_reclaimable:48796kB slab_unreclaimable:9340kB kernel_stack:5248kB pagetables:11424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32902 all_unreclaimable? yes [511250.459124] lowmem_reserve[]: 0 0 29180 29180 [511250.459127] Node 0 Normal free:63896kB min:61608kB low:77008kB high:92412kB active_anon:29011996kB inactive_anon:6336kB active_file:6576kB inactive_file:11148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29881068kB mlocked:0kB dirty:44kB writeback:8kB mapped:13328kB shmem:16776kB slab_reclaimable:198472kB slab_unreclaimable:94604kB kernel_stack:53472kB pagetables:159076kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? no [511250.459131] lowmem_reserve[]: 0 0 0 0 [511250.459134] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB [511250.459144] Node 0 DMA32: 9372*4kB (UEM) 2427*8kB (UEM) 1179*16kB (UEM) 369*32kB (UEM) 104*64kB (EM) 31*128kB (EM) 14*256kB (UEM) 9*512kB (UEM) 7*1024kB (UEM) 3*2048kB (M) 0*4096kB = 119704kB [511250.459154] Node 0 Normal: 1540*4kB (UE) 6148*8kB (UE) 503*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63392kB [511250.459162] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [511250.459163] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [511250.459164] 9275 total pagecache pages [511250.459166] 0 pages in swap cache [511250.459167] Swap cache stats: add 0, delete 0, find 0/0 [511250.459168] Free swap = 0kB [511250.459168] Total swap = 0kB [511250.459169] 8388478 pages RAM [511250.459170] 0 pages HighMem/MovableOnly [511250.459171] 193375 pages reserved [511250.459172] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [511250.459178] [ 444] 0 444 30482 118 63 0 0 systemd-journal [511250.459180] [ 476] 0 476 14365 114 28 0 -1000 auditd [511250.459182] [ 508] 0 508 5315 75 14 0 0 irqbalance [511250.459184] [ 509] 998 509 132421 1908 50 0 0 polkitd [511250.459186] [ 510] 0 510 6686 196 17 0 0 systemd-logind [511250.459188] [ 514] 81 514 6672 148 16 0 -900 dbus-daemon [511250.459189] [ 592] 0 592 6972 52 18 0 0 atd [511250.459191] [ 595] 0 595 31969 188 17 0 0 crond [511250.459193] [ 607] 0 607 28020 44 11 0 0 agetty [511250.459195] [ 1036] 0 1036 138798 3179 89 0 0 tuned [511250.459197] [ 1037] 0 1037 174118 357 182 0 0 rsyslogd [511250.459198] [ 1089] 38 1089 7865 174 19 0 0 ntpd [511250.459200] [ 4714] 0 4714 26866 243 54 0 -1000 sshd [511250.459202] [ 6624] 0 6624 920 100 4 0 0 aliyun-service [511250.459204] [19284] 0 19284 8386 171 21 0 0 AliYunDunUpdate [511250.459206] [19335] 0 19335 34887 1367 64 0 0 AliYunDun [511250.459208] [21657] 26 21657 59097 1539 52 0 -1000 postgres [511250.459210] [21658] 26 21658 48503 264 43 0 0 postgres [511250.459212] [21660] 26 21660 59124 2338 52 0 0 postgres [511250.459213] [21661] 26 21661 59097 332 48 0 0 postgres [511250.459215] [21662] 26 21662 59097 537 47 0 0 postgres [511250.459217] [21663] 26 21663 59328 513 50 0 0 postgres [511250.459218] [21664] 26 21664 49067 317 44 0 0 postgres [511250.459220] [ 7276] 0 7276 32471 164 16 0 0 screen [511250.459222] [ 7277] 0 7277 29357 123 13 0 0 bash [511250.459223] [ 7388] 0 7388 4303 1880 12 0 0 sagent [511250.459225] [ 7747] 0 7747 32504 200 16 0 0 screen [511250.459226] [ 7748] 0 7748 29357 122 14 0 0 bash [511250.459228] [ 7781] 0 7781 8051 4108 20 0 0 tagent [511250.459230] [ 9897] 0 9897 3062553 270245 774 0 0 java [511250.459231] [ 9937] 26 9937 59406 657 53 0 0 postgres [511250.459233] [ 9940] 26 9940 60212 2570 57 0 0 postgres [511250.459235] [ 9997] 26 9997 60098 2346 56 0 0 postgres [511250.459236] [10076] 26 10076 59574 964 54 0 0 postgres [511250.459238] [10077] 26 10077 59618 1006 54 0 0 postgres [511250.459239] [10078] 26 10078 59617 1005 54 0 0 postgres [511250.459241] [11611] 0 11611 60826 4190 73 0 0 python [511250.459243] [11619] 0 11619 348938 6222 118 0 0 python [511250.459245] [12396] 26 12396 60086 2078 56 0 0 postgres [511250.459246] [12499] 1001 12499 1448783 99046 328 0 0 java [511250.459248] [12600] 1003 12600 2226317 312995 847 0 0 java [511250.459249] [29241] 0 29241 78180 1320 101 0 0 php-fpm [511250.459251] [29242] 1004 29242 135239 2687 108 0 0 php-fpm [511250.459253] [29243] 1004 29243 134924 2408 108 0 0 php-fpm [511250.459255] [29244] 1004 29244 135371 2707 108 0 0 php-fpm [511250.459256] [29245] 1004 29245 143755 11294 125 0 0 php-fpm [511250.459258] [29246] 1004 29246 135367 2706 108 0 0 php-fpm [511250.459260] [29826] 27 29826 28792 86 13 0 0 mysqld_safe [511250.459261] [30051] 27 30051 322930 39761 133 0 0 mysqld [511250.459263] [30234] 0 30234 11365 125 22 0 -1000 systemd-udevd [511250.459264] [11182] 0 11182 82780 5702 114 0 0 salt-minion [511250.459266] [11193] 0 11193 171406 8289 144 0 0 salt-minion [511250.459268] [11195] 0 11195 101432 5712 110 0 0 salt-minion [511250.459269] [29678] 1004 29678 140301 7833 118 0 0 php-fpm [511250.459271] [29998] 1004 29998 134983 2404 108 0 0 php-fpm [511250.459273] [11833] 0 11833 69721 2098 58 0 0 python2.7 [511250.459275] [32113] 26 32113 60131 2012 56 0 0 postgres [511250.459276] [ 1017] 1004 1017 135410 2748 108 0 0 php-fpm [511250.459278] [11915] 1004 11915 144263 11778 126 0 0 php-fpm [511250.459280] [ 5999] 0 5999 8115 3139 20 0 0 tagent [511250.459281] [21572] 1004 21572 134919 2379 108 0 0 php-fpm [511250.459283] [21752] 1004 21752 143751 11276 125 0 0 php-fpm [511250.459285] [ 2977] 1004 2977 134920 2406 107 0 0 php-fpm [511250.459286] [ 9217] 0 9217 330989 183882 550 0 0 python2.7 [511250.459288] [ 2008] 1004 2008 135816 3328 109 0 0 php-fpm [511250.459290] [25089] 1000 25089 2800777 187701 710 0 0 java [511250.459291] [25405] 1000 25405 1335611 105668 366 0 0 java [511250.459293] [26033] 1000 26033 1680746 96082 367 0 0 java [511250.459295] [26112] 1000 26112 1148121 61227 230 0 0 java [511250.459296] [14446] 0 14446 31082 540 56 0 0 nginx [511250.459298] [14447] 1004 14447 31278 739 58 0 0 nginx [511250.459299] [14448] 1004 14448 31278 725 58 0 0 nginx [511250.459301] [14449] 1004 14449 31278 714 58 0 0 nginx [511250.459303] [14450] 1004 14450 31278 715 58 0 0 nginx [511250.459304] [14451] 1004 14451 31245 705 58 0 0 nginx [511250.459306] [14452] 1004 14452 31245 696 58 0 0 nginx [511250.459307] [14453] 1004 14453 31278 712 58 0 0 nginx [511250.459309] [14454] 1004 14454 31245 728 58 0 0 nginx [511250.459310] [14455] 1004 14455 31278 730 58 0 0 nginx [511250.459312] [14456] 1004 14456 31278 718 58 0 0 nginx [511250.459314] [14457] 1004 14457 31245 707 58 0 0 nginx [511250.459315] [14458] 1004 14458 31278 722 58 0 0 nginx [511250.459317] [14459] 1004 14459 31278 717 58 0 0 nginx [511250.459318] [14460] 1004 14460 31245 688 58 0 0 nginx [511250.459320] [14462] 1004 14462 31278 712 58 0 0 nginx [511250.459321] [14463] 1004 14463 31278 736 58 0 0 nginx [511250.459323] [14571] 0 14571 3222105 119555 906 0 0 python [511250.459325] [13969] 0 13969 134928 8719 143 0 0 salt-master [511250.459326] [13982] 0 13982 78554 5647 100 0 0 salt-master [511250.459328] [13985] 0 13985 116150 8034 134 0 0 salt-master [511250.459330] [13989] 0 13989 151040 38826 238 0 0 salt-master [511250.459331] [13990] 0 13990 103527 12904 148 0 0 salt-master [511250.459333] [14067] 0 14067 280592 9651 151 0 0 salt-master [511250.459334] [14072] 0 14072 135099 9889 141 0 0 salt-master [511250.459336] [14220] 0 14220 134928 8828 135 0 0 salt-master [511250.459338] [14221] 0 14221 1941362 9675 332 0 0 salt-master [511250.459339] [14228] 0 14228 175360 9657 148 0 0 salt-master [511250.459341] [14268] 0 14268 175362 9655 148 0 0 salt-master [511250.459343] [14314] 0 14314 175361 9662 148 0 0 salt-master [511250.459344] [14327] 0 14327 175363 9663 148 0 0 salt-master [511250.459346] [14329] 0 14329 175363 9666 148 0 0 salt-master [511250.459347] [14330] 0 14330 175364 9666 148 0 0 salt-master [511250.459349] [14331] 0 14331 175365 9666 148 0 0 salt-master [511250.459350] [14334] 0 14334 175366 9670 148 0 0 salt-master [511250.459352] [14338] 0 14338 175366 9669 148 0 0 salt-master [511250.459354] [14340] 0 14340 175366 9674 148 0 0 salt-master [511250.459355] [14345] 0 14345 175367 9679 148 0 0 salt-master [511250.459357] [14349] 0 14349 175367 9675 148 0 0 salt-master [511250.459358] [14350] 0 14350 175367 9671 148 0 0 salt-master [511250.459360] [14354] 0 14354 175368 9672 148 0 0 salt-master [511250.459362] [14357] 0 14357 175369 9678 148 0 0 salt-master [511250.459363] [14358] 0 14358 175369 9673 148 0 0 salt-master [511250.459365] [14362] 0 14362 175369 9677 148 0 0 salt-master [511250.459366] [14364] 0 14364 175370 9680 148 0 0 salt-master [511250.459368] [14365] 0 14365 175371 9681 148 0 0 salt-master [511250.459369] [14368] 0 14368 175371 9676 148 0 0 salt-master [511250.459371] [14370] 0 14370 175371 9674 148 0 0 salt-master [511250.459372] [14372] 0 14372 175372 9682 148 0 0 salt-master [511250.459374] [14376] 0 14376 175373 9682 148 0 0 salt-master [511250.459375] [14377] 0 14377 175374 9676 148 0 0 salt-master [511250.459377] [14378] 0 14378 175374 9689 148 0 0 salt-master [511250.459379] [14380] 0 14380 175650 9716 149 0 0 salt-master [511250.459381] [14384] 0 14384 175375 9690 148 0 0 salt-master [511250.459382] [14385] 0 14385 175375 9685 148 0 0 salt-master [511250.459384] [14401] 0 14401 175376 9687 148 0 0 salt-master [511250.459385] [14404] 0 14404 175377 9685 148 0 0 salt-master [511250.459387] [14413] 0 14413 175377 9685 148 0 0 salt-master [511250.459388] [14420] 0 14420 175377 9687 148 0 0 salt-master [511250.459390] [14421] 0 14421 175378 9686 148 0 0 salt-master [511250.459392] [14424] 0 14424 175380 9693 148 0 0 salt-master [511250.459393] [14428] 0 14428 175380 9689 148 0 0 salt-master [511250.459395] [14435] 0 14435 175382 9698 148 0 0 salt-master [511250.459396] [14437] 0 14437 175382 9694 148 0 0 salt-master [511250.459398] [14439] 0 14439 175383 9692 148 0 0 salt-master [511250.459399] [14442] 0 14442 175384 9694 148 0 0 salt-master [511250.459401] [14445] 0 14445 175385 9692 148 0 0 salt-master [511250.459403] [14465] 0 14465 175385 9695 148 0 0 salt-master [511250.459404] [14473] 0 14473 175385 9695 148 0 0 salt-master [511250.459406] [14486] 0 14486 175386 9697 148 0 0 salt-master [511250.459407] [14489] 0 14489 175386 9699 148 0 0 salt-master [511250.459409] [14503] 0 14503 175386 9699 148 0 0 salt-master [511250.459410] [14513] 0 14513 175387 9700 148 0 0 salt-master [511250.459412] [14520] 0 14520 175388 9704 148 0 0 salt-master [511250.459414] [14523] 0 14523 175389 9700 148 0 0 salt-master [511250.459415] [14525] 0 14525 175389 9703 148 0 0 salt-master [511250.459417] [14527] 0 14527 175390 9710 148 0 0 salt-master [511250.459419] [14533] 0 14533 175390 9705 148 0 0 salt-master [511250.459420] [14539] 0 14539 175390 9709 148 0 0 salt-master [511250.459422] [14590] 0 14590 175391 9713 148 0 0 salt-master [511250.459423] [14598] 0 14598 175390 9705 148 0 0 salt-master [511250.459425] [14613] 0 14613 175391 9705 148 0 0 salt-master [511250.459426] [14624] 0 14624 175392 9713 148 0 0 salt-master [511250.459428] [14630] 0 14630 175392 9707 148 0 0 salt-master [511250.459429] [14634] 0 14634 175393 9707 148 0 0 salt-master [511250.459431] [14652] 0 14652 175393 9709 148 0 0 salt-master [511250.459433] [14677] 0 14677 175394 9708 148 0 0 salt-master [511250.459434] [14679] 0 14679 175394 9711 148 0 0 salt-master [511250.459436] [14709] 0 14709 175395 9713 148 0 0 salt-master [511250.459438] [14718] 0 14718 175396 9710 148 0 0 salt-master [511250.459439] [14723] 0 14723 175396 9710 148 0 0 salt-master [511250.459441] [14746] 0 14746 175396 9716 148 0 0 salt-master [511250.459443] [14752] 0 14752 175461 9717 148 0 0 salt-master [511250.459444] [14791] 0 14791 175398 9715 148 0 0 salt-master [511250.459446] [14799] 0 14799 175397 9720 148 0 0 salt-master [511250.459447] [14804] 0 14804 175472 9721 148 0 0 salt-master [511250.459449] [14835] 0 14835 175462 9729 148 0 0 salt-master [511250.459450] [14840] 0 14840 175463 9735 148 0 0 salt-master [511250.459452] [14864] 0 14864 175463 9727 148 0 0 salt-master [511250.459453] [14882] 0 14882 175464 9731 148 0 0 salt-master [511250.459455] [14893] 0 14893 175465 9731 148 0 0 salt-master [511250.459456] [14899] 0 14899 175465 9720 148 0 0 salt-master [511250.459458] [14906] 0 14906 175466 9721 148 0 0 salt-master [511250.459460] [14910] 0 14910 175402 9723 148 0 0 salt-master [511250.459461] [14984] 0 14984 175466 9725 148 0 0 salt-master [511250.459463] [14988] 0 14988 175467 9735 148 0 0 salt-master [511250.459464] [14992] 0 14992 175468 9734 148 0 0 salt-master [511250.459466] [15072] 0 15072 175468 9735 148 0 0 salt-master [511250.459467] [15101] 0 15101 175468 9731 148 0 0 salt-master [511250.459469] [15129] 0 15129 175469 9733 148 0 0 salt-master [511250.459470] [15143] 0 15143 175469 9737 148 0 0 salt-master [511250.459472] [15168] 0 15168 175470 9740 148 0 0 salt-master [511250.459474] [15181] 0 15181 175474 9744 148 0 0 salt-master [511250.459475] [15219] 0 15219 175474 9734 148 0 0 salt-master [511250.459477] [15223] 0 15223 175477 9753 148 0 0 salt-master [511250.459479] [15259] 0 15259 175475 9734 148 0 0 salt-master [511250.459481] [15266] 0 15266 175476 9735 148 0 0 salt-master [511250.459482] [15322] 0 15322 175476 9736 148 0 0 salt-master [511250.459493] [15350] 0 15350 175476 9745 148 0 0 salt-master [511250.459495] [15366] 0 15366 175477 9743 148 0 0 salt-master [511250.459497] [15380] 0 15380 175506 9745 148 0 0 salt-master [511250.459498] [15399] 0 15399 175754 9769 149 0 0 salt-master [511250.459500] [15407] 0 15407 175479 9747 148 0 0 salt-master [511250.459501] [15447] 0 15447 175479 9742 148 0 0 salt-master [511250.459503] [15450] 0 15450 175479 9751 148 0 0 salt-master [511250.459504] [15454] 0 15454 175481 9747 148 0 0 salt-master [511250.459506] [15462] 0 15462 175480 9748 148 0 0 salt-master [511250.459508] [23316] 1000 23316 3085650 27853 144 0 0 java [511250.459509] [23319] 1000 23319 3085650 27289 144 0 0 java [511250.459511] [23348] 1000 23348 3085650 27778 142 0 0 java [511250.459512] [23351] 1000 23351 3085650 26840 141 0 0 java [511250.459514] [23373] 1000 23373 3085650 27380 143 0 0 java [511250.459515] [23406] 1000 23406 3085650 26933 143 0 0 java [511250.459517] [23425] 1000 23425 3085650 27371 142 0 0 java [511250.459518] [23445] 1000 23445 3085650 27861 141 0 0 java [511250.459520] [23476] 1000 23476 3085650 27716 143 0 0 java [511250.459522] [23497] 1000 23497 3085650 27902 144 0 0 java [511250.459523] [23690] 1000 23690 2049475 328916 865 0 0 java [511250.459525] [23691] 1000 23691 2082756 356868 894 0 0 java [511250.459527] [23693] 1000 23693 2027460 612751 1357 0 0 java [511250.459528] [23712] 1000 23712 2027460 610571 1348 0 0 java [511250.459529] [23754] 1000 23754 2049474 337457 886 0 0 java [511250.459531] [23785] 1000 23785 2049474 330831 864 0 0 java [511250.459533] [23805] 1000 23805 2027460 615907 1366 0 0 java [511250.459534] [23828] 1000 23828 2027460 610191 1346 0 0 java [511250.459536] [23855] 1000 23855 2629446 589971 1351 0 0 java [511250.459537] [23860] 1000 23860 2328022 144465 519 0 0 java [511250.459539] [13536] 1004 13536 134981 2523 108 0 0 php-fpm [511250.459540] [ 1813] 0 1813 1481817 46140 246 0 0 java [511250.459542] [ 3187] 0 3187 1481817 53461 253 0 0 java [511250.459544] [ 2993] 26 2993 59779 1712 55 0 0 postgres [511250.459546] [ 3059] 1000 3059 3085528 16411 141 0 0 java [511250.459547] [ 3146] 1000 3146 2027460 211779 628 0 0 java [511250.459549] [17982] 996 17982 4950828 635077 1629 0 0 java [511250.459551] [16433] 0 16433 37607 360 74 0 0 sshd [511250.459553] [16436] 0 16436 29390 141 13 0 0 bash [511250.459554] [16466] 0 16466 29390 136 14 0 0 bash [511250.459556] [22511] 0 22511 36968 433 72 0 0 sshd [511250.459558] [22515] 0 22515 19016 257 40 0 0 ssh [511250.459560] [22519] 0 22519 19107 350 39 0 0 ssh [511250.459562] [22522] 0 22522 19016 259 38 0 0 ssh [511250.459563] [24770] 0 24770 38342 657 30 0 0 vim [511250.459565] [24781] 0 24781 45009 303 41 0 0 crond [511250.459566] [24784] 0 24784 91360 8641 134 0 0 python [511250.459568] [24932] 0 24932 28791 45 13 0 0 sh [511250.459570] [24933] 0 24933 93538 7284 104 0 0 ansible-playboo [511250.459571] [24942] 0 24942 94424 7584 103 0 0 ansible-playboo [511250.459573] [24943] 0 24943 96455 9707 107 0 0 ansible-playboo [511250.459574] [24944] 0 24944 94436 7599 103 0 0 ansible-playboo [511250.459576] [24945] 0 24945 16336 70 33 0 0 ssh [511250.459578] [24946] 0 24946 16336 71 33 0 0 ssh [511250.459579] [24947] 0 24947 16336 69 30 0 0 ssh [511250.459581] Out of memory: Kill process 17982 (java) score 77 or sacrifice child [511250.459642] Killed process 17982 (java) total-vm:19803312kB, anon-rss:2540308kB, file-rss:0kB, shmem-rss:0kB
(1)mysqld触发了oom killer,既mysqld要申请的内存大于了系统可用的物理内存大小。/proc/sys/vm/min_free_kbytes参数来控制,当系统可用内存(不包含buffer和cache)小于这个值的时候,系统会启动内核线程kswapd来对内存进行回收。而还是触发了oom killer,则表明内存真的不够用了或者在内存回收前或者回收中直接触发了oom killer。
(2)如下的输出表明了申请了3次内存都没有成功
[511250.459112] Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [511250.459117] lowmem_reserve[]: 0 2814 31994 31994 [511250.459120] Node 0 DMA32 free:119704kB min:5940kB low:7424kB high:8908kB active_anon:2678512kB inactive_anon:276kB active_file:124kB inactive_file:132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2883436kB mlocked:0kB dirty:0kB writeback:0kB mapped:1100kB shmem:1632kB slab_reclaimable:48796kB slab_unreclaimable:9340kB kernel_stack:5248kB pagetables:11424kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:32902 all_unreclaimable? yes [511250.459124] lowmem_reserve[]: 0 0 29180 29180 [511250.459127] Node 0 Normal free:63896kB min:61608kB low:77008kB high:92412kB active_anon:29011996kB inactive_anon:6336kB active_file:6576kB inactive_file:11148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29881068kB mlocked:0kB dirty:44kB writeback:8kB mapped:13328kB shmem:16776kB slab_reclaimable:198472kB slab_unreclaimable:94604kB kernel_stack:53472kB pagetables:159076kB unstable:0kB bounce:0kB free_pcp:656kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:924 all_unreclaimable? no [511250.459131] lowmem_reserve[]: 0 0 0 0 [511250.459134] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB [511250.459144] Node 0 DMA32: 9372*4kB (UEM) 2427*8kB (UEM) 1179*16kB (UEM) 369*32kB (UEM) 104*64kB (EM) 31*128kB (EM) 14*256kB (UEM) 9*512kB (UEM) 7*1024kB (UEM) 3*2048kB (M) 0*4096kB = 119704kB [511250.459154] Node 0 Normal: 1540*4kB (UE) 6148*8kB (UE) 503*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63392kB [511250.459162] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [511250.459163] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [511250.459164] 9275 total pagecache pages
(3)被干掉进程信息
如需输出确认了被kill的进程为17982
[511250.459581] Out of memory: Kill process 17982 (java) score 77 or sacrifice child [511250.459642] Killed process 17982 (java) total-vm:19803312kB, anon-rss:2540308kB, file-rss:0kB, shmem-rss:0kB
如下为17982进程占用的内存页数量635077,换算为内存占用量是635077*4096=2GB
[511250.459172] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [511250.459549] [17982] 996 17982 4950828 635077 1629 0
每列的含义为:
pid进程ID。
uid用户ID。
tgid线程组ID。
total_vm虚拟内存使用(单位为4 kB内存页)
rss居民 memory 使用(单位4 kB内存页)
nr_ptes页表项
swapents交换条目
oom_score_adj通常为0;较低的数字表示当调用OOM杀手时,进程将不太可能死亡。
(4)分析系统所有进程rss内存(rss为程序实际使用物理内存,单位为4kB内存页)
把oom输出中进程的rss内存相加,发现已经使用了32g,那就说明系统是内存耗尽了才触发的oom killer。而通过分析,发现java程序占用的的内存总量为26g,是最大头。
三,解决
限制java进程的max heap,并且降低java程序的worker数量,从而降低内存使用
本文出自 “佳” 博客,请务必保留此出处http://leejia.blog.51cto.com/4356849/1952482
以上是关于记一个程序oom的排查过程的主要内容,如果未能解决你的问题,请参考以下文章