平安科技PostgreSQL案例分享|进程私有内存探密
Posted PostgreSQL中文社区
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了平安科技PostgreSQL案例分享|进程私有内存探密相关的知识,希望对你有一定的参考价值。
作者介绍
石勇虎,平安科技资深数据库工程师,多年PostgreSQL数据库相关开发运维经验。2016年加入平安,一直致力于PostgreSQL数据库运维管理工作。对其他数据库产品也有一定的涉猎。
背景介绍
近期遇到一个案例,一个分析型应用的DB在早上的时候出现OOM,OS kill用户进程,数据库进入recovery,几十秒后数据库恢复。OS报错如下:
1. # cat /var/log/messages | grep -i "out of memory"
2. 07:08:48 cnlf081173 kernel: Out of memory: Kill process 15095 (postgres) score 155 or sacrifice chil
数据库报错如下:
3. 07:08:49.663 CST,,,21174,,5c29bb9e.52b6,23,,2018-12-31 14:47:58 CST,,0,LOG,00000,"server process (PID 15095) was terminated by signal 9: Killed","Failed process was running: SELECT appid, device_type, c_type, c_name, nc_name, status FROM tbl_app_os_ver_channel WHERE appid in …
需要说明的是,发生问题DB是目前平安较大的PG单实例数据库,数据量将近30TB,目前每天增量为100GB,数据库负载较高,有使用到大量pg_pathman分区。
我们知道linux下,当主机物理内存和swap空间都用尽时才会引发OOM,那么是什么操作会使用大量的内存,导致内存不足呢?通过排查对应时间点top sql,尤其是上面报错涉及的sql,发现并未涉及大量的分组排序等操作。我们也排查了数据库内存参数,比如shared_buffers,work_mem, max_connections等,还有 os内核参数,是否关闭numa等,发现都是在合理的范围内。
原因分析
在分析原因之前我们先来了解下pg的内存架构:
PG的内存管理主要是共享内存和内存上下文MemoryContext。共享内存是在数据库启动时根据各相关参数计算好的固定大小,而内存上下文则是根据进程的实际使用情况来分配的。进程的私有内存也由内存上下文来管理。
经过排查发现该数据库为独占主机,随着业务增长,内存使用率也有增长,保持在70%多,而且内存使用率高并不是个别进程使用大量内存导致的,随便一个简单的查询(不涉及排序等)都会使用500多MB的私有内存空间,如果不通过pgbouncer连接,执行时间也要8s左右,我们怀疑还是这个库数据字典过大的原因。
1. psql (9.5.5)
2. # select pid from pg_stat_activity where query=current_query();
3. pid
4. ------
5. 2033
6. (1 row)
7. Time: 8246.251 ms
pmap查看进程的私有内存大小,达到580多MB。
1. $ pmap -d 2033
2. 2033: postgres: eits: postgres eits [local] idle
3. Address Kbytes Mode Offset Device Mapping
4. mapped: 43811596K writeable/private: 585464K shared: 43100280K
pg_class有170w+个对象。
1. # select count(1) from pg_class;
2. count
3. ---------
4. 1713920
5. (1 row)
我们在测试环境进行了模拟,通过pg_pathman对单表创建10w个分区,问题重现
1. create table tt_pathmantest2(like tt);
2. SELECT create_range_partitions('tt_pathmantest2', 'id', '0'::bigint, '100'::bigint,100000);
查看单个进程的私有内存使用大小
1. # select * from pg_stat_activity where query=current_query();
2. datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start
3. ---------+---------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------
4. 3175452 | pgbench | 20213 | 10 | postgres | psql | | | -1 | 2019-02-18 13:46:03.654307+08 | 2019-02-18 13:46:22.9179
5. (1 row)
6.
7. Time: 3844.895 ms
8. $ pmap -d 20213
9. 20213: postgres: pg955: postgres pgbench [local] idle
10. Address Kbytes Mode Offset Device Mapping
11. mapped: 54589524K writeable/private: 332792K shared: 54123512K
用gdb dump查看进程私有内存中存放的内容,发现基本上都是pathman的配置信息,
6. $ cat /proc/20213/maps
7. 00400000-009aa000 r-xp 00000000 fd:06 43141376 /paic/postgres/base/9.5.5/bin/postgres
8. 00ba9000-00baa000 r--p 005a9000 fd:06 43141376 /paic/postgres/base/9.5.5/bin/postgres
9. 00baa000-00bb6000 rw-p 005aa000 fd:06 43141376 /paic/postgres/base/9.5.5/bin/postgres
10. 00bb6000-00c00000 rw-p 00000000 00:00 0
11. 02a8c000-02ad1000 rw-p 00000000 00:00 0 [heap]
12. 02ad1000-03d69000 rw-p 00000000 00:00 0 [heap]
13. 2ae1bfd62000-2ae1bfd83000 r-xp 00000000 fd:00 67334359 /usr/lib64/ld-2.17.so
14. 2ae1c77ac000-2ae1c77d6000 r-xp 00000000 fd:06 67150233 /paic/postgres/base/9.5.5/lib/pg_pathman.so
15. 2ae1c77d6000-2ae1c79d5000 ---p 0002a000 fd:06 67150233 /paic/postgres/base/9.5.5/lib/pg_pathman.so
16. 2ae1c79d5000-2ae1c79d7000 rw-p 00029000 fd:06 67150233 /paic/postgres/base/9.5.5/lib/pg_pathman.so
17. 2ae1c79d7000-2aeeaf0cf000 rw-s 00000000 00:04 55482843 /dev/zero (deleted)
18. 2aeeaf0cf000-2aeeaf0db000 r-xp 00000000 fd:00 69271978 /usr/lib64/libnss_files-2.17.so
19. 2aeeaf0db000-2aeeaf2da000 ---p 0000c000 fd:00 69271978 /usr/lib64/libnss_files-2.17.so
20. 2aeeaf2da000-2aeeaf2db000 r--p 0000b000 fd:00 69271978 /usr/lib64/libnss_files-2.17.so
21. 2aeeaf2db000-2aeeaf2dc000 rw-p 0000c000 fd:00 69271978 /usr/lib64/libnss_files-2.17.so
22. … …
23. 2aeeaf2dc000-2aeeaf2e2000 rw-p 00000000 00:00 0
24. 2aeeaf344000-2aeeafc8b000 rw-p 00000000 00:00 0 -- 内存大小计算方法c8b000-344000=947000(H)=9728000(D)=9MB
25. 2aeeafccc000-2aeeb06ce000 rw-p 00000000 00:00 0
26. 2aeeb074f000-2aeeb3827000 rw-p 00000000 00:00 0
27. 2aeeb472b000-2aeeb4f2c000 rw-p 00000000 00:00 0
28. 2aeeb572d000-2aeeb672f000 rw-p 00000000 00:00 0
29. 2aeeb6f30000-2aeeb7731000 rw-p 00000000 00:00 0
30. 2aeeb7f32000-2aeeb8733000 rw-p 00000000 00:00 0
31. 2aeeb8f34000-2aeeb9f36000 rw-p 00000000 00:00 0
32. 2aeeba737000-2aeebaf38000 rw-p 00000000 00:00 0
33. 2aeebb739000-2aeebbf3a000 rw-p 00000000 00:00 0
34. 2aeebc73b000-2aeebcf3c000 rw-p 00000000 00:00 0
35. 2aeebd73d000-2aeebe73f000 rw-p 00000000 00:00 0
36. 2aeebef40000-2aeebf741000 rw-p 00000000 00:00 0
37. 2aeebff42000-2aeec0743000 rw-p 00000000 00:00 0
38. 2aeec0f44000-2aeec1f46000 rw-p 00000000 00:00 0
39. 2aeec2747000-2aeec2f48000 rw-p 00000000 00:00 0
40. 2aeec3749000-2aeec3f4a000 rw-p 00000000 00:00 0
41. 2aeec474b000-2aeec584e000 rw-p 00000000 00:00 0
42. 2aeec604f000-2aeec6850000 rw-p 00000000 00:00 0
43. 2aeec7051000-2aeec7852000 rw-p 00000000 00:00 0
44. 2aeec8053000-2aeec9055000 rw-p 00000000 00:00 0
45. 2aeec9856000-2aeeca057000 rw-p 00000000 00:00 0
46. 2aeeca858000-2aeecb059000 rw-p 00000000 00:00 0
47. 2aeecb85a000-2aeecc85c000 rw-p 00000000 00:00 0
48. 2aeecd05d000-2aeecd85e000 rw-p 00000000 00:00 0
49. 2aeece05f000-2aeece361000 rw-p 00000000 00:00 0
50. 7ffc9fac3000-7ffc9fae4000 rw-p 00000000 00:00 0 [stack]
51. 7ffc9fb41000-7ffc9fb43000 r-xp 00000000 00:00 0 [vdso]
52. ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
53. (gdb) dump memory /paic/postgres/home/postgres/memory.dump 0x2aeeaf344000 0x2aeeafc8b000
54. $ strings -n 10 memory.dump
55. tt_pathmantest2_19127
56. tt_pathmantest2_19128
57. tt_pathmantest2_19129
58. tt_pathmantest2_19130
59. tt_pathmantest2_19131
60. tt_pathmantest2_19132 ……
61. (gdb) dump memory /paic/postgres/home/postgres/memory.dump 0x2aeeb074f000 0x2aeeb3827000
62. $ strings -n 10 memory.dump
63. pathman_tt_pathmantest2_3219_1_check
64. {BOOLEXPR :boolop and :args ({OPEXPR :opno 415 :opfuncid 472 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location -1} {CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull false :location -1 :constvalue 8 [ 8 -23 4 0 0 0 0 0 ]}) :location -1} {OPEXPR :opno 412 :opfuncid 469 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location -1} {CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull false :location -1 :constvalue 8 [ 108 -23 4 0 0 0 0 0 ]}) :location -1}) :location -1}o((id >= '321800'::bigint) AND (id < '321900'::bigint))
65. pathman_tt_pathmantest2_3220_1_check
66. {BOOLEXPR :boolop and :args ({OPEXPR :opno 415 :opfuncid 472 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location -1} {CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull false :location -1 :constvalue 8 [ 108 -23 4 0 0 0 0 0 ]}) :location -1} {OPEXPR :opno 412 :opfuncid 469 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 20 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location -1} {CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull false :location -1 :constvalue 8 [ -48 -23 4 0 0 0 0 0 ]}) :location -1}) :location -1}o((id >= '321900'::bigint) AND (id < '322000'::bigint))
67. pathman_tt_pathmantest2_3221_1_check ……
查看其他内存区域也是同样的结果。
删除pathman插件后(保留之前创建的分区表),问题消失:
1. # drop extension pg_pathman;
2. DROP EXTENSION
3. Time: 233.341 ms
4. # \q
5.
6. $ psql
7. # select * from pg_stat_activity where query =current_query();
8. datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start
9. -------+---------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+---------------------------
10. 17151 | pgbench | 38928 | 10 | postgres | psql | | | -1 | 2019-02-19 16:02:43.443064+08 | 2019-02-19 16:02:45.953323
11. (1 row)
12.
13. Time: 2.680 ms
14.
15. $ pmap -d 38928
16. mapped: 54259544K writeable/private: 2812K shared: 54123512K
并对pathman源码做了分析发现,进程初始化时会加载所有pathman分区表配置信息到该进程私有内存空间,源码如下(其中黄色部分为pathman1.4以后新增内容):
17. /*
18. * Initialize per-process resources.
19. */
20. static void
21. init_local_cache(void)
22. {
23. HASHCTL ctl;
24.
25. /* Destroy caches, just in case */
26. hash_destroy(partitioned_rels);
27. hash_destroy(parent_cache);
28. hash_destroy(bound_cache);
29.
30. /* Reset pg_pathman's memory contexts */
31. if (TopPathmanContext)
32. {
33. /* Check that child contexts exist */
34. Assert(MemoryContextIsValid(PathmanInvalJobsContext));
35. Assert(MemoryContextIsValid(PathmanRelationCacheContext));
36. Assert(MemoryContextIsValid(PathmanParentCacheContext));
37. Assert(MemoryContextIsValid(PathmanBoundCacheContext));
38.
39. /* Clear children */
40. MemoryContextResetChildren(TopPathmanContext);
41. }
42. /* Initialize pg_pathman's memory contexts */
43. else
44. {
45. Assert(PathmanInvalJobsContext == NULL);
46. Assert(PathmanRelationCacheContext == NULL);
47. Assert(PathmanParentCacheContext == NULL);
48. Assert(PathmanBoundCacheContext == NULL);
49.
50. TopPathmanContext =
51. AllocSetContextCreate(TopMemoryContext,
52. CppAsString(TopPathmanContext),
53. ALLOCSET_DEFAULT_SIZES);
54.
55. PathmanInvalJobsContext =
56. AllocSetContextCreate(TopMemoryContext,
57. CppAsString(PathmanInvalJobsContext),
58. ALLOCSET_SMALL_SIZES);
59.
60. /* For PartRelationInfo */
61. PathmanRelationCacheContext =
62. AllocSetContextCreate(TopPathmanContext,
63. CppAsString(PathmanRelationCacheContext),
64. ALLOCSET_DEFAULT_SIZES);
65.
66. /* For PartParentInfo */
67. PathmanParentCacheContext =
68. AllocSetContextCreate(TopPathmanContext,
69. CppAsString(PathmanParentCacheContext),
70. ALLOCSET_DEFAULT_SIZES);
71.
72. /* For PartBoundInfo */
73. PathmanBoundCacheContext =
74. AllocSetContextCreate(TopPathmanContext,
75. CppAsString(PathmanBoundCacheContext),
76. ALLOCSET_DEFAULT_SIZES);
77. }
78.
79. memset(&ctl, 0, sizeof(ctl));
80. ctl.keysize = sizeof(Oid);
81. ctl.entrysize = sizeof(PartRelationInfo);
82. ctl.hcxt = PathmanRelationCacheContext;
83.
84. partitioned_rels = hash_create("pg_pathman's partition dispatch cache",
85. PART_RELS_SIZE, &ctl,
86. HASH_ELEM | HASH_BLOBS | HASH_CONTEXT) ;
87.
88. memset(&ctl, 0, sizeof(ctl));
89. ctl.keysize = sizeof(Oid);
90. ctl.entrysize = sizeof(PartParentInfo);
91. ctl.hcxt = PathmanParentCacheContext;
92.
93. parent_cache = hash_create("pg_pathman's partition parents cache",
94. PART_RELS_SIZE * CHILD_FACTOR, &ctl,
95. HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
96.
97. memset(&ctl, 0, sizeof(ctl));
98. ctl.keysize = sizeof(Oid);
99. ctl.entrysize = sizeof(PartBoundInfo);
100. ctl.hcxt = PathmanBoundCacheContext;
101.
102. bound_cache = hash_create("pg_pathman's partition bounds cache",
103. PART_RELS_SIZE * CHILD_FACTOR, &ctl,
104. HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
105. }
而且pathman1.4之后的版本也只是解决了查询非pathman分区表的内存问题(PG11内置分区也一样),我们也在和pathman作者沟通,看能否将这部分内容放到共享内存,而不是进程的私有内存空间,解决高并发时导致OOM的问题。
解决方案
1.读写分离
读写分离可以临时有效缓解内存不足的问题,而且实施难度和影响较小。
2.历史分区合并
使用merge_range_partitions将2个月前历史分区由天分区合并为周分区。
3.分表-历史表拆分
将历史表拆分为非分区表或者归档表,涉及业务改造量大。
4.做好数据生命周期管理
5.迁移MPP
纯分析型应用,可以迁移MPP数据库,彻底解决性能及资源问题。
PostgreSQL中文社区欢迎广大技术人员投稿
投稿邮箱:press@postgres.cn
以上是关于平安科技PostgreSQL案例分享|进程私有内存探密的主要内容,如果未能解决你的问题,请参考以下文章