在 clickhouse 中设置缓存字典时出现 OOM 错误
Posted
技术标签:
【中文标题】在 clickhouse 中设置缓存字典时出现 OOM 错误【英文标题】:OOM error when setting up cache dictionaries in clickhouse 【发布时间】:2021-02-08 12:58:02 【问题描述】:运行带有 16Gb RAM 的 centos 7.6 VM 我收到以下错误
Code: 32. DB::Exception: Attempt to read after eof: while receiving packet from localhost:9000
查询具有 75 列和 100000 行 (126Mb) 的缓存字典时。这是意料之中的,因为字典似乎很小。我正在使用clickhouse-client --query
启动查询:
SELECT dictGet('CacheDictionary', 'date', toUInt64(number)) AS date, SUM(dictGet('CacheDictionary', 'filterColumn', toUInt64(number))) AS val, AVG(dictGet('CacheDictionary', 'filterColumn', toUInt64(number))) AS avg FROM system.numbers(1, 100000) GROUP BY date
由于我只查询 3 列,我应该只缓存 3*numberOfRows
吗?检查dmesg我得到相应的OOM错误。
[613700.447158] CPU: 1 PID: 5859 Comm: node Not tainted 3.10.0-957.5.1.el7.x86_64 #1
[613700.449266] Hardware name: Scaleway SCW-GP1-S, Bios 0.0.0 02/06/2015
[613700.451020] Call Trace:
[613700.451708] [<ffffffff9cf61e41>] dump_stack+0x19/0x1b
[613700.452792] [<ffffffff9cf5c86a>] dump_header+0x90/0x229
[613700.453875] [<ffffffff9cb00f3b>] ? cred_has_capability+0x6b/0x120
[613700.455220] [<ffffffff9c9ba524>] oom_kill_process+0x254/0x3d0
[613700.456475] [<ffffffff9cb0101e>] ? selinux_capable+0x2e/0x40
[613700.457641] [<ffffffff9c9bad66>] out_of_memory+0x4b6/0x4f0
[613700.458844] [<ffffffff9cf5d36e>] __alloc_pages_slowpath+0x5d6/0x724
[613700.459965] [<ffffffff9c9c1145>] __alloc_pages_nodemask+0x405/0x420
[613700.460773] [<ffffffff9ca0e0a8>] alloc_pages_current+0x98/0x110
[613700.461521] [<ffffffff9c9b6387>] __page_cache_alloc+0x97/0xb0
[613700.462156] [<ffffffff9c9b8fe8>] filemap_fault+0x298/0x490
[613700.462849] [<ffffffffc0461186>] ext4_filemap_fault+0x36/0x50 [ext4]
[613700.463623] [<ffffffff9c9e458a>] __do_fault.isra.59+0x8a/0x100
[613700.464350] [<ffffffff9c9e4b3c>] do_read_fault.isra.61+0x4c/0x1b0
[613700.465079] [<ffffffff9c9e94e4>] handle_pte_fault+0x2f4/0xd10
[613700.465756] [<ffffffff9c9ec01d>] handle_mm_fault+0x39d/0x9b0
[613700.466454] [<ffffffff9cf6f5e3>] __do_page_fault+0x203/0x500
[613700.467136] [<ffffffff9c9f1a37>] ? do_munmap+0x327/0x480
[613700.467749] [<ffffffff9cf6f9c6>] trace_do_page_fault+0x56/0x150
[613700.468410] [<ffffffff9cf6ef42>] do_async_page_fault+0x22/0xf0
[613700.469126] [<ffffffff9cf6b788>] async_page_fault+0x28/0x30
[613700.469766] Mem-Info:
[613700.470040] active_anon:5266300 inactive_anon:2788185 isolated_anon:0
active_file:184 inactive_file:53 isolated_file:0
unevictable:0 dirty:11 writeback:223 unstable:0
slab_reclaimable:9023 slab_unreclaimable:9736
mapped:1319 shmem:1310 pagetables:18040 bounce:0
free:49178 free_pcp:61 free_cma:0
[613700.473741] Node 0 DMA free:14912kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15004kB managed:14912kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[613700.478645] lowmem_reserve[]: 0 2827 31990 31990
[613700.479345] Node 0 DMA32 free:122332kB min:5788kB low:7232kB high:8680kB active_anon:1511568kB inactive_anon:1243404kB active_file:84kB inactive_file:140kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3126684kB managed:2895228kB mlocked:0kB dirty:24kB writeback:272kB mapped:148kB shmem:216kB slab_reclaimable:3228kB slab_unreclaimable:3372kB kernel_stack:144kB pagetables:7784kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:47688 all_unreclaimable? yes
[613700.484885] lowmem_reserve[]: 0 0 29163 29163
[613700.485457] Node 0 Normal free:59468kB min:59716kB low:74644kB high:89572kB active_anon:19553632kB inactive_anon:9909336kB active_file:652kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29866176kB mlocked:0kB dirty:20kB writeback:620kB mapped:5128kB shmem:5024kB slab_reclaimable:32864kB slab_unreclaimable:35572kB kernel_stack:3728kB pagetables:64376kB unstable:0kB bounce:0kB free_pcp:244kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7605 all_unreclaimable? yes
[613700.490897] lowmem_reserve[]: 0 0 0 0
[613700.491483] Node 0 DMA: 2*4kB (U) 1*8kB (U) 1*16kB (U) 3*32kB (UM) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 2*1024kB (UM) 2*2048kB (M) 2*4096kB (M) = 14912kB
[613700.493471] Node 0 DMA32: 134*4kB (UEM) 83*8kB (UEM) 149*16kB (UEM) 74*32kB (UE) 41*64kB (UE) 27*128kB (UEM) 13*256kB (UE) 8*512kB (UEM) 101*1024kB (UM) 0*2048kB 0*4096kB = 122880kB
[613700.495884] Node 0 Normal: 408*4kB (UE) 320*8kB (UE) 301*16kB (UE) 252*32kB (UE) 184*64kB (UEM) 130*128kB (UE) 34*256kB (UE) 2*512kB (UE) 5*1024kB (M) 0*2048kB 0*4096kB = 60336kB
[613700.498284] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[613700.499258] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[613700.500155] 152719 total pagecache pages
[613700.500582] 151262 pages in swap cache
[613700.500989] Swap cache stats: add 25089927, delete 24939368, find 1630185/1802893
[613700.501787] Free swap = 0kB
[613700.502119] Total swap = 1048572kB
[613700.502538] 8387598 pages RAM
[613700.502891] 0 pages HighMem/MovableOnly
[613700.503352] 193519 pages reserved
[613700.503741] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[613700.504600] [ 1773] 0 1773 63151 1390 100 153 0 systemd-journal
[613700.505791] [ 1812] 0 1812 9945 2 25 538 -1000 systemd-udevd
[613700.506886] [ 2179] 0 2179 6594 41 19 35 0 systemd-logind
[613700.507899] [ 2185] 81 2185 16581 57 32 80 -900 dbus-daemon
[613700.508887] [ 3808] 0 3808 26865 2 50 498 0 dhclient
[613700.509845] [ 3965] 0 3965 22931 12 45 251 0 master
[613700.510888] [ 3967] 89 3967 22974 12 45 246 0 qmgr
[613700.511796] [ 4041] 0 4041 90366 307 113 577 0 rsyslogd
[613700.512756] [ 4060] 0 4060 2476 1 10 32 0 agetty
[613700.513715] [ 4061] 0 4061 2476 1 10 31 0 agetty
[613700.514736] [ 4064] 0 4064 6476 1 18 52 0 atd
[613700.515708] [ 4065] 0 4065 6533 105 18 65 0 crond
[613700.516641] [ 4110] 0 4110 28215 13 57 243 -1000 sshd
[613700.517563] [ 5851] 0 5851 3247 0 11 47 0 sh
[613700.518476] [ 5859] 0 5859 380762 1460 310 10577 0 node
[613700.519397] [ 6163] 0 6163 3841 52 12 65 0 bash
[613700.520320] [24945] 0 24945 3813 1 12 103 0 bash
[613700.521241] [26835] 0 26835 38743 56 77 850 0 sshd
[613700.522116] [26838] 0 26838 3779 2 12 65 0 bash
[613700.523063] [26848] 0 26848 3314 40 13 98 0 bash
[613700.523971] [26898] 0 26898 208799 8732 320 5172 0 node
[613700.524890] [26914] 0 26914 218819 1071 149 2602 0 node
[613700.525827] [30635] 89 30635 22438 10 46 240 0 pickup
[613700.526717] [31250] 0 31250 5768 1 16 56 0 anacron
[613700.527670] [32757] 0 32757 413109 921 220 12462 0 python3
[613700.528622] [ 1083] 999 1014 11454714 7881670 16173 207603 0 TCPHandler
[613700.529600] [ 1174] 0 1174 1941 18 9 0 0 sleep
[613700.530531] [ 1344] 0 1344 123706 6927 125 0 0 clickhouse-clie
[613700.531556] Out of memory: Kill process 1083 (TCPHandler) score 958 or sacrifice child
[613700.532483] Killed process 1083 (TCPHandler) total-vm:45818856kB, anon-rss:31526680kB, file-rss:0kB, shmem-rss:0kB
以及对应的stderr.log
(没有输出到clickhouse-server.err.log
)
Processing configuration file '/etc/clickhouse-server/users.xml'.
Include not found: networks
Saved preprocessed configuration to '/var/lib/clickhouse//preprocessed_configs/users.xml'.
Processing configuration file '/etc/clickhouse-server/dicts/benchmark_dictionary.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse//preprocessed_configs/dicts_benchmark_dictionary.xml'.
Processing configuration file '/etc/clickhouse-server/config.xml'.
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Saved preprocessed configuration to '/var/lib/clickhouse//preprocessed_configs/config.xml'.
Processing configuration file '/etc/clickhouse-server/dicts/benchmark_dictionary.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse//preprocessed_configs/dicts_benchmark_dictionary.xml'.
Processing configuration file '/etc/clickhouse-server/dicts/benchmark_dictionary.xml'.
Saved preprocessed configuration to '/var/lib/clickhouse//preprocessed_configs/dicts_benchmark_dictionary.xml'.
Status file /var/run/clickhouse-server/clickhouse-server.pid already exists - unclean restart. Contents:
26053
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Status file /var/lib/clickhouse/status already exists - unclean restart. Contents:
PID: 26053
Started at: 2021-02-08 03:26:05
Revision: 54438
这里是字典 xml 文件:
<yandex>
<dictionary>
<name>CacheDictionary</name>
<source>
<executable>
<command>
awk 'BEGIN FS=","; OFS="," $1=$1; print ' /var/lib/clickhouse/user_files/testCache.csv
</command>
<format>CSV</format>
</executable>
</source>
<layout>
<cache>
<size_in_cells>7500000</size_in_cells>
</cache>
</layout>
<lifetime>0</lifetime>
<structure>
<id>
<name>referentialKey</name>
</id>
<attribute>
<name>date</name>
<type>Date</type>
<null_value></null_value>
</attribute>
<attribute>
<name>integers</name>
<type>UInt64</type>
<null_value></null_value>
</attribute>
<attribute>
<name>filterColumn</name>
<type>Float64</type>
<null_value></null_value>
</attribute>
<attribute>
<name>random0</name>
<type>String</type>
<null_value></null_value>
</attribute>
...
<attribute>
<name>random74</name>
<type>String</type>
<null_value></null_value>
</attribute>
</structure>
</dictionary>
</yandex>
【问题讨论】:
【参考方案1】: dictionary with 75 columns
<size_in_cells>7500000</size_in_cells>
这样的字典可能会吃掉 100GB RAM。
检查system.dictionaries.bytes_allocated
和<size_in_cells>10000
https://github.com/ClickHouse/ClickHouse/issues/2738
【讨论】:
以上是关于在 clickhouse 中设置缓存字典时出现 OOM 错误的主要内容,如果未能解决你的问题,请参考以下文章
在 smack 中设置 vcard 中的名字时出现 NullPointerException