简单计数查询超出 Impala 内存限制
Posted
技术标签:
【中文标题】简单计数查询超出 Impala 内存限制【英文标题】:Impala memory limit exceeded with simple count query 【发布时间】:2017-04-06 05:54:08 【问题描述】:编辑:
表中有一些损坏的 AVRO 文件。删除其中一些后,一切正常。我已经使用 avro-tools 将这些文件解压缩为 json,并且解压后的文件也不是很大。因此,处理损坏的 AVRO 文件似乎是 Impala 中的一些错误。
我有一个 gziped AVRO 格式的 Impala 表,它按“天”分区。当我执行查询时:
select count(0) from adhoc_data_fast.log where day='2017-04-05';
它说:
Query: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Query submitted at: 2017-04-06 13:35:56 (Coordinator: http://szq7.appadhoc.com:25000)
Query progress can be monitored at: http://szq7.appadhoc.com:25000/query_plan?query_id=ef4698db870efd4d:739c89ef00000000
WARNINGS:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.
每个节点配置 96 GB 内存,单池内存限制设置为 300 GB。
所有压缩后的文件不超过250MB:
62M log.2017-04-05.1491321605834.avro
79M log.2017-04-05.1491323647211.avro
62M log.2017-04-05.1491327241311.avro
60M log.2017-04-05.1491330839609.avro
52M log.2017-04-05.1491334439092.avro
59M log.2017-04-05.1491338038503.avro
93M log.2017-04-05.1491341639694.avro
130M log.2017-04-05.1491345239969.avro
147M log.2017-04-05.1491348843931.avro
183M log.2017-04-05.1491352442955.avro
218M log.2017-04-05.1491359648079.avro
181M log.2017-04-05.1491363247597.avro
212M log.2017-04-05.1491366845827.avro
207M log.2017-04-05.1491370445873.avro
197M log.2017-04-05.1491374045830.avro
164M log.2017-04-05.1491377650935.avro
155M log.2017-04-05.1491381249597.avro
203M log.2017-04-05.1491384846366.avro
185M log.2017-04-05.1491388450262.avro
198M log.2017-04-05.1491392047694.avro
206M log.2017-04-05.1491395648818.avro
214M log.2017-04-05.1491399246407.avro
167M log.2017-04-05.1491402846469.avro
77M log.2017-04-05.1491406180615.avro
3.2M log.2017-04-05.1491409790105.avro
1.3M log.2017-04-05.1491413385884.avro
928K log.2017-04-05.1491416981829.avro
832K log.2017-04-05.1491420581588.avro
1.1M log.2017-04-05.1491424180191.avro
2.6M log.2017-04-05.1491427781339.avro
3.8M log.2017-04-05.1491431382552.avro
3.3M log.2017-04-05.1491434984679.avro
5.2M log.2017-04-05.1491438586674.avro
5.1M log.2017-04-05.1491442192541.avro
2.3M log.2017-04-05.1491445789230.avro
884K log.2017-04-05.1491449386630.avro
我已经从 HDFS 获取它们并使用 avro-tools 将它们转换为 json 以解压缩它们。解压后的文件不超过1GB:
16M log.2017-04-05.1491321605834.avro.json
308M log.2017-04-05.1491323647211.avro.json
103M log.2017-04-05.1491327241311.avro.json
150M log.2017-04-05.1491330839609.avro.json
397M log.2017-04-05.1491334439092.avro.json
297M log.2017-04-05.1491338038503.avro.json
160M log.2017-04-05.1491341639694.avro.json
95M log.2017-04-05.1491345239969.avro.json
360M log.2017-04-05.1491348843931.avro.json
338M log.2017-04-05.1491352442955.avro.json
71M log.2017-04-05.1491359648079.avro.json
161M log.2017-04-05.1491363247597.avro.json
628M log.2017-04-05.1491366845827.avro.json
288M log.2017-04-05.1491370445873.avro.json
162M log.2017-04-05.1491374045830.avro.json
90M log.2017-04-05.1491377650935.avro.json
269M log.2017-04-05.1491381249597.avro.json
620M log.2017-04-05.1491384846366.avro.json
70M log.2017-04-05.1491388450262.avro.json
30M log.2017-04-05.1491392047694.avro.json
114M log.2017-04-05.1491395648818.avro.json
370M log.2017-04-05.1491399246407.avro.json
359M log.2017-04-05.1491402846469.avro.json
218M log.2017-04-05.1491406180615.avro.json
29M log.2017-04-05.1491409790105.avro.json
3.9M log.2017-04-05.1491413385884.avro.json
9.3M log.2017-04-05.1491416981829.avro.json
8.3M log.2017-04-05.1491420581588.avro.json
2.3M log.2017-04-05.1491424180191.avro.json
25M log.2017-04-05.1491427781339.avro.json
24M log.2017-04-05.1491431382552.avro.json
5.7M log.2017-04-05.1491434984679.avro.json
35M log.2017-04-05.1491438586674.avro.json
5.8M log.2017-04-05.1491442192541.avro.json
23M log.2017-04-05.1491445789230.avro.json
4.3M log.2017-04-05.1491449386630.avro.json
这是 Impala 配置文件:
[szq7.appadhoc.com:21000] > profile;
Query Runtime Profile:
Query (id=ef4698db870efd4d:739c89ef00000000):
Summary:
Session ID: f54bb090170bcdb6:621ac5796ef2668c
Session Type: BEESWAX
Start Time: 2017-04-06 13:35:56.454441000
End Time: 2017-04-06 13:35:57.326967000
Query Type: QUERY
Query State: EXCEPTION
Query Status:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.
Impala Version: impalad version 2.7.0-cdh5.9.1 RELEASE (build 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c)
User: ubuntu
Connected User: ubuntu
Delegated User:
Network Address: ::ffff:192.168.1.7:29026
Default Db: default
Sql Statement: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Coordinator: szq7.appadhoc.com:22000
Query Options (non default):
Plan:
----------------
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log
03:AGGREGATE [FINALIZE]
| output: count:merge(0)
| hosts=13 per-host-mem=unavailable
| tuple-ids=1 row-size=8B cardinality=1
|
02:EXCHANGE [UNPARTITIONED]
| hosts=13 per-host-mem=unavailable
| tuple-ids=1 row-size=8B cardinality=1
|
01:AGGREGATE
| output: count(0)
| hosts=13 per-host-mem=10.00MB
| tuple-ids=1 row-size=8B cardinality=1
|
00:SCAN HDFS [adhoc_data_fast.log, RANDOM]
partitions=1/7594 files=38 size=3.45GB
table stats: unavailable
column stats: all
hosts=13 per-host-mem=400.00MB
tuple-ids=0 row-size=0B cardinality=unavailable
----------------
Estimated Per-Host Mem: 429916160
Estimated Per-Host VCores: 1
Tables Missing Stats: adhoc_data_fast.log
Request Pool: default-pool
Admission result: Admitted immediately
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE 1 52.298ms 52.298ms 0 1 4.00 KB -1.00 B FINALIZE
02:EXCHANGE 1 676.993ms 676.993ms 0 1 0 -1.00 B UNPARTITIONED
01:AGGREGATE 0 0.000ns 0.000ns 0 1 0 10.00 MB
00:SCAN HDFS 0 0.000ns 0.000ns 0 -1 0 400.00 MB adhoc_data_fast.log
Planner Timeline: 69.589ms
- Analysis finished: 6.642ms (6.642ms)
- Equivalence classes computed: 6.980ms (337.753us)
- Single node plan created: 13.302ms (6.322ms)
- Runtime filters computed: 13.368ms (65.984us)
- Distributed plan created: 15.131ms (1.763ms)
- Lineage info computed: 16.488ms (1.356ms)
- Planning finished: 69.589ms (53.101ms)
Query Timeline: 874.026ms
- Start execution: 63.320us (63.320us)
- Planning finished: 72.764ms (72.701ms)
- Submit for admission: 73.592ms (827.496us)
- Completed admission: 73.775ms (183.088us)
- Ready to start 13 remote fragments: 126.950ms (53.175ms)
- All 13 remote fragments started: 161.919ms (34.968ms)
- Rows available: 856.761ms (694.842ms)
- Unregister query: 872.527ms (15.765ms)
- ComputeScanRangeAssignmentTimer: 356.136us
ImpalaServer:
- ClientFetchWaitTimer: 0.000ns
- RowMaterializationTimer: 0.000ns
Execution Profile ef4698db870efd4d:739c89ef00000000:(Total: 782.712ms, non-child: 0.000ns, % non-child: 0.00%)
Number of filters: 0
Filter routing table:
ID Src. Node Tgt. Node(s) Targets Target type Partition filter Pending (Expected) First arrived Completed Enabled
----------------------------------------------------------------------------------------------------------------------------
Fragment start latencies: Count: 13, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 2ms, 95th %-ile: 2ms, 99.9th %-ile: 35ms
Per Node Peak Memory Usage: szq15.appadhoc.com:22000(0) szq1.appadhoc.com:22000(0) szq13.appadhoc.com:22000(0) szq12.appadhoc.com:22000(0) szq11.appadhoc.com:22000(0) szq20.appadhoc.com:22000(0) szq14.appadhoc.com:22000(0) szq8
.appadhoc.com:22000(0) szq5.appadhoc.com:22000(0) szq9.appadhoc.com:22000(0) szq4.appadhoc.com:22000(0) szq6.appadhoc.com:22000(0) szq7.appadhoc.com:22000(0)
- FiltersReceived: 0 (0)
- FinalizationTimer: 0.000ns
Coordinator Fragment F01:(Total: 729.811ms, non-child: 0.000ns, % non-child: 0.00%)
MemoryUsage(500.000ms): 12.00 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 12.00 KB (12288)
- PerHostPeakMemUsage: 0
- PrepareTime: 52.291ms
- RowsProduced: 0 (0)
- TotalCpuTime: 0.000ns
- TotalNetworkReceiveTime: 676.991ms
- TotalNetworkSendTime: 0.000ns
- TotalStorageWaitTime: 0.000ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 0 (0)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 102.40 GB (109951164416)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0.000ns
- TotalEncryptionTime: 0.000ns
- TotalIntegrityCheckTime: 0.000ns
- TotalReadBlockTime: 0.000ns
CodeGen:(Total: 63.837ms, non-child: 63.837ms, % non-child: 100.00%)
- CodegenTime: 828.728us
- CompileTime: 2.957ms
- LoadTime: 0.000ns
- ModuleBitcodeSize: 1.89 MB (1984232)
- NumFunctions: 7 (7)
- NumInstructions: 96 (96)
- OptimizationTime: 8.070ms
- PrepareTime: 51.769ms
AGGREGATION_NODE (id=3):(Total: 729.291ms, non-child: 52.298ms, % non-child: 7.17%)
ExecOption: Codegen Enabled
- BuildTime: 0.000ns
- GetResultsTime: 0.000ns
- HTResizeTime: 0.000ns
- HashBuckets: 0 (0)
- LargestPartitionPercent: 0 (0)
- MaxPartitionLevel: 0 (0)
- NumRepartitions: 0 (0)
- PartitionsCreated: 0 (0)
- PeakMemoryUsage: 4.00 KB (4096)
- RowsRepartitioned: 0 (0)
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SpilledPartitions: 0 (0)
EXCHANGE_NODE (id=2):(Total: 676.993ms, non-child: 676.993ms, % non-child: 100.00%)
BytesReceived(500.000ms): 0
- BytesReceived: 0
- ConvertRowBatchTime: 0.000ns
- DeserializeRowBatchTimer: 0.000ns
- FirstBatchArrivalWaitTime: 0.000ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0.000ns
- SendersBlockedTotalTimer(*): 0.000ns
Averaged Fragment F00:
split sizes: min: 114.60 MB, max: 451.79 MB, avg: 271.65 MB, stddev: 104.16 MB
completion times: min:694.632ms max:728.356ms mean: 725.379ms stddev:8.878ms
execution rates: min:157.45 MB/sec max:620.68 MB/sec mean:374.89 MB/sec stddev:144.30 MB/sec
num instances: 13
Fragment F00:
Instance ef4698db870efd4d:739c89ef00000001 (host=szq5.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000002 (host=szq8.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000003 (host=szq14.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000004 (host=szq20.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000005 (host=szq11.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000006 (host=szq12.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000007 (host=szq13.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000008 (host=szq1.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef00000009 (host=szq15.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000a (host=szq6.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000b (host=szq4.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000c (host=szq9.appadhoc.com:22000):
Instance ef4698db870efd4d:739c89ef0000000d (host=szq7.appadhoc.com:22000):
那么为什么 Impala 需要这么多内存呢?
【问题讨论】:
【参考方案1】:这可能是 Impala 缺少关于该分区的表的统计信息。解释计划强调以下几点:
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log
尝试对表运行 COMPUTE STATS,或对分区运行 COMPUTE INCREMENTAL STATS。
例如
COMPUTE INCREMENTAL STATS adhoc_data_fast.log PARTITION (day='2017-04-05');
这将有助于 Impala 进行资源规划。如果这能解决问题,我会感到惊讶,但最初值得一试。
【讨论】:
谢谢!但“COMPUTE STATS”也未能分配内存。事实证明,Impala 中可能存在一些错误来处理损坏的 AVRO 文件。以上是关于简单计数查询超出 Impala 内存限制的主要内容,如果未能解决你的问题,请参考以下文章
在 Clickhouse 中运行非常简单的查询时超出内存限制