简单计数查询超出 Impala 内存限制

Posted

技术标签:

【中文标题】简单计数查询超出 Impala 内存限制【英文标题】:Impala memory limit exceeded with simple count query 【发布时间】:2017-04-06 05:54:08 【问题描述】:

编辑:

表中有一些损坏的 AVRO 文件。删除其中一些后,一切正常。我已经使用 avro-tools 将这些文件解压缩为 json,并且解压后的文件也不是很大。因此,处理损坏的 AVRO 文件似乎是 Impala 中的一些错误。


我有一个 gziped AVRO 格式的 Impala 表,它按“天”分区。当我执行查询时:

select count(0) from adhoc_data_fast.log where day='2017-04-05';

它说:

Query: select count(0) from adhoc_data_fast.log where day='2017-04-05'
Query submitted at: 2017-04-06 13:35:56 (Coordinator: http://szq7.appadhoc.com:25000)
Query progress can be monitored at: http://szq7.appadhoc.com:25000/query_plan?query_id=ef4698db870efd4d:739c89ef00000000
WARNINGS:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.

每个节点配置 96 GB 内存,单池内存限制设置为 300 GB。

所有压缩后的文件不超过250MB:

62M     log.2017-04-05.1491321605834.avro
79M     log.2017-04-05.1491323647211.avro
62M     log.2017-04-05.1491327241311.avro
60M     log.2017-04-05.1491330839609.avro
52M     log.2017-04-05.1491334439092.avro
59M     log.2017-04-05.1491338038503.avro
93M     log.2017-04-05.1491341639694.avro
130M    log.2017-04-05.1491345239969.avro
147M    log.2017-04-05.1491348843931.avro
183M    log.2017-04-05.1491352442955.avro
218M    log.2017-04-05.1491359648079.avro
181M    log.2017-04-05.1491363247597.avro
212M    log.2017-04-05.1491366845827.avro
207M    log.2017-04-05.1491370445873.avro
197M    log.2017-04-05.1491374045830.avro
164M    log.2017-04-05.1491377650935.avro
155M    log.2017-04-05.1491381249597.avro
203M    log.2017-04-05.1491384846366.avro
185M    log.2017-04-05.1491388450262.avro
198M    log.2017-04-05.1491392047694.avro
206M    log.2017-04-05.1491395648818.avro
214M    log.2017-04-05.1491399246407.avro
167M    log.2017-04-05.1491402846469.avro
77M     log.2017-04-05.1491406180615.avro
3.2M    log.2017-04-05.1491409790105.avro
1.3M    log.2017-04-05.1491413385884.avro
928K    log.2017-04-05.1491416981829.avro
832K    log.2017-04-05.1491420581588.avro
1.1M    log.2017-04-05.1491424180191.avro
2.6M    log.2017-04-05.1491427781339.avro
3.8M    log.2017-04-05.1491431382552.avro
3.3M    log.2017-04-05.1491434984679.avro
5.2M    log.2017-04-05.1491438586674.avro
5.1M    log.2017-04-05.1491442192541.avro
2.3M    log.2017-04-05.1491445789230.avro
884K    log.2017-04-05.1491449386630.avro

我已经从 HDFS 获取它们并使用 avro-tools 将它们转换为 json 以解压缩它们。解压后的文件不超过1GB:

16M     log.2017-04-05.1491321605834.avro.json
308M    log.2017-04-05.1491323647211.avro.json
103M    log.2017-04-05.1491327241311.avro.json
150M    log.2017-04-05.1491330839609.avro.json
397M    log.2017-04-05.1491334439092.avro.json
297M    log.2017-04-05.1491338038503.avro.json
160M    log.2017-04-05.1491341639694.avro.json
95M     log.2017-04-05.1491345239969.avro.json
360M    log.2017-04-05.1491348843931.avro.json
338M    log.2017-04-05.1491352442955.avro.json
71M     log.2017-04-05.1491359648079.avro.json
161M    log.2017-04-05.1491363247597.avro.json
628M    log.2017-04-05.1491366845827.avro.json
288M    log.2017-04-05.1491370445873.avro.json
162M    log.2017-04-05.1491374045830.avro.json
90M     log.2017-04-05.1491377650935.avro.json
269M    log.2017-04-05.1491381249597.avro.json
620M    log.2017-04-05.1491384846366.avro.json
70M     log.2017-04-05.1491388450262.avro.json
30M     log.2017-04-05.1491392047694.avro.json
114M    log.2017-04-05.1491395648818.avro.json
370M    log.2017-04-05.1491399246407.avro.json
359M    log.2017-04-05.1491402846469.avro.json
218M    log.2017-04-05.1491406180615.avro.json
29M     log.2017-04-05.1491409790105.avro.json
3.9M    log.2017-04-05.1491413385884.avro.json
9.3M    log.2017-04-05.1491416981829.avro.json
8.3M    log.2017-04-05.1491420581588.avro.json
2.3M    log.2017-04-05.1491424180191.avro.json
25M     log.2017-04-05.1491427781339.avro.json
24M     log.2017-04-05.1491431382552.avro.json
5.7M    log.2017-04-05.1491434984679.avro.json
35M     log.2017-04-05.1491438586674.avro.json
5.8M    log.2017-04-05.1491442192541.avro.json
23M     log.2017-04-05.1491445789230.avro.json
4.3M    log.2017-04-05.1491449386630.avro.json

这是 Impala 配置文件:

[szq7.appadhoc.com:21000] > profile;
Query Runtime Profile:
Query (id=ef4698db870efd4d:739c89ef00000000):
  Summary:
    Session ID: f54bb090170bcdb6:621ac5796ef2668c
    Session Type: BEESWAX
    Start Time: 2017-04-06 13:35:56.454441000
    End Time: 2017-04-06 13:35:57.326967000
    Query Type: QUERY
    Query State: EXCEPTION
    Query Status:
Memory limit exceeded
GzipDecompressor failed to allocate 109051904000 bytes.

    Impala Version: impalad version 2.7.0-cdh5.9.1 RELEASE (build 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c)
    User: ubuntu
    Connected User: ubuntu
    Delegated User:
    Network Address: ::ffff:192.168.1.7:29026
    Default Db: default
    Sql Statement: select count(0) from adhoc_data_fast.log where day='2017-04-05'
    Coordinator: szq7.appadhoc.com:22000
    Query Options (non default):
    Plan:
----------------
Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log

03:AGGREGATE [FINALIZE]
|  output: count:merge(0)
|  hosts=13 per-host-mem=unavailable
|  tuple-ids=1 row-size=8B cardinality=1
|
02:EXCHANGE [UNPARTITIONED]
|  hosts=13 per-host-mem=unavailable
|  tuple-ids=1 row-size=8B cardinality=1
|
01:AGGREGATE
|  output: count(0)
|  hosts=13 per-host-mem=10.00MB
|  tuple-ids=1 row-size=8B cardinality=1
|
00:SCAN HDFS [adhoc_data_fast.log, RANDOM]
   partitions=1/7594 files=38 size=3.45GB
   table stats: unavailable
   column stats: all
   hosts=13 per-host-mem=400.00MB
   tuple-ids=0 row-size=0B cardinality=unavailable
----------------
    Estimated Per-Host Mem: 429916160
    Estimated Per-Host VCores: 1
    Tables Missing Stats: adhoc_data_fast.log
    Request Pool: default-pool
    Admission result: Admitted immediately
    ExecSummary:
Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE        1   52.298ms   52.298ms      0           1   4.00 KB        -1.00 B  FINALIZE
02:EXCHANGE         1  676.993ms  676.993ms      0           1         0        -1.00 B  UNPARTITIONED
01:AGGREGATE        0    0.000ns    0.000ns      0           1         0       10.00 MB
00:SCAN HDFS        0    0.000ns    0.000ns      0          -1         0      400.00 MB  adhoc_data_fast.log
    Planner Timeline: 69.589ms
       - Analysis finished: 6.642ms (6.642ms)
       - Equivalence classes computed: 6.980ms (337.753us)
       - Single node plan created: 13.302ms (6.322ms)
       - Runtime filters computed: 13.368ms (65.984us)
       - Distributed plan created: 15.131ms (1.763ms)
       - Lineage info computed: 16.488ms (1.356ms)
       - Planning finished: 69.589ms (53.101ms)
    Query Timeline: 874.026ms
       - Start execution: 63.320us (63.320us)
       - Planning finished: 72.764ms (72.701ms)
       - Submit for admission: 73.592ms (827.496us)
       - Completed admission: 73.775ms (183.088us)
       - Ready to start 13 remote fragments: 126.950ms (53.175ms)
       - All 13 remote fragments started: 161.919ms (34.968ms)
       - Rows available: 856.761ms (694.842ms)
       - Unregister query: 872.527ms (15.765ms)
     - ComputeScanRangeAssignmentTimer: 356.136us
  ImpalaServer:
     - ClientFetchWaitTimer: 0.000ns
     - RowMaterializationTimer: 0.000ns
  Execution Profile ef4698db870efd4d:739c89ef00000000:(Total: 782.712ms, non-child: 0.000ns, % non-child: 0.00%)
    Number of filters: 0
    Filter routing table:
 ID  Src. Node  Tgt. Node(s)  Targets  Target type  Partition filter  Pending (Expected)  First arrived  Completed   Enabled
----------------------------------------------------------------------------------------------------------------------------

    Fragment start latencies: Count: 13, 25th %-ile: 1ms, 50th %-ile: 1ms, 75th %-ile: 1ms, 90th %-ile: 2ms, 95th %-ile: 2ms, 99.9th %-ile: 35ms
    Per Node Peak Memory Usage: szq15.appadhoc.com:22000(0) szq1.appadhoc.com:22000(0) szq13.appadhoc.com:22000(0) szq12.appadhoc.com:22000(0) szq11.appadhoc.com:22000(0) szq20.appadhoc.com:22000(0) szq14.appadhoc.com:22000(0) szq8
.appadhoc.com:22000(0) szq5.appadhoc.com:22000(0) szq9.appadhoc.com:22000(0) szq4.appadhoc.com:22000(0) szq6.appadhoc.com:22000(0) szq7.appadhoc.com:22000(0)
     - FiltersReceived: 0 (0)
     - FinalizationTimer: 0.000ns
    Coordinator Fragment F01:(Total: 729.811ms, non-child: 0.000ns, % non-child: 0.00%)
      MemoryUsage(500.000ms): 12.00 KB
       - AverageThreadTokens: 0.00
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 12.00 KB (12288)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 52.291ms
       - RowsProduced: 0 (0)
       - TotalCpuTime: 0.000ns
       - TotalNetworkReceiveTime: 676.991ms
       - TotalNetworkSendTime: 0.000ns
       - TotalStorageWaitTime: 0.000ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 0 (0)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 102.40 GB (109951164416)
         - PeakMemoryUsage: 0
         - TotalBufferWaitTime: 0.000ns
         - TotalEncryptionTime: 0.000ns
         - TotalIntegrityCheckTime: 0.000ns
         - TotalReadBlockTime: 0.000ns
      CodeGen:(Total: 63.837ms, non-child: 63.837ms, % non-child: 100.00%)
         - CodegenTime: 828.728us
         - CompileTime: 2.957ms
         - LoadTime: 0.000ns
         - ModuleBitcodeSize: 1.89 MB (1984232)
         - NumFunctions: 7 (7)
         - NumInstructions: 96 (96)
         - OptimizationTime: 8.070ms
         - PrepareTime: 51.769ms
      AGGREGATION_NODE (id=3):(Total: 729.291ms, non-child: 52.298ms, % non-child: 7.17%)
        ExecOption: Codegen Enabled
         - BuildTime: 0.000ns
         - GetResultsTime: 0.000ns
         - HTResizeTime: 0.000ns
         - HashBuckets: 0 (0)
         - LargestPartitionPercent: 0 (0)
         - MaxPartitionLevel: 0 (0)
         - NumRepartitions: 0 (0)
         - PartitionsCreated: 0 (0)
         - PeakMemoryUsage: 4.00 KB (4096)
         - RowsRepartitioned: 0 (0)
         - RowsReturned: 0 (0)
         - RowsReturnedRate: 0
         - SpilledPartitions: 0 (0)
      EXCHANGE_NODE (id=2):(Total: 676.993ms, non-child: 676.993ms, % non-child: 100.00%)
        BytesReceived(500.000ms): 0
         - BytesReceived: 0
         - ConvertRowBatchTime: 0.000ns
         - DeserializeRowBatchTimer: 0.000ns
         - FirstBatchArrivalWaitTime: 0.000ns
         - PeakMemoryUsage: 0
         - RowsReturned: 0 (0)
         - RowsReturnedRate: 0
         - SendersBlockedTimer: 0.000ns
         - SendersBlockedTotalTimer(*): 0.000ns
    Averaged Fragment F00:
      split sizes:  min: 114.60 MB, max: 451.79 MB, avg: 271.65 MB, stddev: 104.16 MB
      completion times: min:694.632ms  max:728.356ms  mean: 725.379ms  stddev:8.878ms
      execution rates: min:157.45 MB/sec  max:620.68 MB/sec  mean:374.89 MB/sec  stddev:144.30 MB/sec
      num instances: 13
    Fragment F00:
      Instance ef4698db870efd4d:739c89ef00000001 (host=szq5.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000002 (host=szq8.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000003 (host=szq14.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000004 (host=szq20.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000005 (host=szq11.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000006 (host=szq12.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000007 (host=szq13.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000008 (host=szq1.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef00000009 (host=szq15.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000a (host=szq6.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000b (host=szq4.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000c (host=szq9.appadhoc.com:22000):
      Instance ef4698db870efd4d:739c89ef0000000d (host=szq7.appadhoc.com:22000):

那么为什么 Impala 需要这么多内存呢?

【问题讨论】:

【参考方案1】:

这可能是 Impala 缺少关于该分区的表的统计信息。解释计划强调以下几点:

Estimated Per-Host Requirements: Memory=410.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
adhoc_data_fast.log

尝试对表运行 COMPUTE STATS,或对分区运行 COMPUTE INCREMENTAL STATS。

例如

COMPUTE INCREMENTAL STATS adhoc_data_fast.log PARTITION (day='2017-04-05');

这将有助于 Impala 进行资源规划。如果这能解决问题,我会感到惊讶,但最初值得一试。

【讨论】:

谢谢!但“COMPUTE STATS”也未能分配内存。事实证明,Impala 中可能存在一些错误来处理损坏的 AVRO 文件。

以上是关于简单计数查询超出 Impala 内存限制的主要内容,如果未能解决你的问题,请参考以下文章

在 Clickhouse 中运行非常简单的查询时超出内存限制

超出内存限制或时间限制

Impala 资源管理

PBS vmem 超出限制:我怎么知道内存超出了哪里?

Pig : 容器在 cdh 5 中使用 oozie 运行超出物理内存限制

超出 sysconfig 限制的共享内存大小