RedShift 复制命令返回

Posted

技术标签:

【中文标题】RedShift 复制命令返回【英文标题】:RedShift copy command return 【发布时间】:2016-08-05 06:43:45 【问题描述】:

我们可以通过复制命令获得插入的行数吗?有些记录可能会失败,那么成功插入的记录数是多少?

我在 Amazon S3 中有一个包含 json 对象的文件,并尝试使用复制命令将数据加载到 Redshift。我如何知道有多少条记录成功插入,多少条记录失败?

【问题讨论】:

【参考方案1】:

加载一些示例数据:

db=# copy test from 's3://bucket/data' credentials '' maxerror 5;
INFO:  Load into table 'test' completed, 4 record(s) loaded successfully.
COPY

db=# copy test from 's3://bucket/err_data' credentials '' maxerror 5;
INFO:  Load into table 'test' completed, 1 record(s) loaded successfully.
INFO:  Load into table 'test' completed, 2 record(s) could not be loaded.  Check 'stl_load_errors' system table for details.
COPY

然后是下面的查询:

with _successful_loads as (
    select
        stl_load_commits.query
      , listagg(trim(filename), ', ') within group(order by trim(filename)) as filenames
    from stl_load_commits
    left join stl_query using(query)
    left join stl_utilitytext using(xid)
    where rtrim("text") = 'COMMIT'
    group by query
),
_unsuccessful_loads as (
    select
        query
      , count(1) as errors
    from stl_load_errors
    group by query
)
select
    query
  , filenames
  , sum(stl_insert.rows)            as rows_loaded
  , max(_unsuccessful_loads.errors) as rows_not_loaded
from stl_insert
inner join _successful_loads using(query)
left join _unsuccessful_loads using(query)
group by query, filenames
order by query, filenames
;

给予:

 query |                   filenames                    | rows_loaded | rows_not_loaded
-------+------------------------------------------------+-------------+-----------------
 45597 | s3://bucket/err_data.json                      |           1 |               2
 45611 | s3://bucket/data1.json, s3://bucket/data2.json |           4 |
(2 rows)

【讨论】:

以上是关于RedShift 复制命令返回的主要内容,如果未能解决你的问题,请参考以下文章

Node-Redshift 是不是支持复制命令(查询)将数据从 S3 加载到 Redshift?

执行 Redshift 复制命令时获取凭据的最佳实践

Redshift 复制命令递归扫描

Redshift:从 s3 复制命令 Json 数据

复制命令 Amazon Redshift

AWS:使用从 s3 到 redshift 的复制命令时没有插入任何内容