提高 PostgreSQL 函数性能
Posted
技术标签:
【中文标题】提高 PostgreSQL 函数性能【英文标题】:Improving PostgreSQL Function Performance 【发布时间】:2019-08-07 16:21:14 【问题描述】:我有一个查询可以有效地更新表中所有值的 ID。使用它以便可以将其组合到另一个数据库中的表中,而不会导致 ID 冲突。问题是每张桌子可能需要几分钟。由于有几张桌子都使用了这个功能,因此通常需要长达 20-30 分钟。
这个查询现在已经经历了几次迭代,这基本上是我能做到的最好的了。诚然,我的 SQL 技能相当有限。该函数还可以删除索引中的任何“空白”,尽管这不是严格要求的。
代码如下所示:
CREATE OR REPLACE FUNCTION prep_key_ids(_table text, _offset bigint) RETURNS void AS
$BODY$
DECLARE
old_id bigint;
table_exists boolean;
new_id bigint;
min_id bigint;
max_id bigint;
index bigint;
low_id bigint;
high_id bigint;
row_count bigint;
BEGIN
SELECT EXISTS(SELECT 1 FROM information_schema.table_constraints WHERE table_name=_table) INTO table_exists;
IF table_exists THEN
EXECUTE 'SELECT MIN(id), MAX(id), COUNT(*) FROM ' || _table || ';' INTO min_id, max_id, row_count;
IF row_count <= 0 THEN
RETURN;
END IF;
IF min_id > _offset THEN
-- minimum id greater than the start of our desired offset, we can move each id without there being a conflict
new_id = _offset + 1;
FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' ORDER BY id ASC;' LOOP
EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
new_id = new_id + 1;
END LOOP;
ELSIF max_id <= _offset + row_count THEN
-- maximum id is less than the end point of our desired offset, we can move the ends without there being a conflict
new_id = _offset + row_count;
FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' ORDER BY id DESC;' LOOP
EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
new_id = new_id - 1;
END LOOP;
ELSE
-- there exist ids before our desired start and after our desired end
-- find the pivot point where we can set ids without there being a conflict
EXECUTE 'WITH tb AS ( SELECT row_number() OVER (ORDER BY id ASC) - 1 AS index, id, lead(id) over(ORDER BY id ASC) AS lead_id FROM ' || _table || ' ORDER BY id ASC ) '
'SELECT index, id, lead_id FROM tb WHERE tb.id <= ' || _offset + 1 || ' + tb.index AND tb.lead_id >= ' || _offset + 1 || ' + tb.index + 1 LIMIT 1;'
INTO index, low_id, high_id;
-- NOTE: 'index' is index for low_id, index + 1 gives index for high_id
-- update ids from pivot point down to start of offset
new_id = _offset + 1 + index;
FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' WHERE id <= ' || low_id || ' ORDER BY id DESC;' LOOP
EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
new_id = new_id - 1;
END LOOP;
-- update ids from pivot point up to the end of the offset
new_id = _offset + 1 + index + 1;
FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' WHERE id >= ' || high_id || ' ORDER BY id ASC;' LOOP
EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
new_id = new_id + 1;
END LOOP;
END IF;
END IF;
END;
$BODY$ LANGUAGE plpgsql;
运行EXPLAIN (analyze, buffers, verbose) EXECUTE prep_key_ids( 'imported_fields', 1 )
的输出是:
"Result (cost=0.00..0.26 rows=1 width=4) (actual time=337592.604..337592.605 rows=1 loops=1)"
" Output: prep_key_ids('imported_fields'::text, '1'::bigint)"
" Buffers: shared hit=131862084 read=4409621 dirtied=3013612 written=2828226"
"Planning time: 0.013 ms"
"Execution time: 337592.620 ms"
EXPLAIN (analyze, buffers, verbose) UPDATE imported_fields SET id=595 WHERE id=594
的输出是:
"Update on public.imported_fields (cost=0.28..8.29 rows=1 width=52) (actual time=0.115..0.115 rows=0 loops=1)"
" Buffers: shared hit=8 read=3 dirtied=4"
" -> Index Scan using imported_fields_id_idx on public.imported_fields (cost=0.28..8.29 rows=1 width=52) (actual time=0.008..0.009 rows=1 loops=1)"
" Output: '595'::bigint, exf_import, name, import_field_type, valid_text_timestamp, ctid"
" Index Cond: (imported_fields.id = 594)"
" Buffers: shared hit=4"
"Planning time: 0.272 ms"
"Trigger RI_ConstraintTrigger_a_2290766 for constraint production_field_values_imported_field_fkey on imported_fields: time=0.152 calls=1"
"Trigger RI_ConstraintTrigger_a_2290771 for constraint text_field_values_imported_field_fkey on imported_fields: time=1564.663 calls=1"
"Trigger RI_ConstraintTrigger_a_2290776 for constraint field_definitions_imported_field_fkey on imported_fields: time=0.082 calls=1"
"Trigger RI_ConstraintTrigger_a_2290781 for constraint added_dependencies_domain_fkey on imported_fields: time=0.021 calls=1"
"Trigger RI_ConstraintTrigger_a_2290786 for constraint added_dependencies_criterion_fkey on imported_fields: time=0.013 calls=1"
"Trigger RI_ConstraintTrigger_a_2290791 for constraint guidance_formula_set_entries_rank_field_fkey on imported_fields: time=0.049 calls=1"
"Trigger RI_ConstraintTrigger_a_2290796 for constraint guidance_formula_set_entries_mine_area_field_fkey on imported_fields: time=0.019 calls=1"
"Trigger RI_ConstraintTrigger_a_2290806 for constraint attain_run_settings_start_date_field_fkey on imported_fields: time=0.033 calls=1"
"Trigger RI_ConstraintTrigger_a_2290811 for constraint rm_o_attain_config_datefield_fkey on imported_fields: time=0.029 calls=1"
"Trigger RI_ConstraintTrigger_a_2291411 for constraint activity_filter_operation_field_lookups_field_fkey on imported_fields: time=0.498 calls=1"
"Trigger RI_ConstraintTrigger_a_2292706 for constraint grade_distributions_confidence_field_fkey on imported_fields: time=0.020 calls=1"
"Trigger RI_ConstraintTrigger_a_2292995 for constraint saved_realization_sets_product_field_fkey on imported_fields: time=0.017 calls=1"
"Trigger RI_ConstraintTrigger_a_2293204 for constraint saved_grade_realization_sets_product_field_fkey on imported_fields: time=0.017 calls=1"
"Trigger RI_ConstraintTrigger_a_2293575 for constraint ventilation_advanced_scenarios_text_field_id_fkey on imported_fields: time=0.016 calls=1"
"Trigger RI_ConstraintTrigger_a_2294065 for constraint geosequencing_stability_settings_text_field_definition_fkey on imported_fields: time=0.015 calls=1"
"Trigger RI_ConstraintTrigger_a_2294090 for constraint geosequencing_scenario_subtask_configu_subtask_group_field_fkey on imported_fields: time=0.016 calls=1"
"Trigger RI_ConstraintTrigger_a_2294095 for constraint geosequencing_scenario_subtask_configur_subtask_type_field_fkey on imported_fields: time=0.011 calls=1"
"Trigger RI_ConstraintTrigger_a_2294120 for constraint geosequencing_scenario_subtask_filter_operati_filter_field_fkey on imported_fields: time=0.015 calls=1"
"Trigger RI_ConstraintTrigger_a_2294727 for constraint run_settings_pin_marker_field_fkey on imported_fields: time=0.053 calls=1"
"Trigger RI_ConstraintTrigger_a_2294944 for constraint formula_used_fields_field_id_fkey on imported_fields: time=0.030 calls=1"
"Trigger RI_ConstraintTrigger_a_2295066 for constraint cumulative_production_expenditures_production_field_fkey on imported_fields: time=0.028 calls=1"
"Trigger RI_ConstraintTrigger_a_2295078 for constraint run_settings_target_field_fkey on imported_fields: time=0.024 calls=1"
"Trigger RI_ConstraintTrigger_c_2290773 for constraint text_field_values_imported_field_fkey on text_field_values: time=222.517 calls=38655"
"Execution time: 1790.278 ms"
为了更新这个表,最大的时间消耗是链接 text_field_values 表,它已经在imported_field 列上有一个索引。不知道还能做什么,因为已经有索引了。 text_field_values 表目前有大约 400 万行奇数行(但可能不止这些)。
【问题讨论】:
您是否意识到您正在执行一个动态查询每个受影响的行? 是的,我愿意。不知道如何解决这个问题,因为我不能只将他们的 ID 更新为特定的偏移量,因为它会导致冲突。然而,没有将它们全部设为负数,然后进行更新,这会导致两倍的更新,这更糟。 您可以分块更新,同时将约束设置为延迟。但是更新(主)键仍然是一个非常糟糕的主意。您可以考虑使用类似数据仓库的数据模型。 主键不可延迟,如果我尝试删除约束以重新创建 if(只是为了测试这一点),那么由于依赖表,我会得到一个很长的错误列表.在这一点上重做数据库结构对我来说并不可行——有 191 个表全部链接,有些表有多个外键。 我意识到更新 PK 是个坏主意,但是当将两个数据库组合在一起时,我不知道该怎么做...... 【参考方案1】:评论太长了。
更改表中的 id 似乎相当激烈。如果需要区分id,为什么不加一个固定的数字或者表名前缀呢:
select 100000000 + id, . . .
from table1;
select 200000000 + id, . . .
from table2;
或:
select 'table1' || id, . . .
from table1;
select 'table2' || id, . . .
from table2;
【讨论】:
因为这两个表要合并为一个。实际上,有两个数据库。一个数据库被“导入”到另一个数据库中,将两者合并。以上是关于提高 PostgreSQL 函数性能的主要内容,如果未能解决你的问题,请参考以下文章
连接postgres特别消耗cpu资源而引发的PostgreSQL性能优化考虑