Rails 排序查询优化
Posted
技术标签:
【中文标题】Rails 排序查询优化【英文标题】:Rails ordering query optimization 【发布时间】:2021-11-09 03:54:52 【问题描述】:我有一个模型 Activity,它有很多 ActivitySecondaryUser。我正在尝试优化此查询:
2.6.3 :015 > Activity.left_joins(:activity_secondary_users).where("activity_secondary_users.user_id = :id OR (primary_user_id = :id AND activity_type != '#Activity::MENTION')", id: 10000).order(created_at: :desc).limit(10).explain
Activity Load (812.7ms) SELECT "activities".* FROM "activities" LEFT OUTER JOIN "activity_secondary_users" ON "activity_secondary_users"."activity_id" = "activities"."id" WHERE (activity_secondary_users.user_id = 10000 OR (primary_user_id = 10000 AND activity_type != 'mention')) ORDER BY "activities"."created_at" DESC LIMIT $1 [["LIMIT", 10]]
=> EXPLAIN for: SELECT "activities".* FROM "activities" LEFT OUTER JOIN "activity_secondary_users" ON "activity_secondary_users"."activity_id" = "activities"."id" WHERE (activity_secondary_users.user_id = 10000 OR (primary_user_id = 10000 AND activity_type != 'mention')) ORDER BY "activities"."created_at" DESC LIMIT $1 [["LIMIT", 10]]
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1000.87..19659.54 rows=10 width=138) (actual time=79.769..737.253 rows=10 loops=1)
Buffers: shared hit=2013672
-> Gather Merge (cost=1000.87..202514.52 rows=108 width=138) (actual time=79.768..737.245 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2013672
-> Nested Loop Left Join (cost=0.84..201502.03 rows=45 width=138) (actual time=36.208..351.256 rows=5 loops=3)
Filter: ((activity_secondary_users.user_id = 10000) OR ((activities.primary_user_id = 10000) AND ((activities.activity_type)::text <> 'mention'::text)))
Rows Removed by Filter: 181610
Buffers: shared hit=2013672
-> Parallel Index Scan using index_activities_on_created_at on activities (cost=0.42..28991.70 rows=370715 width=138) (actual time=0.027..52.295 rows=181615 loops=3)
Buffers: shared hit=137766
-> Index Scan using index_activity_secondary_users_on_activity_id on activity_secondary_users (cost=0.42..0.45 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=544845)
Index Cond: (activity_id = activities.id)
Buffers: shared hit=1875906
Planning Time: 0.216 ms
Execution Time: 737.288 ms
索引:
活动:created_at、primary_user_id ActivitySecondaryUser:activity_id我已尝试添加其他索引并更改排序属性,但似乎没有什么能让它更快。该表的记录少于 100 万条,平均耗时超过 500 毫秒。有关如何优化查询的任何建议?
【问题讨论】:
当您多次运行每个查询时,您是否看到相同的差异?我相信第一次运行查询时可能会有一些开销,因为查询计划是制定然后缓存的。 @LesNightingill asc 查询的整体性能确实更好。我确实发现,对于更高的 id,asc 和 desc 的查询都非常慢(有时超过 400 毫秒) 请显示EXPLAIN (ANALYZE, BUFFERS)
,而不仅仅是解释。
@jjanes 添加到问题
【参考方案1】:
我会尝试按降序添加第二个索引。默认情况下,索引将按升序排列,如果您有大量数据,并且您经常希望按降序查看它,则可能值得拥有一个专用索引。
迁移看起来像这样:
def change
add_index(:activities, :created_at, order: created_at: :desc)
end
上面的 Rails 文档在这里:https://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/add_index
里面有一个注释 - 如果您使用的是旧版本的 mysql,请注意
Note: MySQL only supports index order from 8.0.1 onwards (earlier versions accepted the syntax but ignored it).
【讨论】:
我已尝试更改一些索引的排序顺序,但性能并没有提高【参考方案2】:您正在寻找的用户 10000 似乎不再处于活动状态。它必须遍历所有数据,544845 行活动,从最新的开始,然后才找到对该用户的 10 次引用。
这可能是一个很难优化的查询,因为 WHERE 的 ORed 分支在一个表上,但 ORDER BY 在另一个表上。
您能否只检测非活动用户并拒绝为他们运行此类查询?
【讨论】:
在活跃用户上,查询仍然需要超过 200 毫秒。有没有办法重新架构数据库以达到类似的结果?为用户找到传出和传入活动的目标。以上是关于Rails 排序查询优化的主要内容,如果未能解决你的问题,请参考以下文章
努力优化 Rails WHERE NOT IN 在 Rails 中的查询
clickhouse,数据查询与写入优化,分布式子查询优化,外部聚合/排序优化,基于JOIN引擎的优化,SQL优化案例,物化视图提速,查询优化常用经验法则,选择和主键不一样的排序键,数据入库优化(代码