随机抽样的 SQL 分区

Posted 2023-02-16

技术标签:

【中文标题】随机抽样的 SQL 分区【英文标题】：Partition on SQL with random sampling 【发布时间】：2020-12-27 04:51:23 【问题描述】：

我有一个这样的数据库：

我如何获得一个随机样本，例如纽约市的 2 行和伦敦的 3 行随机样本？有人知道一个简单而简短的代码吗？

我正在考虑使用row () over (partition by City order by City)，但如何继续？

【问题讨论】：

“随机样本”与您的要求不匹配。那么你的实际目标是什么？你想从每个组（即城市）中随机选择 50% 的行吗？我只想要显示纽约市的 2 行和伦敦市的 3 行。但这行需要随机选择。 【参考方案1】：

一个选项使用row_number() 和new_id()。

select t.*
from (
    select t.*,
        row_number() over(partition by city order by newid()) rn
    from mytable t
    where city in ('New York City', 'London')
) t
where rn <= case city
    when 'New York City' then 2
    when 'London' then 3
end

row_number() 随机排列具有相同城市的记录。然后，在外部查询中，我们使用条件表达式选择每个城市所需的记录数。

这会给你一个随机选择。如果您想要任何条记录，则不需要newid()：只需使用order by (select null)，它更便宜。

【讨论】：

谢谢，它有效。他们是没有子查询的其他方法吗？我不是子查询的忠实粉丝。这个 TABLESAMPLE 方法怎么样？ @Freddy：使用union 的其他方法需要多次扫描表格（每个城市一次），因此效率较低。【参考方案2】：

您可以使用tablesample。

(select somefields from yourtable tablesample(2 rows) where city = 'New York City')
 union 
(select somefields from yourtable tablesample(3 rows) where city = 'London')

【讨论】：

是的，这是真的。但结果量总是不同的。我并不总是得到要求的行。【参考方案3】：

您也可以使用此代码，无需分区：

select * from 
(select Top 2 City, Unit_price, newID() as t from Sales where City = 'Naypyitaw'
order by newID()) as tt

union 

select * from (
select Top 3 City, Unit_price, newID() as t from Sales where City = 'Yangon'
order by newID()) as tt

【讨论】：

以上是关于随机抽样的 SQL 分区的主要内容，如果未能解决你的问题，请参考以下文章