计算带有子查询的行数的比率
Posted
技术标签:
【中文标题】计算带有子查询的行数的比率【英文标题】:Calculate ratios of counts of rows with subqueries 【发布时间】:2021-04-06 11:08:00 【问题描述】:(我想不出比这个问题更好的标题了。欢迎提出建议。)
(如果版本很重要,我使用的是 SQLAlchemy 1.4.4 和 Postgresql 13.1。)
我有一个表('test'),其中包含多个人的布尔值的多个实例,表示测试结果(通过或失败),我想创建一个查询,返回一个结果集,表示每个人的通过/失败比率他们。
即,对于这个表:
id | person | passed
----+--------+--------
1 | p1 | t
2 | p1 | f
3 | p1 | f
4 | p2 | t
5 | p2 | t
6 | p2 | t
7 | p2 | t
8 | p2 | t
9 | p2 | f
10 | p2 | f
11 | p2 | f
查询应该返回:
person | pass_fail_ratio
-------+-------------------
p1 | 0.5
p2 | 1.6666666666666667
这是迄今为止我能够提出的解决方案。 (我在末尾附加了一个完整的 MWE。)
results_count = (
sa.select(
test.person,
test.passed,
sa.func.count(test.passed).label('count')
).group_by(test.person).group_by(test.passed)
).subquery()
pass_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == True) # noqa
).subquery()
fail_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == False) # noqa
).subquery()
pass_fail_ratio = (
sa.select(
pass_count.c.person,
(
sa.cast(pass_count.c.count, sa.Float)
/ sa.cast(fail_count.c.count, sa.Float)
).label('success_failure_ratio')
)
).filter(fail_count.c.person == pass_count.c.person)
对我来说,这看起来过于复杂,因为这在概念上似乎相当简单。有没有更好的解决方案?
MWE:
# To change database name, modify 'dbname'.
# Expected output:
# ('p1', 0.5)
# ('p2', 1.6666666666666667)
# Lots of constraints and checks omitted for brevity.
# To view generated SQL, uncomment the line containing "echo" below.
import sqlalchemy as sa
import sqlalchemy.orm as orm
import sqlalchemy.types as types
dbname = 'test'
base = orm.declarative_base()
class test(base):
__tablename__ = 'test'
id = sa.Column(sa.Integer, primary_key=True)
person = sa.Column(sa.String)
passed = sa.Column(types.Boolean)
pass
engine = sa.create_engine(
f"postgresql://localhost:5432/dbname", future=True
)
base.metadata.drop_all(engine)
base.metadata.create_all(engine)
session = orm.Session(engine)
# Add some data.
session.add(test(person='p1', passed=True))
session.add(test(person='p1', passed=False))
session.add(test(person='p1', passed=False))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.commit()
results_count = (
sa.select(
test.person,
test.passed,
sa.func.count(test.passed).label('count')
).group_by(test.person).group_by(test.passed)
).subquery()
pass_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == True) # noqa
).subquery()
fail_count = (
sa.select(results_count.c.person, results_count.c.count)
.filter(results_count.c.passed == False) # noqa
).subquery()
pass_fail_ratio = (
sa.select(
pass_count.c.person,
(
sa.cast(pass_count.c.count, sa.Float)
/ sa.cast(fail_count.c.count, sa.Float)
).label('success_failure_ratio')
)
).filter(fail_count.c.person == pass_count.c.person)
# engine.echo = True
with orm.Session(engine) as session:
res = session.execute(pass_fail_ratio)
for row in res:
print(row)
pass
pass
pass
【问题讨论】:
【参考方案1】:这太复杂了。我不会使用子查询。一种方法是:
select person,
count(*) filter (where passed) * 1.0 / count(*) filter (where not passed)
from test t
group by person;
如果没有filter
,您可能会发现“以老式方式”表达这一点更方便:
select person,
sum( passed::int ) * 1.0 / sum( (not passed)::int )
from test t
group by person;
请注意,通过率比通过与失败的比率更常用。很简单:
select person,
avg( passed::int ) as pass_ratio
from test t
group by person;
【讨论】:
哇,那是个杀手!作为一个了解 SQL 的人,委婉地说,并不完美,我不知道 FILTER。 至于通过/失败与通过/所有比率部分,我有我的理由:-) @toomas 。 . .FILTER
是标准 SQL,但 Postgres 是实现它的少数数据库之一。【参考方案2】:
在 SQLAlchemy 中得到 Gordon Linoff 的答案。这是我的最终解决方案:
import sqlalchemy as sa
pass_fail_ratio_query = sa.select(
test.person,
(
sa.cast(
sa.funcfilter(sa.func.count(), test.passed == True), # noqa
sa.Float
)
/ sa.cast(
sa.funcfilter(sa.func.count(), test.passed == False), # noqa
sa.Float
)
)
).group_by(test.person)
【讨论】:
以上是关于计算带有子查询的行数的比率的主要内容,如果未能解决你的问题,请参考以下文章