计算带有子查询的行数的比率

Posted

技术标签:

【中文标题】计算带有子查询的行数的比率【英文标题】:Calculate ratios of counts of rows with subqueries 【发布时间】:2021-04-06 11:08:00 【问题描述】:

(我想不出比这个问题更好的标题了。欢迎提出建议。)

(如果版本很重要,我使用的是 SQLAlchemy 1.4.4 和 Postgresql 13.1。)

我有一个表('test'),其中包含多个人的布尔值的多个实例,表示测试结果(通过或失败),我想创建一个查询,返回一个结果集,表示每个人的通过/失败比率他们。

即,对于这个表:

 id | person | passed
----+--------+--------
  1 | p1     | t
  2 | p1     | f
  3 | p1     | f
  4 | p2     | t
  5 | p2     | t
  6 | p2     | t
  7 | p2     | t
  8 | p2     | t
  9 | p2     | f
 10 | p2     | f
 11 | p2     | f

查询应该返回:

person | pass_fail_ratio
-------+-------------------
p1     | 0.5
p2     | 1.6666666666666667

这是迄今为止我能够提出的解决方案。 (我在末尾附加了一个完整的 MWE。)

results_count = (
    sa.select(
        test.person,
        test.passed,
        sa.func.count(test.passed).label('count')
    ).group_by(test.person).group_by(test.passed)
).subquery()

pass_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == True)  # noqa
).subquery()

fail_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == False)  # noqa
).subquery()

pass_fail_ratio = (
    sa.select(
        pass_count.c.person,
        (
            sa.cast(pass_count.c.count, sa.Float)
            / sa.cast(fail_count.c.count, sa.Float)
        ).label('success_failure_ratio')
    )
).filter(fail_count.c.person == pass_count.c.person)

对我来说,这看起来过于复杂,因为这在概念上似乎相当简单。有没有更好的解决方案?


MWE:

# To change database name, modify 'dbname'.

# Expected output:
# ('p1', 0.5)
# ('p2', 1.6666666666666667)

# Lots of constraints and checks omitted for brevity.

# To view generated SQL, uncomment the line containing "echo" below.

import sqlalchemy as sa
import sqlalchemy.orm as orm
import sqlalchemy.types as types

dbname = 'test'


base = orm.declarative_base()


class test(base):
    __tablename__ = 'test'
    id = sa.Column(sa.Integer, primary_key=True)
    person = sa.Column(sa.String)
    passed = sa.Column(types.Boolean)
    pass


engine = sa.create_engine(
    f"postgresql://localhost:5432/dbname", future=True
)
base.metadata.drop_all(engine)
base.metadata.create_all(engine)
session = orm.Session(engine)

# Add some data.
session.add(test(person='p1', passed=True))
session.add(test(person='p1', passed=False))
session.add(test(person='p1', passed=False))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=True))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.add(test(person='p2', passed=False))
session.commit()

results_count = (
    sa.select(
        test.person,
        test.passed,
        sa.func.count(test.passed).label('count')
    ).group_by(test.person).group_by(test.passed)
).subquery()

pass_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == True)  # noqa
).subquery()

fail_count = (
    sa.select(results_count.c.person, results_count.c.count)
    .filter(results_count.c.passed == False)  # noqa
).subquery()

pass_fail_ratio = (
    sa.select(
        pass_count.c.person,
        (
            sa.cast(pass_count.c.count, sa.Float)
            / sa.cast(fail_count.c.count, sa.Float)
        ).label('success_failure_ratio')
    )
).filter(fail_count.c.person == pass_count.c.person)

# engine.echo = True
with orm.Session(engine) as session:
    res = session.execute(pass_fail_ratio)
    for row in res:
        print(row)
        pass
    pass
pass

【问题讨论】:

【参考方案1】:

这太复杂了。我不会使用子查询。一种方法是:

select person,
       count(*) filter (where passed) * 1.0 / count(*) filter (where not passed)
from test t
group by person;

如果没有filter,您可能会发现“以老式方式”表达这一点更方便:

select person,
       sum( passed::int ) * 1.0 / sum( (not passed)::int )
from test t
group by person;

请注意,通过率比通过与失败的比率更常用。很简单:

select person,
       avg( passed::int ) as pass_ratio
from test t
group by person;

【讨论】:

哇,那是个杀手!作为一个了解 SQL 的人,委婉地说,并不完美,我不知道 FILTER。 至于通过/失败与通过/所有比率部分,我有我的理由:-) @toomas 。 . . FILTER 是标准 SQL,但 Postgres 是实现它的少数数据库之一。【参考方案2】:

在 SQLAlchemy 中得到 Gordon Linoff 的答案。这是我的最终解决方案:

import sqlalchemy as sa
pass_fail_ratio_query = sa.select(
        test.person,
        (
            sa.cast(
                sa.funcfilter(sa.func.count(), test.passed == True),  # noqa
                sa.Float
            )
            / sa.cast(
                sa.funcfilter(sa.func.count(), test.passed == False),  # noqa
                sa.Float
            )
        )
    ).group_by(test.person)

【讨论】:

以上是关于计算带有子查询的行数的比率的主要内容,如果未能解决你的问题,请参考以下文章

从当前查询中减去子查询中的行数

子查询分组后主查询怎么接收count

[MySQL]子语句的查询技巧

MDX 问题,使用子查询计算行数

mysql使用带有子查询的临时表,但不是group by和order by

oracle学习之多表查询,子查询以及事务处理