如何使用 Python Cubes 获取二维的聚合值？

Posted 2023-03-25

技术标签:

【中文标题】如何使用 Python Cubes 获取二维的聚合值？【英文标题】：How to get back aggregate values across 2 dimensions using Python Cubes? 【发布时间】：2016-07-15 07:00:59 【问题描述】：

情况

使用 Python 3、Django 1.9、Cubes 1.1 和 Postgres 9.5。这些是我的图片形式的数据表：

文本格式相同：

存储表

------------------------------
| id  | code | address       |
|-----|------|---------------|
| 1   | S1   | Kings Row     |
| 2   | S2   | Queens Street |
| 3   | S3   | Jacks Place   |
| 4   | S4   | Diamonds Alley|
| 5   | S5   | Hearts Road   |
------------------------------

产品表

------------------------------
| id  | code | name          |
|-----|------|---------------|
| 1   | P1   | Saucer 12     |
| 2   | P2   | Plate 15      |
| 3   | P3   | Saucer 13     |
| 4   | P4   | Saucer 14     |
| 5   | P5   | Plate 16      |
|  and many more ....        |
|1000 |P1000 | Bowl 25       |
|----------------------------|

销售表

----------------------------------------
| id  | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1   | 1          | 1        |7.05    |
| 2   | 1          | 2        |9.00    |
| 3   | 2          | 3        |1.00    |
| 4   | 2          | 3        |1.00    |
| 5   | 2          | 5        |1.00    |
|  and many more ....                  |
| 1000| 20         | 4        |1.00    |
|--------------------------------------|

关系是：

销售属于店铺销售属于产品商店有很多销售产品有很多销售额

我想要达到的目标

我想使用多维数据集能够通过以下方式进行分页显示：

鉴于商店 S1-S3：

-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9   | 0  |
|Plate 15 |0   |0   | 2  |
|  and many more ....    |
|------------------------|

注意以下几点：

尽管 Store S3 下没有 Saucer 12 的销售记录，但我显示的是 0 而不是 null 或 none。我希望能够按商店进行排序，例如 S3 的降序。单元格表示该特定产品在该特定商店中花费的总和。我也想做分页。

我尝试了什么

这是我使用的配置：

"cubes": [
    
        "name": "sales",
        "dimensions": ["product", "store"],
        "joins": [
            "master":"product_id", "detail":"product.id",
            "master":"store_id", "detail":"store.id"
        ]
    
],
"dimensions": [
     "name": "product", "attributes": ["code", "name"] ,
     "name": "store", "attributes": ["code", "address"] 
]

这是我使用的代码：

 result = browser.aggregate(drilldown=['Store','Product'],
                               order=[("Product.name","asc"), ("Store.name","desc"), ("total_products_sale", "desc")])

我没有得到我想要的。我是这样理解的：

----------------------------------------------
| product_id | store_id | total_products_sale |
|------------|----------|---------------------|
| 1          | 1        |       7.05          |
| 1          | 2        |       9             |
| 2          | 3        |       2.00          |
|  and many more ....                         |
|---------------------------------------------|

这是没有分页的整个表格，如果该商店没有出售的产品，它不会显示为零。

我的问题

我如何得到我想要的？

在使用多维数据集运行查询之前，是否需要创建另一个数据表来按商店和产品聚合所有内容？

更新

我已经阅读了更多。我意识到我想要的是所谓的切割，因为我需要跨越 2 个维度。见：https://en.wikipedia.org/wiki/OLAP_cube#Operations

在Cubes GitHub issues 交叉发布以获得更多关注。

【问题讨论】：

顺便说一句，我认为这些是倒退的：3. Sales has many Store 4. Sales has many Product。应该是：3. Store has many Sales 4. Product has many Sales 哎呀你是对的。我去换 【参考方案1】：

这是一个纯 SQL 解决方案，使用来自附加 tablefunc 模块的 crosstab() 来透视聚合数据。它的性能通常优于任何客户端替代方案。如果您不熟悉crosstab()，请先阅读此内容：

PostgreSQL Crosstab Query

关于crosstab() 输出中的“额外”列：

Pivot on Multiple Columns using Tablefunc

SELECT product_id, product
     , COALESCE(s1, 0) AS s1               --  1. ... displayed 0 instead of null
     , COALESCE(s2, 0) AS s2
     , COALESCE(s3, 0) AS s3
     , COALESCE(s4, 0) AS s4
     , COALESCE(s5, 0) AS s5
FROM   crosstab(
     'SELECT s.product_id, p.name, s.store_id, s.sum_amount
      FROM   product p
      JOIN  (
         SELECT product_id, store_id
              , sum(amount) AS sum_amount  -- 3. SUM total of product spent in store
         FROM   sales
         GROUP  BY product_id, store_id
         ) s ON p.id = s.product_id
      ORDER  BY s.product_id, s.store_id;'
   , 'VALUES (1),(2),(3),(4),(5)'          -- desired store_id's
   ) AS ct (product_id int, product text   -- "extra" column
          , s1 numeric, s2 numeric, s3 numeric, s4 numeric, s5 numeric)
ORDER  BY s3 DESC;                         -- 2. ... descending order for S3

产生您想要的结果正是（加上product_id）。

要包含从未售出的产品，请将[INNER] JOIN 替换为LEFT [OUTER] JOIN。

SQL Fiddle 带有基本查询。_{sqlfiddle 上未安装 tablefunc 模块。}

要点

阅读reference answer for crosstab()中的基本解释。

我将product_id 包括在内，因为product.name 几乎不是唯一的。否则，这可能会导致将两种不同产品混为一谈的偷偷摸摸的错误。

如果保证参照完整性，则查询中不需要store 表。

ORDER BY s3 DESC 有效，因为s3 引用了 output 列，其中 NULL 值已替换为 COALESCE。否则我们需要DESC NULLS LAST 来最后对 NULL 值进行排序：

PostgreSQL sort by datetime asc, null first?

对于动态构建crosstab() 查询，请考虑：

Dynamic alternative to pivot with CASE and GROUP BY

我也想做分页。

最后一项是模糊的。可以使用LIMIT 和OFFSET 进行简单的分页：

Displaying data in grid view page by page

我会考虑在分页之前使用MATERIALIZED VIEW 来实现结果。如果您有稳定的页面大小，我会在 MV 中添加页码以获得简单快速的结果。

要优化大结果集的性能，请考虑：

SQL syntax term for 'WHERE (col1, col2) < (val1, val2)' Optimize query with OFFSET on large table

【讨论】：

如果我使用 postgres 查询，每次我需要这样的东西时，我都需要编写理智的查询。我选择立方体是为了更容易重复使用。我更喜欢使用立方体库的答案 @KimStacks：我不熟悉立方体库。但我对这类问题非常熟悉，这是迄今为止最有效的解决方案。但是是的，如果结果中的商店发生变化，则需要调整 SQL 查询。虽然答案很详细。我可能必须编写自己的函数来使用您的模板构建查询 @KimStacks：我添加了另一个可能有助于动态交叉表的链接。假设我在我的 django 应用程序中使用您的查询，这意味着我需要原始执行查询。我可以让它与 django 模型类一起工作吗？尤其是当字段将是动态的时。 docs.djangoproject.com/en/1.9/topics/db/sql

以上是关于如何使用 Python Cubes 获取二维的聚合值？的主要内容，如果未能解决你的问题，请参考以下文章