TSQL 透视或估计模型复制

Posted

技术标签:

【中文标题】TSQL 透视或估计模型复制【英文标题】:TSQL Pivot or Estimate Model Replication 【发布时间】:2021-10-05 13:34:09 【问题描述】:

因此,我正在尝试在 SQL 中重新创建一个在 Excel 中开发的模型,但我遇到了困难,因为该模型依赖于某些记录的位置。

它被用来估计竞争对手的销售额(单位和美元)。

唯一可用的数据是我们客户的数据,因此这就是用于估算的数据。换句话说,如果没有数据,那么他们使用下一个最接近的记录中的数据和数据,并用相同的值填充中间的所有内容。

我正在尝试在 SQL 中重现,因为我开发了 SSIS 包以将数据泵入 SQL Server,然后从那里创建视图以自动化报告并推送到 Tableau 或 Power BI。这是一个非常手动的过程,有很大的出错空间。

短期修复,我相信特定的支点将帮助我重新创建模型的逻辑。这就是我想要改变的:

Parent ASIN Rank Bounds Groups
1 1 1
5 0 1
7 1 2
10 0 2
12 1 3
14 0 3

但是,我需要一个如下所示的表格:

Groups Lower_Bound Upper_Bound
1 1 5
2 5 7
3 7 10
4 10 12
5 12 14
6 14 18

我曾尝试使用 LAG 和 LEAD(例如 select *, LAG([RB Units],1,0) over (order by [Parent ASIN Rank]) as test1, LEAD([RB Units],1,0) over (order by [Parent ASIN Rank]) as test2,),但问题是有大量没有数据的项目组聚集在一起,因此只有这些组中没有数据的第一个和最后一个项目可以从上一个和/或下一个记录中分配销售(或单位)数据。

例如,如果第 10 行没有销售数据,但第 11 和 12 行有,则 LAG 和 LEAD 将起作用。另一方面,如果第 18、19、20、21 和 22 行没有数据,则只有第 18 和 22 行适用于 LAG 和 LEAD 方法。我不确定我是否可以以某种方式对 LAG 和 LEAD 进行分区,使其能够从下一个最接近的记录中复制数据,一直复制到那些没有数据的大组。

这是我尝试复制的模型示例:

以前的单位值列公式: =IF(E10="",G9,E10)

下一个单位价值列公式: =IF(E10="",H11,E10)

上一页。销售排名值列公式: =IF(E10="",I9,C10)

下一个销售排名值列公式: =IF(E10="",J11,C10)

估计。总单位栏公式: =IF(E10="",(G10-((G10-H10)/(J10-I10))*(C10-I10)),E10)

数据透视表按总计排序,这本质上是父 ASIN 排名基于总计的销售排名,但两者均基于“销售排名:30 天平均”的平均值。提供给我们。

这需要是动态的,因为排名(即总列和父 ASIN 排名)会在数据刷新时发生变化。截至目前,如果第一行中没有包含数据的记录,则此模型会中断。

我正在做自引用的 CTE,但我开始质疑在 SQL 中复制具有这种结构的模型是否可能或是否极其困难。这可能不是最好的估算方法,但这是我公司一直在使用的方法(我相对较新,想帮助他们自动化流程)。

我需要能够说“这条记录来自上面的 X 行(上面最接近的记录不为空)和下面的 Y 行(下面最接近的记录不为空)。

我尝试过使用 ROW_NUMBER() OVER()以多种方式分配 ID。

这是我的脚本之一:

--create view vw_PT_Parent_ASIN_Units
--as
with a1 as (
    select
        a.[Adjusted Parent ASIN],
        a.[Fixed Brand],
        AVG([Sales Rank: 30 days avg#]) as Total
    from
        vw_Keepa_IFCN_Phase1 as a
    where
        [Include ASIN in Analysis?] = 1
    group by
        a.[Adjusted Parent ASIN],a.[Fixed Brand]
),
a2 as
(
    select 
        [Parent ASIN], 
        sum([Ordered Units]) as [Ordered Units],
        sum([Ordered Revenue]) as [Ordered Revenue]
    from 
        vw_RB_Sales
    group by 
        [Parent ASIN]
),
a3 as
(
    select
        a1.*,
        a2.[Parent ASIN] as [RB Parent ASIN],
        [RB Units] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Units]
            end,
        [RB Sales] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Revenue]
            end
    from
        a1
    left join
        a2
    on
        a1.[Adjusted Parent ASIN] = a2.[Parent ASIN]
),
Est_Model as
(
--I need to create a list of ASINs that are sorted by the "Total" 
    select 
        [Adjusted Parent ASIN], 
        ROW_NUMBER() over (order by Total, [Adjusted Parent ASIN]) as [Parent ASIN Rank]
    from 
        a3
    where 
        Total is not null
),
a4 as
(
    select 
        a3.*,
        b.[Parent ASIN Rank]
    from 
        a3
    left join 
        Est_Model as b
    on 
        a3.[Adjusted Parent ASIN] = b.[Adjusted Parent ASIN]
),
a5 as
(
    select distinct 
        a4a.[Adjusted Parent ASIN],
        a4a.[RB Parent ASIN],
        a4a.[Parent ASIN Rank], 
        Bounds=(ROW_NUMBER() over (order by [Parent ASIN Rank]))%2
    from
        a4 as a4a
    where
        a4a.total is not null
    and 
        a4a.[RB Units] is null
)
, test as (
select *,Groups=row_number() over (partition by Bounds order by [parent asin rank]) from a5 --order by [Parent ASIN Rank]
)
select * from test order by [Parent ASIN Rank]


select distinct t1.Groups, t1.[Parent ASIN Rank] as l, t2.[Parent ASIN Rank] as h
from test as t1
inner join test as t2
on t1.[Adjusted Parent ASIN] = t2.[Adjusted Parent ASIN]
where t1.Bounds=1
and t2.bounds =0


select Groups, Bounds, [1] as Lower_Bound, [0] as Upper_Bound
from test
pivot
    (
    sum(Bounds) for [Parent ASIN Rank]
    IN ([1],[0])
    )
as pvt
order by [Parent ASIN Rank]
--select distinct parent asin rank (low),parent asin rank (high),sales,units,total
--but i need to be able to separate the current parent ASIN ranks that you see in the results right now and I should do this by grabbing even and odd numbers
    
    
    /*
    select 
        *,
        ROW_NUMBER() over (order by [Parent ASIN Rank]),
        --DENSE_RANK() over
        (ROW_NUMBER() over (order by [Parent ASIN Rank]))/2,
        (ROW_NUMBER() over (order by [Parent ASIN Rank]))%2
    from
        a4
    where
        total is not null
    and 
        [RB Units] is null
    order by 
        [Parent ASIN Rank]
        */

--pivot so that the ranges are in separate columns and then use that as CTE and say if rank is in between a range then use this units number
--Low | High | Units | Sales
-- 1  |   5  | 2892  |  90186
-- 5  |   7  | 5076  |  121394


--  a4a.*, Test1= case when a4a.[Parent ASIN Rank] <> 1 then row_number() over (partition by a4a.[rb units] order by a4a.[Parent ASIN Rank]) else 0 end
--  , Test2=case when a4a.[rb units] is null then 1 else 0 end
--  , Test3=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]+1 else 0 end
--  , Test4=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]-1 else 0 end


--need to get the range. For example, the blank record falls in-between these two records
--I need to have one column with the low units and one column with the high units, but all in the same row....so there needs to be 2 joins...one to a low table and one to a high table
--This means that for low and high they need to match on the parent ASIN rank and an adjusted parent ASIN rank that matches the ASIN rank of the NULL record
--For example, B01M0ZV2CU has a rank of 5 which means that ranks 4 and 6 need to match that 5


--Need to fix duplicates which probably goes back to Keepa view definitions and joins



--Currently, my method works well for when there is a single row with NULL values, but it does not work for when there are multiple rows with NULL values

下面是该脚本的另一个版本,但这个版本使用 LAG 和 LEAD,它仅适用于一个 NULL 记录:

--create view vw_PT_Parent_ASIN_Units
--as
with a1 as (
    select
        a.[Adjusted Parent ASIN],
        a.[Fixed Brand],
        AVG([Sales Rank: 30 days avg#]) as Total
    from
        vw_Keepa_IFCN_Phase1 as a
    where
        [Include ASIN in Analysis?] = 1
    group by
        a.[Adjusted Parent ASIN],a.[Fixed Brand]
),
a2 as
(
    select 
        [Parent ASIN], 
        sum([Ordered Units]) as [Ordered Units],
        sum([Ordered Revenue]) as [Ordered Revenue]
    from 
        vw_RB_Sales
    group by 
        [Parent ASIN]
),
a3 as
(
    select
        a1.*,
        a2.[Parent ASIN] as [RB Parent ASIN],
        [RB Units] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Units]
            end,
        [RB Sales] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Revenue]
            end
    from
        a1
    left join
        a2
    on
        a1.[Adjusted Parent ASIN] = a2.[Parent ASIN]
),
Est_Model as
(
--I need to create a list of ASINs that are sorted by the "Total" 
    select 
        [Adjusted Parent ASIN], 
        ROW_NUMBER() over (order by Total, [Adjusted Parent ASIN]) as [Parent ASIN Rank]
    from 
        a3
    where 
        Total is not null
),
a4 as
(
    select 
        a3.*,
        b.[Parent ASIN Rank]
    from 
        a3
    left join 
        Est_Model as b
    on 
        a3.[Adjusted Parent ASIN] = b.[Adjusted Parent ASIN]
)
    select 
        *, 
        LAG([RB Units],1,0) over (order by [Parent ASIN Rank]) as test1,
        LEAD([RB Units],1,0) over (order by [Parent ASIN Rank]) as test2,
        --LAG([RB Units],1,0) over (partition by [RB Units] order by [Parent ASIN Rank]) as test11,
        --LEAD([RB Units],1,0) over (partition by [RB Units] order by [Parent ASIN Rank]) as test22,
        test3=case when [rb units] is null then 1 else 0 end
    from
        a4
    where
        total is not null


--  a4a.*, Test1= case when a4a.[Parent ASIN Rank] <> 1 then row_number() over (partition by a4a.[rb units] order by a4a.[Parent ASIN Rank]) else 0 end
--  , Test2=case when a4a.[rb units] is null then 1 else 0 end
--  , Test3=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]+1 else 0 end
--  , Test4=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]-1 else 0 end


--need to get the range. For example, the blank record falls in-between these two records
--I need to have one column with the low units and one column with the high units, but all in the same row....so there needs to be 2 joins...one to a low table and one to a high table
--This means that for low and high they need to match on the parent ASIN rank and an adjusted parent ASIN rank that matches the ASIN rank of the NULL record
--For example, B01M0ZV2CU has a rank of 5 which means that ranks 4 and 6 need to match that 5


--Need to fix duplicates which probably goes back to Keepa view definitions and joins



--Currently, my method works well for when there is a single row with NULL values, but it does not work for when there are multiple rows with NULL values

最后,这是我发布的第二个脚本的输出截图。希望这将有助于澄清一些事情。

接受任何和所有帮助!提前谢谢!

【问题讨论】:

如果您使用LEAD?LAG,不确定为什么要重新加入,但您可能想看看这个***.com/questions/44893970/…,它解释了如何获取前一个非空行。您目前的问题并不能真正回答,因为对于您想要的确切结果以及Parent ASIN Rank 是如何产生的,而且图像不能复制到文本中,还不清楚和漫无边际。 minimal reproducible example 涉及 CREATE TABLEINSERT 语句,带有样本数据和预期结果将有很长的路要走 你好@Charlieface!谢谢您的意见。我已经看了很长时间,所以我认为我提供了足够的信息。无论如何,结果证明 COALESCE 是答案,因为它返回列表中的第一个非空值。它就像一个魅力。所以,谢谢你给我那个链接!没有它,我会迷失一段时间。我有一种感觉,SQL 有一个预编译的函数来做这样的事情。再次谢谢你!如果你愿意,你可以发布 COALESCE 作为答案,或者我可以提供我的脚本的一部分,它最终可以工作。由你决定:) 【参考方案1】:

COALESCE 原来是答案,因为它返回列表中的第一个非空值

【讨论】:

以上是关于TSQL 透视或估计模型复制的主要内容,如果未能解决你的问题,请参考以下文章

TSQL透视多列

使用 Bookshelf.js 在数据透视模型上设置时间戳

透视Hbase数据模型|概念视图|物理视图

透视Hbase数据模型|概念视图|物理视图

摄像机模型

EM算法