SQL连接和条件求和
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了SQL连接和条件求和相关的知识,希望对你有一定的参考价值。
我有两个表,设置如下:
PMmx
- 起始 - 目的地矩阵的表格版本
Origin Destination Trips
1 1 0.2
2 1 0.3
3 1 0.4
. . .
. . .
1 1101 0.6
2 1101 0.7
3 1101 0.8
. . .
. . .
1101 1 0.2
1101 2 0.3
1101 3 0.4
ZE
- 一个区域等价的表
Precinct Zone
1 1101
2 1102
3 1111
我想在PMmx
表中选择与Zone
表中的ZE
列匹配的行条目。例如:
Origin Destination Trips
1 1101 0.6
2 1101 0.7
3 1101 0.8
. . .
. . .
1101 1 0.2
1101 2 0.3
1101 3 0.4
我还想创建一个名为Distribution
的新列,它计算Trips/(Total Trips)
,其中总行程将在特定区域数上求和(通过Origin
或Destination
,取决于哪个列与区域等效Zone
数相匹配)。
例如,对于Origin
1,Destination
1101,我希望该行条目的新Distribution
值为0.6/(0.6+0.7+0.8)
。
我试过以下代码
SELECT
PMmx.Origin as Origin
,PMmx.Destination as Destination
,PMmx.Trips/sum(PMmx.Trips) as 'Distribution'
FROM PMmx
inner join ZE on Origin=ZE.Zone or Destination=ZE.Zone
Group by Origin, Destination, Trips
我不确定这是否会产生正确的结果,因为没有group by子句我得到Column '2DVISUM_2031PMmx_unpiv.Origin' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
并且通过group by子句我得到Divide by zero error encountered.
从inner join
不应该有任何sums
为零,所以我不确定为什么我得到这个错误。
请帮忙!
编辑:我现在使用查询获取重复行
with cte as (
select
origin, destination, trips
, SUM(Trips) over(partition by Pmx.Origin) sum_trips
, trips / SUM(Trips) over(partition by Pmx.Origin) trips_div
from Pmx
inner join ZE on Pmx.Origin = ZE.Zone
)
select
origin, destination, trips, sum_trips, trips_div
from cte
union all
select
destination, origin, trips, sum_trips, trips_div
from cte
更新了表以显示错误:
Z E:
Precinct Zone
1 1101
2 1102
3 1111
4 1211
PMX:
Origin Destination Trips
1 1 0.20
2 1 0.30
3 1 0.40
1 1101 0.60
2 1101 0.70
3 1101 0.80
1101 1 0.20
1101 2 0.30
1101 3 0.40
1101 1211 0.60
1211 1101 0.50
输出包含具有不同行程值的重复项:
origin destination trips sum_trips trips_div
1101 1 0.20 1.50 0.13333333333333333333333333
1101 2 0.30 1.50 0.20000000000000000000000000
1101 3 0.40 1.50 0.26666666666666666666666666
1101 1211 0.60 1.50 0.40000000000000000000000000
1211 1101 0.50 0.50 1.00000000000000000000000000
1 1101 0.20 1.50 0.13333333333333333333333333
2 1101 0.30 1.50 0.20000000000000000000000000
3 1101 0.40 1.50 0.26666666666666666666666666
1211 1101 0.60 1.50 0.40000000000000000000000000
1101 1211 0.50 0.50 1.00000000000000000000000000
编辑2:我想创建一个'if语句',以便如果Pmx.origin =ZE.Zone
然后trips_div
是trips/SUM(Trips) over(partition by Pmx.Origin)
如上所述。然而,如果Pmx.origin =ZE.Zone
和Pmx.destination=ZE.Zone
然后我想要trips_div
仍然是trips/SUM(Trips) over(partition by Pmx.Origin)
。当Pmx.origin does not equal ZE.Zone
和Pmx.destination=ZE.Zone
然后trips/SUM(Trips) over(partition by Pmx.Destination)
。我尝试过各种各样的case when
语句,但似乎无法让它发挥作用。
我希望输出为:
origin destination trips sum_trips trips_div
1 1101 0.20 2.10 0.0952380952380952
2 1101 0.30 2.10 0.1428571428571429
3 1101 0.40 2.10 0.1904761904761905
1101 1 0.20 1.50 0.1333333333333333
1101 2 0.30 1.50 0.2000000000000000
1101 3 0.40 1.50 0.2666666666666666
1101 1211 0.60 1.50 0.4000000000000000
1211 1101 0.50 0.50 1.0000000000000000
如果我了解您的要求,我认为您可以使用稍微不同的方法来获得总和,这使得在源表的每一行上都可以使用该总和。有了这个,你不需要group by子句。
SELECT
PMmx.Origin as Origin
, PMmx.Destination as Destination
, (PMmx.Trips/sum(PMmx.Trips) over(partition by Destination)) as 'Distribution'
FROM PMmx
inner join ZE on Origin=ZE.Zone or Destination=ZE.Zone
MS SQL Server 2014架构设置:
CREATE TABLE Pmx
([Origin] int, [Destination] int, [Trips] decimal(12,2))
;
INSERT INTO Pmx
([Origin], [Destination], [Trips])
VALUES
(1, 1, 0.2),
(2, 1, 0.3),
(3, 1, 0.4),
(1, 1101, 0.6),
(2, 1101, 0.7),
(3, 1101, 0.8),
(1101, 1, 0.2),
(1101, 2, 0.3),
(1101, 3, 0.4)
;
CREATE TABLE ZE
([Precinct] int, [Zone] int)
;
INSERT INTO ZE
([Precinct], [Zone])
VALUES
(1, 1101),
(2, 1102),
(3, 1111)
;
查询1:
with cte as (
select
origin, destination, trips
, SUM(Trips) over(partition by Pmx.Origin) sum_trips
, trips / SUM(Trips) over(partition by Pmx.Origin) trips_div
from Pmx
inner join ZE on Pmx.Origin = ZE.Zone
)
select
origin, destination, trips, sum_trips, trips_div
from cte
union -- changed to union so duplication is avoided
select
destination, origin, trips, sum_trips, trips_div
from cte
| origin | destination | trips | sum_trips | trips_div |
|--------|-------------|-------|-----------|--------------------|
| 1101 | 1 | 0.2 | 0.9 | 0.2222222222222222 |
| 1101 | 2 | 0.3 | 0.9 | 0.3333333333333333 |
| 1101 | 3 | 0.4 | 0.9 | 0.4444444444444444 |
| 1 | 1101 | 0.2 | 0.9 | 0.2222222222222222 |
| 2 | 1101 | 0.3 | 0.9 | 0.3333333333333333 |
| 3 | 1101 | 0.4 | 0.9 | 0.4444444444444444 |
part 2
MS SQL Server 2014架构设置:
CREATE TABLE Pmx
([Origin] int, [Destination] int, [Trips] decimal(12,2))
;
INSERT INTO Pmx
([Origin], [Destination], [Trips])
VALUES
(1, 1, 0.20),
(2, 1, 0.30),
(3, 1, 0.40),
(1, 1101, 0.60),
(2, 1101, 0.70),
(3, 1101, 0.80),
(1101, 1, 0.20),
(1101, 2, 0.30),
(1101, 3, 0.40),
(1101, 1211, 0.60),
(1211, 1101, 0.50)
;
CREATE TABLE ZE
([Precinct] int, [Zone] int)
;
INSERT INTO ZE
([Precinct], [Zone])
VALUES
(1, 1101),
(2, 1102),
(3, 1111),
(4, 1211)
;
查询1:
with cte as (
select
origin, destination, trips
, SUM(Trips) over(partition by Pmx.Origin) sum_trips
, trips / SUM(Trips) over(partition by Pmx.Origin) trips_div
from Pmx
inner join ZE on Pmx.Origin = ZE.Zone
)
select
origin, destination, trips, sum_trips, trips_div
from cte
union
select
destination, origin, trips, sum_trips, trips_div
from cte
order by 1,2,3,4
| origin | destination | trips | sum_trips | trips_div |
|--------|-------------|-------|-----------|---------------------|
| 1 | 1101 | 0.2 | 1.5 | 0.13333333333333333 |
| 2 | 1101 | 0.3 | 1.5 | 0.2 |
| 3 | 1101 | 0.4 | 1.5 | 0.26666666666666666 |
| 1101 | 1 | 0.2 | 1.5 | 0.13333333333333333 |
| 1101 | 2 | 0.3 | 1.5 | 0.2 |
| 1101 | 3 | 0.4 | 1.5 | 0.26666666666666666 |
| 1101 | 1211 | 0.5 | 0.5 | 1 |
| 1101 | 1211 | 0.6 | 1.5 | 0.4 |
| 1211 | 1101 | 0.5 | 0.5 | 1 |
| 1211 | 1101 | 0.6 | 1.5 | 0.4 |
以上是关于SQL连接和条件求和的主要内容,如果未能解决你的问题,请参考以下文章