需要帮助使用在两个单独表的列之间进行算术运算的函数将列添加到一个表

Posted

技术标签:

【中文标题】需要帮助使用在两个单独表的列之间进行算术运算的函数将列添加到一个表【英文标题】:need help adding column to one table using function that does arithmetic operations between columns from two separate tables 【发布时间】:2015-12-30 23:10:36 【问题描述】:

我正在尝试使用 sequel-pro 将列“wOBA”添加到 mysql 中的表“starting_pitcher_stats”。下面是对“starting_pitcher_stats”表中的九个变量执行算术运算的函数的代码。特别是,该函数收集多个变量的值,对其中一些变量(下面的分子)应用不同的权重(系数),并将总和除以更多变量的加法和减法。所有这些变量都驻留在“starting_pitcher_stats”表中。算术运算用下面的公式表示(系数是乘以下面分子中的每个变量的值):

wOBA=(.69*walks_a + .72*HBP + .89*singles_a + 1.27*doubles_a + 1.62*triples_a+ 2.10*HR_a)/(at_bats+walks_a+SF+HBP)

每个重量因年份而异。每年不同的权重来自“GUTS”表。

我遇到的第一个难题是获取函数的正确代码。第二个是正确的代码语法,用于实际调用此函数并使用每个“Starting_Pitcher”的每年(赛季)的每场比赛的正确加权 wOBA 值填充新列。

该函数是使用下面的代码创建的,并在我的函数和过程列表中列为函数“wOBA”。然而,sequel pro 中功能名称旁边的小***/旋钮由于某种原因是灰色的。直到我找到正确的代码来调用它,我才会知道是否有任何错误。

请询问我可以提供的更多信息以澄清任何事情。

提前谢谢你。

DELIMITER $$
    CREATE FUNCTION wOBA(wOBA DECIMAL(10,3))
    RETURNS DECIMAL(10,3)
    BEGIN
        DECLARE wOBA decimal(10,3);
        SET wOBA = 0;
        SELECT cast((SELECT SUM(weighted_BB) as wBB_sum 
            FROM (
                SELECT g.wBB*SUM(if(e.event_CD=14,1,0)) as weighted_BB 
                FROM events e 
                INNER JOIN GUTS g 
                    ON substring(e.game_ID,4,4)=g.season 
                WHERE PIT_ID=Starting_Pitcher 
                GROUP BY g.season) 
            as walks_a)  
            + (SELECT SUM(weighted_HBP) as wHBP_sum 
            FROM (
                SELECT g.wHBP*SUM(if(e.event_CD=16,1,0)) as weighted_HBP 
                FROM events e 
                INNER JOIN GUTS g 
                    ON substring(e.game_ID,4,4)=g.season 
                WHERE PIT_ID=Starting_Pitcher 
                GROUP BY g.season) 
            as HBP)     
            + (SELECT SUM(weighted_1B) as w1B_sum 
            FROM (
                SELECT g.w1B*SUM(if(e.event_CD=20,1,0)) as weighted_1B 
                FROM events e 
                INNER JOIN GUTS g 
                    ON substring(e.game_ID,4,4)=g.season 
                WHERE PIT_ID=Starting_Pitcher 
                GROUP BY g.season) 
            as singles_a)       
            + (SELECT SUM(weighted_2B) as w2B_sum 
            FROM ( 
                SELECT g.w2B*SUM(if(e.event_CD=21,1,0)) as weighted_2B 
                FROM events e 
                INNER JOIN GUTS g 
                    ON substring(e.game_ID,4,4)=g.season 
                WHERE PIT_ID=Starting_Pitcher 
                GROUP BY g.season) 
            as doubles_a)       
            + (SELECT SUM(weighted_3B) as w3B_sum 
            FROM (
                SELECT g.w3B*SUM(if(e.event_CD=22,1,0)) as weighted_3B 
                FROM events e 
                INNER JOIN GUTS g 
                    ON substring(e.game_ID,4,4)=g.season 
                WHERE PIT_ID=Starting_Pitcher 
                GROUP BY g.season) 
            as triples_a)       
            + (SELECT SUM(weighted_HR) as wHR_sum 
            FROM (
                SELECT g.wHR*SUM(if(e.event_CD=23,1,0)) as weighted_HR 
                FROM events e 
                INNER JOIN GUTS g 
                    ON substring(e.game_ID,4,4)=g.season 
                WHERE PIT_ID=Starting_Pitcher 
                GROUP BY g.season) 
            as HR_a) as decimal(10,3))
            /
            cast(SUM(if(e.ab_fl="T",1,0)) 
                + SUM(if(e.event_cd=14,1,0)) 
                + SUM(if(e.SF_fl="T",1,0)) 
                + SUM(if(e.event_cd=16,1,0)) as unsigned) INTO wOBA 
            FROM events e
            WHERE e.PIT_ID=Starting_Pitcher AND PIT_START_FL = "T"
            LIMIT 1;
        RETURN wOBA;
    END
    $$
    DELIMITER ;

Darwin,这是events 表的两个屏幕截图。第一个是结构,第二个是一些内容(因为并非所有内容都适合镜头):

[

这里是GUTS表的结构和内容的截图。

以下是事件表结构的屏幕截图,显示了函数中的字段(及其定义):

更新:

UPDATE retrosheet.starting_pitcher_stats 
SET starting_pitcher_stats.wOBA =(SELECT
(
   (g.wBB * SUM(IF(e.event_cd = 14, 1, 0)))
   + (g.wHBP * SUM(IF(e.event_cd = 16, 1, 0)))
   + (g.w1B  * SUM(IF(e.event_cd = 20, 1, 0)))
   + (g.w2B  * SUM(IF(e.event_cd = 21, 1, 0)))
   + (g.w3B  * SUM(IF(e.event_cd = 22, 1, 0)))
   + (g.wHR  * SUM(IF(e.event_cd = 23, 1, 0)))
   )
   /
   (
     SUM(IF(e.ab_fl = 'T',   1, 0))
   + SUM(IF(e.event_cd = 14, 1, 0))
   + SUM(IF(e.sf_fl = 'T',   1, 0))
   + SUM(IF(e.event_cd = 16, 1, 0))
  ) AS wOBA
  FROM events AS e, GUTS AS g
  WHERE e.YEAR_ID = g.SEASON_ID
    AND e.PIT_START_FL= 'T'
    AND e.PIT_ID = Starting_Pitcher)

查询只是继续运行。我会继续调整的。

更新: start_pitcher_stats 表的屏幕截图

更新:

好的,我正在尝试创建一个 wOBA 列作为新表的一部分,其中包含 wOBA 其他组件的列。

但是,查询会一直持续下去。如何缩短运行时间?

DROP TABLE IF EXISTS starting_pitcher_wOBA;
CREATE TABLE starting_pitcher_wOBA 
SELECT
a.YEAR_ID
,
a.GAME_ID
,
a.PIT_ID
,
a.wBB
,
a.wHBP
,
a.w1B
,
a.w2B
,
a.w3B
,
a.wHR
,
a.u_walks_a
,
a.HBP
,
a.singles_a
,
a.doubles_a
,
a.triples_a
,
a.HR_a
,
a.at_bats
,
a.sacrifice_flies_a
,
a.wOBA
FROM
(
SELECT 
g.YEAR_ID
,
h.GAME_ID
,
e.PIT_ID
,
g.wBB
,
g.wHBP
,
g.w1B
,
g.w2B
,
g.w3B
,
g.wHR
,
SUM(IF(e.event_cd = 14, 1, 0)) AS u_walks_a
,
SUM(IF(e.event_cd = 16, 1, 0)) AS HBP
,
SUM(IF(e.event_cd = 20, 1, 0)) AS singles_a
,
SUM(IF(e.event_cd = 21, 1, 0)) AS doubles_a
,
SUM(IF(e.event_cd = 22, 1, 0)) AS triples_a
,
SUM(IF(e.event_cd = 23, 1, 0)) AS HR_a
,
SUM(IF(e.ab_fl = 'T',   1, 0)) AS at_bats
,
SUM(IF(e.sf_fl = 'T',   1, 0)) AS sacrifice_flies_a
,
(
(
   (g.wBB * SUM(IF(e.event_cd = 14, 1, 0))) 
   + (g.wHBP * SUM(IF(e.event_cd = 16, 1, 0))) 
   + (g.w1B  * SUM(IF(e.event_cd = 20, 1, 0))) 
   + (g.w2B  * SUM(IF(e.event_cd = 21, 1, 0))) 
   + (g.w3B  * SUM(IF(e.event_cd = 22, 1, 0))) 
   + (g.wHR  * SUM(IF(e.event_cd = 23, 1, 0))) 
   )
   /
   (
     SUM(IF(e.ab_fl = 'T',   1, 0)) 
   + SUM(IF(e.event_cd = 14, 1, 0)) 
   + SUM(IF(e.sf_fl = 'T',   1, 0)) 
   + SUM(IF(e.event_cd = 16, 1, 0)) 
  ) 
 )  AS wOBA
FROM events AS e, GUTS AS g, game AS h
WHERE e.PIT_START_FL= 'T' 
GROUP BY g.YEAR_ID, h.GAME_ID,e.PIT_ID) AS a
INNER JOIN GUTS AS g
ON 
a.YEAR_ID=g.YEAR_ID
INNER JOIN game AS h
ON
a.GAME_ID = h.GAME_ID
INNER JOIN events AS e
ON
a.PIT_ID = e.PIT_ID

【问题讨论】:

GUTS 是否每个赛季只有一个记录,我们是否跨多个赛季运行此计算? 子句WHERE PIT_ID=Starting_Pitcher中,这两个值是从哪里来的? 嗨,达尔文,是的,GUTS 每个赛季都有一个记录。 'PIT_ID' 来自“events”表,“Starting_Pitcher”来自“starting_pitcher_stats”表。 events是什么样的? 【参考方案1】:

我们将从清理查询开始。您应该尽可能尝试在每一行上执行计算,而不是执行多个垂直子查询,因为这样可以避免 DBMS 对同一个表进行多次遍历。

SELECT
  (
   ( (g.wbb  * SUM(IF(e.event_cd = 14, 1, 0)))
   + (g.whbp * SUM(IF(e.event_cd = 16, 1, 0)))
   + (g.w1b  * SUM(IF(e.event_cd = 20, 1, 0)))
   + (g.w2b  * SUM(IF(e.event_cd = 21, 1, 0)))
   + (g.w3b  * SUM(IF(e.event_cd = 22, 1, 0)))
   + (g.whr  * SUM(IF(e.event_cd = 23, 1, 0)))
   )
   /
   (
     SUM(IF(e.ab_fl = 'T',   1, 0))
   + SUM(IF(e.event_cd = 14, 1, 0))
   + SUM(IF(e.sf_fl = 'T',   1, 0))
   + SUM(IF(e.event_cd = 16, 1, 0))
   )
  ) AS woba
  FROM events e, guts g
  WHERE e.year_id = g.season_id
    AND e.pit_start_fl = 'T'
    AND e.pit_id = starting_pitcher
  GROUP BY g.season;

假设我没有在某处丢掉逗号,这将为指定的首发投手每年返回一列 woba

请注意,我加入了e.year_id 而非SUBSTRING(e.game_ID,4,4) 上的表格;这避免了在每条记录上调用 SUBSTRING() 的开销。这种事情看起来很小,但它可以在一张大桌子上迅速加起来。

这应该足以让你开始。

【讨论】:

谢谢达尔文!当我运行它时,我收到错误“where 子句中的未知列 Starting_Pitcher”。然后我意识到我需要定义要填充 wOBA 的表。请参阅我在其中添加“更新”代码以填充列的 OP。查询继续运行和运行。我会继续尝试玩它。当我在最后尝试使用“GROUP by g.SEASON_ID”时,它给了我错误。 我在上面的原始帖子中添加了starting_pitcher_stats表结构的屏幕截图。 Darwin,您如何看待我上面创建 wOBA 列的尝试,作为创建包含 wOBA 列以及构成 wOBA 的变量的新表的一部分?问题是查询需要永远。关于如何提高流程效率的任何想法?我在starting_pitcher_stats 表中确实有u_walks_a、singles_a、doubles_a、triples_a、HR_a、HBP、at_bats 和牺牲_flies_a 的列。如果我从这些列中提取而不是从事件表中获取它们会更快吗?谢谢。

以上是关于需要帮助使用在两个单独表的列之间进行算术运算的函数将列添加到一个表的主要内容,如果未能解决你的问题,请参考以下文章

Linux Shell编程之算术运算

算术和关系运算符

组合来自 2 个单独的 SQL 表的列数据

如何用matlab编写矩阵运算程序?

算术运算符++和--的用法

C++算术运算符