在 BigQuery/SQL 中测试交易条目

Posted

技术标签:

【中文标题】在 BigQuery/SQL 中测试交易条目【英文标题】:Testing Trade Entries in BigQuery/SQL 【发布时间】:2020-02-18 15:12:46 【问题描述】:

我在 BigQuery 中有一个包含各种股票价格对的数据集,以及它们的相对 zscores(按日期排序,以百万行为单位):

   +-----+--------+--------+------------+---------------+------------------+---------------+
1  | Row | stock1 | stock2 |    date    |  spreadclose  | logspreadreturn  |   zscore20    |
   +-----+--------+--------+------------+---------------+------------------+---------------+
2  | 1   | AJRD   | MAS    | 27/12/2010 |  0.537230704  |  0.017358199     | -0.251654379  |
3  | 2   | ABEV   | EFOI   | 27/12/2010 | 41.00585106   |  0.014929275     |  1.950810153  |
4  | 3   | AIRI   | REFR   | 27/12/2010 |  0.537688889  |  0.003638617     |  0.707555834  |
5  | 4   | AEO    | RJF    | 27/12/2010 |  0.35009565   | -0.004321474     |  0.265411543  |
6  | 5   | AFL    | TSU    | 27/12/2010 |  0.771122788  | -0.028202112     |  0.247268645  |
   +----+---------+--------+------------+---------------+------------------+---------------+

然后我按以下方式对股票价格进行抽样。我正在尝试根据大查询中的“我想要的列”计算交易条目,但似乎只能根据“(当前(错误))”列实现数字。

   Stock1 Stock2     date       zscore20     (Current wrong)   What I want
1    FITB     MS  4/01/2010     -1.5               1           0
2    FITB     MS  5/01/2010     -1.9               1           0
3    FITB     MS  6/01/2010     -2.3               1           1
4    FITB     MS  7/01/2010     -2.0               1           1
5    FITB     MS  8/01/2010     -1.0               1           1
6    FITB     MS 11/01/2010      0.5              -1           0
7    FITB     MS 12/01/2010      1.5              -1           0
8    FITB     MS 13/01/2010      1.8              -1           0
9    FITB     MS 14/01/2010      2.1              -1          -1
10   FITB     MS 15/01/2010      1.5              -1          -1
11   FITB     MS 19/01/2010      1.3              -1          -1
12   FITB     MS 20/01/2010      0.4              -1          -1
13   FITB     MS 21/01/2010     -0.1               1           0

“我要什么”栏目中的逻辑如下:

在时间序列中,当zscore > 2.0时,标志-1;然后在 zscore 下一个越过零变成负数时退出 (0) 在时间序列中,当zscore

我在较大数据集上运行的当前查询(将处理年份设置为 2010 年):

DECLARE processyear date;

SET processyear = '2012-01-01'; 

INSERT INTO `dataset.main.tradetime`(

-- Calculate tradetimeparameters
WITH tradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    LN(spreadclose) - LN(LAG(spreadclose) OVER (PARTITION BY stock1, stock2 ORDER BY date ASC)) as logspreadreturn,
    SAFE_DIVIDE((spreadclose - sma20), stdev20) as zscore20
FROM `rw-algotrader-264713.mlbprices.dailyspreads`
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),

secondtradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    CASE WHEN zscore20 > 0 THEN -1 WHEN zscore20 < 0 THEN 1 END as tradesignal20
FROM tradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),


thirdtradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    LEAD(tradesignal20) OVER (PARTITION BY stock1, stock2 ORDER BY date ASC) as tradesignal20
FROM secondtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),

fourthtradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    tradesignal20,
    CASE WHEN tradesignal20 < 0 THEN 0 ELSE tradesignal20 END as positivesignals,
    CASE WHEN tradesignal20 > 0 THEN 0 ELSE tradesignal20 END as negativesignals
FROM thirdtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),

fifthtradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    tradesignal20,
    positivesignals,
    negativesignals,
       (case when positivesignals = 0 then 0
             else row_number() over (partition by stock1,stock2, positivesignals, grp order by date ASC)
        end) as positivedaysintrade
from (select stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    tradesignal20,
    positivesignals,
    negativesignals, countif(positivesignals = 0) over (partition by stock1,stock2 order by date ASC) as grp
      from fourthtradetimeparameters
     ) fourthtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),

sixthtradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    tradesignal20,
    positivesignals,
    negativesignals,
    positivedaysintrade,
       (case when negativesignals = 0 then 0
             else row_number() over (partition by stock1,stock2, negativesignals, grp order by date ASC)
        end) as negativedaysintrade
from (select stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    tradesignal20,
    positivesignals,
    negativesignals, 
    positivedaysintrade,countif(negativesignals = 0) over (partition by stock1,stock2 order by date ASC) as grp
      from fifthtradetimeparameters
     ) fifthtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),

seventhtradetimeparameters AS (
SELECT 
    stock1,
    stock2,
    date,
    spreadclose,
    logspreadreturn,
    zscore20,
    tradesignal20,
    positivesignals,
    negativesignals,
    positivedaysintrade,
    negativedaysintrade,
    (positivedaysintrade + negativedaysintrade) as daysintrade,
    (tradesignal20 * logspreadreturn) as sdcreturn20

FROM sixthtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
)

SELECT 
  stock1,
  stock2,
  date,
  spreadclose,
  logspreadreturn,
  zscore20,
  tradesignal20,
  positivesignals,
  negativesignals,
  positivedaysintrade,
  negativedaysintrade,
  daysintrade,
  sdcreturn20
FROM seventhtradetimeparameters)

第二个较短的表格如下:

SELECT * FROM `dataset.tradetime` WHERE stock1 = 'FITB' and stock2 = 'MS' and date >= '2010-01-01'

如果有人可以指导我使用可以实现上述目标的 Bigquery/SQL 代码,我将不胜感激。提前非常感谢,

【问题讨论】:

生成您当前结果的查询真的很有帮助。您的表格与预期结果之间的关系尚不清楚。 嗨,戈登——上面的代码段有帮助吗?还在学习 SQL,感谢您的帮助! 【参考方案1】:

假设您从一个包含前四列的表开始。

然后你就可以使用窗口函数了。如果我理解这些条件,它们可以归结为:

flag = -1 当 zscore20 flag = -1 当 zscore20

然后flag = 1 的类似逻辑。

如果我理解正确,那么您可以使用窗口函数和case 表达式:

select t.*,
       (case when zscore20 > 2.0
             then 1
             when zscore20 > 0 and
                  (max(case when zscore20 > 2.0 then date end) over (partition by stock1, stock2 order by date) >
                   max(case when zscore20 < 0 then date end) over (partition by stock1, stock2 order by date)
                  )
             then 1
             when zscore20 < -2.0
             then -1
             when zscore20 < 0 and
                  (max(case when zscore20 < -2.0 then date end) over (partition by stock1, stock2 order by date) >
                   max(case when zscore20 > 0 then date end) over (partition by stock1, stock2 order by date)
                  )
             then 1
             else 0 
       end) as flag
from t;

【讨论】:

嗨,Gordon - 将尝试一下 - 此段会替换现有的贸易信号标志吗?我还在我的原始问题中包含了一个图表,以使事情变得更容易 另外,请问then time end中的时间字段是什么——需要声明吗? @RashKel 。 . .那应该是date。假设您指定了四列,这将替换所有逻辑。 太棒了,非常感谢 Gordon,我用日期替换了它,而且效果很好。感谢您的帮助【参考方案2】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT Stock1, Stock2, dt_date, zscore20,
  IFNULL(
    LAST_VALUE(flag IGNORE NULLS) 
    OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  , 0) What_I_want
FROM (
  SELECT *,
    CASE
      WHEN zscore20 > 2.0 THEN -1
      WHEN zscore20 < -2.0 THEN 1
      WHEN zscore20 >= 0 AND prev_zscore20 < 0 THEN 0
      WHEN zscore20 < 0 AND prev_zscore20 >= 0 THEN 0
      ELSE NULL
    END flag
  FROM (
    SELECT Stock1, Stock2, dt_date, zscore20,
      LAG(zscore20) OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date) prev_zscore20
    FROM (
      SELECT *, PARSE_DATE('%d/%m/%Y', dt) dt_date
      FROM `project.dataset.table`
    )
  )
)

您可以使用您问题中的示例数据进行测试,如以下示例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'FITB' Stock1, 'MS' Stock2, '4/01/2010' dt, -1.5 zscore20 UNION ALL
  SELECT 'FITB', 'MS', '5/01/2010', -1.9 UNION ALL
  SELECT 'FITB', 'MS', '6/01/2010', -2.3 UNION ALL
  SELECT 'FITB', 'MS', '7/01/2010', -2.0 UNION ALL
  SELECT 'FITB', 'MS', '8/01/2010', -1.0 UNION ALL
  SELECT 'FITB', 'MS', '11/01/2010', 0.5 UNION ALL
  SELECT 'FITB', 'MS', '12/01/2010', 1.5 UNION ALL
  SELECT 'FITB', 'MS', '13/01/2010', 1.8 UNION ALL
  SELECT 'FITB', 'MS', '14/01/2010', 2.1 UNION ALL
  SELECT 'FITB', 'MS', '15/01/2010', 1.5 UNION ALL
  SELECT 'FITB', 'MS', '19/01/2010', 1.3 UNION ALL
  SELECT 'FITB', 'MS', '20/01/2010', 0.4 UNION ALL
  SELECT 'FITB', 'MS', '21/01/2010', -0.1 
)
SELECT Stock1, Stock2, dt_date, zscore20,
  IFNULL(
    LAST_VALUE(flag IGNORE NULLS) 
    OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
  , 0) What_I_want
FROM (
  SELECT *,
    CASE
      WHEN zscore20 > 2.0 THEN -1
      WHEN zscore20 < -2.0 THEN 1
      WHEN zscore20 >= 0 AND prev_zscore20 < 0 THEN 0
      WHEN zscore20 < 0 AND prev_zscore20 >= 0 THEN 0
      ELSE NULL
    END flag
  FROM (
    SELECT Stock1, Stock2, dt_date, zscore20,
      LAG(zscore20) OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date) prev_zscore20
    FROM (
      SELECT *, PARSE_DATE('%d/%m/%Y', dt) dt_date
      FROM `project.dataset.table`
    )
  )
)
-- ORDER BY dt_date  

结果

Row Stock1  Stock2  dt_date     zscore20    What_I_want  
1   FITB    MS      2010-01-04  -1.5        0    
2   FITB    MS      2010-01-05  -1.9        0    
3   FITB    MS      2010-01-06  -2.3        1    
4   FITB    MS      2010-01-07  -2.0        1    
5   FITB    MS      2010-01-08  -1.0        1    
6   FITB    MS      2010-01-11  0.5         0    
7   FITB    MS      2010-01-12  1.5         0    
8   FITB    MS      2010-01-13  1.8         0    
9   FITB    MS      2010-01-14  2.1         -1   
10  FITB    MS      2010-01-15  1.5         -1   
11  FITB    MS      2010-01-19  1.3         -1   
12  FITB    MS      2010-01-20  0.4         -1   
13  FITB    MS      2010-01-21  -0.1        0    

【讨论】:

非常感谢 Mikhail - 我接受了 Gordon 的回答,因为他先回复了,而且效果很好,但也感谢您的回复 @RashKel - 尊重 Gordon,因为他是 SQL 的超级专家 - 我刚刚再次检查了 Gordon 的答案,但它绝不会产生预期的输出! - 这实际上是我跳出并提供“有效”答案的原因

以上是关于在 BigQuery/SQL 中测试交易条目的主要内容,如果未能解决你的问题,请参考以下文章

Big Query (SQL) 将多列转换为行/数组

BigQuery/SQL - 特定变体的拆分值

以太坊系节点交易金额有效性判断

什么是BIG?如何买BIG?

在 DirectByteBuffer 中存储 Big Hashmap 的方法

NodeJS 中使用 Async/Await 的顺序分类账条目