在 BigQuery/SQL 中测试交易条目
Posted
技术标签:
【中文标题】在 BigQuery/SQL 中测试交易条目【英文标题】:Testing Trade Entries in BigQuery/SQL 【发布时间】:2020-02-18 15:12:46 【问题描述】:我在 BigQuery 中有一个包含各种股票价格对的数据集,以及它们的相对 zscores(按日期排序,以百万行为单位):
+-----+--------+--------+------------+---------------+------------------+---------------+
1 | Row | stock1 | stock2 | date | spreadclose | logspreadreturn | zscore20 |
+-----+--------+--------+------------+---------------+------------------+---------------+
2 | 1 | AJRD | MAS | 27/12/2010 | 0.537230704 | 0.017358199 | -0.251654379 |
3 | 2 | ABEV | EFOI | 27/12/2010 | 41.00585106 | 0.014929275 | 1.950810153 |
4 | 3 | AIRI | REFR | 27/12/2010 | 0.537688889 | 0.003638617 | 0.707555834 |
5 | 4 | AEO | RJF | 27/12/2010 | 0.35009565 | -0.004321474 | 0.265411543 |
6 | 5 | AFL | TSU | 27/12/2010 | 0.771122788 | -0.028202112 | 0.247268645 |
+----+---------+--------+------------+---------------+------------------+---------------+
然后我按以下方式对股票价格进行抽样。我正在尝试根据大查询中的“我想要的列”计算交易条目,但似乎只能根据“(当前(错误))”列实现数字。
Stock1 Stock2 date zscore20 (Current wrong) What I want
1 FITB MS 4/01/2010 -1.5 1 0
2 FITB MS 5/01/2010 -1.9 1 0
3 FITB MS 6/01/2010 -2.3 1 1
4 FITB MS 7/01/2010 -2.0 1 1
5 FITB MS 8/01/2010 -1.0 1 1
6 FITB MS 11/01/2010 0.5 -1 0
7 FITB MS 12/01/2010 1.5 -1 0
8 FITB MS 13/01/2010 1.8 -1 0
9 FITB MS 14/01/2010 2.1 -1 -1
10 FITB MS 15/01/2010 1.5 -1 -1
11 FITB MS 19/01/2010 1.3 -1 -1
12 FITB MS 20/01/2010 0.4 -1 -1
13 FITB MS 21/01/2010 -0.1 1 0
“我要什么”栏目中的逻辑如下:
在时间序列中,当zscore > 2.0时,标志-1;然后在 zscore 下一个越过零变成负数时退出 (0) 在时间序列中,当zscore我在较大数据集上运行的当前查询(将处理年份设置为 2010 年):
DECLARE processyear date;
SET processyear = '2012-01-01';
INSERT INTO `dataset.main.tradetime`(
-- Calculate tradetimeparameters
WITH tradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
LN(spreadclose) - LN(LAG(spreadclose) OVER (PARTITION BY stock1, stock2 ORDER BY date ASC)) as logspreadreturn,
SAFE_DIVIDE((spreadclose - sma20), stdev20) as zscore20
FROM `rw-algotrader-264713.mlbprices.dailyspreads`
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),
secondtradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
CASE WHEN zscore20 > 0 THEN -1 WHEN zscore20 < 0 THEN 1 END as tradesignal20
FROM tradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),
thirdtradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
LEAD(tradesignal20) OVER (PARTITION BY stock1, stock2 ORDER BY date ASC) as tradesignal20
FROM secondtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),
fourthtradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
CASE WHEN tradesignal20 < 0 THEN 0 ELSE tradesignal20 END as positivesignals,
CASE WHEN tradesignal20 > 0 THEN 0 ELSE tradesignal20 END as negativesignals
FROM thirdtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),
fifthtradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
positivesignals,
negativesignals,
(case when positivesignals = 0 then 0
else row_number() over (partition by stock1,stock2, positivesignals, grp order by date ASC)
end) as positivedaysintrade
from (select stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
positivesignals,
negativesignals, countif(positivesignals = 0) over (partition by stock1,stock2 order by date ASC) as grp
from fourthtradetimeparameters
) fourthtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),
sixthtradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
positivesignals,
negativesignals,
positivedaysintrade,
(case when negativesignals = 0 then 0
else row_number() over (partition by stock1,stock2, negativesignals, grp order by date ASC)
end) as negativedaysintrade
from (select stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
positivesignals,
negativesignals,
positivedaysintrade,countif(negativesignals = 0) over (partition by stock1,stock2 order by date ASC) as grp
from fifthtradetimeparameters
) fifthtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
),
seventhtradetimeparameters AS (
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
positivesignals,
negativesignals,
positivedaysintrade,
negativedaysintrade,
(positivedaysintrade + negativedaysintrade) as daysintrade,
(tradesignal20 * logspreadreturn) as sdcreturn20
FROM sixthtradetimeparameters
WHERE date >= DATE_SUB(processyear, INTERVAL 1 WEEK) and date < DATE_ADD(processyear, INTERVAL 1 YEAR)
)
SELECT
stock1,
stock2,
date,
spreadclose,
logspreadreturn,
zscore20,
tradesignal20,
positivesignals,
negativesignals,
positivedaysintrade,
negativedaysintrade,
daysintrade,
sdcreturn20
FROM seventhtradetimeparameters)
第二个较短的表格如下:
SELECT * FROM `dataset.tradetime` WHERE stock1 = 'FITB' and stock2 = 'MS' and date >= '2010-01-01'
如果有人可以指导我使用可以实现上述目标的 Bigquery/SQL 代码,我将不胜感激。提前非常感谢,
【问题讨论】:
生成您当前结果的查询真的很有帮助。您的表格与预期结果之间的关系尚不清楚。 嗨,戈登——上面的代码段有帮助吗?还在学习 SQL,感谢您的帮助! 【参考方案1】:假设您从一个包含前四列的表开始。
然后你就可以使用窗口函数了。如果我理解这些条件,它们可以归结为:
flag = -1
当 zscore20
flag = -1
当 zscore20
然后flag = 1
的类似逻辑。
如果我理解正确,那么您可以使用窗口函数和case
表达式:
select t.*,
(case when zscore20 > 2.0
then 1
when zscore20 > 0 and
(max(case when zscore20 > 2.0 then date end) over (partition by stock1, stock2 order by date) >
max(case when zscore20 < 0 then date end) over (partition by stock1, stock2 order by date)
)
then 1
when zscore20 < -2.0
then -1
when zscore20 < 0 and
(max(case when zscore20 < -2.0 then date end) over (partition by stock1, stock2 order by date) >
max(case when zscore20 > 0 then date end) over (partition by stock1, stock2 order by date)
)
then 1
else 0
end) as flag
from t;
【讨论】:
嗨,Gordon - 将尝试一下 - 此段会替换现有的贸易信号标志吗?我还在我的原始问题中包含了一个图表,以使事情变得更容易 另外,请问then time end
中的时间字段是什么——需要声明吗?
@RashKel 。 . .那应该是date
。假设您指定了四列,这将替换所有逻辑。
太棒了,非常感谢 Gordon,我用日期替换了它,而且效果很好。感谢您的帮助【参考方案2】:
以下是 BigQuery 标准 SQL
#standardSQL
SELECT Stock1, Stock2, dt_date, zscore20,
IFNULL(
LAST_VALUE(flag IGNORE NULLS)
OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
, 0) What_I_want
FROM (
SELECT *,
CASE
WHEN zscore20 > 2.0 THEN -1
WHEN zscore20 < -2.0 THEN 1
WHEN zscore20 >= 0 AND prev_zscore20 < 0 THEN 0
WHEN zscore20 < 0 AND prev_zscore20 >= 0 THEN 0
ELSE NULL
END flag
FROM (
SELECT Stock1, Stock2, dt_date, zscore20,
LAG(zscore20) OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date) prev_zscore20
FROM (
SELECT *, PARSE_DATE('%d/%m/%Y', dt) dt_date
FROM `project.dataset.table`
)
)
)
您可以使用您问题中的示例数据进行测试,如以下示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'FITB' Stock1, 'MS' Stock2, '4/01/2010' dt, -1.5 zscore20 UNION ALL
SELECT 'FITB', 'MS', '5/01/2010', -1.9 UNION ALL
SELECT 'FITB', 'MS', '6/01/2010', -2.3 UNION ALL
SELECT 'FITB', 'MS', '7/01/2010', -2.0 UNION ALL
SELECT 'FITB', 'MS', '8/01/2010', -1.0 UNION ALL
SELECT 'FITB', 'MS', '11/01/2010', 0.5 UNION ALL
SELECT 'FITB', 'MS', '12/01/2010', 1.5 UNION ALL
SELECT 'FITB', 'MS', '13/01/2010', 1.8 UNION ALL
SELECT 'FITB', 'MS', '14/01/2010', 2.1 UNION ALL
SELECT 'FITB', 'MS', '15/01/2010', 1.5 UNION ALL
SELECT 'FITB', 'MS', '19/01/2010', 1.3 UNION ALL
SELECT 'FITB', 'MS', '20/01/2010', 0.4 UNION ALL
SELECT 'FITB', 'MS', '21/01/2010', -0.1
)
SELECT Stock1, Stock2, dt_date, zscore20,
IFNULL(
LAST_VALUE(flag IGNORE NULLS)
OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
, 0) What_I_want
FROM (
SELECT *,
CASE
WHEN zscore20 > 2.0 THEN -1
WHEN zscore20 < -2.0 THEN 1
WHEN zscore20 >= 0 AND prev_zscore20 < 0 THEN 0
WHEN zscore20 < 0 AND prev_zscore20 >= 0 THEN 0
ELSE NULL
END flag
FROM (
SELECT Stock1, Stock2, dt_date, zscore20,
LAG(zscore20) OVER(PARTITION BY Stock1, Stock2 ORDER BY dt_date) prev_zscore20
FROM (
SELECT *, PARSE_DATE('%d/%m/%Y', dt) dt_date
FROM `project.dataset.table`
)
)
)
-- ORDER BY dt_date
结果
Row Stock1 Stock2 dt_date zscore20 What_I_want
1 FITB MS 2010-01-04 -1.5 0
2 FITB MS 2010-01-05 -1.9 0
3 FITB MS 2010-01-06 -2.3 1
4 FITB MS 2010-01-07 -2.0 1
5 FITB MS 2010-01-08 -1.0 1
6 FITB MS 2010-01-11 0.5 0
7 FITB MS 2010-01-12 1.5 0
8 FITB MS 2010-01-13 1.8 0
9 FITB MS 2010-01-14 2.1 -1
10 FITB MS 2010-01-15 1.5 -1
11 FITB MS 2010-01-19 1.3 -1
12 FITB MS 2010-01-20 0.4 -1
13 FITB MS 2010-01-21 -0.1 0
【讨论】:
非常感谢 Mikhail - 我接受了 Gordon 的回答,因为他先回复了,而且效果很好,但也感谢您的回复 @RashKel - 尊重 Gordon,因为他是 SQL 的超级专家 - 我刚刚再次检查了 Gordon 的答案,但它绝不会产生预期的输出! - 这实际上是我跳出并提供“有效”答案的原因以上是关于在 BigQuery/SQL 中测试交易条目的主要内容,如果未能解决你的问题,请参考以下文章