获取过去 20 周三的数据:aws redshift

Posted

技术标签:

【中文标题】获取过去 20 周三的数据:aws redshift【英文标题】:Getting data of past 20 Wednesday: aws redshift 【发布时间】:2019-04-22 13:53:20 【问题描述】:

我必须为 AWS-redshift 编写此查询,以获取最近 20 个星期三的数据,帮助!

SELECT 
    count(user_leads.id) AS lead_count, DATE(user_leads.created)
FROM
    user_leads
    join courses on user_leads.course_id = courses.id
    left join users on user_leads.user_id = users.id
where
    user_leads.created >= '2020-01-31'
        AND user_leads.created < '2020-03-03'
        AND courses.course_type !=4
        AND users.email not like "%edureka%"
        AND users.first_name not like "%test%"
        AND weekday(user_leads) = 2
GROUP BY DATE(user_leads.created) DESC;

【问题讨论】:

转换成strreplace('][','],[')? 【参考方案1】:

使用str.replace()

someFile.json:

[
    "Date",
    "17/04/2019",
    "Skill",
    "Travis",
    "Repository",
    "27,699 repository results"
][
    "Date",
    "17/04/2019",
    "Skill",
    "Kotlin",
    "Repository",
    "55,752 repository results"
]

因此

with open('someFile.json', 'r') as fp:
    content = fp.readlines()        
    content = [l.strip() for l in content if l.strip()]
    for line in content:
       if '][' in line:
           print(line.replace('][','],['))
       else:
           print(line)

输出

[
"Date",
"17/04/2019",
"Skill",
"Travis",
"Repository",
"27,699 repository results"
],[
"Date",
"17/04/2019",
"Skill",
"Kotlin",
"Repository",
"55,752 repository results"
]

编辑

一个看起来像 json 的文件应该是:

someFile.json:

[
    
        "date": "Date",
        "dt": "17/04/2019",
        "skill":  "Skill",
        "travel": "Travis",
        "repo": "Repository",
        "dat": "27,699 repository results"
    
][
    
        "date": "Date",
        "dt": "17/04/2019",
        "skill":  "Skill",
        "travel": "Kotlin",
        "repo": "Repository",
        "dat": "2327,699 repository results"
    
]

因此

import json

with open('someFile.json', 'r') as file:
    content = file.read()
    clean = content.replace('][', ',')  # cleanup here
    json_data = json.loads(clean)

print(json_data)

输出

[
  'date': 'Date', 'dt': '17/04/2019', 'skill': 'Skill', 'travel': 'Travis', 'repo': 'Repository', 'dat': '27,699 repository results', 
  'date': 'Date', 'dt': '17/04/2019', 'skill': 'Skill', 'travel': 'Kotlin', 'repo': 'Repository', 'dat': '2327,699 repository results'
]

【讨论】:

@ashishmishra 这不是一个有效的 json 开头。

以上是关于获取过去 20 周三的数据:aws redshift的主要内容,如果未能解决你的问题,请参考以下文章

AWS Athena [Presto] 如何仅接收过去 7 天的数据?

零基础学习云计算及大数据DBA集群架构师Linux Bash Shell编程及系统自动化2015年1月20日周三

SQL数字/日期模式分析/时间跨度

仅获取上周和过去上周的最新数据并汇总某些列

拯救DBA,会说话的数据库 | 下周三揭秘

pandas通过DatetimeProperties对象获取日期对象的星期几周几信息编码(周一为0,周天为6)使用pd.to_timedelta函数将时间列所有时间数据处理到当周的周三(星期三)