[3]Clickhouse列式存储的明日之星:如何利用ClickHouse查询Github上点赞排名靠前的站点?

Posted 朱清云的技术博客

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[3]Clickhouse列式存储的明日之星:如何利用ClickHouse查询Github上点赞排名靠前的站点?相关的知识,希望对你有一定的参考价值。

在Github上点赞最多的项目,应该就是最受欢迎的项目;那么这些数据是如何排名和统计出来的?ClickHouse提供了一个示范项目

通过查询,我们,我们看到右上角显示,Elapsed: 1.099 sec, read 283.59 million rows, 2.34 GB. 也就是说其只花了1.099秒,就读取了2.83亿条Github的时间数据,这2.83亿条数据总共用了2.34GB的速度,平均每秒能处理2.34GB的数据; 查询和分析效率还是挺厉害的!

那么这些测试数据是从哪里来的呢? 有一个网站(https://www.gharchive.org/),其专门提供这种类型的数据,其是以JSON文件存在的方式打的zip包。

再来一条查询,看看总共Github上有多少个项目:

SELECT uniq(repo_name) FROM github_events


咱们最终的是为了查询的结果。

Github 地址点赞数目
https://github.com/996icu/996.ICU353981
https://github.com/FreeCodeCamp/FreeCodeCamp225340
https://github.com/vuejs/vue218489
https://github.com/facebook/react212230
https://github.com/sindresorhus/awesome195499
https://github.com/kamranahmedse/developer-roadmap194566
https://github.com/tensorflow/tensorflow187179
https://github.com/jwasham/coding-interview-university183350
https://github.com/freeCodeCamp/freeCodeCamp164742
https://github.com/getify/You-Dont-Know-JS162725
https://github.com/donnemartin/system-design-primer160653
https://github.com/ant-design/ant-design151675
https://github.com/flutter/flutter147847
https://github.com/torvalds/linux144876
https://github.com/EbookFoundation/free-programming-books143801
https://github.com/twbs/bootstrap138052
https://github.com/github/gitignore136974
https://github.com/TheAlgorithms/Python136093
https://github.com/trekhleb/javascript-algorithms133330
https://github.com/airbnb/javascript132359
https://github.com/danistefanovic/build-your-own-x127875
https://github.com/jackfrued/Python-100-Days127581
https://github.com/CyC2018/CS-Notes123552
https://github.com/Snailclimb/JavaGuide122087
https://github.com/public-apis/public-apis120438
https://github.com/vinta/awesome-python120057
https://github.com/facebook/react-native115198
https://github.com/golang/go108127
https://github.com/jlevy/the-art-of-command-line107810
https://github.com/robbyrussell/oh-my-zsh107137
https://github.com/labuladong/fucking-algorithm105729
https://github.com/vhf/free-programming-books94114
https://github.com/justjavac/free-programming-books-zh_CN92908
https://github.com/angular/angular92617
https://github.com/nodejs/node85995
https://github.com/electron/electron85728
https://github.com/ossu/computer-science85694
https://github.com/mrdoob/three.js85477
https://github.com/Microsoft/vscode81998
https://github.com/tensorflow/models81221
https://github.com/kubernetes/kubernetes81128
https://github.com/laravel/laravel80327
https://github.com/PanJiaChen/vue-element-admin79011
https://github.com/FortAwesome/Font-Awesome78969
https://github.com/iluwatar/java-design-patterns78325
https://github.com/avelino/awesome-go78127
https://github.com/angular/angular.js77047
https://github.com/django/django75288
https://github.com/daneden/animate.css74046
https://github.com/resume/resume.github.com73923

笔者推荐推荐的是 https://github.com/sindresorhus/awesome,这个上面非常多的资料,
大数据,人工智能,Java,Python,内涵还是非常的丰富的!

参考文献

https://clickhouse.com/learn/
https://gitstar-ranking.com/ClickHouse
https://ghe.clickhouse.tech/

以上是关于[3]Clickhouse列式存储的明日之星:如何利用ClickHouse查询Github上点赞排名靠前的站点?的主要内容,如果未能解决你的问题,请参考以下文章

[2]Clickhouse列式存储的明日之星:创建和添加数据

[1]Clickhouse列式存储的明日之星: 概述和安装

[1]Clickhouse列式存储的明日之星: 概述和安装

clickhouse 列式存储数据库介绍

列式存储?OLAP?ClickHouse究竟是何方神圣

列式存储?OLAP?ClickHouse究竟是何方神圣