[3]Clickhouse列式存储的明日之星:如何利用ClickHouse查询Github上点赞排名靠前的站点?
Posted 朱清云的技术博客
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[3]Clickhouse列式存储的明日之星:如何利用ClickHouse查询Github上点赞排名靠前的站点?相关的知识,希望对你有一定的参考价值。
在Github上点赞最多的项目,应该就是最受欢迎的项目;那么这些数据是如何排名和统计出来的?ClickHouse提供了一个示范项目。
通过查询,我们,我们看到右上角显示,Elapsed: 1.099 sec, read 283.59 million rows, 2.34 GB. 也就是说其只花了1.099秒,就读取了2.83亿条Github的时间数据,这2.83亿条数据总共用了2.34GB的速度,平均每秒能处理2.34GB的数据; 查询和分析效率还是挺厉害的!
那么这些测试数据是从哪里来的呢? 有一个网站(https://www.gharchive.org/),其专门提供这种类型的数据,其是以JSON文件存在的方式打的zip包。
再来一条查询,看看总共Github上有多少个项目:
SELECT uniq(repo_name) FROM github_events
咱们最终的是为了查询的结果。
Github 地址 | 点赞数目 |
---|---|
https://github.com/996icu/996.ICU | 353981 |
https://github.com/FreeCodeCamp/FreeCodeCamp | 225340 |
https://github.com/vuejs/vue | 218489 |
https://github.com/facebook/react | 212230 |
https://github.com/sindresorhus/awesome | 195499 |
https://github.com/kamranahmedse/developer-roadmap | 194566 |
https://github.com/tensorflow/tensorflow | 187179 |
https://github.com/jwasham/coding-interview-university | 183350 |
https://github.com/freeCodeCamp/freeCodeCamp | 164742 |
https://github.com/getify/You-Dont-Know-JS | 162725 |
https://github.com/donnemartin/system-design-primer | 160653 |
https://github.com/ant-design/ant-design | 151675 |
https://github.com/flutter/flutter | 147847 |
https://github.com/torvalds/linux | 144876 |
https://github.com/EbookFoundation/free-programming-books | 143801 |
https://github.com/twbs/bootstrap | 138052 |
https://github.com/github/gitignore | 136974 |
https://github.com/TheAlgorithms/Python | 136093 |
https://github.com/trekhleb/javascript-algorithms | 133330 |
https://github.com/airbnb/javascript | 132359 |
https://github.com/danistefanovic/build-your-own-x | 127875 |
https://github.com/jackfrued/Python-100-Days | 127581 |
https://github.com/CyC2018/CS-Notes | 123552 |
https://github.com/Snailclimb/JavaGuide | 122087 |
https://github.com/public-apis/public-apis | 120438 |
https://github.com/vinta/awesome-python | 120057 |
https://github.com/facebook/react-native | 115198 |
https://github.com/golang/go | 108127 |
https://github.com/jlevy/the-art-of-command-line | 107810 |
https://github.com/robbyrussell/oh-my-zsh | 107137 |
https://github.com/labuladong/fucking-algorithm | 105729 |
https://github.com/vhf/free-programming-books | 94114 |
https://github.com/justjavac/free-programming-books-zh_CN | 92908 |
https://github.com/angular/angular | 92617 |
https://github.com/nodejs/node | 85995 |
https://github.com/electron/electron | 85728 |
https://github.com/ossu/computer-science | 85694 |
https://github.com/mrdoob/three.js | 85477 |
https://github.com/Microsoft/vscode | 81998 |
https://github.com/tensorflow/models | 81221 |
https://github.com/kubernetes/kubernetes | 81128 |
https://github.com/laravel/laravel | 80327 |
https://github.com/PanJiaChen/vue-element-admin | 79011 |
https://github.com/FortAwesome/Font-Awesome | 78969 |
https://github.com/iluwatar/java-design-patterns | 78325 |
https://github.com/avelino/awesome-go | 78127 |
https://github.com/angular/angular.js | 77047 |
https://github.com/django/django | 75288 |
https://github.com/daneden/animate.css | 74046 |
https://github.com/resume/resume.github.com | 73923 |
笔者推荐推荐的是 https://github.com/sindresorhus/awesome,这个上面非常多的资料,
大数据,人工智能,Java,Python,内涵还是非常的丰富的!
参考文献
https://clickhouse.com/learn/
https://gitstar-ranking.com/ClickHouse
https://ghe.clickhouse.tech/
以上是关于[3]Clickhouse列式存储的明日之星:如何利用ClickHouse查询Github上点赞排名靠前的站点?的主要内容,如果未能解决你的问题,请参考以下文章