clickhouse安装数据导入及查询测试
Posted perfei
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了clickhouse安装数据导入及查询测试相关的知识,希望对你有一定的参考价值。
官网
quick start
进入下面目录,手工下载软件然后安装
https://repo.clickhouse.tech/rpm/lts/x86_64/
或者直接下载这四个文件
wget https://repo.clickhouse.tech/rpm/lts/x86_64/clickhouse-client-20.3.9.70-2.noarch.rpm
wget https://repo.clickhouse.tech/rpm/lts/x86_64/clickhouse-common-static-20.3.9.70-2.x86_64.rpm
wget https://repo.clickhouse.tech/rpm/lts/x86_64/clickhouse-common-static-dbg-20.3.9.70-2.x86_64.rpm
wget https://repo.clickhouse.tech/rpm/lts/x86_64/clickhouse-server-20.3.9.70-2.noarch.rpm
安装
rpm -ivh *.rpm
yum localinstall -y *.rpm
卸载
rpm -qa |grep clickhouse
rpm -e rpm包名
systemctl start clickhouse-server.service
systemctl status clickhouse-server.service
systemctl stop clickhouse-server.service
systemctl restart clickhouse-server.service
客户端使用
clickhouse-client
测试数据
https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz
https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz
xz -d hits_v1.tsv.xz
xz -d visits_v1.tsv.xz
yum install xz
创建数据库
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
建表
-m 可以执行一个多行语句
clickhouse-client -m
CREATE TABLE tutorial.hits_v1 ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `javascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams` Nested( Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192;
存储引擎 MergeTree
按EventData分区
按CounterID, EventDate, intHash32(UserID)三列的顺序存储数据,如果表中不显式指定主键,则以此为默认主键
按intHash32(UserID)采样
index_granularity设置索引粒度
CREATE TABLE tutorial.visits_v1 ( `CounterID` UInt32, `StartDate` Date, `Sign` Int8, `IsNew` UInt8, `VisitID` UInt64, `UserID` UInt64, `StartTime` DateTime, `Duration` UInt32, `UTCStartTime` DateTime, `PageViews` Int32, `Hits` Int32, `IsBounce` UInt8, `Referer` String, `StartURL` String, `RefererDomain` String, `StartURLDomain` String, `EndURL` String, `LinkURL` String, `IsDownload` UInt8, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `PlaceID` Int32, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `IsYandex` UInt8, `GoalReachesDepth` Int32, `GoalReachesURL` Int32, `GoalReachesAny` Int32, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `MobilePhoneModel` String, `ClientEventTime` DateTime, `RegionID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `IPNetworkID` UInt32, `SilverlightVersion3` UInt32, `CodeVersion` UInt32, `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `UserAgentMajor` UInt16, `UserAgentMinor` UInt16, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `SilverlightVersion2` UInt8, `SilverlightVersion4` UInt16, `FlashVersion3` UInt16, `FlashVersion4` UInt16, `ClientTimeZone` Int16, `OS` UInt8, `UserAgent` UInt8, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `NetMajor` UInt8, `NetMinor` UInt8, `MobilePhone` UInt8, `SilverlightVersion1` UInt8, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `JavaEnable` UInt8, `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `BrowserLanguage` UInt16, `BrowserCountry` UInt16, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `Params` Array(String), `Goals` Nested( ID UInt32, Serial UInt32, EventTime DateTime, Price Int64, OrderID String, CurrencyID UInt32), `WatchIDs` Array(UInt64), `ParamSumPrice` Int64, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `ClickLogID` UInt64, `ClickEventID` Int32, `ClickGoodEvent` Int32, `ClickEventTime` DateTime, `ClickPriorityID` Int32, `ClickPhraseID` Int32, `ClickPageID` Int32, `ClickPlaceID` Int32, `ClickTypeID` Int32, `ClickResourceID` Int32, `ClickCost` UInt32, `ClickClientIP` UInt32, `ClickDomainID` UInt32, `ClickURL` String, `ClickAttempt` UInt8, `ClickOrderID` UInt32, `ClickBannerID` UInt32, `ClickMarketCategoryID` UInt32, `ClickMarketPP` UInt32, `ClickMarketCategoryName` String, `ClickMarketPPName` String, `ClickAWAPSCampaignName` String, `ClickPageName` String, `ClickTargetType` UInt16, `ClickTargetPhraseID` UInt64, `ClickContextType` UInt8, `ClickSelectType` Int8, `ClickOptions` String, `ClickGroupBannerID` Int32, `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `FirstVisit` DateTime, `PredLastVisit` Date, `LastVisit` Date, `TotalVisits` UInt32, `TraficSource` Nested( ID Int8, SearchEngineID UInt16, AdvEngineID UInt8, PlaceID UInt16, SocialSourceNetworkID UInt8, Domain String, SearchPhrase String, SocialSourcePage String), `Attendance` FixedString(16), `CLID` UInt32, `YCLID` UInt64, `NormalizedRefererHash` UInt64, `SearchPhraseHash` UInt64, `RefererDomainHash` UInt64, `NormalizedStartURLHash` UInt64, `StartURLDomainHash` UInt64, `NormalizedEndURLHash` UInt64, `TopLevelDomain` UInt64, `URLScheme` UInt64, `OpenstatServiceNameHash` UInt64, `OpenstatCampaignIDHash` UInt64, `OpenstatAdIDHash` UInt64, `OpenstatSourceIDHash` UInt64, `UTMSourceHash` UInt64, `UTMMediumHash` UInt64, `UTMCampaignHash` UInt64, `UTMContentHash` UInt64, `UTMTermHash` UInt64, `FromHash` UInt64, `WebVisorEnabled` UInt8, `WebVisorActivity` UInt32, `ParsedParams` Nested( Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), `Market` Nested( Type UInt8, GoalID UInt32, OrderID String, OrderPrice Int64, PP UInt32, DirectPlaceID UInt32, DirectOrderID UInt32, DirectBannerID UInt32, GoodID String, GoodName String, GoodQuantity Int32, GoodPrice Int64), `IslandID` FixedString(16) ) ENGINE = CollapsingMergeTree(Sign) PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192;
ch2 :) use tutorial; USE tutorial Ok. 0 rows in set. Elapsed: 0.002 sec. ch2 :) show tables; SHOW TABLES ┌─name──────┐ │ hits_v1 │ │ visits_v1 │ └───────────┘ 2 rows in set. Elapsed: 0.004 sec.
数据导入
date clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv date
FORMAT TSV 导入的数据格式为TSV
max_insert_block_size 一次导入10万条记录
2.5G数据,74秒,平均 34M/S
[root@ch2 data]# date Fri Jul 31 22:09:16 EDT 2020 TSV" --max_insert_block_size=100000 < hits_v1.tsvT INTO tutorial.hits_v1 FORMAT [root@ch2 data]# date Fri Jul 31 22:10:28 EDT 2020 [root@ch2 data]# [root@ch2 data]# ls -ltrh total 9.8G -rw-r--r-- 1 root root 7.3G Dec 27 2018 hits_v1.tsv -rw-r--r-- 1 root root 2.5G Dec 27 2018 visits_v1.tsv
7.3G数据,26秒,285M/S
[root@ch2 data]# date Fri Jul 31 22:14:04 EDT 2020 [root@ch2 data]# date Fri Jul 31 22:14:30 EDT 2020
ch2 :) use tutorial; USE tutorial Ok. 0 rows in set. Elapsed: 0.002 sec. ch2 :) show tables; SHOW TABLES ┌─name──────┐ │ hits_v1 │ │ visits_v1 │ └───────────┘ 2 rows in set. Elapsed: 0.003 sec. ch2 :) select count() from hits_v1; SELECT count() FROM hits_v1 ┌─count()─┐ │ 8873898 │ └─────────┘ 1 rows in set. Elapsed: 0.005 sec. ch2 :) select count() from visits_v1; SELECT count() FROM visits_v1 ┌─count()─┐ │ 1681989 │ └─────────┘ 1 rows in set. Elapsed: 0.004 sec.
查询
SELECT StartURL AS URL, AVG(Duration) AS AvgDuration FROM tutorial.visits_v1 WHERE StartDate BETWEEN ‘2014-03-23‘ AND ‘2014-03-30‘ GROUP BY URL ORDER BY AvgDuration DESC LIMIT 10;
SELECT StartURL AS URL, AVG(Duration) AS AvgDuration FROM tutorial.visits_v1 WHERE (StartDate >= ‘2014-03-23‘) AND (StartDate <= ‘2014-03-30‘) GROUP BY URL ORDER BY AvgDuration DESC LIMIT 10 ┌─URL─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────AvgDuration─┐ │ http://itpalanija-pri-patrivative=0&ads_app_user │ 60127 │ │ http://renaul-myd-ukraine │ 58938 │ │ http://e.mail=on&default?abid=2061&scd=yes&option?r=city_inter.com/menu&site-zaferio.ru/c/m.ensor.net/ru/login=false&orderStage.php?Brandidatamalystyle/20Mar2014%2F007%2F94dc8d2e06e56ed56bbdd │ 51378 │ │ https://moda/vyikrorable.com/notification │ 48828.6 │ │ http://karta/Futbol/dynas.com/haberler.ru/messages.yandsearchives/494503_lte_13800200319 │ 48484.666666666664 │ │ http://karta/Futbol/dynamo.kiev.ua/kawaica.su/648 │ 46272.2 │ │ https://moda/vyikroforum1/top.ru/moscow/delo-product/trend_sms/multitryaset/news/2014/03/201000 │ 41531.666666666664 │ │ http:%2F%2Fallback/angleNews │ 38878.29268292683 │ │ http://xmusic/vstreatings of speeds │ 36925 │ │ http://bashmelnykh-metode.net/video/#!/video/emberkas.ru/detskij-yazi.com/iframe/default.aspx?id=760928&noreask=1&source │ 34323 │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────┘ 10 rows in set. Elapsed: 0.147 sec. Processed 1.48 million rows, 116.66 MB (10.06 million rows/s., 793.94 MB/s.)
SELECT sum(Sign) AS visits, sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits, (100. * goal_visits) / visits AS goal_percent FROM tutorial.visits_v1 WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = ‘yandex.ru‘) ┌─visits─┬─goal_visits─┬──────goal_percent─┐ │ 10543 │ 8553 │ 81.12491700654462 │ └────────┴─────────────┴───────────────────┘ 1 rows in set. Elapsed: 0.032 sec. Processed 47.60 thousand rows, 5.39 MB (1.49 million rows/s., 169.11 MB/s.)