clickhouse两分片两副本集群部署
Posted wshenjin
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了clickhouse两分片两副本集群部署相关的知识,希望对你有一定的参考价值。
节点IP
- 192.168.31.101
- 192.168.31.102
因为手头没有足够多的机器,所以只能用两台机器各起两个实例组成两分片两副本的集群。
基本规划
副本01 | 副本02 | |
---|---|---|
分片01 | 192.168.31.101:9100 | 192.168.31.102:9200 |
分片02 | 192.168.31.102:9100 | 192.168.31.101:9200 |
ZK集群部署
忽略
配置:
各个实例config0x.xml的差异化配置:
<log>/var/log/clickhouse-server/clickhouse-server0x.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server0x.err.log</errorlog>
<http_port>8123</http_port>
<tcp_port>9100</tcp_port>
<mysql_port>9104</mysql_port>
<interserver_http_port>9109</interserver_http_port>
<path>/data/database/clickhouse0x/</path>
<tmp_path>/data/database/clickhouse0x/tmp/</tmp_path>
<user_files_path>/data/database/clickhouse0x/user_files/</user_files_path>
<format_schema_path>/data/database/clickhouse0x/format_schemas/</format_schema_path>
<include_from>/etc/clickhouse-server/metrika0x.xml</include_from>
各个实例metrika0x.xml中相同的配置:
<!--集群相关配置-->
<clickhouse_remote_servers>
<!--自定义集群名称 ckcluster_2shards_2replicas-->
<ckcluster_2shards_2replicas>
<!--分片1-->
<shard>
<internal_replication>true</internal_replication>
<!--副本1-->
<replica>
<host>192.168.31.101</host>
<port>9100</port>
</replica>
<!--副本2-->
<replica>
<host>192.168.31.102</host>
<port>9200</port>
</replica>
</shard>
<!--分片2-->
<shard>
<internal_replication>true</internal_replication>
<!--副本1-->
<replica>
<host>192.168.31.102</host>
<port>9100</port>
</replica>
<!--副本2-->
<replica>
<host>192.168.31.101</host>
<port>9200</port>
</replica>
</shard>
</ckcluster_2shards_2replicas>
</clickhouse_remote_servers>
<!--zookeeper相关配置-->
<zookeeper-servers>
<node index="1">
<host>192.168.31.101</host>
<port>2181</port>
</node>
<node index="2">
<host>192.168.31.102</host>
<port>2181</port>
</node>
</zookeeper-servers>
<!--压缩算法-->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method>
</case>
</clickhouse_compression>
各个节点metrika0x.xml中复制标识的配置:
# 192.168.31.101 9100 metrika01.xml
<macros>
<shard>01</shard>
<replica>ckcluster-01-01</replica>
</macros>
# 192.168.31.101 9200 metrika02.xml
<macros>
<shard>02</shard>
<replica>ckcluster-02-02</replica>
</macros>
# 192.168.31.102 9100 metrika01.xml
<macros>
<shard>02</shard>
<replica>ckcluster-02-01</replica>
</macros>
# 192.168.31.102 9200 metrika02.xml
<macros>
<shard>01</shard>
<replica>ckcluster-01-02</replica>
</macros>
复制标识, 也称为宏配置,这里唯一标识一个副本名称,每个实例都要配置并且都是唯一的。
- hard 表示分片编号
- replica是副本标识
启动实例
192.168.31.101:
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid --config-file=/etc/clickhouse-server/config01.xml"
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server02.pid --config-file=/etc/clickhouse-server/config02.xml"
192.168.31.102:
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server01.pid --config-file=/etc/clickhouse-server/config01.xml"
[root@ ~]# su - clickhouse -s /bin/bash -c "/usr/bin/clickhouse-server --daemon --pid-file=/var/run/clickhouse-server/clickhouse-server02.pid --config-file=/etc/clickhouse-server/config02.xml"
各个节点上查看状态:
:) SELECT * FROM system.clusters;
┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────┬─host_address───┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ ckcluster_2shards_2replicas │ 1 │ 1 │ 1 │ 192.168.31.101 │ 192.168.31.101 │ 9100 │ 0 │ default │ │ 0 │ 0 │
│ ckcluster_2shards_2replicas │ 1 │ 1 │ 2 │ 192.168.31.102 │ 192.168.31.102 │ 9200 │ 0 │ default │ │ 0 │ 0 │
│ ckcluster_2shards_2replicas │ 2 │ 1 │ 1 │ 192.168.31.102 │ 192.168.31.102 │ 9100 │ 0 │ default │ │ 0 │ 0 │
│ ckcluster_2shards_2replicas │ 2 │ 1 │ 2 │ 192.168.31.101 │ 192.168.31.101 │ 9200 │ 1 │ default │ │ 0 │ 0 │
└─────────────────────────────┴───────────┴──────────────┴─────────────┴────────────────┴────────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
建库建表
在每个实例上建库:
:) create database testdb ;
192.168.31.101 9100 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) ENGINE = ReplicatedMergeTree(‘/clickhouse/tables/01/person_local‘,‘ckcluster-01-01‘,BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
192.168.31.101 9200 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) ENGINE = ReplicatedMergeTree(‘/clickhouse/tables/02/person_local‘,‘ckcluster-02-02‘,BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
192.168.31.102 9100 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) ENGINE = ReplicatedMergeTree(‘/clickhouse/tables/02/person_local‘,‘ckcluster-02-01‘,BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
192.168.31.102 9200 建本地表和分布式表:
:) create table person_local(ID Int8, Name String, BirthDate Date) ENGINE = ReplicatedMergeTree(‘/clickhouse/tables/01/person_local‘,‘ckcluster-01-02‘,BirthDate, (Name, BirthDate), 8192);
:) create table person_all as person_local ENGINE = Distributed(ckcluster_2shards_2replicas, testdb, person_local, rand());
本地表创建的语法:
create table person_local(ID Int8, Name String, BirthDate Date) ENGINE = ReplicatedMergeTree(‘/clickhouse/tables/${shard}/person_local‘,‘${replica}‘,BirthDate, (Name, BirthDate), 8192);
- /clickhouse/tables/${shard}/person_local 代表的是这张表在ZooKeeper上的路径。即配置在相同shard里面的不同replica的机器需要配置相同的路径,不同shard的路径不同。shard对应该实例的metrika.xml配置。
- ${replica} 分片的名称,需要每个实例都不同,对应该实例的metrika.xml配置
分布表语法:
:) create table person_all as person_local ENGINE = Distributed(${cluster_name}, ${db_name}, ${local_table_name}, rand());
- ${cluster_name} 集群名称
- ${db_name} 库名
- ${local_table_name} 本地表名
- rand() 是分布式算法
分布式表只是作为一个查询引擎,本身不存储任何数据,查询时将sql发送到所有集群分片,然后进行进行处理和聚合后将结果返回给客户端,因此clickhouse限制聚合结果大小不能大于分布式表节点的内存,当然这个一般条件下都不会超过。
分布式表可以所有实例都创建,也可以只在一部分实例创建,这个和业务代码中查询的示例一致,建议设置多个,当某个节点挂掉时可以查询其他节点上的表。
以上是关于clickhouse两分片两副本集群部署的主要内容,如果未能解决你的问题,请参考以下文章