云原生打造企业内部DNS+ETCD+NTP+Quay高可用实战(完结篇)
Posted 张先生的深夜课堂
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了云原生打造企业内部DNS+ETCD+NTP+Quay高可用实战(完结篇)相关的知识,希望对你有一定的参考价值。
本篇是和的最后一篇。
5.NTP服务
DNS服务部署及验证完毕,相对而言NTP服务的容器化改造相对简单。
主要工作在于自定义镜像和流量的重定向。
5.1 自定义镜像文件
这里设计了一个脚本,在Pod启动后先执行将环境变量设置至chronyd的配置文件
然后再启动chronyd的后台服务进程。
5.1.1 启动脚本
#!/bin/sh
DEFAULT_NTP="time.cloudflare.com"
CHRONY_CONF_FILE="/etc/chrony/chrony.conf"
# update permissions on chrony directories
chown -R chrony:chrony /run/chrony /var/lib/chrony
chmod o-rx /run/chrony
# remove previous pid file if it exist
rm -f /var/run/chrony/chronyd.pid
## dynamically populate chrony config file.
{
echo "# https://github.com/cturra/docker-ntp"
echo
echo "# chrony.conf file generated by startup script"
echo "# located at /opt/startup.sh"
echo
echo "# time servers provided by NTP_SERVER environment variables."
} > ${CHRONY_CONF_FILE}
# NTP_SERVERS environment variable is not present, so populate with default server
if [ -z "${NTP_SERVERS}" ]; then
NTP_SERVERS="${DEFAULT_NTP}"
fi
IFS=","
for N in $NTP_SERVERS; do
# strip any quotes found before or after ntp server
echo "server "${N//\"}" iburst" >> ${CHRONY_CONF_FILE}
done
# final bits for the config file
{
echo "driftfile /var/lib/chrony/chrony.drift"
echo "makestep 0.1 3"
#echo "rtcsync"
echo
echo "allow all"
} >> ${CHRONY_CONF_FILE}
## startup chronyd in the foreground
exec /usr/sbin/chronyd -d -s
5.1.2 编写Dockfile
参考https://github.com/cturra/docker-ntp
注意最终的ENTRYPOINT [ "/bin/sh", "/opt/startup.sh" ]
参数copy到镜像内的执行脚本。
cat chronyDockerfile
FROM alpine:latest
# author ChengZhang
MAINTAINER ChengZhang
# install chrony
RUN apk add --no-cache chrony
# script to configure/startup chrony (ntp)
COPY startup.sh /opt/startup.sh
RUN mkdir -p /etc/chrony/
RUN mkdir -p /var/lib/dpkg/info
# ntp port
EXPOSE 123/udp
# let docker know how to test container health
# HEALTHCHECK CMD chronyc tracking || exit 1
# start chronyd in the foreground
ENTRYPOINT [ "/bin/sh", "/opt/startup.sh" ]
5.1.3 编译镜像
podman build -f chronyDockerfile -t registry.cj.io:5000/chrony:0.0.1 .
# 镜像大小6.27MBpodman image ls | grep chrony
registry.cj.io:5000/chrony 0.0.1 a9d4c925474e 2 days ago 6.27 MB
【推送到企业内部镜像仓库】
podman push registry.cj.io:5000/chrony:0.0.1
5.2 节点部署
编写chrony服务的yaml文件,这里设置副本数为3。
注意这里的环境变量NTP_SERVERS
值为ntp.aliyun.com
cat > chrony.yaml <<EOF
apiVersion: apps/v1
#kind: Deployment
kind: StatefulSet
metadata:
name: chrony
namespace: common
spec:
serviceName: chronysvc
replicas: 3
selector:
matchLabels:
app: chrony
template:
metadata:
labels:
app: chrony
spec:
serviceAccount: ethan
serviceAccountName: ethan
containers:
- name: etcd
image: 'registry.cj.io:5000/chrony:0.0.1'
imagePullPolicy: Always
env:
- name: NTP_SERVERS
value: ntp.aliyun.com
securityContext:
capabilities:
add:
- SYS_TIME
EOF
启动
oc create -f chrony.yaml
5.3 关键:权限注意事项
这里有一个非常注意事项,实际上Pod在更改系统事件需要设置特殊权限
而OCP的权限设置颗粒度更加细致。
不通过一定的权限操作,Pod会报错,具体如下:
2020-12-19T14:18:26Z CAP_SYS_TIME not present
2020-12-19T14:18:26Z Fatal error : adjtimex(0x8001) failed : Operation not permitted
解决方式:
采用了OCP的sa,建立了ethan的这个特权账户。因此添加sa。
oc create sa ethan
serviceaccount/ethan created
oc adm policy add-scc-to-user anyuid -z ethan
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:anyuid added: "ethan"
将原定义的ethan用户添加特权。
oadm policy add-scc-to-user privileged -z ethan
同时继续以ethan
账户启动。
赋予SYS_TIME
的权限,详见上述Pod的yaml文件参数。
securityContext:
capabilities:
add:
- SYS_TIME
5.4 测试
手动调整时间
# 查看当前时间
date
Sun Dec 20 11:51:42 CST 2020
# 手动设置时间
date -s "2020-12-20 11:50:00"
Sun Dec 20 11:50:00 CST 2020
查看本机Chrony状态
Dec 20 11:44:54 localhost.localdomain chronyd[7267]: Selected source 172.18.1.2
Dec 20 11:44:54 localhost.localdomain chronyd[7267]: System clock wrong by -3.561087 seconds, adjustment started
Dec 20 11:44:50 localhost.localdomain chronyd[7267]: System clock was stepped by -3.561087 seconds
Dec 20 11:50:08 localhost.localdomain chronyd[7267]: Backward time jump detected!
Dec 20 11:50:08 localhost.localdomain chronyd[7267]: Can't synchronise: no selectable sources
Dec 20 11:53:03 localhost.localdomain chronyd[7267]: Selected source 172.18.1.2
Dec 20 11:53:03 localhost.localdomain chronyd[7267]: System clock wrong by 155.561438 seconds, adjustment started
[root@localhost yumcache]# chronyc sources –v
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 172.18.1.2 3 6 377 0 -6171ns[-2296us] +/- 29ms
一般而言都是慢慢调整时间
这里采用立即步进地校正时钟,时间已校正。
chronyc -a makestep
200 OK
date
Sun Dec 20 12:08:24 CST 2020
5.5 网络拓扑持久化
经过测试验证,目前NTP服务可以为集群内提供服务
为了将NTP服务暴露给集群外提供企业统一的高可用NTP服务。
这里采用NodePort SVC的方式
编写yaml文件
cat > chronysvc.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
name: chronysvc
spec:
selector:
app: chrony
ports:
- port: 123
name: ntp-port
protocol: UDP
targetPort: 123
nodePort: 30123
type: NodePort
EOF
启动svc
oc create -f chronysvc.yaml
由于NodePort的默认端口在30000-32767
之间,因此这里选择的NodePort端口为了便于记忆,选择了30123
6.外部LoadBalance节点
至此,NTP及DNS服务的容器化改造全部完成。
相关服务可以正常在集群内提供服务。
但是存在最后一公里的问题,就是如何暴露服务给集群外提供给企业内部使用。
NTP和DNS服务的服务端口分别是UDP的123和UPD的53。
采用OCP的router和Ingress均无法实现集群对外的4层服务代理。
而采用NodePort,存在端口范围不匹配以及单点故障问题。
理想的架构是在集群外部署企业级的硬件负载均衡器,将下游的UDP的123和UPD的53直接负载均衡转发至所有OCP计算节点的NodePort SVC端口上。
在《4.6 网络拓扑持久化》和《5.5 网络拓扑持久化》章节中,我们通过NodePort SVC,将OPC计算节点的30123和30053进行了暴露,各OCP计算节点的上述端口会通过svc yaml
文件中的描述,通过负载均衡的方式将外部流量达到内部Pod的服务端口上。
UDP流量的负载均衡采用nginx进行代理转发。
TCP流量的负载均衡采用Haproxy进行代理转发。后期会有Servicer mesh以及Envoy教程。
采用Keepalived做双机热备切换,消除单点故障。适合Nginx和Haproxy这种无状态的应用HA。
6.1 keepalived
参考https://www.keepalived.org/index.html
# 下载
wget https://www.keepalived.org/software/keepalived-2.1.5.tar.gz
# 解压缩
tar xf keepalived-2.1.5.tar.gz
# 编译安装
cd keepalived-2.1.5
编译
之前要安装gcc openssl-devel pcre-devel
./configure --prefix=/usr/local/keepalived
安装后执行
# 拷贝安装目录下的二进制文件
ln -s /usr/local/keepalived /usr/local/keepalived
# 建立配置文件目录
mkdir -p /etc/keepalived/
cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
ln -s /usr/local/keepalived/sbin/keepalived /usr/sbin/
# 这个从keepalived源码目录复制,安装目录中没有
cp /root/keepalived-2.1.5/keepalived/etc/init.d/keepalived /etc/init.d/
cp /root/keepalived-2.1.5/keepalived/keepalived.service /etc/systemd/system/
chmod 755 /etc/init.d/keepalived
配置keepalived
cat > /etc/keepalived/keepalived.conf <<EOF
! Configuration File for keepalived
global_defs {
router_id master01 #主备不需要一样
}
vrrp_instance VI_1 {
state MASTER #主的MASTER,备的是BACKUP
interface ens192 #需要绑定的网卡
virtual_router_id 50
priority 100 #主的100,后续可以不一样
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.18.1.2 #这个就是VRRP
}
}
EOF
关键:配置防火墙
firewall-cmd --direct --permanent --add-rule ipv4 filter INPUT 0 --in-interface ens192 --destination 224.0.0.18 --protocol vrrp -j ACCEPT
keepalived比较简单,这里不详细展开。
6.2 Nginx配置UDP转发
6.2.1 Nginx编译
Nginx默认是不支持UDP转发的,需要重新手动编译
参考http://nginx.org/
wget wget http://nginx.org/download/nginx-1.18.0.tar.gz
tar xvf nginx-1.18.0.tar.gz
cd nginx-1.18.0
开始编译
编译前务必安装
yum -y install proc* openssl* pcre*
正式编译
./configure --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-threads --with-stream --with-stream_ssl_module --with-mail --with-mail_ssl_module --with-file-aio --with-ipv6 --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'
返回信息
从这里的返回信息可以看到关键配置文件的路径
例如/etc/nginx/nginx.conf
checking for ioctl(FIONBIO) ... found
checking for ioctl(FIONREAD) ... found
checking for struct tm.tm_gmtoff ... found
checking for struct dirent.d_namlen ... not found
checking for struct dirent.d_type ... found
checking for sysconf(_SC_NPROCESSORS_ONLN) ... found
checking for sysconf(_SC_LEVEL1_DCACHE_LINESIZE) ... found
checking for openat(), fstatat() ... found
checking for getaddrinfo() ... found
checking for PCRE library ... found
checking for PCRE JIT support ... found
checking for OpenSSL library ... found
checking for zlib library ... found
creating objs/Makefile
Configuration summary
+ using threads
+ using system PCRE library
+ using system OpenSSL library
+ using system zlib library
nginx path prefix: "/etc/nginx"
nginx binary file: "/usr/sbin/nginx"
nginx modules path: "/etc/nginx/modules"
nginx configuration prefix: "/etc/nginx"
nginx configuration file: "/etc/nginx/nginx.conf"
nginx pid file: "/var/run/nginx.pid"
nginx error log file: "/var/log/nginx/error.log"
nginx http access log file: "/var/log/nginx/access.log"
nginx http client request body temporary files: "/var/cache/nginx/client_temp"
nginx http proxy temporary files: "/var/cache/nginx/proxy_temp"
nginx http fastcgi temporary files: "/var/cache/nginx/fastcgi_temp"
nginx http uwsgi temporary files: "/var/cache/nginx/uwsgi_temp"
nginx http scgi temporary files: "/var/cache/nginx/scgi_temp"
./configure: warning: the "--with-ipv6" option is deprecated
安装
make
make install
# 返回相关信息
make -f objs/Makefile installmake[1]: Entering directory `/root/nginx-1.18.0'
test -d '/etc/nginx' || mkdir -p '/etc/nginx'
test -d '/usr/sbin' \
|| mkdir -p '/usr/sbin'
test ! -f '/usr/sbin/nginx' \
|| mv '/usr/sbin/nginx' \
'/usr/sbin/nginx.old'
cp objs/nginx '/usr/sbin/nginx'
test -d '/etc/nginx' \
|| mkdir -p '/etc/nginx'
cp conf/koi-win '/etc/nginx'
cp conf/koi-utf '/etc/nginx'
cp conf/win-utf '/etc/nginx'
test -f '/etc/nginx/mime.types' \
|| cp conf/mime.types '/etc/nginx'
cp conf/mime.types '/etc/nginx/mime.types.default'
test -f '/etc/nginx/fastcgi_params' \
|| cp conf/fastcgi_params '/etc/nginx'
cp conf/fastcgi_params \
'/etc/nginx/fastcgi_params.default'
test -f '/etc/nginx/fastcgi.conf' \
|| cp conf/fastcgi.conf '/etc/nginx'
cp conf/fastcgi.conf '/etc/nginx/fastcgi.conf.default'
test -f '/etc/nginx/uwsgi_params' \
|| cp conf/uwsgi_params '/etc/nginx'
cp conf/uwsgi_params \
'/etc/nginx/uwsgi_params.default'
test -f '/etc/nginx/scgi_params' \
|| cp conf/scgi_params '/etc/nginx'
cp conf/scgi_params \
'/etc/nginx/scgi_params.default'
test -f '/etc/nginx/nginx.conf' \
|| cp conf/nginx.conf '/etc/nginx/nginx.conf'
cp conf/nginx.conf '/etc/nginx/nginx.conf.default'
test -d '/var/run' \
|| mkdir -p '/var/run'
test -d '/var/log/nginx' \
|| mkdir -p '/var/log/nginx'
test -d '/etc/nginx/html' \
|| cp -R html '/etc/nginx'
test -d '/var/log/nginx' \
|| mkdir -p '/var/log/nginx'
make[1]: Leaving directory `/root/nginx-1.18.0'
通过配置stream模块来解决UDP的负载均衡。
UDP负载均衡解决了两个关键点:高可用性和横向扩展。UDP设计是不保证端至端传送数据的,因此需要在客户端软件来处理网络级错误和重传机制。
6.2.1 配置Nginx
验证Nginx配置文件是否正确
顺便查看手动编译Nginx采用的模块
$nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] getpwnam("nginx") failed
# 这里是因为进程默认采用nginx用户,实际未添加
nginx: configuration file /etc/nginx/nginx.conf test failed
注意其中error_log
、pid
等字段内容文件是否存在,可参考make install
输出结果中的相关路径信息
cat <<EOF > /etc/nginx/nginx.conf
user root;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
#error_log logs/error.log notice;
#error_log logs/error.log info;
pid /var/run/nginx.pid;
events {
worker_connections 20480;
}
stream{
upstream ntp {
server 172.18.1.47:30123;
server 172.18.1.48:30123;
server 172.18.1.49:30123;
# check interval=10000 rise=2 fall=3 timeout=2000 default_down=false type=udp;
}
upstream dns2 {
server 172.18.1.1:53;
}
upstream dns {
server 172.18.1.47:30053;
server 172.18.1.48:30053;
server 172.18.1.49:30053;
}
server {
listen 123 udp;
proxy_pass ntp;
proxy_responses 1;
#proxy_bind $remote_addr transparent;
proxy_timeout 20s;
}
server {
listen 53 udp;
proxy_pass dns;
proxy_responses 1;
#proxy_bind $remote_addr transparent;
proxy_timeout 20s;
}
server {
listen 153 udp;
proxy_pass dns;
proxy_responses 1;
#proxy_bind $remote_addr transparent;
proxy_timeout 20s;
}
}
EOF
注意:新版本Nginx的健康检查没有了,需要手动添加模块。
验证
查看端口绑定情况,发现由nginx进行了绑定
$ss -ulnp | grep 123
UNCONN 0 0 *:111 *:* users:(("rpcbind",pid=23123,fd=5),("systemd",pid=1,fd=100))
UNCONN 0 0 *:123 *:* users:(("nginx",pid=9032,fd=5),("nginx",pid=9031,fd=5))
UNCONN 0 0 *:826 *:* users:(("rpcbind",pid=23123,fd=10))
UNCONN 0 0 :::111 :::* users:(("rpcbind",pid=23123,fd=7),("systemd",pid=1,fd=102))
UNCONN 0 0 :::826 :::* users:(("rpcbind",pid=23123,fd=11))
6.3 Haproxy
参考https://www.haproxy.org/
下载
wget https://www.haproxy.org/download/2.4/src/devel/haproxy-2.4-dev3.tar.gz
解压缩
tar xvf haproxy-2.4-dev3.tar.gz
cd haproxy-2.4-dev3
编译
# 确保openssl安装
yum install -y openssl
make TARGET=linux-glibc PREFIX=/usr/local/haproxy
make install PREFIX=/usr/local/haproxy
注意
# 如果用下面,将提示错误
make TARGET=linux26 PREFIX=/usr/local/haproxy
# Target 'linux26' was removed from HAProxy 2.0 due to being irrelevant and
often wrong. Please use 'linux-glibc' instead or define your custom target
by checking available options using 'make help TARGET=<your-target>'.
安装完毕后
useradd haproxy
mkdir -p /etc/haproxy/
vi /etc/haproxy/haproxy.cfg
cp examples/haproxy.init /etc/init.d/haproxy
/usr/local/haproxy/sbin/haproxy -f /etc/haproxy/haproxy.cfg
检查配置文件自动生成和系统haproxy用户
# 验证配置文件
$ll /etc/haproxy/haproxy.cfg
-rw-r--r--. 1 root root 3142 Mar 28 2019 /etc/haproxy/haproxy.cfg
# 验证用户
$cat /etc/passwd | grep haproxy
haproxy:x:188:188:haproxy:/var/lib/haproxy:/sbin/nologin
————————到此安装haproxy完成——————
HAProxy配置文件详解 其配置文件主要由五个部分组成,分别为global部分,defaults部分,frontend部分,backend部分,liste部分。1)global部分 用于设置全局配置参数 2) defaults部分 默认参数的配置部分。3) frontend部分 用于设置接收用户请求的前端虚拟节点。frontend可以根据ACL规则直接指定要使用的后端backend。4) backend部分 用于设置集群后端服务集群的配置,也就是添加一组真实服务器,以处理前端用户的请求。
listen部分此部分是frontend和backend部分的结合体。配置此部分不需要在配置
配置前务必要关闭SELINUX
sed -i 's/^SELINUX=.*/SELINUX=permissive/' /etc/selinux/config
reboot
配置完成后
$systemctl restart haproxy
$systemctl status haproxy
验证
ss -lutnp | grep haproxy
tcp LISTEN 0 128 *:9000 *:* users:(("haproxy",pid=23559,fd=5))
tcp LISTEN 0 128 *:6443 *:* users:(("haproxy",pid=23559,fd=7))
tcp LISTEN 0 128 *:80 *:* users:(("haproxy",pid=23559,fd=9))
tcp LISTEN 0 128 *:443 *:* users:(("haproxy",pid=23559,fd=10))
tcp LISTEN 0 128 *:22623 *:* users:(("haproxy",pid=23559,fd=8))
防火墙配置
firewall-cmd --zone=public --add-port=6443/tcp --permanent
firewall-cmd --zone=public --add-port=443/tcp --permanent
firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --zone=public --add-port=22623/tcp --permanent
firewall-cmd --zone=public --add-port=53/udp --permanent
firewall-cmd --zone=public --add-port=123/udp --permanent
firewall-cmd --reload
至此(DNS+NTP部署前篇)完毕
后续将做Quay分布式镜像仓库及Ceph在OCP应用专题教程。
By 2021.1.1 张诚
以上是关于云原生打造企业内部DNS+ETCD+NTP+Quay高可用实战(完结篇)的主要内容,如果未能解决你的问题,请参考以下文章