CentOS7.9上部署OpenShift3.11集群
Posted 代码狂魔v
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CentOS7.9上部署OpenShift3.11集群相关的知识,希望对你有一定的参考价值。
CentOS7.9上部署OpenShift3.11集群
OCP官网文档:https://docs.openshift.com/container-platform/3.11/welcome/index.html
版本:
- openshift 3.11.0
- centos7.9
- ansible 2.6.5
环境信息
主机名 | IP | 系统版本 | 配置 |
---|---|---|---|
master.ocp.cn | 192.168.108.110 | CentOS 7.9 | 4C2G |
node1.ocp.cn | 192.168.108.111 | CentOS 7.9 | 4C2G |
node2.ocp.cn | 192.168.108.112 | CentOS 7.9 | 4C2G |
infra.ocp.cn | 192.168.108.113 | CentOS 7.9 | 4C2G |
基本配置
开启Selinux
以下操作每个节点都需要做
# selinux 改成 enforcing 或 Permissive
grep "^\\s*[^#\\t].*$" /etc/selinux/config
SELINUX=enforcing
SELINUXTYPE=targeted
如果没有开启,通过下面的方式开启
# 每台机器都开启SELINUX
sed -i 's/SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config
sed -i 's/SELINUXTYPE=.*/SELINUXTYPE=targeted/' /etc/selinux/config
设置hosts与主机名
以下操作每个节点都需要做
# 每台机器设置Hosts
cat >> /etc/hosts <<EOF
192.168.108.110 master.ocp.cn
192.168.108.111 node1.ocp.cn
192.168.108.112 node2.ocp.cn
192.168.108.113 infra.ocp.cn
EOF
# 每台机器分别设置主机名
# 各自设置各自的
hostnamectl set-hostname master.ocp.cn
hostnamectl set-hostname node1.ocp.cn
hostnamectl set-hostname node2.ocp.cn
hostnamectl set-hostname infra.ocp.cn
安装docker(可省略)
可以先自行安装docker(在每个节点上),也可以让ansible执行检查的时候安装(prerequisites.yml
)
# 每台机器都安装docker
yum install -y docker-1.13.1
注意:如果需要单独设置docker的存储有下面几种方式
- 直接添加一块空闲的硬盘作为后端存储,这种方式内部做的操作
- 将空余块设备(可以是分区)创建成physical volume(pv)
- 再由这些PV组成volume group(vg)
- 从vg中建立两个logical volume(lv),data和matedata
- 将data和matedata映射成thin-pool
cat > /etc/sysconfig/docker-storage-setup<<EOF
DEVS=/dev/vdc # 添加的硬盘
VG=docker-vg
EOF
docker-storage-setup
- 使用一个已经存在的 volume group
cat > /etc/sysconfig/docker-storage-setup<<EOF
VG=docker-vg # 已存在的并且有空闲空间的存储 vgs查看VFree字段就是空闲值
EOF
docker-storage-setup
- 直接使用系统所在的volume group的剩余空间
docker-storage-setup
- 直接使用系统空间,什么也不做即可,使用命令
df -h /var/lib/docker
可以查看使用的是
最后启动docker
# 每台机器都启用docker
systemctl enable docker
systemctl start docker
systemctl is-active docker
安装OCP
配置ssh免密登录
在Master上操作即可
# 生成ssh公钥私钥
ssh-keygen
# ssh-copy-id
for host in \\
master.ocp.cn node1.ocp.cn node2.ocp.cn infra.ocp.cn;\\
do \\
ssh-copy-id $host; \\
done
拉取3.11版本的OCP playbook
在Master上操作
yum install -y git
# Master上下载openshift-3.11 play-book
cd ~
git clone -b release-3.11 https://github.com/openshift/openshift-ansible.git
git配置代理(下载不动可配置代理)
# 配置http/https代理
git config --global http.proxy http://192.168.108.1:1080
git config --global https.proxy https://192.168.108.1:1080
# 取消http/https代理
git config --global --unset http.proxy
git config --global --unset https.proxy
安装ansible
在Master上安装,ansible版本为2.6.5,有两种安装方式
- 下载rpm包安装
# 安装wget
yum install -y wget
# 下载ansible26
wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/centos-release-ansible26-1-3.el7.centos.noarch.rpm
# 安装依赖
yum install -y centos-release-configmanagement
# 安装
rpm -ivh centos-release-ansible26-1-3.el7.centos.noarch.rpm
# 下载ansible-2.6.5
wget https://releases.ansible.com/ansible/rpm/release/epel-7-x86_64/ansible-2.6.5-1.el7.ans.noarch.rpm
# 先yum安装ansible-2.6.5所需依赖
yum install -y python-jinja2 python-paramiko python-setuptools python-six python2-cryptography sshpass PyYAML
# 安装
rpm -ivh ansible-2.6.5-1.el7.ans.noarch.rpm
- 源码安装,需要python3
源码安装python3,安装rpm包请参考:https://centos.pkgs.org/7/epel-x86_64/python36-rpm-4.11.3-9.el7.x86_64.rpm.html
yum install -y wget
# 编译安装python3
yum -y groupinstall "Development tools"
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
yum install libffi-devel -y
wget https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz
tar -xvJf Python-3.7.0.tar.xz
mkdir /usr/local/python3 #创建编译安装目录
cd Python-3.7.0
./configure --prefix=/usr/local/python3
make && make install
ln -s /usr/local/python3/bin/python3 /usr/local/bin/python3
ln -s /usr/local/python3/bin/pip3 /usr/local/bin/pip3
# 测试是否安装成功
python3 -V
pip3 -V
- 安装ansible,装完ansible默认位置在
/usr/local/python3/bin
中
pip3 install --upgrade setuptools
wget --no-check-certificate https://releases.ansible.com/ansible/ansible-2.6.5.tar.gz
tar fxz ansible-2.6.5.tar.gz && cd ansible-2.6.5
python3 setup.py install
配置ansible
# 配置ansible配置文件(Master上)
cp /etc/ansible/hosts /etc/ansible/hosts.bak
cat > /etc/ansible/hosts<<EOF
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
#openshift_release="3.11"
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_public_hostname=openshift.ocp.cn
openshift_master_default_subdomain=ocp.cn
openshift_ca_cert_expire_days=3650
openshift_node_cert_expire_days=3650
openshift_master_cert_expire_days=3650
etcd_ca_default_days=3650
#openshift_hosted_manage_registry=false
openshift_disable_check=memory_availability,disk_availability,docker_image_availability,docker_storage
#openshift_enable_service_catalog=false
#template_service_broker_install=false
#ansible_service_broker_install=false
#osn_storage_plugin_deps=[]
#openshift_enable_service_catalog=false
#openshift_cluster_monitoring_operator_install=false
[masters]
master.ocp.cn
[etcd]
master.ocp.cn
[nodes]
master.ocp.cn openshift_node_group_name='node-config-master'
node1.ocp.cn openshift_node_group_name='node-config-compute'
node2.ocp.cn openshift_node_group_name='node-config-compute'
infra.ocp.cn openshift_node_group_name='node-config-infra'
EOF
执行laybook安装
- 安装前检查
# 集群安装前检查
ansible-playbook ~/openshift-ansible/playbooks/prerequisites.yml
最后若全部显示failed=0
说明成功
- 安装
# 集群安装
ansible-playbook ~/openshift-ansible/playbooks/deploy_cluster.yml
-
检查安装是否成功:playbook跑完全部显示
failed=0
,oc get no
查看节点状态全都是Ready
说明成功 -
卸载
# 卸载
ansible-playbook ~/openshift-ansible/playbooks/adhoc/uninstall.yml
访问
- 在Master节点上添加用户
htpasswd -c -b /etc/origin/master/htpasswd <user> <password>
- Windows上添加hosts:
C:\\Windows\\System32\\drivers\\etc
192.168.108.110 master.ocp.cn
192.168.108.110 openshift.ocp.cn
在浏览器访问https://openshift.ocp.cn:8443
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p4HXbo1r-1627217803986)(files/image-20210724224322193.png)]
安装注意事项
- 操作系统语言不能是中文
- infra节点会自动部署router,lb不要放在infra节点上,所以80端口不能冲突
- 如果web console访问端口改成443,lb不能放一起,端口冲突
- 硬盘格式XFS才支持overlay2
- 开启Selinux
- 保证能联网
- 如果lb和master在一个节点上,会有8443端口已被占用的问题,建议安装时lb不要放在master节点上
- 如果etcd放在master节点上,会以静态pod形式启动。如果放在node节点上,会以系统服务的形式启动。我在安装过程中,一个etcd放在了master上,另一个放在了node上,导致etcd启动失败。建议安装时etcd要么全放在master节点上,要么全放在node节点上。
- 安装过程中,直接安装了带有nfs持久存储的监控,需要提前安装java-1.8.0-openjdk-headless python-passlib,这一点官网没有提及,不提前装安装会报错。
- docker 启用配置参数
–selinux-enabled=false
,但是操作系统Selinux
必须开启,否则安装报错
基本使用
这里以部署一个nginx为例,在default这个namespace下
- 创建yaml文件
oc create deployment web --image=nginx:1.14 --dry-run -o yaml > nginx.yaml
# 内容如下
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: web
name: web
spec:
replicas: 1
selector:
matchLabels:
app: web
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: web
spec:
containers:
- image: nginx:1.14
name: nginx
resources: {}
status: {}
- 部署,此时会发现容器起不来,可用
docker logs
查看对应节点上的容器日志会发现一堆Permission Denied
,这是因为OpenShift默认不给root权限,而nginx是需要root权限的
oc apply -f nginx.yaml
- 将anyuid这个scc( Security Context Constraints)赋给default,再重新部署下就可以了
oc adm policy add-scc-to-user anyuid -z default
scc "anyuid" added to: ["system:serviceaccount:default:default"]
参考:https://stackoverflow.com/questions/42363105/permission-denied-mkdir-in-container-on-openshift
OpenShift will by default run containers as a non root user. As a result, your application can fail if it requires it runs as root. Whether you can configure your container to run as root will depend on permissions you have in the cluster.
It is better to design your container and application so that it doesn’t have to run as root.
A few suggestions.
- Create a special UNIX user to run the application as and set that user (using its uid), in the USER statement of the
Dockerfile
. Make the group for the user be the root group.- Fixup permissions on the
/src
directory and everything under it so owned by the special user. Ensure that everything is group root. Ensure that anything that needs to be writable is writable to group root.- Ensure you set
HOME
to/src
inDockerfile
.With that done, when OpenShift runs your container as an assigned uid, where group is root, then by virtue of everything being group writable, application can still update files under
/src
. TheHOME
variable being set ensures that anything written to home directory by code goes into writable/src
area.
OC常用命令合集
oc get nodes # 获取集群所有节点
oc describe node node-name # 查看对应节点详细信息,可以看到运行在该节点下的pod
oc get pods -n namespace-name # 查看对应namespace下pod
oc get all # 查看当前project下的所有资源
oc status # 查看登录在哪个project下
oc get pods -o wide -n namespace-name # 查看对应namespace下pod详情
oc describe pod pod-name -n namespace-name # 查看pod详细信息
oc get limitrange -n namespace-name # 获取对应namespace的limitrange配置文件
oc describe limitrange limitrange.config -n namespace-name # 查看配置文件详情
oc edit limitrange limitrange.config -n namespace-name # 修改limitrange配置
oc project project-name # 切换到project
oc adm policy add-scc-to-user anyuid -z default # 为该project下default账户开启anyuid,可以使用root权限,一般是安装/运行某些软件时需要
oc adm policy remove-scc-from-user anyuid -z default # 删除default的anyuid权限
oc get pod # 查看该project下的pod
oc get service # 查看该project下的service
oc get endpoints # 查看该project下的Endpoints
oc delete pod pod-name -n namespace-name # 重启pod
oc rollout history DeploymentConfig/dc-name # 查看dc历史版本
oc rollout history DeploymentConfig/dc-name --revision=5 # 回滚到5版本
oc scale dc pod-name --replicas=3 -n namespace # 设置副本数为3
oc autoscale dc dc-name --min=2 --max=10 --cpu-percent=80 # 设置自动伸缩最小2,最大10
oc get scc # 查看scc
oc describe scc anyuid # 查看anyuid详细信息,user即包含已经开启anyuid的project
oc describe clusterrole.rbac # 查看集群管理员角色及权限
oc describe clusterrolebinding.rbac # 查看用户组及绑定的角色
oc adm policy add-cluster-role-to-user cluster-admin username # 添加username为cluster-admin
oc get routes --all-namespaces # 查看所有namespace的route
oc logs -f pod-name # 查看pod log
docker ps -a|grep pod-name # 查看pod对应containerID
docker exec -it containerID /bin/sh # 登录到container
oc new-project my-project # 创建project
oc status # 查看当前项目状态
oc api-resources # 查看server支持的api资源
oc adm must-gather # 收集当前集群的状态信息
oc adm top pods # 查看pod资源状态
oc adm top node # 查看节点资源状态
oc adm top images # 查看images使用情况
oc adm cordon node1 # 标记node1为SchedulingDisabled
oc adm manage-node <node1> --schedulable = false # 标记node1为unschedulable
oc adm manage-node node1 --schedulable # 标记node1为schedulable
oc adm drain node1 # 将node1进入维护模式
oc delete node # 删除node
oc get csr # 查询CSR(certificate signing requests)
oc adm certificate approve csr-name # approve CSR
oc adm certificate deny csr_name # 拒掉csr
oc get csr|xargs oc adm certificate approve csr # approve所有csr
echo 'openshift_master_bootstrap_auto_approve=true' >> /etc/ansible/hosts # 设置自动approve csr
oc get project projectname # get project
oc describe project projectname # 查看project信息
oc get pod pod-name -o yaml # 查看pod-name yaml文件
oc get nodes --show-labels # 查看节点label
oc label nodes node-name label-key=label-value # 给node添加标签,pod也要有对应label(在第二层spec下添加nodeSelector),pod就会运行在指定的node上
oc label nodes node-name key=new-value --overwrite # 更新指定key值的label
oc label nodes node-name key- # 删除指定key值的label
oc adm manage-node node-name --list-pods # 查看运行在某node上的所有pod
oc adm manage-node node-name --schedulable=false # mark node as unschedulable
oc login --token=iz56jscHZp9mSN3kHzjayaEnNo0DMI_nRlaiJyFmN74 --server=https://console.qa.c.sm.net:8443 # 使用命令行登录openshift,token是用你自己的账户在登录网址时生成的token
TOKEN=$(oc get secret $(oc get serviceaccount default -o jsonpath='{.secrets[0].name}') -o jsonpath='{.data.token}' | base64 --decode ) # get token
APISERVER=$(oc config view --minify -o jsonpath='{.clusters[0].cluster.server}') # get apiserver
curl $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure # curl API
oc rsh pod-name # 使用命令行登录pod
oc whoami --show-server # 查看当前登录的服务地址
oc whoami # 查看当前登录账户
oc get dc -n namespace # 查看deployment config
oc get deploy -n namespace # 查看deploy
oc edit deploy/deployname -o yaml -n namespace # 编辑deploy yaml文件
oc get cronjob # 查看cronjob
oc edit cronjob/cronjob-name -n namespace-name # 编辑cronjob
oc describe cm/configmap-name -n namespace-name # 查看cronjob
oc get configmap -n namespace-name # 查询configmap
oc get cm -n namespace-name # 查询configmap
cat /etc/origin/master/master-config.yaml|grep cidr # 查看集群pod网段规划
oc edit vm appmngr54321-poc-msltoibh -n appmngr54321-poc -o yaml # 编辑VM yaml文件
oc serviceaccounts get-token sa-name # 获取sa-name的token
oc login url --token=token # 使用token登录
oc scale deployment deployment-name --replicas 5 # 扩展pod副本数量
oc config view # 查看config
oc api-versions # get api versions
oc api-resources # get api-resources
oc get hpa --all-namespaces# 查询HPA
oc describe hpa/hpaname -n namespace # 查看HPA,可以看到Metrics,Events
oc create serviceaccount caller # 创建sa
oc adm policy add-cluster-role-to-user cluster-admin -z caller # 赋予cluster-admin权限
oc serviceaccounts get-token caller # get sa token
echo $KUBECONFIG # 获取kubeconfig文件位置
oc get cm --all-namespaces -l app=deviation # 根据labelselector筛选cm
oc describe PodMetrics podname # 查询pod CPU/mem usage,openshift4.X适用
oc api-resources -o wide # 查看shortnames、apiGroup、verbs
oc delete pod --selector logging-infra=fluentd # 按selector删除
oc get pods -n logger -w # watch 某个pod状态
oc explain pv.spec # 查看资源对象的定义
oc get MutatingWebhookConfiguration # 查看MutatingWebhook
oc get ValidatingWebhookConfiguration # 查看ValidatingWebhook
oc annotate validatingwebhookconfigurations <validating_webhook_name> service.beta.openshift.io/inject-cabundle=true # 给validatingwebhook注入CA
oc annotate mutatingwebhookconfigurations <mutating_webhook_name> service.beta.openshift.io/inject-cabundle=true # 给mutatingwebhook注入CA
oc get pv --selector=='path=testforocp' # 根据label查询pv
oc get cm --all-namespaces -o=jsonpath='{.items[?(@..metadata.annotations.cpuUsage=="0")].metadata.name}' # 根据自定义annotations查询,返回cm name
ns=$(oc get cm --selector='app=dev' --all-namespaces|awk '{print $1}'|grep -v NAMESPACE)
for i in $ns;do oc get cm dev -oyaml -n $i >> /tmp/test.txt; done;
问题排查
- 所有节点重启后,只有master状态是ready,其他是NotReady,如下
NAME STATUS ROLES AGE
infra.ocp.cn NotReady infra 6d
master.ocp.cn Ready master 6d
node1.ocp.cn NotReady compute 6d
node2.ocp.cn NotReady compute 6d
此时docker ps
会发现所有容器都未启动,然后手动启动所有容器:docker ps -aq | xargs docker restart
,发现容器之间是有依赖的,所以需要重复执行一下,在观察,发现总有那么几个容器一会自动退出了,并且在master上查看节点仍然是NotReady状态
原因有可能是node节点上的origin-node服务未启动,执行下列命令启动该进程即可
#启动
systemctl start origin-node
# 查看启动状态
systemctl status origin-node -l
这个服务会拉起所有的容器,那些起不来的容器会重新创建,然后稍等片刻,在master处可以看到节点是ready状态了。所以每次节点重启后除了master都要启动下该服务
那这个服务究竟是什么呢?k8s里面不是kubelet么,但是我没找到该服务,查看下这个服务的详情,它代表的应该是一个OpenShift Node
cat /etc/systemd/system/origin-node.service
[Unit]
Description=OpenShift Node
After=docker.service
After=chronyd.service
After=ntpd.service
Wants=docker.service
Documentation=https://github.com/openshift/origin
Wants=dnsmasq.service
After=dnsmasq.service
[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/origin-node
ExecStart=/usr/local/bin/openshift-node
LimitNOFILE=65536
LimitCORE=infinity
WorkingDirectory=/var/lib/origin/
SyslogIdentifier=origin-node
Restart=always
RestartSec=5s
TimeoutStartSec=300
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
参考
- https://www.jianshu.com/p/a83cece9f305
- https://docs.openshift.com/container-platform/3.10/install/running_install.html
- https://blog.aishangwei.net/?p=1179#2-2
- https://blog.csdn.net/qq_16240085/article/details/86004707
- https://docs.openshift.com/container-platform/3.11/welcome/index.html
- https://stackoverflow.com/questions/42363105/permission-denied-mkdir-in-container-on-openshift
以上是关于CentOS7.9上部署OpenShift3.11集群的主要内容,如果未能解决你的问题,请参考以下文章