一步一步搭建ZooKeeper + Mesos + Marathon平台管理Docker集群
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了一步一步搭建ZooKeeper + Mesos + Marathon平台管理Docker集群相关的知识,希望对你有一定的参考价值。
最近在Youtube看Docker视频的过程中不幸看到了Mesos的介绍,然后就有一种一见如故的感觉,最终根据mesosphere官网的文档在IBM的Bluemix虚拟机上搭建了基于ZooKeeper + Mesos + Marathon的平台。
搭建之前先简单了解下各个组件是做什么的。(来自wikipedia及其他网络)
ZooKeeper:Zookeeper 分布式服务框架是 Apache Hadoop 的一个子项目,它主要是用来解决分布式应用中经常遇到的一些数据管理问题,如:统一命名服务、状态同步服务、集群管理、分布式应用配置项的管理等。官网:https://zookeeper.apache.org/
Mesos:来自mesosphere的定义是Mesos是下一代的云数据中心的kernel,它是Apache下的开源分布式资源管理框架,作者之一Benjamin在Mesosconf上一直强调Mesos只做kernel的事情,只做scheduling,并不实际运行任务。现该作者也去了Mesosphere公司做云数据中心操作系统DC/OS的工作(看起来很令人兴奋的软件)。
Marathon:是Mesos的一个框架,能够支持运行长期任务,也与这个名字有点关联,马拉松本身就是长时间要完成的任务,它可以提供REST API服务,可以通过HAProxy实现服务发现和负载均衡。(负载均衡可以看mesosphere公司开源的marathon-lb https://github.com/mesosphere/marathon-lb/)
Docker:就不用介绍了吧。记住它是应用容器引擎,可以给微服务提供完美的运行环境,尽量一个容器只有一个服务。
下面说一下服务器的运行环境:总共六台服务器,三台运行Mesos-master,4台运行Mesos-slave,服务器都是IBM Bluemix上申请的虚拟机,处于一个vpn网络里,底层是基于Openstack的架构,在底层是Softlayer和Cloudfoundry。
Server Name | Internal Ip | Operation System | Roles |
bastion.shanker | 192.168.0.33 | Centos 6.7 | Mesos Master1,ZooKeeper, slave, Jenkins,Haproxy |
dbmaster.shanker | 192.168.0.28 | Centos 6.7 | Mesos Master2,ZooKeeper |
dbslave2.shanker | 192.168.0.31 | Centos 6.7 | Mesos Master3,ZooKeeper |
dbslave3.shanker | 192.168.0.32 | Centos 6.7 | Mesos Slave |
dbslave.shanker | 192.168.0.29 | Ubuntu 14.04 | Mesos Slave |
dbarbiter.shanker | 192.168.0.30 | Ubuntu 14.04 | Mesos Slave,mysql Slave |
软件安装:
RedHat 6 / CentOS 6
# Add the repository sudo rpm -Uvh http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm yum -y install mesos marathon zookeeper
所有机器用ansible安装Java:
ansible mesos -m shell -a "wget http://download.oracle.com/otn-pub/java/jdk/8u73-b02/jdk-8u73-linux-x64.tar.gz " ansible mesos -m shell -a "tar zxf jdk-8u73-linux-x64.tar.gz -C /usr/java/
导入jre到.zshrc,然后用ansible分发下去:
export JAVA_HOME=/usr/java/jdk1.8.0_73 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tool.jar
Master Node Setup
ZooKeeper的配置:
在三台ZooKeeper server上设置ID,这个数字必须是1到255的整数,并且每个节点的ID是不一样的。
sudo zookeeper --server-initialize --myid=<YOUR ID HERE>
编辑/etc/zookeeper/conf/zoo.cfg,在每台机器上都需要添加zk 集群的服务器名称,ip地址,端口号等信息:server.A=B:C:D
server.1=192.168.0.33:2888:3888 server.2=192.168.0.28:2888:3888 server.3=192.168.0.31:2888:3888
A:代表一个1-255的整数,是第几号服务器,可以随便定义。
B:代表服务器的ip地址。
C:代表服务器与集群中的Leader通信的端口。
D:代表如果集群中的Leader down了,需要用这个端口来重新选举,如果使用一台主机多个zk实例,这个D的值不能与C相同。
zk最终的配置:
# egrepv /etc/zookeeper/conf/zoo.cfg maxClientCnxns=50 tickTime=2000 initLimit=10 syncLimit=5 dataDir=/var/lib/zookeeper clientPort=2181 server.1=192.168.0.33:2888:3888 server.2=192.168.0.28:2888:3888 server.3=192.168.0.31:2888:3888
一份简单的zk配置就完成了,sudo service zookeeper restart。
Mesos & Marathon的配置:
在每一个master节点添加以下文件,并把master的ip地址写进去:
# cat /etc/mesos/zk zk://192.168.0.33:2181,192.168.0.28:2181,192.168.0.31:2181/mesos
Quorum的设定原则是Mesos master的数量除以2得到的整数,我们用了3个maser,除以2 是1.5,得到的整数是2,所以这里用2
# cat /etc/mesos-master/quorum 2
Hostname的设置是可选的,如果有DNS的前提下,可以不用设置,但是为了避免后期出现不能解析域名的情况,我这里设置的hostname,在每一台机器(包括slave)的/etc/mesos-{master,slave}/hostname的文件里写上改机器的ip地址,注意:每台机器只需要写自己机器的ip即可,不需要把所有机器的ip都写进去:
$ ansible all -m shell -a ‘cat /etc/mesos-master/hostname‘ -s bastion | success | rc=0 >> 192.168.0.33 dbmaster | success | rc=0 >> 192.168.0.28 dbslave2 | success | rc=0 >> 192.168.0.31
然后将hostname 复制到Marathon的目录一份:
cp /etc/mesos-master/hostname /etc/marathon/conf/
如果你不需要master的机器跑slave的话,需要将slave的功能override
sudo stop mesos-slave sudo sh -c "echo manual > /etc/init/mesos-slave.override"
然后没有问题的话就可以start Mesos-master和Marathon了。
ansible master -m shell -a "start mesos-msater && start marathon"
Slave Node Setup
安装mesos-slave软件
Debian/Ubuntu
# Setup # sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E56151BF # DISTRO=$(lsb_release -is | tr ‘[:upper:]‘ ‘[:lower:]‘) # CODENAME=$(lsb_release -cs) # # # Add the repository # echo "deb http://repos.mesosphere.com/${DISTRO} ${CODENAME} main" | # sudo tee /etc/apt/sources.list.d/mesosphere.list # sudo apt-get -y update # sudo apt-get -y install mesos
RedHat/CentOS6
# # Add the repository # sudo rpm -Uvh # sudo yum -y install mesos
如果系统使用的是Ubuntu like的系统,需要将自动安装的ZooKeeper禁用
sudo service zookeeper stop sudo sh -c "echo manual > /etc/init/zookeeper.override"
配置zk,跟master节点的内容一样,可以使用ansible 分发到slave节点:
# cat zk zk://192.168.0.33:2181,192.168.0.28:2181,192.168.0.31:2181/mesos
要支持运行Docker,还需要在slave节点/etc/mesos-slave/添加两个文件:containerizers,executor_registration_timeout,文件内容如下:
# ibmcloud at dbarbiter.shanker in /etc/mesos-slave [15:43:23] $ cat containerizers docker,mesos # ibmcloud at dbarbiter.shanker in /etc/mesos-slave [15:43:27] $ cat executor_registration_timeout 5mins
需要配置的就这么多,接下来就是要在slave节点上禁用mesos-master,要不然选举的时候会出莫名其妙的错的
sudo service mesos-master stop sudo sh -c "echo manual > /etc/init/mesos-master.override"
然后开启slave服务
Ubuntu:
sudo service mesos-slave restart
CentOS:
sudo start mesos-slave
然后在浏览器里输入一个master节点的ip:5050看看是否有Mesos的console出现,如果输入的ip地址不是leader,mesos会自动跳转到Leader的ip地址上:
登录进Mesos-master的dashborad之后会发现有显示Cluster Name, 当前Leader的ip地址,有多少Slaves, Resources具体情况,Active Tasks,即当前的任务都有哪些,Completed Tasks,:
如果不确认当前的Marathon的Leader ip是哪个,可以看Mesos-master的Framework页面,来查看Marathon的ip:
然后Marathon的界面如下:
利用这个UI界面可以很容易的创建应用,并且可以以json的方式编辑,只要右上角这个JSON Mode被选中即可:
比如我们创建一个python的应用程序,来看看后来Docker是否正常运行:
有了这个平台创建基于Docker的应用非常容易,而且扩展起来也是秒级的,比如我们创建了一个nginx的应用,然后需要扩展到5个,只需要点击Scale Application,输入5,后台就自动扩展了5个Nginx的应用。
然后就是故障自动转移,加入我们去后台终止一个正在运行的容器,Marathon会自动检测到正在运行的任务与设置的不一样,而立马新创建一个应用。
初了用web ui的方式来创建任务,也可以使用API的形式:
比如我这里有个2048游戏的json格式的文件,使用的image为nginx:latest,将本地的/home/ibmcloud/2048-maser/ 映射到容器的nginx根目录,采用BRIDGE网络,expose容器的80端口,并对改容器进行http的健康检查:
$ cat nginx-bridge-2048game.json { "id": "/nginx", "cpus": 1, "mem": 128, "disk": 0, "instances": 1, "container": { "type": "DOCKER", "volumes": [ { "containerPath": "/usr/share/nginx/html/", "hostPath": "/home/ibmcloud/2048-master/", "mode": "RO" } ], "docker": { "image": "nginx", "network": "BRIDGE", "portMappings": [ { "containerPort": 80, "hostPort": 0, "protocol": "tcp", "servicePort": 80, "labels": {} } ], "privileged": false, "parameters": [], "forcePullImage": false } }, "healthChecks": [ { "path": "/", "protocol": "HTTP", "portIndex": 0, "gracePeriodSeconds": 120, "intervalSeconds": 30, "timeoutSeconds": 5, "maxConsecutiveFailures": 3, "ignoreHttp1xx": false } ], "portDefinitions": [ { "port": 10003, "protocol": "tcp", "labels": {} } ] }
在Marathon的Leader机器上运行
# curl -u username:password -i -H ‘Content-Type: application/json‘ [email protected] 192.168.0.33:8080/v2/apps HTTP/1.1 201 Created Date: Tue, 07 Jun 2016 07:59:17 GMT X-Marathon-Leader: http://192.168.0.33:8080 Cache-Control: no-cache, no-store, must-revalidate Pragma: no-cache Expires: 0 Location: http://192.168.0.33:8080/v2/apps/nginx Content-Type: application/json; qs=2 Transfer-Encoding: chunked Server: Jetty(9.3.z-SNAPSHOT) {"id":"/nginx","cmd":null,"args":null,"user":null,"env":{},"instances":1,"cpus":1,"mem":128,"disk":0,"executor":"","constraints":[],"uris":[],"fetch":[],"storeUrls":[],"ports":[80],"portDefinitions":[{"port":80,"protocol":"tcp","labels":{}}],"requirePorts":false,"backoffSeconds":1,"backoffFactor":1.15,"maxLaunchDelaySeconds":3600,"container":{"type":"DOCKER","volumes":[{"containerPath":"/usr/share/nginx/html/","hostPath":"/home/ibmcloud/2048-master/","mode":"RO"}],"docker":{"image":"nginx","network":"BRIDGE","portMappings":[{"containerPort":80,"hostPort":0,"servicePort":80,"protocol":"tcp","labels":{}}],"privileged":false,"parameters":[],"forcePullImage":false}},"healthChecks":[{"path":"/","protocol":"HTTP","portIndex":0,"gracePeriodSeconds":120,"intervalSeconds":30,"timeoutSeconds":5,"maxConsecutiveFailures":3,"ignoreHttp1xx":false}],"readinessChecks":[],"dependencies":[],"upgradeStrategy":{"minimumHealthCapacity":1,"maximumOverCapacity":1},"labels":{},"acceptedResourceRoles":null,"ipAddress":null,"version":"2016-06-07T07:59:18.003Z","residency":null,"tasksStaged":0,"tasksRunning":0,"tasksHealthy":0,"tasksUnhealthy":0,"deployments":[{"id":"c30b9f6e-786c-45bd-9e7e-cc6cc001d91d"}],"tasks":[]}#
为了方便,可以将这个命令写入到脚本里运行:
$ cat runapp.sh #!/bin/bash curl -i -H ‘Content-Type: application/json‘ [email protected]"$1" 192.168.0.33:8080/v2/apps
可以看到我们的2048游戏已经可以玩了:
最后强调一下安全问题,因为我把Marathon暴漏到公网了,而且没加认证,导致有坏蛋扫描到我的机器,并且创建了一个Metasploit的任务对我的服务器进行渗透,好在我的安全性做的还可以,凶手没有得逞,参考这篇文章:http://shanker.blog.51cto.com/1189689/1785797
然后marathon开启认证的方式是这样运行的:
marathon --http_credentials "username:password"
注意问题:
1. 需要添加MESOS_QUORUM到/etc/default/mesos-master, 即使/etc/mesos-master/quorum 有这个,也必须要做这一步(来自stackoverflow的解决方案)
# cat /etc/default/mesos-master
PORT=5050
ZK=`cat /etc/mesos/zk`
MESOS_QUORUM=`cat /etc/mesos-master/quorum`
2. slave start fails, need to delete the slave.info file /tmp/mesos/meta/slaves/latest/slave.info
3.当遇到主节点不是当前节点,浏览器自动转发的时候,用到的转发的文件是/etc/mesos-master/hostname,是每台机器的hostname只写自己的ip地址,而不是将所有master节点ip都写上,
然后marathon 的hostname同master的hostname,cp 过去即可。
4.遇到这个问题是因为你的slave没有禁用mesos-master,或者开启了mesos-master进程
Failed to create ZooKeeper, zookeeper_init: No such file or directory
5.遇到以下问题需要查看/etc/mesos-master/quorum 是否都设置的数量,而且/etc/default/mesos-master配置文件也有QUORUM的设置,请参考问题1
Replica in VOTING status received a broadcasted recover request from 192.168.0.33:37798
6. 查看mesos 网页的的LOG ,出现以下内容,才是正常的
I0524 07:53:33.242832 27506 http.cpp:312] HTTP GET for /master/state from 124.193.167.1:18148 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36‘
I0524 07:53:41.834223 27507 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
I0524 07:53:44.241183 27509 http.cpp:312] HTTP GET for /master/state from 124.193.167.1:18148 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36‘
I0524 07:53:52.034374 27502 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
I0524 07:53:55.241883 27504 http.cpp:312] HTTP GET for /master/state from 124.193.167.1:18148 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36‘
I0524 07:54:02.236343 27505 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
I0524 07:54:06.243017 27508 http.cpp:312] HTTP GET for /master/state from 124.193.167.1:18148 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36‘
I0524 07:54:13.141284 27507 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
I0524 07:54:22.238883 27509 http.cpp:312] HTTP GET for /master/state from 124.193.167.1:18148 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36‘
I0524 07:54:23.345165 27509 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
I0524 07:54:33.545162 27509 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
I0524 07:54:34.240898 27503 http.cpp:312] HTTP GET for /master/state from 124.193.167.1:18148 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36‘
I0524 07:54:43.755383 27504 http.cpp:312] HTTP GET for /master/state from 192.168.0.33:37798 with User-Agent=‘Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0‘
8.要支持Docker任务,一定要添加这两个文件到salve下。
/etc/mesos-slave/{containerizers,executor_registration_timeout }
参考文档:
https://open.mesosphere.com/getting-started/install/
https://www.youtube.com/watch?v=hZNGST2vIds&feature=youtu.be
https://www.youtube.com/watch?v=EgYyf3bSb8Q
https://www.youtube.com/watch?v=_uw1ISM_uRU
欢迎补充!
本文出自 “天涯海阁” 博客,请务必保留此出处http://shanker.blog.51cto.com/1189689/1787008
以上是关于一步一步搭建ZooKeeper + Mesos + Marathon平台管理Docker集群的主要内容,如果未能解决你的问题,请参考以下文章