Ambari Server网口带宽占用率很高问题的分析和解决办法
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Ambari Server网口带宽占用率很高问题的分析和解决办法相关的知识,希望对你有一定的参考价值。
Ambari是Hortonworks出一款开源Hadoop管理系统,是用python写的,目前市场是开源的Hadoop管理系统好像就只有这一个,虽然Ambari问题很多,也不好用,但也没办法了。
最近监控系统经常报警说一个url老是不可达,只是一个ambari服务器的url 。
于是登录到服务器上一探究竟。
用iftop查看网络状况,发现网络占用率很高,达到了700Mbps,而且一直居高不下,网卡最大带宽是1000Mbps,被用掉了一大半,怪不得会报警,这是不正常的。
再仔细看,发现ambari server和每个slavenodes都有很多包传输,肯定和某个服务有关。怀疑是Ganglia。
保险起见,我登录到另外一个ambari server上,发现网络占用率很低,1Mbps都不到。
用tcpdump抓包,然后用wireshark分析。
tcpdump -i bond0 ‘src host SLAVENODE‘ -w traffic.cap -G 60 -W 1
-G:override previous saved file very n seconds .
-W: Keep n files
抓好包后把,包传到自己电脑上,用wireshark分析,发现几乎所有包都是metrics相关的,推断肯定是ganglia ,因为另外一个ambari server上没有装ganglia。
虽然ganglia服务已经被停掉了,但估计是agents端在一直不停的发metrics给server端,导致了很高的网络占用率。
于是删掉ganglia,在ambari web UI中没法删掉ganglia,只能调用ambari API 来做。
#STOP THE GANGLIA SERVICE
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}‘ https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/services/GANGLIA
#STOP THE SERVER AND MONITOR ON THE GANGLIA SERVER
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}‘ https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/hosts/<GANGLIA_SERVER_FQDN>/host_components/GANGLIA_SERVER
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}‘ https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/hosts/<GANGLIA_SERVER_FQDN>/host_components/GANGLIA_MONITOR
#STOP THE GANGLIA MONITOR ON ***EVERY*** NODE (REPEAT FOR EACH NODE WHERE GANGLIA IS MONITORING):
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}‘ https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/hosts/<HOST_FQDN>/host_components/GANGLIA_MONITOR
#STOP EACH SERVICE COMPONENT:
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Stop All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}‘ https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/services/GANGLIA/components/GANGLIA_SERVER
curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d ‘{"RequestInfo":{"context":"Stop All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}‘ https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/services/GANGLIA/components/GANGLIA_MONITOR
#RERUN TO CHECK ALL COMPONENTS ARE STOPPED:
curl --user admin:admin https://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/services/GANGLIA
#REMOVE THE GANGLIA SERVICE:
curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<AMBARI_NODE>:8080/api/v1/clusters/<CLUSTER_NAME>/services/GANGLIA
删掉ambari后,再用iftop看下网络占用率,恢复到正常水平。
本文出自 “Linux运维” 博客,谢绝转载!
以上是关于Ambari Server网口带宽占用率很高问题的分析和解决办法的主要内容,如果未能解决你的问题,请参考以下文章