在 mesosphere DCOS 集群上为 prometheus 设置云监视导出器

Posted 2023-02-16

技术标签:

【中文标题】在 mesosphere DCOS 集群上为 prometheus 设置云监视导出器【英文标题】：Setting up cloud watch exporter for prometheus on mesosphere DCOS cluster 【发布时间】：2015-11-03 04:49:00 【问题描述】：

我已经在我的 AWS mesosphere DCOS 集群上为 Prometheus 设置了云手表导出器。我启用了“CloudWatchFullAccess”政策。但是仪表，“cloudwatch_exporter_scrape_error”显示非零值。我想知道为什么刮擦会出错。

在哪里可以查看日志或如何调试此问题？

我使用的配置文件也是


   "region": "ap-southeast-1",
   "metrics": [
        "aws_namespace": "AWS/ELB", "aws_metric_name": "HealthyHostCount",
         "aws_dimensions": ["AvailabilityZone", "LoadBalancerName"],
         "aws_dimension_select": "LoadBalancerName": ["name of my LB"],
         "aws_statistics": ["Sum"]
        
      ]

但是除了米我没有任何米-cloudwatch_requests_total, cloudwatch_exporter_scrape_duration_seconds, cloudwatch_exporter_scrape_error 暴露于普罗米修斯。

如何从 cloudwatch_exporter 获取额外的仪表？

【问题讨论】：

cloudwatch_exporter 的作者在这里，您能验证一下curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLENAME 下显示的凭据吗？另外，stderr/stdout 有输出吗？感谢您调查这个问题。当我尝试执行 curl 命令时，出现“无法连接到 169.254.169.254 端口 80：连接被拒绝”错误在我的设置中，cloudwatch_exporter 正在侦听端口“9106”。所以我在 http://:9106/metrics 上查找指标。所有指标都应该在这里公开，对吧？如果您无法访问 169.254.169.254，这听起来像是您机器上的网络问题。作为一种解决方法，我建议创建一个 IAM 用户并将凭证放入环境变量中。是的，他们会在那里。 【参考方案1】：

您似乎在尝试使用 IAM 实例配置文件，但您无法访问 http://169.254.169.254。这是您的网络设置的某种形式的问题，因为这应该在 EC2 上开箱即用。

你有两个选择。

cloudwatch:ListMetrics

cloudwatch:GetMetricStatistics

AWS_ACCESS_KEY_ID

AWS_SECRET_ACCESS_KEY

~/.aws/credentials

见https://github.com/prometheus/cloudwatch_exporter#credentials-and-permissions

【讨论】：

我正在运行 cloudwatch_exporter docker 容器。如何向 docker 容器提供 AWS 凭证？在 docker 容器中提供正确的凭据后，我从 cloud watch exporter 获得了相关指标。感谢您的帮助！【参考方案2】：

下面是我的配置文件


    "region": "us-west-2",
    "metrics": [
    "aws_namespace": "AWS/ELB", "aws_metric_name": "HealthyHostCount",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Average"],
    "aws_namespace": "AWS/ELB", "aws_metric_name": "UnHealthyHostCount",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Average"],
    "aws_namespace": "AWS/ELB", "aws_metric_name": "RequestCount",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Sum"],
    "aws_namespace": "AWS/ELB", "aws_metric_name": "Latency",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Average"],
    "aws_namespace": "AWS/ELB", "aws_metric_name": "SurgeQueueLength",
     "aws_dimensions": ["us-west-2a", "test"], "aws_statistics": ["Maximum",    "Sum"],
    ]

我可以看到下面的输出

cloudwatch_requests_total 10.0

cloudwatch_exporter_scrape_duration_seconds 2.571412647

cloudwatch_exporter_scrape_error 0.0

但为什么不是其他指标呢？

【讨论】：

对于命名空间 AWS/ELB，有效的 aws_dimensions 是 LoadBalancerName 和 AvailabilityZone。请参阅link。您需要更改配置文件，并输入有效的维度或不输入维度来获取命名空间 AWS/ELB 的指标。希望这会有所帮助。它对我有用，我有疑问，所以如果我们在 docker 中部署它，它只会在原生主机时废弃指标？

以上是关于在 mesosphere DCOS 集群上为 prometheus 设置云监视导出器的主要内容，如果未能解决你的问题，请参考以下文章

谈谈Apache Mesos和Mesosphere DCOS：历史架构发展和应用

DCOS实践分享：基于DCOS的大数据应用分享

dcos下rexray服务的配置

DC/OS安装

使用ARM模板在Azure中国大规模部署DCOS集群

弹性集成Apache Mesos与Apache Kafka框架