网站首页 > 厂商资讯 > deepflow >

Prometheus 指标聚合与筛选技巧

在当今大数据时代，监控系统对于企业来说至关重要。Prometheus 作为一款开源的监控和告警工具，因其强大的功能、灵活的配置和易于扩展的特性，在众多监控系统中脱颖而出。本文将深入探讨 Prometheus 指标聚合与筛选技巧，帮助您更好地利用 Prometheus 进行监控。

一、Prometheus 指标聚合简介

Prometheus 的核心概念是指标（Metrics），它通过抓取目标上的指标数据来实现监控。在 Prometheus 中，指标分为两种类型：瞬时指标和累积指标。瞬时指标表示在某一时刻的值，而累积指标表示从开始到某一时刻的累积值。

为了方便管理和查询，Prometheus 引入了指标聚合（Metric Relabeling）的概念。指标聚合允许用户根据特定的规则对指标进行重命名、标签添加、标签修改和标签过滤等操作。

二、Prometheus 指标聚合技巧

重命名指标

在 Prometheus 中，可以通过重命名指标来简化指标名称，提高可读性。例如，将 http_requests_total 重命名为 http_requests。

relabel_configs:

  - source_labels: [__name__]

    action: rename

    target_label: name

    regex: http_requests_(.*)_total

    replacement: $1

添加标签

有时，我们需要在指标中添加额外的标签来提供更多上下文信息。例如，为 http_requests 指标添加 status_code 标签。

relabel_configs:

  - source_labels: [__name__, status_code]

    action: labelmap

    regex: (http_requests_(.*?))_(\d+)

    target_label_name: {1}

    target_label_value: {2}

修改标签

如果需要修改标签的值，可以使用 labelmod 动作。例如，将 status_code 标签的值从数字转换为字符串。

relabel_configs:

  - source_labels: [status_code]

    action: labelmod

    regex: (.+)

    replacement: $1

    target_label: status_code_str

标签过滤

在指标聚合过程中，有时需要过滤掉某些标签。可以使用 labeldrop 动作来实现。

relabel_configs:

  - source_labels: [status_code]

    action: labeldrop

    regex: ^200$

三、Prometheus 指标筛选技巧

Prometheus 提供了丰富的查询语言（PromQL），允许用户对指标进行筛选、聚合和计算。以下是一些常见的筛选技巧：

标签匹配

使用 = 操作符进行标签匹配。例如，查询所有 status_code 为 200 的请求。

http_requests{status_code="200"}

标签范围匹配

使用 >=、<=、> 和 < 操作符进行标签范围匹配。例如，查询 status_code 在 200 到 300 之间的请求。

http_requests{status_code>=200, status_code<=300}

标签组合

使用 and 和 or 操作符进行标签组合。例如，查询 status_code 为 200 或 404 的请求。

http_requests{status_code="200" or status_code="404"}

时间范围匹配

使用 time() 函数进行时间范围匹配。例如，查询过去 5 分钟内的 http_requests。

http_requests{status_code="200"}[5m]

四、案例分析

假设您想监控一个网站的性能，并关注 200 和 404 状态码的请求。以下是一个 Prometheus 配置示例：

scrape_configs:

  - job_name: 'web'

    static_configs:

      - targets: ['192.168.1.1:9090']



alerting:

  alertmanagers:

    - static_configs:

        - targets:

          - '192.168.1.2:9093'



rule_files:

  - 'alerting_rules.yml'



prometheus.yml

在 alerting_rules.yml 文件中，您可以定义以下规则：

groups:

- name: web_rules

  rules:

  - alert: HighHTTPStatus200

    expr: count(http_requests{status_code="200"}[5m]) > 100

    for: 1m

    labels:

      severity: "high"

    annotations:

      summary: "High number of 200 status codes"

      description: "There are more than 100 200 status codes in the last 5 minutes."



  - alert: HighHTTPStatus404

    expr: count(http_requests{status_code="404"}[5m]) > 50

    for: 1m

    labels:

      severity: "high"

    annotations:

      summary: "High number of 404 status codes"

      description: "There are more than 50 404 status codes in the last 5 minutes."

通过以上配置，Prometheus 将监控网站性能，并在满足特定条件时触发告警。

总结，Prometheus 指标聚合与筛选技巧对于有效利用 Prometheus 进行监控至关重要。通过合理配置指标聚合和运用 PromQL，您可以轻松地筛选和查询所需的数据，从而实现对系统性能的全面监控。