Spring Boot整合Prometheus实现监控与报警全流程实战

大家好，今天我们来聊聊Spring Boot应用如何整合Prometheus，实现从监控数据采集到报警的全流程。Prometheus以其强大的数据模型、灵活的查询语言和高效的存储能力，在云原生监控领域占据着重要地位。通过将Prometheus与Spring Boot应用结合，我们可以实时了解应用的运行状态，及时发现并解决潜在问题。

1. Prometheus简介与核心概念

Prometheus是一个开源的系统监控和报警工具包。它以时间序列数据为核心，通过HTTP协议从目标服务抓取指标数据，并提供强大的查询语言PromQL进行数据分析。

时间序列数据（Time Series Data）: Prometheus存储的数据都是时间序列数据，由指标名称、标签集合和时间戳组成。例如，http_requests_total{method="GET", endpoint="/api/users"} 100 1678886400 表示在时间戳1678886400时，GET请求/api/users接口的总数为100。
指标（Metrics）: 指标是Prometheus监控的核心，它代表了被监控对象的状态或性能。Prometheus支持四种主要的指标类型：
- Counter: 计数器，单调递增的指标，例如请求总数、错误总数。
- Gauge: 仪表盘，可以任意变化的指标，例如CPU使用率、内存使用率。
- Histogram: 直方图，用于统计数据的分布情况，例如请求延迟的分布。
- Summary: 摘要，类似于直方图，但它会直接计算分位数（如95%分位数）。
PromQL（Prometheus Query Language）: Prometheus的查询语言，用于查询和聚合时间序列数据。PromQL功能强大，可以进行各种复杂的计算和分析。
Exporters: 用于将各种服务的指标暴露给Prometheus。例如，node_exporter用于暴露机器级别的指标，mysql_exporter用于暴露MySQL数据库的指标。

2. Spring Boot整合Prometheus的方案

Spring Boot整合Prometheus主要有两种方式：

Micrometer: Micrometer是一个应用程序监控工具外观。它提供了一组通用的API，可以方便地将应用程序的指标导出到不同的监控系统，包括Prometheus。这是官方推荐的方式。
直接暴露Prometheus格式的Endpoint: 可以直接创建一个Spring Boot Endpoint，返回符合Prometheus格式的指标数据。

我们重点介绍使用Micrometer的方式，因为它更灵活、更易于维护。

3. 使用Micrometer整合Prometheus

3.1 添加依赖

首先，需要在pom.xml文件中添加Micrometer和Prometheus的依赖：

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

3.2 配置Prometheus Endpoint

在application.properties或application.yml文件中配置Prometheus Endpoint：

management.endpoints.web.exposure.include=prometheus
management.metrics.export.prometheus.enabled=true

或者使用YAML格式：

management:
  endpoints:
    web:
      exposure:
        include: prometheus
  metrics:
    export:
      prometheus:
        enabled: true

这个配置会将/actuator/prometheus Endpoint暴露出来，Prometheus可以从这个Endpoint抓取指标数据。

3.3 收集应用指标

Micrometer提供了多种方式来收集应用指标。

自动收集: Micrometer会自动收集一些常用的指标，例如JVM指标、HTTP请求指标等。
手动收集: 可以使用MeterRegistry手动创建和更新指标。

3.3.1 自动收集指标

无需任何额外配置，Micrometer会自动收集以下指标：

jvm.memory.used: JVM已使用的内存量。
jvm.memory.max: JVM可使用的最大内存量。
jvm.gc.memory.allocated: JVM垃圾回收器分配的内存量。
jvm.gc.pause: JVM垃圾回收器的暂停时间。
process.cpu.usage: 进程的CPU使用率。
http.server.requests: HTTP服务器的请求指标（包括请求数量、请求延迟等）。

3.3.2 手动收集指标

可以使用MeterRegistry来手动创建和更新指标。首先，在Spring Boot应用中注入MeterRegistry：

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

@Component
public class CustomMetrics {

    private final Counter customCounter;

    @Autowired
    public CustomMetrics(MeterRegistry registry) {
        customCounter = Counter.builder("custom.counter")
                .description("A custom counter metric")
                .tags("environment", "production")
                .register(registry);
    }

    public void incrementCustomCounter() {
        customCounter.increment();
    }
}

在这个例子中，我们创建了一个名为custom.counter的计数器，并添加了environment=production的标签。然后，我们可以通过调用incrementCustomCounter()方法来增加计数器的值。

3.3.3 示例：监控API调用次数

假设我们需要监控某个API接口的调用次数。我们可以使用Timer来记录请求的延迟，并使用Counter来记录请求的总数。

import io.micrometer.core.annotation.Timed;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class MyController {

    @GetMapping("/api/hello")
    @Timed(value = "api.hello.requests", description = "Time taken to process hello API")
    public String hello() {
        // 模拟业务逻辑
        try {
            Thread.sleep(100);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "Hello, world!";
    }
}

在这个例子中，我们使用了@Timed注解来自动记录/api/hello接口的请求延迟。Micrometer会自动创建一个名为api.hello.requests的Timer指标，用于记录请求延迟。同时，Micrometer也会自动记录请求的总数。访问几次/api/hello接口后，访问/actuator/prometheus，可以看到以下指标：

# HELP api_hello_requests_seconds Time taken to process hello API
# TYPE api_hello_requests_seconds summary
api_hello_requests_seconds{environment="production",quantile="0.5",} 0.1012492
api_hello_requests_seconds{environment="production",quantile="0.9",} 0.1012492
api_hello_requests_seconds{environment="production",quantile="0.95",} 0.1012492
api_hello_requests_seconds{environment="production",quantile="0.99",} 0.1012492
api_hello_requests_seconds_count{environment="production",} 3.0
api_hello_requests_seconds_sum{environment="production",} 0.3037476
# HELP api_hello_requests_seconds_max  api_hello_requests_seconds_max
# TYPE api_hello_requests_seconds_max gauge
api_hello_requests_seconds_max{environment="production",} 0.1012492

4. 配置Prometheus抓取指标

安装并启动Prometheus。修改prometheus.yml配置文件，添加Spring Boot应用的抓取目标：

scrape_configs:
  - job_name: 'spring-boot-app'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:8080'] # 替换为你的Spring Boot应用的地址

job_name: 任务名称，用于标识抓取的目标。
metrics_path: Prometheus抓取指标的路径，这里配置为/actuator/prometheus。
scrape_interval: Prometheus抓取指标的频率，这里配置为每5秒抓取一次。
static_configs: 静态配置，用于指定抓取的目标地址。

重启Prometheus，Prometheus就会开始抓取Spring Boot应用的指标数据。

5. 使用PromQL查询指标

通过Prometheus的Web UI，可以使用PromQL查询指标数据。例如，查询api.hello.requests_seconds_count指标：

api_hello_requests_seconds_count{environment="production"}

可以查询过去5分钟内api.hello.requests_seconds_count指标的变化：

rate(api_hello_requests_seconds_count{environment="production"}[5m])

6. 配置报警规则

Prometheus本身并不提供报警功能，它需要与Alertmanager配合使用。Alertmanager负责接收来自Prometheus的报警信息，并根据配置的规则发送报警通知。

6.1 安装和配置Alertmanager

下载并安装Alertmanager。修改alertmanager.yml配置文件，配置报警规则和接收者：

route:
  receiver: 'email-receiver'
  repeat_interval: 5m # 多久重复发送报警通知

receivers:
  - name: 'email-receiver'
    email_configs:
      - to: '[email protected]' # 替换为你的邮箱地址
        from: '[email protected]'
        smarthost: 'smtp.example.com:587' # 替换为你的SMTP服务器地址
        auth_username: 'alertmanager'
        auth_password: 'your_password'
        require_tls: true

route: 路由配置，用于指定报警信息应该发送给哪个接收者。
receiver: 接收者配置，用于指定报警通知的发送方式。

6.2 配置Prometheus报警规则

修改prometheus.yml配置文件，添加报警规则：

rule_files:
  - "rules.yml"

创建一个名为rules.yml的文件，定义报警规则：

groups:
  - name: example
    rules:
      - alert: HighRequestLatency
        expr: sum(rate(api_hello_requests_seconds_sum{environment="production"}[5m])) / sum(rate(api_hello_requests_seconds_count{environment="production"}[5m])) > 0.2
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "High request latency detected"
          description: "Request latency is higher than 0.2 seconds for more than 1 minute."

alert: 报警名称。
expr: PromQL表达式，用于判断是否触发报警。
for: 持续时间，只有当表达式的结果持续满足这个时间后，才会触发报警。
labels: 标签，用于对报警进行分类。
annotations: 注解，用于提供报警的详细信息。

在这个例子中，我们定义了一个名为HighRequestLatency的报警规则。如果/api/hello接口的平均请求延迟在5分钟内持续高于0.2秒超过1分钟，就会触发报警。

重启Prometheus，Prometheus就会根据配置的规则评估指标数据，并将报警信息发送给Alertmanager。

7. 报警通知

当Alertmanager接收到来自Prometheus的报警信息后，会根据配置的接收者发送报警通知。在上面的例子中，Alertmanager会将报警通知发送到指定的邮箱地址。

8. 使用Grafana进行可视化

虽然Prometheus自带的Web UI可以查询指标数据，但它在可视化方面比较弱。Grafana是一个流行的开源数据可视化工具，可以与Prometheus无缝集成。

8.1 安装和配置Grafana

下载并安装Grafana。启动Grafana，添加Prometheus数据源，配置Prometheus的地址。

8.2 创建Dashboard

在Grafana中创建Dashboard，添加Panel，选择Prometheus数据源，使用PromQL查询指标数据，并配置图表的样式。

例如，可以创建一个Panel来显示/api/hello接口的请求总数：

数据源: Prometheus
PromQL: sum(rate(api_hello_requests_seconds_count{environment="production"}[5m]))
图表类型: Graph

9. 代码示例总结

下面是一个完整的Spring Boot应用整合Prometheus的代码示例：

pom.xml

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-core</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
    </dependency>
</dependencies>

application.properties

management.endpoints.web.exposure.include=prometheus
management.metrics.export.prometheus.enabled=true

MyController.java

import io.micrometer.core.annotation.Timed;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class MyController {

    @GetMapping("/api/hello")
    @Timed(value = "api.hello.requests", description = "Time taken to process hello API")
    public String hello() {
        // 模拟业务逻辑
        try {
            Thread.sleep(100);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "Hello, world!";
    }
}

CustomMetrics.java (可选)

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

@Component
public class CustomMetrics {

    private final Counter customCounter;

    @Autowired
    public CustomMetrics(MeterRegistry registry) {
        customCounter = Counter.builder("custom.counter")
                .description("A custom counter metric")
                .tags("environment", "production")
                .register(registry);
    }

    public void incrementCustomCounter() {
        customCounter.increment();
    }
}

10. 总结：整合Prometheus，提升应用监控能力

通过以上步骤，我们就完成了Spring Boot应用与Prometheus的整合。我们可以使用Prometheus抓取应用的指标数据，使用PromQL查询和分析数据，使用Alertmanager配置报警规则，使用Grafana进行数据可视化。这套方案能够帮助我们实时了解应用的运行状态，及时发现并解决潜在问题，提升应用的可靠性和稳定性。

11. 小结：Prometheus + Micrometer，监控方案更灵活

通过Micrometer可以将Spring Boot应用指标轻松暴露给Prometheus。使用@Timed注解能够简化监控代码的编写。

12. 小结：PromQL + Alertmanager，报警配置更强大

PromQL提供了强大的查询能力，可以根据实际需求定制报警规则。Alertmanager负责接收报警信息并发送通知。

13. 小结：Grafana + Prometheus，可视化展示更直观

Grafana能够与Prometheus无缝集成，提供丰富的图表类型和灵活的配置选项。通过Grafana，我们可以将监控数据以直观的方式展示出来。

Spring Boot整合Prometheus实现监控与报警全流程实战

发表回复 取消回复

发表回复取消回复