关于Prometheus + Spring Boot 应用监控的问题

2026-02-19 07:41:09

1. Prometheus是什么

Prometheus是一个具有活跃生态系统的开源系统监控和告警工具包。一言以蔽之，它是一套开源监控解决方案。

Prometheus主要特性：

多维数据模型，其中包含由指标名称和键/值对标识的时间序列数据
PromQL，一种灵活的查询语言
不依赖分布式存储；单服务器节点是自治的
时间序列收集通过HTTP上的pull模型进行
通过中间网关支持推送（push）时间序列
通过服务发现或静态配置发现目标
支持多种模式的图形和仪表盘

为什么用pull（拉取）而不用push（推送）呢？

因为，pull有以下优势：

进行更改时，可以在笔记本电脑上运行监控
可以更轻松地判断目标是否下线
可以手动转到目标并使用Web浏览器检查其运行状况

目标暴露HTTP端点，Prometheus服务端通过HTTP主动拉取数据。既然是服务端自己主动向目标拉取数据，那么服务端运行在本地（我们自己的电脑上）也是可以的，只要能访问目标端点即可，同时就像心跳检测一样可以判断目标是否下线，还有，服务端自己主动拉取，那么想拉取谁的数据就拉取谁的数据，因而可以随意切换拉取目标。

回想一下Skywalking是怎么做的，SkyWalking有客户端和服务端，需要在目标服务上安装探针（agent），探针采集目标服务的指标数据，上报给服务端OAP服务，这个对目标有一定的侵入性，不过可以接受。Prometheus不需要探针，可以借助push gateway来实现push效果。

对了，有个名词要先说清楚，metrics （译：度量，指标），个人更倾向于把它翻译成指标，后面说指标就是metrics

2. 基本概念

2.1. 数据模型

Prometheus基本上将所有数据存储为时间序列：具有时间戳的值流，它们属于同一个指标和同一组标记的维度。除了存储的时间序列外，Prometheus还可以生成临时派生的时间序列作为查询的结果。

Metric names and labels

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

每个时间序列都由它的指标名称和称为标签的可选键/值对唯一标识。

样本构成实际的时间序列数据。每个样本包括：

一个64位的浮点值
一个毫秒时间戳

给定指标名称和一组标签，时间序列通常使用这种符号来标识：

<metric name>{<label name>=<label value>, ...}

例如，有一个时间序列，指标名称是api_http_requests_total，标签有method="POST"和handler="/messages"，那么它可能被表示成这样：

api_http_requests_total{method="POST", handler="/messages"}

2.2. 指标类型

Counter

counter是一个累积量度，代表一个单调递增的计数器，其值只能增加或在重新启动时重置为零。例如，可以使用计数器来表示已服务请求数，已完成任务或错误的数量。

不要使用计数器来显示可以减小的值。例如，请勿对当前正在运行的进程数使用计数器，代替的应该使用量规。

Gauge

量规是一种指标，代表可以任意上下波动的单个数值。

量规通常用于测量值，例如温度或当前内存使用量，还用于可能上升和下降的“计数”，例如并发请求数。

Histogram

直方图对观察结果（通常是请求持续时间或响应大小）进行抽样，并在可配置的桶中对它们进行计数。它还提供了所有观测值的总和。

一个基础指标名称为<basename>的直方图在抓取期间会暴露多个时间序列：

观察桶的累积计数器，表示为 <basename>_bucket{le="<upper inclusive bound>"}
所有观测值的总和，表示为 <basename>_sum
观察到的事件数量，表示为 <basename>_count

Summary

与直方图类似，摘要对观察结果（通常是请求持续时间和响应大小等内容）进行抽样分析。虽然它还提供了观测值的总数和所有观测值的总和，但它可以计算滑动时间窗口内的可配置分位数。

一个基础指标名称为<basename>的摘要在抓取期间暴露多个时间序列:

观察桶的累积计数器，表示为 <basename>_bucket{le="<upper inclusive bound>"}
所有观测值的总和，表示为 <basename>_sum
观察到的事件数量，表示为 <basename>_count

2.3. 作业和实例

在Prometheus的术语中，可以抓取的端点称为实例，通常对应于单个进程。具有相同目的的实例集合，称为作业。

例如，一个作业有四个实例：

job: api-server
instance 1: 1.2.3.4:5670
instance 2: 1.2.3.4:5671
instance 3: 5.6.7.8:5670
instance 4: 5.6.7.8:5671

当Prometheus抓取目标时，它会自动在抓取的时间序列上附加一些标签，以识别被抓取的目标：

job：目标所属的已配置的作业名称
instance：被抓取的目标URL的<host>:<port>部分

3. 安装与配置

Prometheus通过抓取指标HTTP端点从目标收集指标。由于Prometheus以相同的方式暴露自己的数据，因此它也可以抓取并监视其自身的健康状况。

默认情况下，不用更改配置，直接运行就可以抓取prometheus自身的健康状况数据

# Start Prometheus.
# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path)

./prometheus --config.file=prometheus.yml

直接访问 localhost:9090

访问 localhost:9090/metrics 可以查看各项指标

举个例子

输入以下表达式，点“Execute”，可以看到以下效果

prometheus_target_interval_length_seconds

这应该返回多个不同的时间序列（以及每个序列的最新值），每个序列的指标名称均为prometheus_target_interval_length_seconds，但具有不同的标签。

这个是以图形化的方式展示指标，通过localhost:9090/metrics查看也是一样的

如果我们只对99%的延迟感兴趣，我们可以使用以下查询：

prometheus_target_interval_length_seconds{quantile="0.99"}

为了计算返回的时间序列数，查询应该这样写：

count(prometheus_target_interval_length_seconds)

接下来，让我们利用Node Exporter来多添加几个目标：

tar -xzvf node_exporter-*.*.tar.gz
cd node_exporter-*.*

# Start 3 example targets in separate terminals:
./node_exporter --web.listen-address 127.0.0.1:8080
./node_exporter --web.listen-address 127.0.0.1:8081
./node_exporter --web.listen-address 127.0.0.1:8082

接下来，配置Prometheus来抓取这三个新目标

首先，定义一个名为'node'的作业，这个作业负责从这三个目标端点抓取数据。假设，想象前两个端点是生产环境的，另一个是非生产环境的，为了以示区别，我们将其打上两个不同的标签。在本示例中，我们将group="production"标签添加到第一个目标组，同时将group="canary"添加到第二个目标。

scrape_configs:
 - job_name:  'node'

 # Override the global default and scrape targets from this job every 5 seconds.
 scrape_interval: 5s

 static_configs:
  - targets: ['localhost:8080', 'localhost:8081']
  labels:
   group: 'production'

  - targets: ['localhost:8082']
  labels:
   group: 'canary'

3.1. 配置

为了查看所有的命令行参数，运行如下命令

./prometheus -h

配置文件是YAML格式的，可以使用 --config.file参数指定

配置文件的主要结构如下：

global:
 # How frequently to scrape targets by default.
 [ scrape_interval: <duration> | default = 1m ]

 # How long until a scrape request times out.
 [ scrape_timeout: <duration> | default = 10s ]

 # How frequently to evaluate rules.
 [ evaluation_interval: <duration> | default = 1m ]

 # The labels to add to any time series or alerts when communicating with
 # external systems (federation, remote storage, Alertmanager).
 external_labels:
 [ <labelname>: <labelvalue> ... ]

 # File to which PromQL queries are logged.
 # Reloading the configuration will reopen the file.
 [ query_log_file: <string> ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
 [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
 [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
 alert_relabel_configs:
 [ - <relabel_config> ... ]
 alertmanagers:
 [ - <alertmanager_config> ... ]

# Settings related to the remote write feature.
remote_write:
 [ - <remote_write> ... ]

# Settings related to the remote read feature.
remote_read:
 [ - <remote_read> ... ]

4. 抓取 Spring Boot 应用

Prometheus希望抓取或轮询单个应用程序实例以获取指标。 Spring Boot在 /actuator/prometheus 提供了一个actuator端点，以适当的格式提供Prometheus抓取。

为了以Prometheus服务器可以抓取的格式公开指标，需要依赖 micrometer-registry-prometheus

<dependency>
 <groupId>io.micrometer</groupId>
 <artifactId>micrometer-registry-prometheus</artifactId>
 <version>1.6.4</version>
</dependency>

下面是一个示例 prometheus.yml

scrape_configs:
 - job_name: 'spring'
 metrics_path: '/actuator/prometheus'
 static_configs:
  - targets: ['HOST:PORT']

接下来，创建一个项目，名为prometheus-example

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <parent>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-parent</artifactId>
  <version>2.4.3</version>
  <relativePath/> <!-- lookup parent from repository -->
 </parent>
 <groupId>com.cjs.example</groupId>
 <artifactId>prometheus-example</artifactId>
 <version>0.0.1-SNAPSHOT</version>
 <name>prometheus-example</name>
 <description>Demo project for Spring Boot</description>
 <properties>
  <java.version>1.8</java.version>
 </properties>
 <dependencies>
  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>
  <dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-web</artifactId>
  </dependency>

  <dependency>
   <groupId>io.micrometer</groupId>
   <artifactId>micrometer-registry-prometheus</artifactId>
   <scope>runtime</scope>
  </dependency>
 </dependencies>

 <build>
  <plugins>
   <plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
   </plugin>
  </plugins>
 </build>

</project>

application.yml

spring:
 application:
 name: prometheus-example
management:
 endpoints:
 web:
  exposure:
  include: "*"
 metrics:
 tags:
  application: ${spring.application.name}

这句别忘了：management.metrics.tags.application=${spring.application.name}

Spring BootActuator 默认的端点很多，详见

https://docs.spring.io/spring-boot/docs/2.4.3/reference/html/production-ready-features.html

启动项目，浏览器访问/actuator/prometheus 端点

配置Prometheus抓取该应用

scrape_configs:
 # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
 - job_name: 'prometheus'
 # metrics_path defaults to '/metrics'
 # scheme defaults to 'http'.
 static_configs:
 - targets: ['localhost:9090']

 - job_name: 'springboot-prometheus'
 metrics_path: '/actuator/prometheus'
 static_configs:
  - targets: ['192.168.100.93:8080']

重启服务

./prometheus --config.file=prometheus.yml

4.1. Grafana

https://grafana.com/docs/

https://grafana.com/tutorials/

下载&解压

wget https://dl.grafana.com/oss/release/grafana-7.4.3.linux-amd64.tar.gz
tar -zxvf grafana-7.4.3.linux-amd64.tar.gz

启动

./bin/grafana-server web

浏览器访问 http://localhost:3000

默认账号是 admin/admin

首次登陆后我们将密码改成admin1234

先配置一个数据源，一会儿添加仪表盘的时候要选择数据源的

Grafana官方提供了很多模板，我们可以直接使用

首先要找到我们想要的模板

比如，我们这里随便选了一个模板

可以直接将模板JSON文件下载下来导入，也可以直接输入模板ID加载，这里我们直接输入模板ID

立竿见影，马上就看到漂亮的展示界面了

我们再添加一个DashBoard （ID：12856）

到此这篇关于Prometheus + Spring Boot 应用监控的文章就介绍到这了,更多相关Prometheus + Spring Boot 应用监控内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们！

使用Prometheus+Grafana的方法监控Springboot应用教程详解

1 简介项目越做越发觉得,任何一个系统上线,运维监控都太重要了.关于Springboot微服务的监控,之前写过[Springboot]用Springboot Admin监控你的微服务应用,这个方案可以实时监控并提供告警提醒功能,但不能记录历史数据,无法查看过去1小时或过去1天等运维情况.本文介绍Prometheus + Grafana的方法监控Springboot 2.X,实现美观漂亮的数据可视化. 2 Prometheus Prometheus是一套优秀的开源的监控.报警和时间序列数据库组合
prometheus监控springboot应用简单使用介绍详解

对于springboot应用,需要以下几个步骤 springboot应用开启endpoint,添加actuator的以来和promethus的依赖 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> &
springboot2.X整合prometheus监控的实例讲解

springboot2.x暴露健康状况通过prometheus监控加入依赖  <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId>
Prometheus 入门教程之SpringBoot 实现自定义指标监控

上篇文章我们已经可以在 Grafana 上看到对应的 SpringBoot 应用信息了,通过这些信息我们可以对 SpringBoot 应用有更全面的监控.但是如果我们需要对一些业务指标做监控,我们应该怎么做呢?这篇文章就带你一步步实现一个模拟的订单业务指标监控. 假设我们有一个订单系统,我们需要监控它的实时订单总额.10 分钟内的下单失败率.请求失败数.那么我们应该怎么做呢? 添加业务监控指标在 spring-web-prometheus-demo 项目的基础上,我们添加一个 Promethe
SpringBoot+Prometheus+Grafana实现应用监控和报警的详细步骤

背景 SpringBoot的应用监控方案比较多,SpringBoot+Prometheus+Grafana是目前比较常用的方案之一.它们三者之间的关系大概如下图: 开发SpringBoot应用首先,创建一个SpringBoot项目,pom文件如下: <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</art
关于Prometheus + Spring Boot 应用监控的问题

1. Prometheus是什么 Prometheus是一个具有活跃生态系统的开源系统监控和告警工具包.一言以蔽之,它是一套开源监控解决方案. Prometheus主要特性: 多维数据模型,其中包含由指标名称和键/值对标识的时间序列数据 PromQL,一种灵活的查询语言不依赖分布式存储: 单服务器节点是自治的时间序列收集通过HTTP上的pull模型进行通过中间网关支持推送(push)时间序列通过服务发现或静态配置发现目标支持多种模式的图形和仪表盘为什么用pull(拉取)而不用push
Spring Boot Actuator监控的简单使用方法示例代码详解

Spring Boot Actuator帮助我们实现了许多中间件比如mysql.es.redis.mq等中间件的健康指示器. 通过 Spring Boot 的自动配置,这些指示器会自动生效.当这些组件有问题的时候,HealthIndicator 会返回 DOWN 或 OUT_OF_SERVICE 状态,health 端点 HTTP 响应状态码也会变为 503,我们可以以此来配置程序健康状态监控报警. 使用步骤也非常简单,这里演示的是线程池的监控.模拟线程池满了状态下将HealthInicator
spring boot actuator监控超详细教程

spring boot actuator介绍 Spring Boot包含许多其他功能,可帮助您在将应用程序推送到生产环境时监视和管理应用程序. 您可以选择使用HTTP端点或JMX来管理和监视应用程序. 审核,运行状况和指标收集也可以自动应用于您的应用程序. 总之Spring Boot Actuator就是一款可以帮助你监控系统数据的框架,其可以监控很多很多的系统数据,它有对应用系统的自省和监控的集成功能,可以查看应用配置的详细信息,如: 显示应用程序员的Health健康信息显示Info应用信息
spring boot metrics监控指标使用教程

目录 springbootmetrics是什么? 一.引入依赖二.配置启用三.独立的web服务四.全局标签设置五.自定义指标收集六.推送or拉取指标引入依赖启用push模式 spring boot metrics是什么? 针对应用监控指标暴露,spring boot有一套完整的解决方案,并且内置了好很多的指标收集器,如tomcat.jvm.cpu.kafka.DataSource.spring mvc(缺少直方图的数据)等.基于micrometer技术,几乎支持所有主流的监控服务的
教你开发脚手架集成Spring Boot Actuator监控的详细过程

目录集成引入依赖配置文件访问验证端点 Endpoints Health Info 安全高级自定义健康检查自定义metrics指标 PID PORT过程监控自定义管理端点路径自定义管理服务器端口暴露数据给Prometheus 集成引入依赖在项目的pom.xml中增加以下依赖 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-
Spring Boot如何监控SQL运行情况?

目录前言 MYSQL监控慢SQL MySLQ如果需要监控慢SQL,需要在/ect/my.cnf文件中进行如下配置: 输出结果 Druid监控慢SQL 添加Druid依赖 Yml文件中配置Druid属性监控页面特殊说明 Prometheus+grafana实现SQL监控结尾前言监控SQL是现在项目运维中必要的一部分,通过SQL监控我们能够明显的分析系统那些地方存在问题,从而有效的进行SQL优化,提升系统的性能.那么常见的SQL监控方式又那些呢? MYSQL监控慢SQL MySLQ如果需
Spring Boot Actuator监控端点小结

在Spring Boot的众多Starter POMs中有一个特殊的模块,它不同于其他模块那样大多用于开发业务功能或是连接一些其他外部资源.它完全是一个用于暴露自身信息的模块,所以很明显,它的主要作用是用于监控与管理,它就是:spring-boot-starter-actuator. spring-boot-starter-actuator模块的实现对于实施微服务的中小团队来说,可以有效地减少监控系统在采集应用指标时的开发量.当然,它也并不是万能的,有时候我们也需要对其做一些简单的扩展来帮助我们
springboot 使用Spring Boot Actuator监控应用小结

微服务的特点决定了功能模块的部署是分布式的,大部分功能模块都是运行在不同的机器上,彼此通过服务调用进行交互,前后台的业务流会经过很多个微服务的处理和传递,出现了异常如何快速定位是哪个环节出现了问题? 在这种框架下,微服务的监控显得尤为重要.本文主要结合Spring Boot Actuator,跟大家一起分享微服务Spring Boot Actuator的常见用法,方便我们在日常中对我们的微服务进行监控治理. Actuator监控 Spring Boot使用"习惯优于配置的理念",采用包
Spring Boot应用监控的实战教程

概述 Spring Boot 监控核心是 spring-boot-starter-actuator 依赖,增加依赖后, Spring Boot 会默认配置一些通用的监控,比如 jvm 监控.类加载.健康监控等. 我们之前讲过Docker容器的可视化监控,即监控容器的运行情况,包括 CPU使用率.内存占用.网络状况以及磁盘空间等等一系列信息.同样利用SpringBoot作为微服务单元的实例化技术选型时,我们不可避免的要面对的一个问题就是如何实时监控应用的运行状况数据,比如:健康度.运行指标.日志信
详解Spring Boot Admin监控服务上下线邮件通知

本文介绍了Spring Boot Admin监控服务上下线邮件通知,分享给大家,具体如下: 微服务架构下,服务的数量少则几十,多则上百,对服务的监控必不可少. 如果是以前的单体项目,启动了几个项目是固定的,可以通过第三方的监控工具对其进行监控,然后实时告警. 在微服务下,服务数量太多,并且可以随时扩展,这个时候第三方的监控功能就不适用了,我们可以通过Spring Boot Admin连接注册中心来查看服务状态,这个只能在页面查看. 很多时候更希望能够自动监控,通过邮件告警,某某服务下线了这样的功