spring boot 监控处理方案实例详解
大家都知道 spring boot整合了很多很多的第三方框架,我们这里就简单讨论和使用 性能监控和JVM监控相关的东西。其他的本文不讨论虽然有些关联,所以开篇有说需要有相关spring boot框架基础说了这么多废话,下面真正进入主题。
这里首先给大家看下整体的数据流程图,其中两条主线一条是接口或方法性能监控数据收集,还有一条是spring boot 微服务JVM相关指标数据采集,最后都汇总到InfluxDB时序数据库中在用数据展示工具Grafara进行数据展示或报警。
〇、基础服务
基础服务比较多,其中包括RabbitMQ,Eureka注册中心,influxDB,Grafara(不知道这些东西 请百度或谷歌一下了解相关知识),下面简单说下各基础服务的功能:
RabbitMQ 一款很流行的消息中间件,主要用它来收集spring boot应用监控性能相关信息,为什么是RabbitMQ而不是什么别的 kafka等等,因为测试方便性能也够用,spring boot整合的够完善。
Eureka 注册中心,一般看过或用过spring cloud相关框架的都知道spring cloud注册中心主要推荐使用Eureka!至于为什么不做过多讨论不是本文主要讨论的关注点。本文主要用来同步和获取注册到注册中心的应用的相关信息。
InfluxDB和Grafara为什么选这两个,其他方案如 ElasticSearch 、Logstash 、Kibana,ELK的组合等!原因很显然 influxDB是时序数据库数据的压缩比率比其他(ElasticSearch )好的很多(当然本人没有实际测试过都是看一些文档)。同时InfluxDB使用SQL非常类似mysql等关系型数据库入门方便,Grafara工具可预警。等等!!!!!!!!!!!
好了工具就简单介绍到这里,至于这些工具怎么部署搭建请搭建先自行找资料学习,还是因为不是本文重点介绍的内容,不深入讨论。如果有docker相关基础的童鞋可以直接下载个镜像启动起来做测试使用(本人就是使用docker启动的上面的基础应用(Eureka除外))
一、被监控的应用
这里不多说被监控应用肯定是spring boot项目但是要引用一下相关包和相关注解以及修改相关配置文件
包引用,这些包是必须引用的
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-sleuth-zipkin-stream</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-stream-rabbit</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-hystrix</artifactId> </dependency>
简单说下呢相关包的功能spring-cloud-starter-netflix-eureka-client用于注册中心使用的包,spring-cloud-starter-stream-rabbit 发送RabbitMQ相关包,spring-boot-starter-actuator发布监控相关rest接口包,
spring-cloud-starter-hystrix熔断性能监控相关包。
相关注解
@EnableHystrix//开启性能监控 @RefreshScope//刷新配置文件 与本章无关 @EnableAutoConfiguration @EnableFeignClients//RPC调用与本章无关 @RestController @SpringBootApplication public class ServerTestApplication { protected final static Logger logger = LoggerFactory.getLogger(ServerTestApplication.class); public static void main(String[] args) { SpringApplication.run(ServerTestApplication.class, args); } }
配置文件相关
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 60000 hystrix.threadpool.default.coreSize: 100 spring: application: name: spring-cloud-server2-test rabbitmq: host: 10.10.12.21 port: 5672 username: user password: password encrypt: failOnError: false server: port: 8081 eureka: instance: appname: spring-cloud-server2-test prefer-ip-address: true client: serviceUrl: defaultZone: http://IP:PORT/eureka/#注册中心地址 eureka-server-total-connections-per-host: 500 endpoints: refresh: sensitive: false metrics: sensitive: false dump: sensitive: false auditevents: sensitive: false features: sensitive: false mappings: sensitive: false trace: sensitive: false autoconfig: sensitive: false loggers: sensitive: false
简单解释一下endpoints下面相关配置,主要就是 原来这些路径是需要授权访问的,通过配置让这些路径接口不再是敏感的需要授权访问的接口这应我们就可以轻松的访问注册到注册中心的每个服务的响应的接口。这里插一句接口性能需要在方法上面加上如下类似相关注解,然后才会有相关性能数据输出
@Value("${name}") private String name; @HystrixCommand(commandProperties = { @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "20000") }, threadPoolProperties = { @HystrixProperty(name = "coreSize", value = "64") }, threadPoolKey = "test1") @GetMapping("/testpro1") public String getStringtest1(){ return name; }
好了到这里你的应用基本上就具备相关性能输出的能力了。你可以访问
如果是上图的接口 你的应用基本OK,为什么是基本因为你截图没有体现性能信息发送RabbitMQ的相关信息。这个需要看日志,加入你失败了评论区在讨论。我们先关注主线。
好的spring boot 应用就先说道这里。开始下一主题
二、性能指标数据采集
刚才访问http://IP:port/hystrix.stream这个显示出来的信息就是借口或方法性能相关信息的输出,如果上面都没有问题的话数据应该发送到了RabbitMQ上面了我们直接去RabbitMQ上面接收相关数据就可以了。
性能指标数据的采集服务主要应用以下包
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-amqp</artifactId> </dependency> <!-- https://mvnrepository.com/artifact/com.github.miwurster/spring-data-influxdb --> <dependency> <groupId>org.influxdb</groupId> <artifactId>influxdb-java</artifactId> <version>2.8</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-autoconfigure</artifactId> </dependency>
直接贴代码
package application; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; /** * * @author zyg * */ @SpringBootApplication public class RabbitMQApplication { public static void main(String[] args) { SpringApplication.run(RabbitMQApplication.class, args); } }
package application; import org.springframework.amqp.core.Binding; import org.springframework.amqp.core.BindingBuilder; import org.springframework.amqp.core.Queue; import org.springframework.amqp.core.TopicExchange; import org.springframework.amqp.rabbit.connection.CachingConnectionFactory; import org.springframework.amqp.rabbit.connection.ConnectionFactory; import org.springframework.amqp.rabbit.core.RabbitTemplate; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; /** * * @author zyg * */ @Configuration public class RabbitMQConfig { public final static String QUEUE_NAME = "spring-boot-queue"; public final static String EXCHANGE_NAME = "springCloudHystrixStream"; public final static String ROUTING_KEY = "#"; // 创建队列 @Bean public Queue queue() { return new Queue(QUEUE_NAME); } // 创建一个 topic 类型的交换器 @Bean public TopicExchange exchange() { return new TopicExchange(EXCHANGE_NAME); } // 使用路由键(routingKey)把队列(Queue)绑定到交换器(Exchange) @Bean public Binding binding(Queue queue, TopicExchange exchange) { return BindingBuilder.bind(queue).to(exchange).with(ROUTING_KEY); } @Bean public ConnectionFactory connectionFactory() { //rabbitmq IP 端口号 CachingConnectionFactory connectionFactory = new CachingConnectionFactory("IP", 5672); connectionFactory.setUsername("user"); connectionFactory.setPassword("password"); return connectionFactory; } @Bean public RabbitTemplate rabbitTemplate(ConnectionFactory connectionFactory) { return new RabbitTemplate(connectionFactory); } }
package application; import java.util.Map; import java.util.concurrent.TimeUnit; import org.influxdb.InfluxDB; import org.influxdb.InfluxDBFactory; import org.influxdb.dto.Point; import org.influxdb.dto.Point.Builder; import org.influxdb.dto.Query; import org.influxdb.dto.QueryResult; /** * * @author zyg * */ public class InfluxDBConnect { private String username;// 用户名 private String password;// 密码 private String openurl;// 连接地址 private String database;// 数据库 private InfluxDB influxDB; public InfluxDBConnect(String username, String password, String openurl, String database) { this.username = username; this.password = password; this.openurl = openurl; this.database = database; } /** 连接时序数据库;获得InfluxDB **/ public InfluxDB influxDbBuild() { if (influxDB == null) { influxDB = InfluxDBFactory.connect(openurl, username, password); influxDB.createDatabase(database); } return influxDB; } /** * 设置数据保存策略 defalut 策略名 /database 数据库名/ 30d 数据保存时限30天/ 1 副本个数为1/ 结尾DEFAULT * 表示 设为默认的策略 */ public void createRetentionPolicy() { String command = String.format("CREATE RETENTION POLICY \"%s\" ON \"%s\" DURATION %s REPLICATION %s DEFAULT", "defalut", database, "30d", 1); this.query(command); } /** * 查询 * * @param command * 查询语句 * @return */ public QueryResult query(String command) { return influxDB.query(new Query(command, database)); } /** * 插入 * * @param measurement * 表 * @param tags * 标签 * @param fields * 字段 */ public void insert(String measurement, Map<String, String> tags, Map<String, Object> fields) { Builder builder = Point.measurement(measurement); builder.time(((long)fields.get("currentTime"))*1000000, TimeUnit.NANOSECONDS); builder.tag(tags); builder.fields(fields); // influxDB.write(database, "", builder.build()); } /** * 删除 * * @param command * 删除语句 * @return 返回错误信息 */ public String deleteMeasurementData(String command) { QueryResult result = influxDB.query(new Query(command, database)); return result.getError(); } /** * 创建数据库 * * @param dbName */ public void createDB(String dbName) { influxDB.createDatabase(dbName); } /** * 删除数据库 * * @param dbName */ public void deleteDB(String dbName) { influxDB.deleteDatabase(dbName); } public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } public String getOpenurl() { return openurl; } public void setOpenurl(String openurl) { this.openurl = openurl; } public void setDatabase(String database) { this.database = database; } }
package application; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; /** * * @author zyg * */ @Configuration public class InfluxDBConfiguration { private String username = "admin";//用户名 private String password = "admin";//密码 private String openurl = "http://IP:8086";//InfluxDB连接地址 private String database = "test_db";//数据库 @Bean public InfluxDBConnect getInfluxDBConnect(){ InfluxDBConnect influxDB = new InfluxDBConnect(username, password, openurl, database); influxDB.influxDbBuild(); influxDB.createRetentionPolicy(); return influxDB; } }
package application; import java.io.IOException; import java.util.HashMap; import java.util.List; import java.util.Map; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.amqp.rabbit.annotation.RabbitListener; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import org.springframework.util.StringUtils; import com.fasterxml.jackson.databind.ObjectMapper; /** * * @author zyg * */ @Component public class Consumer { protected final static Logger logger = LoggerFactory.getLogger(Consumer.class); private ObjectMapper objectMapper = new ObjectMapper(); @Autowired private InfluxDBConnect influxDB; @RabbitListener(queues = RabbitMQConfig.QUEUE_NAME) public void sendToSubject(org.springframework.amqp.core.Message message) { String payload = new String(message.getBody()); logger.info(payload); if (payload.startsWith("\"")) { // Legacy payload from an Angel client payload = payload.substring(1, payload.length() - 1); payload = payload.replace("\\\"", "\""); } try { if (payload.startsWith("[")) { @SuppressWarnings("unchecked") List<Map<String, Object>> list = this.objectMapper.readValue(payload, List.class); for (Map<String, Object> map : list) { sendMap(map); } } else { @SuppressWarnings("unchecked") Map<String, Object> map = this.objectMapper.readValue(payload, Map.class); sendMap(map); } } catch (IOException ex) { logger.error("Error receiving hystrix stream payload: " + payload, ex); } } private void sendMap(Map<String, Object> map) { Map<String, Object> data = getPayloadData(map); data.remove("latencyExecute"); data.remove("latencyTotal"); Map<String, String> tags = new HashMap<String, String>(); tags.put("type", data.get("type").toString()); tags.put("name", data.get("name").toString()); tags.put("instanceId", data.get("instanceId").toString()); //tags.put("group", data.get("group").toString()); influxDB.insert("testaaa", tags, data); // for (String key : data.keySet()) { // logger.info("{}:{}",key,data.get(key)); // } } public static Map<String, Object> getPayloadData(Map<String, Object> jsonMap) { @SuppressWarnings("unchecked") Map<String, Object> origin = (Map<String, Object>) jsonMap.get("origin"); String instanceId = null; if (origin.containsKey("id")) { instanceId = origin.get("host") + ":" + origin.get("id").toString(); } if (!StringUtils.hasText(instanceId)) { // TODO: instanceid template instanceId = origin.get("serviceId") + ":" + origin.get("host") + ":" + origin.get("port"); } @SuppressWarnings("unchecked") Map<String, Object> data = (Map<String, Object>) jsonMap.get("data"); data.put("instanceId", instanceId); return data; } }
这里不多说,就是接收RabbitMQ信息然后保存到InfluxDB数据库中。
三、JVM相关数据采集
JVM相关数据采集非常简单主要思想就是定时轮训被监控服务的接口地址然后把返回信息插入到InfluxDB中
服务引用的包不多说这个服务是需要注册到注册中心Eureka中的因为需要获取所有服务的监控信息。
插入InfluxDB代码和上面基本类似只不过多了一个批量插入方法
package com.zjs.collection; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.cloud.netflix.eureka.EnableEurekaClient; /** * * @author zyg * */ @EnableEurekaClient @SpringBootApplication public class ApplictionCollection { public static void main(String[] args) { SpringApplication.run(ApplictionCollection.class, args); } }
/** * 批量插入 * * @param measurement * 表 * @param tags * 标签 * @param fields * 字段 */ public void batchinsert(String measurement, Map<String, String> tags, List<Map<String, Object>> fieldslist) { org.influxdb.dto.BatchPoints.Builder batchbuilder=BatchPoints.database(database); for (Map<String, Object> map : fieldslist) { Builder builder = Point.measurement(measurement); tags.put("instanceId", map.get("instanceId").toString()); builder.time((long)map.get("currentTime"), TimeUnit.NANOSECONDS); builder.tag(tags); builder.fields(map); batchbuilder.point(builder.build()); } System.out.println(batchbuilder.build().toString()); influxDB.write(batchbuilder.build()); }
package com.zjs.collection; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.LinkedBlockingQueue; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.cloud.client.ServiceInstance; import org.springframework.cloud.client.discovery.DiscoveryClient; import org.springframework.context.annotation.Bean; import org.springframework.http.client.SimpleClientHttpRequestFactory; import org.springframework.scheduling.annotation.EnableScheduling; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component; import org.springframework.web.client.RestTemplate; /** * 获取微服务实例 * * @author zyg * */ @Component @SpringBootApplication @EnableScheduling public class MicServerInstanceInfoHandle { protected final static Logger logger = LoggerFactory.getLogger(MicServerInstanceInfoHandle.class); final String pathtail = "/metrics/mem.*|heap.*|threads.*|gc.*|nonheap.*|classes.*"; Map<String, String> tags; ThreadPoolExecutor threadpool; @Autowired DiscoveryClient dc; @Autowired RestTemplate restTemplate; final static LinkedBlockingQueue<Map<String, Object>> jsonMetrics = new LinkedBlockingQueue<>(1000); /** * 初始化实例 可以吧相关参数设置到配置文件 */ public MicServerInstanceInfoHandle() { tags = new HashMap<String, String>(); threadpool = new ThreadPoolExecutor(4, 20, 60, TimeUnit.SECONDS, new ArrayBlockingQueue<>(100)); } @Autowired private InfluxDBConnect influxDB; /** * metrics数据获取 */ @Scheduled(fixedDelay = 2000) public void metricsDataObtain() { logger.info("开始获取metrics数据"); List<String> servicelist = dc.getServices(); for (String str : servicelist) { List<ServiceInstance> silist = dc.getInstances(str); for (ServiceInstance serviceInstance : silist) { threadpool.execute(new MetricsHandle(serviceInstance)); } } } /** * 将数据插入到influxdb数据库 */ @Scheduled(fixedDelay = 5000) public void metricsDataToInfluxDB() { logger.info("开始批量将metrics数据insert-influxdb"); ArrayList<Map<String, Object>> metricslist = new ArrayList<>(); MicServerInstanceInfoHandle.jsonMetrics.drainTo(metricslist); if (!metricslist.isEmpty()) { logger.info("批量插入条数:{}", metricslist.size()); influxDB.batchinsert("metrics", tags, metricslist); } logger.info("结束批量metrics数据insert"); } @Bean public RestTemplate getRestTemplate() { RestTemplate restTemplate = new RestTemplate(); SimpleClientHttpRequestFactory achrf = new SimpleClientHttpRequestFactory(); achrf.setConnectTimeout(10000); achrf.setReadTimeout(10000); restTemplate.setRequestFactory(achrf); return restTemplate; } class MetricsHandle extends Thread { private ServiceInstance serviceInstanc; public MetricsHandle(ServiceInstance serviceInstance){ serviceInstanc=serviceInstance; } @Override public void run() { try { logger.info("获取 {}:{}:{} 应用metrics数据",serviceInstanc.getServiceId(),serviceInstanc.getHost(),serviceInstanc.getPort()); @SuppressWarnings("unchecked") Map<String, Object> mapdata = restTemplate .getForObject(serviceInstanc.getUri().toString() + pathtail, Map.class); mapdata.put("instanceId", serviceInstanc.getServiceId() + ":" + serviceInstanc.getHost() + ":" + serviceInstanc.getPort()); mapdata.put("type", "metrics"); mapdata.put("currentTime", System.currentTimeMillis() * 1000000); MicServerInstanceInfoHandle.jsonMetrics.add(mapdata); } catch (Exception e) { logger.error("instanceId:{},host:{},port:{},path:{},exception:{}", serviceInstanc.getServiceId(), serviceInstanc.getHost(), serviceInstanc.getPort(), serviceInstanc.getUri(), e.getMessage()); } } } }
这里简单解释一下这句代码 final String pathtail = "/metrics/mem.*|heap.*|threads.*|gc.*|nonheap.*|classes.*";
,metrics这个路径下的信息很多但是我们不是都需要所以我们需要有选择的获取这样节省流量和时间。上面关键类MicServerInstanceInfoHandle做了一个多线程访问主要应对注册中心有成百上千个服务的时候单线程可能轮序不过来,同时做了一个队列缓冲,批量插入到InfluxDB。
四、结果展示
如果你数据采集成功了就可以绘制出来上面的图形下面是对应的sql
SELECT mean("rollingCountFallbackSuccess"), mean("rollingCountSuccess") FROM "testaaa" WHERE ("instanceId" = 'IP:spring-cloud-server1-test:8082' AND "type" = 'HystrixCommand') AND $timeFilter GROUP BY time($__interval) fill(null) SELECT mean("currentPoolSize") FROM "testaaa" WHERE ("type" = 'HystrixThreadPool' AND "instanceId" = '10.10.12.51:spring-cloud-server1-test:8082') AND $timeFilter GROUP BY time($__interval) fill(null) SELECT "heap", "heap.committed", "heap.used", "mem", "mem.free", "nonheap", "nonheap.committed", "nonheap.used" FROM "metrics" WHERE ("instanceId" = 'SPRING-CLOUD-SERVER1-TEST:10.10.12.51:8082') AND $timeFilter
好了到这里就基本结束了。
五、优化及设想
上面的基础服务肯定都是需要高可用的,毋庸置疑都是需要学习的。如果有时间我也会向大家一一介绍,大家亦可以去搜索相关资料查看!
可能有人问有一个叫telegraf的小插件直接就能收集相关数据进行聚合结果监控,
其实我之前也是使用的telegraf这个小工具但是发现一个问题,
就是每次被监控的应用重启的时候相关字段名就会变,
因为他采集使用的是类实例的名字作为字段名,这应我们会很不方便,每次重启应用我们都要重新设置sql语句这样非常不友好,
再次感觉收集数据编码难度不大所以自己就写了收集数据的代码!如果有哪位大神对telegraf比较了解可以解决上面我说的问题记得给我留言哦!在这里先感谢!
有些地方是需要优化的,比如一些IP端口什么的都是可以放到配置文件里面的。
六、总结
从spring boot到现在短短的2、3年时间就迅速变得火爆,知识体系也变得完善,开发成本越来越低,
所以普及程度就越来越高,微服务虽然很好但是我们也要很好的善于运用,监控就是重要的一环,
试想一下你的机房运行着成千上万的服务,稳定运行和及时发现有问题的服务是多么重要的一件事情!