java 根据线程统计CPU

2021-08-10  本文已影响0人  ShootHzj

最近工作中碰到这样一个需求,统计每个线程的cpu占比来计算代码变化引起的线程级别CPU波动。我最终实现的效果是每分钟采集一次每个线程的cpu占比并输出到Prometheus,分享一下流程:

设计思路

java的ThreadMXBean可以获取每个线程CPU执行的nanoTime,那么可以以这个为基础,除以中间系统经过的纳秒数,就获得了该线程的CPU占比

编码

首先,我们定义一个结构体,用来存放一个线程上次统计时的纳秒数和当时的系统纳秒数

import lombok.Data;

/**
 * @author hezhangjian
 */
@Data
public class ThreadMetricsAux {

    private long usedNanoTime;

    private long lastNanoTime;

    public ThreadMetricsAux() {
    }

    public ThreadMetricsAux(long usedNanoTime, long lastNanoTime) {
        this.usedNanoTime = usedNanoTime;
        this.lastNanoTime = lastNanoTime;
    }
    
}

然后我们在SpringBoot中定义一个定时任务,它将定时地统计计算每个线程的CPU信息,并输出到MeterRegistry,当你调用SpringActuator的接口时,你将能获取到这个指标。

import com.google.common.util.concurrent.AtomicDouble;
import io.micrometer.core.instrument.Meter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tags;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;
import java.util.HashMap;

/**
 * @author hezhangjian
 */
@Slf4j
@Service
public class ThreadMetricService {

    @Autowired
    private MeterRegistry meterRegistry;

    private final ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();

    private final HashMap<Long, ThreadMetricsAux> map = new HashMap<>();

    private final HashMap<Meter.Id, AtomicDouble> dynamicGauges = new HashMap<>();

    /**
     * one minutes
     */
    @Scheduled(cron = "0 * * * * ?")
    public void schedule() {
        final long[] allThreadIds = threadBean.getAllThreadIds();
        for (long threadId : allThreadIds) {
            final ThreadInfo threadInfo = threadBean.getThreadInfo(threadId);
            if (threadInfo == null) {
                continue;
            }
            final long threadNanoTime = getThreadCPUTime(threadId);
            if (threadNanoTime == 0) {
                // 如果threadNanoTime为0,则识别为异常数据,不处理,并清理历史数据
                map.remove(threadId);
            }
            final long nanoTime = System.nanoTime();
            ThreadMetricsAux oldMetrics = map.get(threadId);
            // 判断是否有历史的metrics信息
            if (oldMetrics != null) {
                // 如果有,则计算CPU信息并上报
                double percent = (double) (threadNanoTime - oldMetrics.getUsedNanoTime()) / (double) (nanoTime - oldMetrics.getLastNanoTime());
                handleDynamicGauge("jvm.threads.cpu", "threadName", threadInfo.getThreadName(), percent);
            }
            map.put(threadId, new ThreadMetricsAux(threadNanoTime, nanoTime));
        }
    }

    // meter Gauge相关代码
    private void handleDynamicGauge(String meterName, String labelKey, String labelValue, double snapshot) {
        Meter.Id id = new Meter.Id(meterName, Tags.of(labelKey, labelValue), null, null, Meter.Type.GAUGE);

        dynamicGauges.compute(id, (key, current) -> {
            if (current == null) {
                AtomicDouble initialValue = new AtomicDouble(snapshot);
                meterRegistry.gauge(key.getName(), key.getTags(), initialValue);
                return initialValue;
            } else {
                current.set(snapshot);
                return current;
            }
        });
    }

    long getThreadCPUTime(long threadId) {
        long time = threadBean.getThreadCpuTime(threadId);
        /* thread of the specified ID is not alive or does not exist */
        return time == -1 ? 0 : time;
    }

}

其他配置

依赖配置

pom文件中

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
        </dependency>

Prometheus接口配置

application.yaml

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus

效果

通过curl命令调用curl localhost:20001/actuator/prometheus|grep cpu

jvm_threads_cpu{threadName="RMI Scheduler(0)",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-10",} 0.0
jvm_threads_cpu{threadName="Signal Dispatcher",} 0.0
jvm_threads_cpu{threadName="Common-Cleaner",} 3.1664628758074733E-7
jvm_threads_cpu{threadName="http-nio-20001-Poller",} 7.772143763853949E-5
jvm_threads_cpu{threadName="http-nio-20001-Acceptor",} 8.586978352515361E-5
jvm_threads_cpu{threadName="DestroyJavaVM",} 0.0
jvm_threads_cpu{threadName="Monitor Ctrl-Break",} 0.0
jvm_threads_cpu{threadName="AsyncHttpClient-timer-8-1",} 2.524386571545477E-4
jvm_threads_cpu{threadName="Attach Listener",} 0.0
jvm_threads_cpu{threadName="scheduling-1",} 1.2269694160981585E-4
jvm_threads_cpu{threadName="container-0",} 1.999795692406262E-6
jvm_threads_cpu{threadName="http-nio-20001-exec-9",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-7",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-8",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-5",} 0.0
jvm_threads_cpu{threadName="Notification Thread",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-6",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-3",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-4",} 0.0
jvm_threads_cpu{threadName="Reference Handler",} 0.0
jvm_threads_cpu{threadName="http-nio-20001-exec-1",} 0.0012674719289349648
jvm_threads_cpu{threadName="http-nio-20001-exec-2",} 6.542541277148053E-5
jvm_threads_cpu{threadName="RMI TCP Connection(idle)",} 1.3998786340454562E-6
jvm_threads_cpu{threadName="Finalizer",} 0.0
jvm_threads_cpu{threadName="Catalina-utility-2",} 7.920883054498174E-5
jvm_threads_cpu{threadName="RMI TCP Accept-0",} 0.0
jvm_threads_cpu{threadName="Catalina-utility-1",} 6.80101662787773E-5
上一篇下一篇

猜你喜欢

热点阅读