使用JDK内置工具排查应用故障
0条评论JDK提供的许多内置的工具可以帮助我们监控应用,排查故障,本文用一个示例程序模拟死锁,CPU占用和内存泄漏的情况,然后展示使用工具排查的步骤思路
示例程序
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
import java.util.ArrayList;
import java.util.List;
/**
* 模拟死锁,内存,CPU占用
* <p>
* 使用jdk工具排查
* <p>
* -Xms32m -Xmx32m
*/
public class JdkTools {
private final static Object lock1 = new JdkTools();
private final static Object lock2 = new JdkTools();
private final static List<char[]> chars = new ArrayList<>();
private final static List<DemoObject> objects = new ArrayList<>();
private static void acquire1() {
synchronized (lock1) {
System.out.println(Thread.currentThread().getId() + "-acquire-lock1");
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
e.printStackTrace();
}
synchronized (lock2) {
System.out.println(Thread.currentThread().getId() + "-acquire-lock2");
}
}
}
private static void acquire2() {
synchronized (lock2) {
System.out.println(Thread.currentThread().getId() + "-acquire-lock2");
synchronized (lock1) {
System.out.println(Thread.currentThread().getId() + "-acquire-lock1");
}
}
}
private static void deadLock() {
new Thread(JdkTools::acquire1, "JdkTools.deadLock-thread1").start();
new Thread(JdkTools::acquire2, "JdkTools.deadLock-thread2").start();
}
private static void consumeCPU() {
new Thread(() -> {
int i = 0;
while (true) {
i++;
}
}, "JdkTools-consumeCPU-thread").start();
}
private static void outOfMemory() {
int n = 0;
while (true) {
try {
Thread t = new Thread(() -> {
int size = 1024 * 1024;
while (true) {
chars.add(new char[size]);
for (int i = 0; i < size; i++) {
objects.add(new DemoObject());
}
}
}, "JdkTools-outOfMemory-thread-" + n++);
t.start();
t.join();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
Thread.sleep(1000 * 20L);
} catch (InterruptedException e) {
e.printStackTrace();
}
chars.clear();
objects.clear();
}
}
}
public static void main(String[] args) {
MemoryMXBean memoryMBean = ManagementFactory.getMemoryMXBean();
MemoryUsage usage = memoryMBean.getHeapMemoryUsage();
int maxSize = 1024 * 1024 * 32;
if (usage.getInit() > maxSize || usage.getMax() > maxSize) {
System.out.println("Use: -Xms32m -Xmx32m");
return;
}
deadLock();
consumeCPU();
outOfMemory();
}
}
class DemoObject {
int a = 10086;
long[] longs = new long[1024];
}
运行程序,输出:
10-acquire-lock1
11-acquire-lock2
Exception in thread "JdkTools-outOfMemory-thread-0" java.lang.OutOfMemoryError: Java heap space
at DemoObject.<init>(JdkTools.java:111)
at JdkTools.lambda$outOfMemory$1(JdkTools.java:73)
at JdkTools$$Lambda$4/142257191.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
这里我们忽略输出日志,仅使用jdk工具来排查问题。
首先我们使用 jps -l 查看java进程
l表示输出程序main方法所在类的全限定名,v打印jvm变量,m打印main方法参数,另外q参数可以不打印类名
5826 org.jetbrains.idea.maven.server.RemoteMavenServer
5172 com.intellij.idea.Main
130822 org.jetbrains.kotlin.daemon.KotlinCompileDaemon
68376 com.sun.tools.hat.Main
130941 org.jetbrains.jps.cmdline.Launcher
73836 JdkTools
5197 com.intellij.idea.Main
我们可以知道JdkTools类运行的进程号是73836
死锁
使用 jstack 73836 可查看进程73836所有的线程栈,如果检测到死锁末尾会有相应输出:
Found one Java-level deadlock:
=============================
"JdkTools.deadLock-thread2":
waiting to lock monitor 0x00007fc8a40062c8 (object 0x00000000fe042eb0, a JdkTools),
which is held by "JdkTools.deadLock-thread1"
"JdkTools.deadLock-thread1":
waiting to lock monitor 0x00007fc8a4004d78 (object 0x00000000fe042ec0, a JdkTools),
which is held by "JdkTools.deadLock-thread2"
Java stack information for the threads listed above:
===================================================
"JdkTools.deadLock-thread2":
at JdkTools.acquire2(JdkTools.java:42)
- waiting to lock <0x00000000fe042eb0> (a JdkTools)
- locked <0x00000000fe042ec0> (a JdkTools)
at JdkTools$$Lambda$2/303563356.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"JdkTools.deadLock-thread1":
at JdkTools.acquire1(JdkTools.java:33)
- waiting to lock <0x00000000fe042ec0> (a JdkTools)
- locked <0x00000000fe042eb0> (a JdkTools)
at JdkTools$$Lambda$1/471910020.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Found 1 deadlock.
可以看出,线程JdkTools.deadLock-thread2获取了一个类型为JdkTools,地址为0x00000000fe042ec0的锁,等待一个类型为JdkTools,地址为0x00000000fe042eb0的锁,目前运行的代码行为JdkTools.java:42;
而线程JdkTools.deadLock-thread1正好相反,获取了一个类型为JdkTools,地址为0x00000000fe042eb0的锁,等待一个类型为JdkTools,地址为0x00000000fe042ec0的锁,目前运行的代码行为JdkTools.java:33,于是发生了死锁。
查找CPU占用率高的线程
线程是调度的基本单位,进程是资源分配的基本单位,线程在linux中也叫轻量级进程,使用top的H参数可以以线程的维度查看系统资源占用情况
使用 top -Hp 73836 可以看到在进程73836创建的线程中,占用CPU最高的是线程id是73858。同时我们这里也可以看出,CPU占用不同,但是内存占用都是一样的,间接反映了内存是以进程为单位分配的
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
73858 zeng 20 0 3029304 67296 16220 R 97.7 1.1 30:01.47 java
73838 zeng 20 0 3029304 67296 16220 S 0.7 1.1 0:01.27 java
73841 zeng 20 0 3029304 67296 16220 S 0.7 1.1 0:01.32 java
73845 zeng 20 0 3029304 67296 16220 S 0.7 1.1 0:01.28 java
73840 zeng 20 0 3029304 67296 16220 S 0.3 1.1 0:01.27 java
...
jstack可以获取当前所有的线程栈,其中记录了每个线程的id,所以我们可以用这个线程id结合jstack匹配,找到对应线程栈,不过jstack记录的是16进制的我们需要把它转换为16进制
jstack 73836 | grep -A 10 `printf '%x' 73858`
"JdkTools-consumeCPU-thread" #12 prio=5 os_prio=0 tid=0x00007fc8ec192000 nid=0x12082 runnable [0x00007fc8cfffe000]
java.lang.Thread.State: RUNNABLE
at JdkTools.lambda$consumeCPU$0(JdkTools.java:59)
at JdkTools$$Lambda$3/135721597.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"JdkTools.deadLock-thread2" #11 prio=5 os_prio=0 tid=0x00007fc8ec190800 nid=0x12081 waiting for monitor entry [0x00007fc8d41de000]
java.lang.Thread.State: BLOCKED (on object monitor)
at JdkTools.acquire2(JdkTools.java:42)
- waiting to lock <0x00000000fe042eb0> (a JdkTools)
- locked <0x00000000fe042ec0> (a JdkTools)
于是我们知道,是JdkTools-consumeCPU-thread线程长时间占用CPU,目前执行到JdkTools.java:59
排查内存泄漏
线程间是共享堆内存的,线程结束了,其对应的工作内存会释放,栈上引用的堆上的内存由于引用不可达,也会被JVM自动回收。
一般出现内存泄漏是线程不结束,一些永远无法获取到的对象绑定到线程上,如ThreadLocal,或者静态字段引用的对象等。
可以认为内存泄漏都是业务代码逻辑造成的,一些可以不持有的对象却一直持有导致JVM无法垃圾回收
如果发生了内存泄漏,我们可以找到占用内存多的对象,然后分析哪些对象持有了这个对象的引用,最终定位问题
我们使用jmap和jhat结合分析内存泄漏情况:
- 将堆栈dump到文件 jmap -dump:live,format=b,file=dump.dat 73836
- 使用jhat分析dump文件,通过http查看分析
jhat dump.dat
Reading from dump.dat... Dump file created Thu Nov 07 17:43:42 CST 2019 Snapshot read, resolving... Resolving 14123 objects... Chasing references, expect 2 dots.. Eliminating duplicate references.. Snapshot resolved. Started HTTP server on port 7000 Server is ready.
访问 http://127.0.0.1:7000 即可进程追踪引用关系,查看Show heap histogram,可以看到 类-实例数-占用字节数 的列表,默认按照 占用字节数 排序
Class Instance Count Total Size
class [J 2876 23589856
class [C 1410 2210504
class [Ljava.lang.Object; 777 88456
class java.lang.Class 661 60812
class DemoObject 2874 34488
class [B 10 25214
class java.lang.String 1397 16764
class[J 表示long的数组类型,有2876个实例,点击进入
class [J
...
References to this object:
[J@0xff08a320 (8208 bytes) : ??
[J@0xfe8d3318 (8208 bytes) : ??
[J@0xfe752748 (8208 bytes) : ??
...
在 References to this object(引用该对象的对象,可以理解为每个实例都引用了对应类的对象,数组也不例外): 下方可以看到该类型对应的所有实例,点击0xff08a320进入
Object at 0xff08a320
References to this object:
DemoObject@0xfef2e8f0 (28 bytes) : field longs
可以看到DemoObject的longs字段引用了这个long数组,点击DemoObject@0xfef2e8f0,看看谁引用了DemoObject
Object at 0xfef2e8f0
instance of DemoObject@0xfef2e8f0 (28 bytes)
Class:
class DemoObject
Instance data members:
a (I) : 10086
longs (L) : [J@0xff08a320 (8208 bytes)
References to this object:
[Ljava.lang.Object;@0xfe068f38 (33328 bytes) : Element 1797 of [Ljava.lang.Object;@0xfe068f38
...
从上面可以得知0xfe068f38这个对象数组引用了DemoObject,继续点击0xfe068f38对象数组,查看谁引用了它
Object at 0xfe068f38
Array of 4164 objects
0 : DemoObject@0xfe06ff90 (28 bytes)
1 : DemoObject@0xfe06ff78 (28 bytes)
2 : DemoObject@0xfe06ff60 (28 bytes)
3 : DemoObject@0xfe06ff48 (28 bytes)
4 : DemoObject@0xfe06ff30 (28 bytes)
5 : DemoObject@0xfe06ff18 (28 bytes)
...
References to this object:
java.util.ArrayList@0xfe025b68 (32 bytes) : field elementData
由于这个对象数组有4164个元素,我们需要滑到最下面,java.util.ArrayList@0xfe025b68这个对象持有对象数组,有点眉目了,我们继续点击java.util.ArrayList@0xfe025b68
Object at 0xfe025b68
instance of java.util.ArrayList@0xfe025b68 (32 bytes)
Class:
class java.util.ArrayList
Instance data members:
elementData (L) : [Ljava.lang.Object;@0xfe068f38 (33328 bytes)
modCount (I) : 541773
size (I) : 2874
References to this object:
class JdkTools (108 bytes) : static field objects
查看最后一行,class JdkTools (108 bytes) : static field objects,到这里已经明确了,是JdkTools的静态变量objects导致OOM,接下来我们就可以从业务层分析原因,为什么会占用如此高的内存了
其他
参考文章和扩展阅读:
java命令--jstack 工具
https://www.cnblogs.com/kongzhongqijing/articles/3630264.html
使用Java监控工具出现 Can't attach to the process
https://www.cnblogs.com/wscit/p/6803354.html
JVM性能调优监控命令jps、jinfo、jstat、jmap+jhat、jstack使用详解
https://www.cnblogs.com/barrywxx/p/8576390.html
JDK内置工具使用(jps、jstack、jmap、jstat)
https://www.cnblogs.com/heishuichenzhou/p/10609085.html
JVM调优总结 -Xms -Xmx -Xmn -Xss
https://www.cnblogs.com/ceshi2016/p/8447989.html
java动态获取jvm参数
https://blog.csdn.net/liudezhicsdn/article/details/51058504
Java一个对象占用多少字节
https://www.cnblogs.com/zhi-leaf/p/11589711.html
JNI基础:JNI数据类型和类型描述符
https://blog.csdn.net/afei__/article/details/80899758
jmap命令详解
https://www.cnblogs.com/xingzc/p/5778014.html
Linux 常用命令 | top 详解
https://www.cnblogs.com/xiaote/p/10976362.html
Why do -Xmx and Runtime.maxMemory not agree
https://stackoverflow.com/questions/23701207/why-do-xmx-and-runtime-maxmemory-not-agree