Problem:
In our load test, we found frequent full gc caused by Metaspace is full. STW (stop the world) time is long.
Shameless fact: this issue probably will not happen in real prod since we clean browser cache every iteration in our load test tool. But it do help if we load class as less as possible.
Step-by-step troubleshooting:
1. From tomcat catalina.out, we found there are many "org.mozilla" related class is loaded:
[Loaded org.mozilla.javascript.gen.file__***_theming_less_source_map_js_fmt_***]
[Loaded org.mozilla.javascript.gen.file__***_theming_less_source_map_js_fmt_***]
Reason of we use "-all" option is "-all" option will ensure that JVM will not force gc (
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/ff3b27e6bcc2/src/share/vm/services/diagnosticCommand.cpp#l385):
void ClassHistogramDCmd::execute(DCmdSource source, TRAPS) {
VM_GC_HeapInspection heapop(output(),
!_all.value() /* request full gc if false */);
VMThread::execute(&heapop);
}
int ClassHistogramDCmd::num_arguments() {
ResourceMark rm;
ClassHistogramDCmd* dcmd = new ClassHistogramDCmd(NULL, false);
if (dcmd != NULL) {
DCmdMark mark(dcmd);
return dcmd->_dcmdparser.num_arguments();
} else {
return 0;
}
}
BTW, if we use GC.class_stats, it will force gc every time:
void ClassStatsDCmd::execute(DCmdSource source, TRAPS) {
if (!UnlockDiagnosticVMOptions) {
output()->print_cr("GC.class_stats command requires -XX:+UnlockDiagnosticVMOptions");
return;
}
VM_GC_HeapInspection heapop(output(),
true /* request_full_gc */);
heapop.set_csv_format(_csv.value());
heapop.set_print_help(_help.value());
heapop.set_print_class_stats(true);
if (_all.value()) {
if (_columns.has_value()) {
output()->print_cr("Cannot specify -all and individual columns at the same time");
return;
} else {
heapop.set_columns(NULL);
}
} else {
if (_columns.has_value()) {
heapop.set_columns(_columns.value());
} else {
heapop.set_columns(DEFAULT_COLUMNS);
}
}
VMThread::execute(&heapop);
}
3. From class histogram, we can analyze metaspace utilization in quantitative way
num #instances #bytes class name
----------------------------------------------
618: 621 64584 org.mozilla.javascript.gen.file****
4. Regarding which java class trigger this class loader, we can do a profiling and find clue from stack trace.
Follow-up:
Need check with dev team about possibility and pros/cons of caching these class data in server cache
Notes:
If we troubleshooting metaspace leak problem, we'd better force gc before we get class histogram to exclude noise as much as possible.