Wednesday, September 19, 2018

Analyze Metaspace Utilization in quantitative way

Problem:

In our load test, we found frequent full gc caused by Metaspace is full. STW (stop the world) time is long. 
Shameless fact: this issue probably will not happen in real prod since we clean browser cache every iteration in our load test tool. But it do help if we load class as less as possible.


Step-by-step troubleshooting:

1. From tomcat catalina.out, we found there are many "org.mozilla" related class is loaded:
[Loaded org.mozilla.javascript.gen.file__***_theming_less_source_map_js_fmt_***]
[Loaded org.mozilla.javascript.gen.file__***_theming_less_source_map_js_fmt_***]
2. Inspired by https://www.javacodegeeks.com/2016/03/jcmd-one-jdk-command-line-tool-rule.html, we use jcmd <tomcat pid> GC.class_histogram -all to get class histogram.
Reason of we use "-all" option is "-all" option will ensure that JVM will not force gc (http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/ff3b27e6bcc2/src/share/vm/services/diagnosticCommand.cpp#l385):

void ClassHistogramDCmd::execute(DCmdSource source, TRAPS) {
  VM_GC_HeapInspection heapop(output(),
                              !_all.value() /* request full gc if false */);
  VMThread::execute(&heapop);
}


int ClassHistogramDCmd::num_arguments() {

  ResourceMark rm;
  ClassHistogramDCmd* dcmd = new ClassHistogramDCmd(NULL, false);
  if (dcmd != NULL) {
    DCmdMark mark(dcmd);
    return dcmd->_dcmdparser.num_arguments();
  } else {
    return 0;
  }
}

BTW, if we use GC.class_stats, it will force gc every time:

void ClassStatsDCmd::execute(DCmdSource source, TRAPS) {
  if (!UnlockDiagnosticVMOptions) {
    output()->print_cr("GC.class_stats command requires -XX:+UnlockDiagnosticVMOptions");
    return;
  }

  VM_GC_HeapInspection heapop(output(),

                              true /* request_full_gc */);
  heapop.set_csv_format(_csv.value());
  heapop.set_print_help(_help.value());
  heapop.set_print_class_stats(true);
  if (_all.value()) {
    if (_columns.has_value()) {
      output()->print_cr("Cannot specify -all and individual columns at the same time");
      return;
    } else {
      heapop.set_columns(NULL);
    }
  } else {
    if (_columns.has_value()) {
      heapop.set_columns(_columns.value());
    } else {
      heapop.set_columns(DEFAULT_COLUMNS);
    }
  }
  VMThread::execute(&heapop);
}

3. From class histogram, we can analyze metaspace utilization in quantitative way
 num     #instances         #bytes  class name
----------------------------------------------
 618:           621          64584  org.mozilla.javascript.gen.file****
4. Regarding which java class trigger this class loader, we can do a profiling and find clue from stack trace.


Follow-up:

Need check with dev team about possibility and pros/cons of caching these class data in server cache

Notes:

If we troubleshooting metaspace leak problem, we'd better force gc before we get class histogram to exclude noise as much as possible.

No comments:

Post a Comment