Mastering Java Heap Dump Analysis: An Expert’s Guide to Solving Complex Memory Problems

When we have performance issues in Java, memory problems are high on the list of possible culprits. They can cause the system to slow down, become unresponsive, suffer from intermittent hangs, and eventually crash. High memory usage can also cause cloud computing costs to soar.

Java heap dump analysis is the fastest way to get to the root cause of memory-related headaches. In this article, we’ll discuss briefly how the JVM manages memory, before looking at how a heap dump can help us troubleshoot problems quickly.

Java Heap Dump Analysis

A heap dump is a snapshot of the JVM heap at a given moment. Since the entire heap is copied to the dump file, it may be very large. Huge enterprise applications may have a heap size of many gigabytes. The dump file is in binary, so it’s impractical to analyze it manually. Fortunately, there are several excellent heap dump analysis tools available. In this article, we’ll use HeapHero, although if you’re already familiar with other tools such as Eclipse MAT, you may prefer to use them.

You may like to read this article, which describes several ways to take a heap dump from a running program. If you’re developing for Android, you will find the article How to Capture Heap Dumps in Android helpful.

Understanding the JVM Memory Model

Java runtime memory consists of several memory areas, each with its own purpose. This is illustrated in the diagram below.

Fig: JVM Memory Model

The two main subdivisions are heap memory and native memory. These are sometimes referred to as heap and non-heap. Heap memory is managed by the JVM, and its initial and maximum size can be configured using runtime arguments. Native memory is managed by the operating system, although we can also configure maximum sizes for some areas.

The heap is a shared storage space for objects created by all classes that make up the application. If local variables, in other words variables declared within a method, are Java primitives, they are stored within the stack rather than in the heap. If they are objects, they are stored in the heap, but a reference to them is stored in the stack. 

The heap is cleaned regularly by the garbage collector (GC) to remove any objects that are no longer in use. To do this, the GC begins with garbage roots. These are variables known to be still in use, and include static variables and references held in the stack. Working with these, it recursively works through the object reference tree, also known as the dominator tree, marking all objects that can be reached from the garbage roots. Any unmarked objects are cleaned from memory. See this article for more information about garbage collection.

To speed up GC, the heap is divided into smaller areas. The young generation (YG) holds newly-created objects, but if an object survives a few cycles of GC, it’s promoted to the old generation (OG). This is because it’s been proven that in most cases, Java objects ‘die young’, so the YG can be smaller and is cleaned more frequently.

Native memory is the other major subdivision. It includes:

  • The Metaspace (known as the Permgen in early versions of Java.) Class definitions are stored here, and it’s possible to set a maximum size for it using JVM arguments. If no maximum size is set, and if there is a memory leak related to classes, it can grow until the application is either killed by the operating system or hangs the device.
  • Thread space holds a stack for each active thread.
  • The code cache stores pre-compiled code for frequently-called methods.
  • The direct buffer area stores data used by fast, operating-system managed input-output. The heap stores an object holding information about the buffer, but the actual buffer is stored in this area.
  • The GC area is used internally by the garbage collector.
  • JNI is used by the Java Native Interface.
  • The miscellaneous area holds various internal memory structures used by the JVM. For example, symbol tables are stored here.

You can learn more about the JVM memory model in this video: JVM Explained in 10 Minutes.

This article concentrates on the kind of memory problems that can be diagnosed using a heap dump. In fact, memory errors can occur in any of the areas shown in the diagram, and the causes may not always be obvious from examining the heap. For specific information about different types of errors and how to solve them, see Types of Out Of Memory Error

We can see trends in memory usage by monitoring GC logs using a tool such as GCeasy. By plotting memory usage before and after GC, this tool can help us spot patterns that help diagnose memory problems. For example, if the problem is in native memory rather than in heap memory, we’ll see that heap usage is not high, even though overall memory usage is unacceptable.

Causes of Memory Issues

Memory problems are caused by one of four things:

  • Memory leaks, where the program is not releasing unused memory to the GC;
  • Memory wastage;
  • The heap is not configured correctly;
  • The device or container as a whole is short of physical memory.

To effectively solve the problem, we need to establish which of these is the cause of the issue.

Using Heap Dump Analysis to Find Root Causes

To find the root cause, we’d first take a heap dump, then submit it to a tool such as HeapHero or Eclipse MAT. The examples below are all taken from HeapHero. It’s also useful to examine the heap dump together with information taken from GC logs. In this article, the GC log examples are analyzed using GCeasy.

1. Look for ML Suggestions

HeapHero has the advantage that it uses machine learning to identify potential problems from a heap dump, and it makes suggestions at the front of the report. Quite often, we need look no further when diagnosing memory problems. The suggestions include links to more information and guidance on how to fix the problem. The image below gives an example of this.

Fig: ML Suggestion from HeapHero

Eclipse Mat includes a Leak Suspect report, which is also a good starting point.

2. Check if the Statistics Look Reasonable

Next, have a look at the statistics shown near the front of the report, taken with statistics from GCeasy, to see if they look reasonable. These include the heap size, counts of class loaders and classes, the GC root count and the object creation rate. See the two diagrams below.

Fig: Statistics Produced by HeapHero

Fig: Object Creation Rate by GCeasy

With a bit of practice,we learn to tell if they look reasonable for the type of application.

3. Establish the Type of Memory Problem

The next step is to decide whether the problem is a memory leak, memory wastage or simply an under-configured environment. To establish whether it’s a memory leak, the GC log patterns are the best diagnostic. GCeasy plots memory usage over time, with GC events marked as red triangles. Let’s look at the image below, which compares the GC pattern of a healthy application to one with a memory leak.

Fig: GCeasy Graphs Showing Healthy vs Memory Leak Patterns

In the first graph, memory usage fluctuates up and down as temporary variables are created and then released. The GC is always able to bring the memory down to a consistent level. In the second, the memory still fluctuates, but the bottom line increases gradually over time. The GC is unable to bring memory down to a consistent level, indicating a memory leak.

If there is no memory leak, then we need to see if memory is being wasted. If not, then the problem is under-configuration.  You may be interested in this article, which explains how to find the optimal configurations for the heap size. If there is not enough physical memory to increase the heap size, you would first need to add more RAM to the device or container.

Most applications, if examined carefully, waste a lot of memory. HeapHero shows a detailed breakdown of memory wastage, with lists of actual issues by category. For more information about wasted memory, read How Much Memory is My Application Wasting?.

4. Finding The Cause of Memory Leaks

In almost all cases, the memory leak will be found amongst the three or four largest objects in memory. HeapHero, therefore, produces a Largest Object report, showing all objects in descending order of memory used. We can interactively explore individual objects by clicking on the three dots indicated with the arrow in the image below.

Fig: Interactive Largest Object Report Produced by HeapHero

This brings up a menu that allows us to explore up and down the dominator tree from this object. Choosing outgoing references lets us explore the object’s children to see which of them are using a lot of space. We can also view the contents of the children, which can help with debugging. 

We can also explore incoming references to find the parents of the object. This tells us which objects hold references to it, so we know what is preventing it from being garbage collected.

To watch a demonstration of this process, see How to Analyze a Heap Dump Fast.

This step uncovers the majority of memory issues. However, if this doesn’t solve the problem, the HeapHero report contains several other useful sections.

The class histogram analyzes the objects in the heap by class, showing the number of objects in each class, the shallow heap (heap space used by the objects themselves) and retained heap (space used by the objects and their children). This is shown in the screenshot below.

Fig: Class Histogram

If we suspect a memory leak, it’s a good idea to take a second heap dump of the application, and compare the two histograms. This quickly shows us whether the number of objects for a class is increasing with time.

The GC Roots section lets us work upwards from the roots through the dominator tree. We can expand individual items to see their children.

Fig: GC Roots Report

Sometimes garbage collection can be held up by objects waiting to be finalized. This happens when a class overrides the finalize() method of java.lang.object, and the finalization code is slow to execute. This could be because it’s waiting for resources to become available. Since there is only one finalization queue, slow finalizers can cause major GC delays, and, in fact, this practice is deprecated in later versions of Java. HeapHero lists objects waiting to be finalized, as shown below.

Fig: Objects Waiting for Finalization

If we suspect rogue threads, we can explore HeapHero’s threads section. It shows an overview of what threads exist and their status, and lets us explore their stack traces. It also gives a report of threads with identical stack traces.

Fig: Thread Summary

HeapHero also includes the facility to use OQL to explore the heap dump further.

For more information on memory leaks and how to find them, see:

Conclusion

Heap dumps contain valuable information that help us diagnose memory issues quickly.  Tools such as HeapHero and Eclipse MAT let us carry out complex Java heap dump analysis to find the root cause of any problems.

In this article, we’ve looked at how to use HeapHero to troubleshoot elusive memory issues, as well as looking at the role GC logs can play to determine whether or not we have a memory leak.

Share your Thoughts!

Up ↑

Index

Discover more from HeapHero – Java & Android Heap Dump Analyzer

Subscribe now to keep reading and get access to the full archive.

Continue reading