Memory leaks can cause havoc in production. The system runs fine for a while, then performance degrades.
The classic advice is, ‘Look at the memory analyzer’s leak suspect report.’ In many cases, this is good enough: it leads us straight to the problem, we fix it, and everything’s fine.
Often, though, we have to go much deeper to find and fix the root cause. In this article, we’ll look at how we can track down elusive memory leaks that aren’t instantly pinpointed by the leak suspect report.
Memory Analyzer Leak Suspects Report: What Is It, and Why Might it Not be Enough?
Heap dump analyzers such as HeapHero and Eclipse MAT begin by identifying leak suspects.
MAT shows an interactive chart, as shown below:

Fig: Eclipse MAT Leak Suspects Report
This chart identifies the largest objects, since leaking objects can almost always be found amongst the three or four biggest memory hogs. From there, we can explore the dominator tree to find the parents and children of these objects.
HeapHero goes a step further: it prefixes its memory analysis report with a list of any potential problems it finds in the heap dump.

Fig: HeapHero’s Interactive Problem Report
Conveniently, it lets us view the stack trace so we can see exactly where the object is used. It also lists the largest objects, and allows us to interactively explore the dominator tree.

Fig: HeapHero’s Largest Object Report
For a demonstration of how to explore the dominator tree to investigate leak suspects, watch this video: How to Analyze Heap Dumps Fast.
Often, this is all we need to do to find the leak. Let’s look at some reasons why the leak suspects report may not be enough.
- It’s possible that the problem is not actually a memory leak at all. Memory issues can also be caused by memory wastage. The other common cause is that the heap is not optimally configured.
- The leak may be in native memory rather than the heap. For a breakdown of the areas that make up the JVM model, see JVM in 10 Minutes.
- The leak may not be obvious from exploring the dominator tree of the largest objects. This is likely to happen if the leak is caused by a proliferation of identical objects, each of which may be quite small.
Memory Analysis: Useful Tools
Next let’s look at some of the tools we may need for memory analysis.
- The JDK includes several useful tools, which we can use for taking heap dumps, tracking class loading, analyzing native memory and more. These include jcmd, jstat and jmap, all of which are documented on the Oracle website.
- A heap dump analyzer. In this article, we’ll be using HeapHero. There are several other good tools available, including Eclipse MAT and platform-specific tools.
- A Garbage Collector (GC) log analyzer. The sample reports in this article were produced by GCeasy.
- A Java profiler can be useful. In most cases, a simple tool such as VisualVM is adequate, especially if the VisualGC add-on is enabled. Older versions of the JDK include this tool as JVisualVM. There are also more complex and powerful profilers available.
Effective Memory Analysis for Elusive Memory Leaks
First, we need to establish whether it’s actually a leak, and if so, whether it affects the heap or native memory. The easiest way to do this is to analyze the GC logs, looking for memory usage patterns. It’s always worth enabling GC logging in production, since it uses very little overhead. If GC logging wasn’t enabled, and it’s not convenient to restart the JVM with logging, VisualVM’s add-on VisualGC also lets us look at GC behavior.
So, what should we look for? The graphs below are taken from GCeasy reports, and show memory usage over time, with GC events indicated as red triangles.
Let’s first look at a healthy GC pattern.

Fig: Healthy GC Pattern
In this pattern, memory may go up, for example when transactions are being processed, but the GC is consistently able to bring memory down to a similar level, indicated by the dotted line.
If the memory usage is consistently high, and the GC is running more frequently, it’s likely that the heap is under-configured.
A memory leak in the heap, on the other hand, may look like this:

The GC is still clearing memory regularly, but never down to the same level. As memory gets full, the GC runs more often, trying to keep enough memory clear for new requests. At this point, performance degrades, the system may experience long pauses, and CPU usage spikes.
If there is a leak in native memory, as opposed to the heap, the pattern will be different.

Fig: Native Memory Leak Pattern
In this application, GC appeared to be initially healthy. However, when native memory began running short, the pattern changed dramatically. Although heap usage is fairly low, the GC is running almost continuously, and the heap usage isn’t changing much. This indicates that the problem does not lie in the heap, and we need to look at native memory to find out why.
Based on these findings, we can then decide on the right strategy to solve the problem.
- If memory usage is consistently high, rather than increasing over time, and GC is happening frequently, we’d first look for memory wastage, then look at increasing the heap size. We’d need to make sure the device or container has enough RAM to cater for a larger heap.
- If the GC logs are showing the typical memory leak pattern, we’d need to investigate heap usage, as described in the next section.
- If the GC logs indicate problems with native memory, we’d need to investigate where it’s being used. This is described in a later section.
Memory Leaks Affecting the Heap
Leaks generally fall into two main categories: an object that keeps growing, or a large number of the same type of object. The HeapHero report’s ‘Largest Object’ report is by default sorted in descending order by retained heap size. The retained heap is the amount of memory used by an object and all its children.
By browsing the outgoing references, as demonstrated in the video we linked to earlier, we can see what the retained space is made up of, and, if necessary, see the actual contents of the objects. The incoming references tell us what objects hold references to an item, preventing it from being garbage collected.
The report also includes a class histogram, which is very useful for finding large numbers of the same type of object.
For difficult-to-find leaks, it’s a good tactic to take two heap dumps, with an interval between them. Comparing the largest objects and histograms from the two reports is a quick way to find objects that are growing or proliferating. If we’re using the root cause analyzer yCrash, it has a facility to automatically compare reports. This saves a lot of time.
It’s also worth checking HeapHero’s ‘Objects Awaiting Finalization’ report. Any object whose class overrides the finalize() method is not immediately released when it’s eligible for GC. Instead, it’s added to a finalizer queue, which executes the method for each finalizable object in turn. If the queue is slow, it can cause a serious memory leak.
Memory Leaks Affecting Native Memory
We can visualize the Java memory model like this:

Fig: JVM Memory Model
If the GC logs indicate a problem in native memory, the first step is to ascertain which area is affected.
To do this we need to enable native memory tracking by restarting the JVM with the following switch: -XX:NativeMemoryTracking=summary
Next, while the program is running, we can extract tracking data to a file, using the following CLI command:
jcmd <pid> VM.native_memory summary > MyProg_nmt.txt
<pid> represents the process ID of the running program. The easiest way to find the PID is to use the jps command, which is bundled with the JDK.
This produces a text file showing memory usage by native area. Reading it manually can be time-consuming. A quick way to analyze it is to load the file into GCeasy, which produces useful information, including a breakdown as shown below:

Fig: Breakdown of Native Memory Usage
Ideally, we need to repeat the extract using jcmd to append to the file at regular intervals:
jcmd <pid> VM.native_memory summary >> MyProg_nmt.txt
This allows GCeasy to produce an interactive trend graph by memory area as shown below:

Fig: Native Memory Trend Graph
For this application, clicking on the Thread option showed the Thread area of memory is increasing over time, indicating a probable memory leak.
Once we’ve determined which area is affected, we can follow the same procedures to find the problem as we would when troubleshooting an OutOfMemoryError. The following article discusses these steps, and contains links to specific debugging strategies relating to specific native memory areas: Types of OutOfMemoryErrors.
The three native areas most commonly affected by memory leaks are the thread space, the metaspace and the direct buffer area. For more details on both native memory leaks and heap leaks, see Java Memory Leaks: The Definitive Guide to Causes, Detection & Fixes.
Conclusion
Memory leaks drain performance, and can be time-consuming to troubleshoot.
A leak suspect report can be useful, especially if we use it as a starting point to explore the dominator tree. However, there will be times when we need to look further.
GC log analysis helps us to pinpoint which area of memory is affected, and whether or not the problem is actually caused by a leak.
A memory analyzer such as HeapHero is the most important debugging tool for any kind of memory issues. Taking two or more heap dumps at timed intervals allows us to compare reports to find out which object is growing or proliferating.
This article has discussed strategies for finding obscure memory leaks. Questions and comments welcome!

Share your Thoughts!