Diagnosing Java Native Memory Leaks (JNI, Direct Buffers, and OS Tools) 

Does your application’s performance gradually degrade over time? Or does it run perfectly for weeks, then suddenly crash with an OutOfMemoryError? Maybe your container application mysteriously disappears from time to time, thanks to the dreaded OOM killer. Chances are, you have a memory leak.

Plenty of diagnostic guides deal with finding heap-related memory leaks, but what if the leak is not in the heap? How do you troubleshoot native memory leaks? What tools do you need, and how should you go about finding out which memory area is affected? JNI memory leaks, in particular, can be a real challenge.

This article discusses the JVM memory model and how to determine which memory pool is causing the problem. We’ll look at some useful troubleshooting tools and work through the diagnostic steps we’d use to debug a sample program that has a native memory leak.

What Is Native Memory in Java? 

The heap is only one of many memory pools maintained by the JVM. We can visualize JVM memory like this:

Fig:  The JVM Memory Model

The two major divisions are Heap Memory and Native Memory, sometimes referred to as heap and non-heap.

Heap memory is a central storage area where all objects used by the application are created. It’s managed entirely by the JVM, and cleaned regularly by the garbage collector (GC). The heap is usually, although not always, divided into the Young Generation and the Old Generation to speed up GC.

Native memory is managed by the operating system. Functionally, we can visualize it as having several pools, each with its own purpose. These include:

  • Metaspace: Stores definitions and metadata for each currently-loaded class. In older JVMs, this information was held in an area called the PermGen.
  • Thread Space: Holds the stack for each running thread.
  • Code Cache: Stores pre-compiled code for hot methods.
  • Direct Buffers: Used for fast native I/O buffers.
  • GC: Used internally by the garbage collector.
  • JNI: Used by the Java Native Interface.
  • Misc: Reserved for the JVM’s internal use.

Of these, the Code Cache, GC and Misc are not usually subject to leaks, since they are used only by the JVM itself. Metaspace, Thread Space, DirectBuffers and JNI are all subject to program bugs, and may therefore suffer from native memory leaks.

If the system actually crashes with a Java OutOfMemoryError, identifying the problem area is simple: we need only examine the actual Java error message. This troubleshooting process is described in detail in this article: Types Of OutOfMemoryError in Java.

If it doesn’t crash, the problem can be much harder to diagnose. We may see these symptoms:

  • The system as a whole, not only the Java application, may gradually slow down.
  • If we examine operating system diagnostics such as top in Linux or Task Manager in Windows, we see the Java application consuming more and more memory as time goes on.
  • In some cases, but not always, the application may become progressively CPU-hungry.
  • Examining the GC logs shows heap usage as normal.
  • When running in Linux, or in containers under Kubernetes, the application may suddenly disappear. This is because it has been terminated by the OOM killer to prevent the entire system from crashing.

Best Tools to Diagnose Java Native Memory Leaks 

We need a well-stocked toolbox to efficiently troubleshoot native memory leaks.

Here are some tools you’re likely to find useful.

OSToolComments
Linuxtoptop -o RES for top memory usage
pmapmap growth
vmmapMemory mappings by process
gdbGnu debugger
WindowsTask managerHigh-level overview of processes
VMMapPart of SysInternals; similar to pmap
ProcExpPart of Sysinternals; Process details
WinDbgWindows Debugger
MacOSvmapSimilar to pmap
leaksLeak detection
gdbGnu Debugger
JDK ToolsJcmdExtract diagnostics
JconsoleProfiler; view Mbean info

Useful diagnostic artifacts and third-party tools to analyze them include:

ArtifactHow to Extract ArtifactAnalysis Tools
Native Memory TrackingJVM argument: -XX:NativeMemoryTracking=summaryGCeasy
 jcmd <pid> VM.native_memory summary >> filename.txt 
Heap DumpHow to Capture Heap DumpsHeapHero
 Eclipse MAT
Thread DumpHow to Capture Thread DumpsfastThread
GC LogGC Logging Best PracticesGCeasy
  GCViewer

It’s worth mentioning here that since Java 5, MBeans have proved to be a highly useful diagnostic tool. These make diagnostic information available via JMX. The JVM provides standard platform MBeans, and developers can also define classes for application-specific diagnostics. They’re used in conjunction with tools such as JConsole, as we’ll see later in the worked example.

How to Troubleshoot Java Native Memory Leaks: Step-by-Step 

When troubleshooting, it always helps to have an action plan. Here are some suggestions. 

Step 1: Confirm It’s a Memory Problem 

If the device slows down or crashes, we suspect memory problems.

A quick look at the running processes tells us whether the problem is memory-related, and which applications are using the most memory. In Linux, we can use the top -o RES command to see the top memory users.

Fig: Output of top Command

In Windows, we can use the “Details” tab of the task manager sorted by memory usage:

Fig: Windows Task Manager

This confirms whether the problem is indeed memory-related. We can sweat a glance whether free memory is running short in the device. If our Java application is one of the top memory-users, it’s likely that it’s the culprit. We can watch this over a period of time, to see whether the Java application’s memory is growing. If so, we probably have a memory leak. 

Step 2: Determine Heap vs Native Memory Leak

The next thing to establish is whether the problem is heap-related or if the leak is in native memory. Th GC logs are our best indicators here. It’s always best to have GC logging running in production, since it uses very little overhead, and contains valuable performance-related information.

Viewing the GC log with a tool such as GCeasy, we can easily see heap usage patterns over time. Here’s an example of a healthy pattern, with GC events shown as red triangles.

Fig: Healthy Heap Usage Pattern

The GC is always able to bring heap usage down to a consistently low level, and GC events are not becoming more frequent. If the GC pattern looks like this, the memory problem is not likely to be heap-related.

If, on the other hand, the memory leak is in the heap, we’re likely to see a pattern like this:

Fig: Memory Leak in the Heap

Although the GC is able to clear space on each cycle, it’s never able to bring memory down to the same level. Over time, the trend is for the lower memory level to increase. GC events occur more frequently as the GC has to work harder to clear memory for new allocations. This results in increased CPU usage, since the GC is processor-hungry. If we see this pattern, we should be looking for the problem in the heap, not in native memory. This article provides a comprehensive guide to troubleshooting heap-related memory leaks.

If the problem is a native memory leak, the GC pattern may be healthy, or it may look like this:

Fig: Metaspace Leak Pattern

In this example, GC was initially healthy, then suddenly GC events began running back-to-back, without clearing heap memory. This pattern is likely to indicate problems in the metaspace, perhaps a ClassLoader leak. When metaspace is full, it triggers a full GC event, whereas native leaks in other memory pools usually don’t affect GC.

Step 3: Identify the Affected Native Memory Area 

Now we’ve established that we have a leak in native memory, the next step is to find out which native memory area is affected. To do this, we need to start the JVM with native memory tracking (NMT) enabled. The JVM argument is:

XX:NativeMemoryTracking=summary

Or, for low-level details,

XX:NativeMemoryTracking=detail

Requesting summary information doesn’t add too much overhead, and gives us enough information for most cases. It’s possible to enable NMT in production, but ideally, we should use a performance testing environment, where we’re able to test it under load.

To actually obtain the NMT information, we’d use jcmd while the load test is running. We should sample the information at regular intervals so we can see trends over time. The command is:

jcmd <pid> VM.native_memory summary >> filename.txt

This appends each sample to the output file. A section of the file may look like this:

11532:
Native Memory Tracking:
(Omitting categories weighting less than 1KB)
Total: reserved=3570619KB, committed=247863KB
- Java Heap (reserved=2066432KB, committed=131072KB)
(mmap: reserved=2066432KB, committed=131072KB)
- Class (reserved=1048653KB, committed=205KB)
(classes #536)
( instance classes #453, array classes #83)
(malloc=77KB #594)
(mmap: reserved=1048576KB, committed=128KB)
( Metadata: )
( reserved=8192KB, committed=256KB)
( used=134KB)
( waste=122KB =47.63%)
( Class space:)
( reserved=1048576KB, committed=128KB)
( used=5KB)
( waste=123KB =95.86%)
- Thread (reserved=19501KB, committed=749KB)
(thread #19)
(stack: reserved=19456KB, committed=704KB)
(malloc=25KB #120)
(arena=20KB #36)

The extract contains a breakdown by category of all memory used by the JVM, both heap and non-heap. For a breakdown of the NMT categories, see the Oracle NMT documentation.

Since it’s a text file, we can examine it manually, but spotting trends is time-consuming. Since we took regular samples, the file will be quite long. There’s a quicker way.

In additional to GC logs, GCeasy can read NMT output from a .txt file, and present the contents as a series of interactive graphs and charts, like this one:

Fig: Breakdown of Memory Usage by GCeasy

This shows committed memory vs reserved memory in each NMT category. Reserved memory is the space allocated by the JVM, whereas committed memory is the space currently being used.

We can also view trends over time for each category to establish which space is actually growing. We’ll see this in action in the next section, where we work through diagnostics for an actual memory leak.

Our troubleshooting tactics will differ, depending on which area is affected.

Step 4: Check Metaspace, PermGen, and Thread Leaks 

Some areas are simple to troubleshoot, since they correspond exactly to our JVM memory model. 

These are:

  • Metaspace. The causes include ClassLoader leaks, too many dynamic classes (e.g. in scripting languages such as Groovy), or simply under-configuration.  For these, we can follow the same troubleshooting procedure outlined for Solving OutOfMemoryError in the Metaspace. We should take a heap dump and view the class histogram using a tool such as HeapHero. This lets us see exactly how many objects of what class are currently loaded.
  • PermGen. Prior to Java 8, the PermGen served the same purpose as the Metaspace. If the leak is in the PermGen area, the causes are the same as for the Metaspace. To troubleshoot the problem, see Solving OutOfMemoryErrors in the PermGen. A heap dump is again the most useful artifact here.
  • Thread Space. Thread space leaks rarely cause serious memory problems, since thread space is small. They are far more likely to cause OutOfMemoryErrors or CPU spikes. Causes include threads that are failing to terminate, or bugs that result in threads being created in a non-terminating loop. If thread space is increasing, we can troubleshoot it as we would OutOfMemoryError: Unable to Create New Native Threads.

Step 5: Investigate Direct Buffers, FFM, and JNI Memory Leaks

For other memory spaces, the source of the problem is less obvious. It helps if we know what APIs are prone to native memory leaks. The table below shows coding techniques commonly associated with native memory leaks, and the NMT space likely to be affected. These may differ depending on the JVM currently in use. The first step is to look at the program code to find out which of these techniques it actually uses. There’s no point, for example, in searching for JNI memory leaks in a program that doesn’t use JNI.

CategoryLeak TypeNMT Area Likely to be Affected
Java-ControlledMemory mapped fileInternal, Other
Direct BuffersOther, Internal, maybe NIO
Memory Segment (FFM)Other, Unknown, Internal
Native Code ControlledJNI malloc leakOther, Unknown, Internal
JNI thread leakThread
JNI GlobalRef leakHeap, Class, Metaspace
FFM native code leakOther, Unknown, Internal

Currently, there are three main types of Java objects that create memory off-heap. This technique  is extensively used in modern frameworks, such as Spring WebClient.

  • Memory-mapped files allow a file to be mapped into virtual memory, so it can be accessed as if it were actual memory. The operating system is at liberty to page portions of it to disk if physical memory runs short.
  • Direct buffers are created with the method ByteBuffer.allocateDirect(). They are used for fast native I/O.
  • Memory Segments are created using the Java Foreign Functions and Memory API, and can be used to programmatically manage large chunks of memory.

Each of these creates a small object in the heap, which references off-heap memory. When the object becomes unreachable, the JVM launches clean-up operations to release the off-heap memory.

To troubleshoot these problems, we can start with a heap dump, which allows us to identify what off-heap memory we’re referencing via these objects, and how many we have. For memory-mapped files and direct buffers, we can also gain insights through standard MBeans via JConsole. We’ll see this in action in the next section. 

Sometimes, all that’s needed is to migrate to a newer version of Java. Java 17 introduced several improvements to native memory management.

Troubleshooting steps follow the process outlined in the article How to Solve OutOfMemoryError: Direct Buffers.

Memory controlled by native code requires a more specialized approach, since we’ll need to actually trace memory allocations and deallocations in the foreign code. This category includes FFM and JNI memory leaks.

A thread dump can be helpful, since it contains the Java stack trace of each running thread. This can establish a starting point within the Java application.

For debugging the actual native code, Oracle has an excellent guide to Diagnosing Memory Leaks in Native Code.

Example: Diagnosing a Java Native Memory Leak 

Let’s work through the process of diagnosing a memory leak in the Direct Buffer area. The program code below deliberately creates a leak by repeatedly creating direct buffers, then referring to them in an ArrayList. Since the ArrayList doesn’t release the references, the buffers will always be reachable, and can’t be cleared from memory.

import java.nio.ByteBuffer;
import java.util.ArrayList;
// This program causes a memory leak in the Direct Buffer area
public class BuggyProg17 {
public static void main(String[] args) {
ArrayList al = new ArrayList();
while (true) {
// Allocate a large direct ByteBuffer
ByteBuffer directBuffer = ByteBuffer.allocateDirect(1000000);
// Add the reference into an ArrayList
al.add(directBuffer);
// Introduce a pause so we have time to collect
// diagnostics before the program crashes
try{Thread.sleep(500);} catch(Exception e){}
}
}
}

We’ll run the program with both GC logs and NMT tracking:

java -Xlog:gc*:file=BuggyProg17.log:time -XX:NativeMemoryTracking=summary BuggyProg17

Next, we’ll establish its PID:

>jps
12284 BuggyProg17
7804 Jps

We’ll then sample NMT data to a file at regular intervals:

jcmd 12284 VM.native_memory summary >> BuggyProg17.txt

At the same time, we can run Linux top or Windows task manager, where we’ll see the memory size of the process growing over time.

Next, we can load the NMT data into GCeasy. We’ll get an interactive report with several graphs and charts.

The overall trend graph shows that native memory is definitely growing:

Fig: Native Memory Trend Graph

We’ll also see a breakdown of usage by category.

Fig: Breakdown by NMT Category

We can drill down to see trends by category. Since the ‘Other’ category appears to be quite large, we’ll examine it.

Fig: Native Memory Trend Graph for Category Other

This tells us we’re likely to have a leak in the ‘Others’ region. Referring to the table of leak types mapped to NMT categories in the previous section, this tells us the problem could relate to:

  • Memory-mapped files
  • Direct Buffers
  • Memory Segments (FFM)
  • FFM native leaks

We then ask ourselves which of these are actually used by the application. In a program this small, that’s easy: the only one we’re using is a direct buffer.

If we take a heap dump and view it with a heap dump analyzer, we can search for direct buffer objects, which we know belong to class ByteBuffer.

Fig: Histogram by HeapHero

At the time the dump was taken, there were 255 DirectByteBuffers, which is certainly excessive, and is likely to indicate that the buffers are being created in a non-terminating loop.

We can also see this on the MBeans tab of JConsole, if we connect to the running application.

Fig: JConsole Showing MBean for Buffer Pools

We navigated to the MBeans tab, then under java.nio, we looked at the buffer pool. At the time this was run, there were 1663 instances of direct buffers, definitely pointing to an out-of-control loop.

How to Fix Java Native Memory Leaks: Key Takeaways 

Not all memory leaks are heap-related. We’ve discussed tools we can use to troubleshoot native memory leaks. In particular, native memory tracking lets us isolate the problematic memory area. 

For FFM and JNI memory leaks, we need to be able to debug the native code. The Oracle documentation guides us through this process.

Once we’ve identified which area of memory is affected, we can follow normal debugging strategies to find and fix the problem.

Share your Thoughts!

Up ↑

Index

Discover more from HeapHero – Java & Android Heap Dump Analyzer

Subscribe now to keep reading and get access to the full archive.

Continue reading