Beyond OutOfMemoryError: Using a Memory Analyzer for Subtle Memory Leaks

OutOfMemory errors can cause havoc in production: the system crashes, clients can’t connect, and managers raise blood pressure – their own and everyone else’s. This is bad, but this kind of crash is often fairly simple to diagnose and correct. Just follow the stack trace, and in many cases, the culprit becomes obvious.

What’s worse is a subtle memory leak. The system doesn’t crash immediately. Instead, it becomes progressively less responsive, and may intermittently time out. Clients become annoyed by the delays, and may take their business elsewhere. The IT troubleshooters receive complaints, but can’t put their fingers on the problem. CPU usage climbs, and other applications on the device may be affected. If the system is running in the cloud, bills become higher as more and more resources are being used.

In this article, we’ll look at how to use a memory analyzer and other tools to not only diagnose the problem, but proactively forecast that memory issues may become a problem in the future.

What is a Memory Leak, and What Symptoms May Indicate its Presence?

A memory leak occurs when memory usage continually increases, in spite of the garbage collector (GC) cleaning unused memory regularly.

Typically, if we look at garbage collection statistics for a healthy application and compare them to an application that has a memory leak, we would see something like this:

Fig: Comparison of Healthy GC Pattern to Memory Leak Pattern

The small red triangles indicate a GC event. In the first graph, every time GC runs, memory is cleared down to roughly the same level, so that the lower memory level forms almost a straight line.

In the second graph, GC is clearing memory, but it never gets down to the same level. Less and less memory is cleared on each GC event. 

A line indicating the lower levels slopes continuously upwards. We also notice that GC events are gradually happening more frequently as the GC tries and fails to keep memory usage down.

If the program runs for long enough, GC events will eventually be running back-to-back, and clearing almost no memory on each cycle. Eventually, this will probably result in an OutOfMemory error.

If we’re seeing the symptoms listed below, it’s highly likely that we have a memory leak.

  • Memory usage increases gradually over time.
  • CPU usage increases. This is because GC, which is CPU-heavy, is running more frequently.
  • Response times and performance degrade over time.
  • Time-outs occur intermittently. This happens because the GC has paused application threads in a stop-the-world event.
  • Eventually, after running for some time, an OutOfMemory error occurs.

We can be proactive in finding memory leaks before they cause shouting and chaos in production. By introducing regular monitoring, we can diagnose memory leaks that may cause issues in the future. GC monitoring is an essential part of this. Are GC events happening more frequently with time? Does the GC fail to effectively clear enough memory on each cycle? Is CPU usage increasing? If so, it’s worth checking for memory leaks before they degrade the system.

Not all memory issues are necessarily memory leaks. The heap may have been configured too small for the task, or the application may simply be wasting space. What distinguishes a memory leak is that the problem gradually worsens over time, since GC becomes progressively unable to cope.

What Causes Memory Leaks?

Typically, a leak happens because:

  • Resources that are no longer needed by the application are not being released to the GC, so they build up in memory over time. For example, an application may fail to release resources created by an HTTP request when the request is completed. Resources are released when a variable is no longer in scope, when it is explicitly set to null, or when an object’s close() method, if it has one, is called. Variables that are defined in the wrong scope (e.g. as class variables instead of local variables) are often the cause of memory leaks.
  • A bug in the program repeatedly either creates objects or adds data to collections. This is usually because there is a loop that does not terminate when it should, probably because of issues with the terminator condition.
  • Duplicate items are added to a cache or a collection, possibly because the logic for removing unused items or assigning keys is faulty.

As we’ll see in later sections, these types of errors can usually be fairly easy to find using a memory analyzer to examine a heap dump. In this article, we’ll use HeapHero to analyze the dump, but the same principles apply if you’re using another tool such as Eclipse MAT.

These types of leaks will manifest in one of two ways: when we look at the heap dump, we’ll either see a single object – usually a collection – that has grown extremely large, or we’ll see a proliferation of objects of the same class.

For more information, this video discusses common causes of memory leaks and how to find them.

Occasionally, memory leaks can be caused by more obscure issues, and these are often harder to find. Here are a few less common causes of memory leaks:

  • Mutable keys in collections: Some collections, such as hashmaps, can become problematic if the keys are changed. Because they use the hash key to calculate which bucket the data should be stored in, changing the key can result in duplicate records or failure to delete records no longer required.
  • Uncleared Thread Local variables:  If the application has many threads, and Thread Local variables aren’t cleared, they can build up in memory and cause problems.
  • Slow Finalizer methods: At one time, developers were encouraged to include a finalize()  method when coding a class that needed clean-up operations before being garbage collected. This has since been deprecated. If an object has a finalize() method, the GC adds it to a finalization queue, which invokes this method in turn for each finalizable object. This queue is single-threaded, so if any object takes a long time to finalize, it prevents all objects in the queue behind it from being garbage collected until it completes. For example, if the finalize() method waits on a network connection that has been dropped, it can hang indefinitely, and cause a memory leak.
  • Threads failing to terminate. This can result in a build-up of unused threads, each retaining memory.
  • Listeners that are not deregistered. If Object A is a listener for Object B, and Object B outlives Object A, Object A cannot be garbage collected while it’s still registered as a listener.

Obviously, there can be other obscure causes, but this list will give you some idea of what to look for.

For more information, this video deals with less common causes of memory leaks.

What Do You Need to Troubleshoot a Memory Leak?

Firstly, check the application log or console output. Has the program actually thrown an OutOfMemory error? If so, it should be followed by a stack trace, which is always useful in tracing errors.

Next, we’ll need a heap dump. This is a snapshot of the contents of memory at a given moment. It contains all objects in the heap, together with their incoming and outgoing references and contents. There are several ways to obtain a heap dump.

The heap dump is a very large binary file, almost impossible to analyze manually. Fortunately, there are tools that allow us to extract and browse through the information stored in the dump, such as the one we’ll be using as an example: HeapHero.

It can also be helpful to use a GC log analyzer such as GCeasy to monitor the performance of the garbage collector, and spot trends, such as the one shown in the image earlier in the article.

How Do You Use a Memory Analyzer to Find Memory Leaks?

If the application has thrown an OutOfMemory error, check the error message carefully to establish the type of error. There are, in fact, 9 different types of OutOfMemory errors in Java.  

The next step is to create a heap dump and load it into HeapHero, or another memory analyzer. HeapHero can either be installed on site, or you can upload the dump to the cloud here.

HeapHero has machine learning algorithms that are often able to instantly diagnose the cause of the memory leak. If so, it will show recommendations at the front of the report, as shown in the diagram below. In this case, you may need to look no further. See the image below for an example.

Fig: Problem Detected by HeapHero Indicating Probable Memory Leak

Since most memory leaks fall into the two most common categories we spoke of earlier (either an object that’s continually growing, or a proliferation of similar objects), it makes sense to look at these first. Even if the leak is caused by one of the obscure issues mentioned earlier, the tactics described below are likely to uncover clues to the problem.

Almost always, the memory leak is likely to be found among the four top largest objects in the heap dump. Most memory analyzers, including HeapHero, include a list of objects in memory sorted by size in descending order. Our first step, then, is to investigate the top items on this list. The image below shows an example of this list.

Fig: List of Largest Objects Produced by HeapHero

At this point, it’s worth mentioning that the shallow heap is the amount of memory retained by the object itself, whereas the retained heap is the amount of memory retained by the object and all its descendants.

In the example above, the largest object is a thread. This indicates that one of the local variables of this thread is either expanding or proliferating. The next thing to do is to expand this by looking at its child objects. We can do this by looking at its outgoing references, as shown in the next image. In HeapHero, we’d do this by clicking the ‘more’ link.

Fig: HeapHero Report of Outgoing References

We can expand the larger items on this list using the arrow keys, moving down as many levels as we need to find the actual object that is causing the problem. We may find a very large object, in this case the TreeMap, indicating that this map is expanding. Alternatively, we may have found a large number of similar objects, indicating that an object is proliferating.

Once we’ve found a suspect, we would next explore why it is not being garbage collected. We can do this by exploring its parent objects, or incoming references. Again, we can do this using the ‘more’ link.

Once we know which object is using up memory, and which objects are holding a reference to it, we can easily zoom in on the area of code that’s causing the problem. 

You may like to watch this video to see a demonstration of using HeapHero to find a memory leak.

Let’s now look at a few other areas of the HeapHero report that we may find useful in looking for more obscure errors.

The image below is a class histogram:

Fig: Class Histogram

This shows heap usage by class, and it’s useful especially for finding proliferations of similar objects.

If we suspect that there may be thread issues, the HeapHero has a link whereby its sister tool, fastThread, can analyze thread activity from the dump. The image below shows the link in the HeapHero report, and extracts from the resulting FastThread report.

Fig: Accessing Thread Analysis Tools 

In this case, we can see that we have a very large number of identical threads. We can interactively explore their stack trace to find out why this is happening.

It’s also worth checking the section of the HeapHero report that lists objects waiting for finalization, as per the image below.

Fig: Objects Awaiting Finalization

If this queue is large, it indicates that the object at the head of the queue may be hung, waiting for resources, and preventing all objects in the queue from being garbage collected.

Conclusion

With the right tools, detecting, finding and fixing memory leaks need not be difficult.

Analysis of GC performance can indicate whether a memory leak is present. By monitoring GC performance regularly, we can proactively spot memory leaks before they become a problem in production.

Finding the leak requires a heap dump, which is a large binary file. To use it for troubleshooting, we need a memory analyzer such as HeapHero.

By having the right tools, knowing what to look for and approaching the problem logically, it’s possible to find and fix obscure memory leaks without wasting hours of valuable developer time.

Share your Thoughts!

Up ↑

Index

Discover more from HeapHero – Java & Android Heap Dump Analyzer

Subscribe now to keep reading and get access to the full archive.

Continue reading