How to Analyze Large Heap Dumps in Java (Tools, Tips & Best Practices)

Page contents

Hold on… What are unreachable objects, and why would we want to analyze them?

Let’s start with this scenario.

You have a large web server running in the cloud. It runs fine initially, but after a while, as traffic increases, you start to notice problems. Response time is erratic, and it’s no longer coping well with the workload. You’ve checked all the usual suspects. The heap’s not too full, and you don’t have a memory leak.

The finance department also notice problems. Why are cloud costs higher than they used to be?

These elusive problems are often due to object churn: the situation where the application is creating temporary objects faster than the GC is comfortably able to collect them.

To find the root cause of object churn, the best place to look for clues is among the unreachable objects. In this article, we’ll look at what they are and how to use a heap dump analyzer to find them. We’ll also discuss how to avoid object churn when coding.

And since we’re talking about cloud applications, there’s a good chance the heap may be huge. We’ll therefore point out some of the things we need to think about when we want to analyze large heap dumps.

What is Object Churn, and How Does it Affect Cloud Computing Costs?

The garbage collector (GC) normally does a great job of making sure there’s always enough heap space for new objects. However, if we’re creating and then releasing objects too frequently, the GC can get behind, and start struggling. If you’d like a deeper understanding of how GC works, you can refer to this detailed guide on Java Garbage Collection.

Let’s take this small program, which illustrates object churn at its worst.

			
public class BuggyProg10 {
// ===================================================================
// Tight loop creating objects, which go out of scope and are eligible
// for GC almost immediately. This illustrates object churn
// ===================================================================
        
        public static void main(String[] args){
        while(true)
            {
            Double[] db = new Double[200000];
            for (int i=0; i<200000;i++)
                    db[i]=new Double(i);
            try{Thread.sleep(5);} catch(Exception e){}
            }                        
	 }
}

		

Not surprisingly, GC has to work really hard to keep memory clear, even though the array is eligible for collection as soon as it goes out of scope. If we run it, and analyze heap usage over time from the GC logs, we’ll see something like this:

Fig: GCeasy graph of Heap Usage Over Time Showing Object Churn

Notice a few things here:

Heap usage rises steeply, then drops dramatically when GC runs;
GC is not always able to drop the heap back down to the same level;
GC is running far too frequently.

Obviously, all this activity is going to affect performance. Response times increase when GC is running. Because the GC is very processor-hungry, CPU usage spikes. Throughput drops, and pause time increases. In the worst case, if the GC really can’t keep up, the application may crash with this error: java.lang.OutOfMemoryError: GC Overhead limit exceeded. This means that almost all CPU time is used by the GC, not by the application.

Other than annoying your users, this can also inflate your cloud provider’s bills. Most hosts charge by the actual resources you use: CPU time, RAM, disk I/O and network traffic.

Fig: Finance Manager Receives Cloud Computing Bill

Here are some of the ways object churn can increase cloud costs:

Increased CPU time;
We usually configure memory according to peak usage during load testing or live monitoring. Object churn causes higher spikes, so the heap has to be allowed enough memory to cater for this. This means we’re paying for memory we don’t really need;
When system performance drops, the system administrator will quite likely respond by increasing the heap size. If the application is running in a container, the container will need more memory to allow for this;
If we’re running via a load balancer, it may spawn more instances to counteract performance problems.

How to Analyze Large Heap Dumps in Cloud Environments

In enterprise or specialized systems, the heap can be enormous. Heap sizes of 32GB or more are no longer uncommon, and sizes in terabytes are not unknown. When we’re dealing with these, analyzing a heap dump to diagnose memory issues requires some special considerations.

The size of the dump file on disk is likely to be between 30 – 70% of the size of the heap if we’re only dumping live objects. If we’re dumping all objects, as we would when troubleshooting object churn, the dump file will be between 80 – 130% of the heap size. As you can see, the dump file will be huge. This often leads to frustration when we’re troubleshooting in a hurry.

Extracting the heap is likely to be slow, and adds considerable overhead. If it’s taken from a production machine, performance will be affected.

Analyzing the dump could take some time, as the analyzer has to arrange and index it.

Let’s look at some of the issues we may encounter when troubleshooting huge systems in the cloud.

1. Choosing Tools to Analyze Large Heap Dumps

Many lightweight heap dump analyzers won’t handle very large heap dumps. Go with well-known tools such as HeapHero or Eclipse MAT.

2. Choose The Environment Where the Tool Will Run

We can’t analyze big dumps on small machines. The machine we’ll use for the analysis must have at least 8GB more memory that the size of our heap dump. If the heap dump occupies 56GB, the analysis machine will need at least 64GB of RAM. If the number of CPUs or the processor speed is inadequate, it could take many hours for the analysis to run.

Options include:

Having a powerful server available for the analysis;
Spin up a temporary powerful VM in the cloud purely to run the analysis;
Work with a tool such as HeapHero that offers its own large servers for the analysis task.

3. Storing the Dump File

When choosing the dump file destination:

Make sure there is enough space on the device;
Choose a high-speed device, or the dump may take hours. If we’re dumping to a device on the network, we need a high-speed connection as well.
If the application is running in a container, always dump to persistent storage.

4. Transferring the Dump File

If possible, send the dump directly to the analysis machine. If not, either use a removable drive with enough capacity, or compress the file and send it via a high-speed link. If transferring over a network, make sure it’s secure, as heap dumps often contain sensitive information.

5. Security

Heap dumps are a snapshot of memory. If the application processes any sensitive information, such as credit card numbers or social security numbers, these may be visible to anyone analyzing the dump. The dump should therefore be treated as highly confidential, and good security must be enabled during storage and transit. When it’s no longer needed, it must be destroyed completely.

The enterprise version of HeapHero has a facility for masking confidential data.

Using Unreachable Objects When Analyzing Heap Dumps

When an object has no more live references pointing to it, it becomes eligible for GC. This happens either when the object goes out of scope, or when it’s explicitly released. The object itself, however, will only be deleted on the next GC cycle. Until then, it remains in memory as an unreachable object.

These unreachable objects are the best possible clue as to why we’re experiencing object churn. They are the objects created and then discarded since the last GC cycle. However, they’re usually left out when we extract a heap dump, because they add to both the time it takes to do the dump, and the size of the file. For most troubleshooting tasks we don’t need them, but in this case we do.

We need to explicitly request them when we extract the dump. For example, if we’re using jmap for the extract, the command would normally be:

jmap -dump:live,format=b,file=heap.hprof <pid>

This tells it to extract live objects only. Instead, we’d use this syntax:

jmap -dump:format=b,file=heap.hprof <pid>

HeapHero automatically analyzes unreachable objects if they’re included, but if we’re using Eclipse MAT we’d have to set preferences to load them, and use a different strategy when doing the analysis. This is well described in the MAT documentation.

If you’re using a different tool, refer to the product documentation.

Let’s have a look at the HeapHero analysis report for the sample program in the section above. Remember, it illustrated object churn by creating an array of Double in a tight loop.

Below various graphs and charts related to live objects, there is an expandable section headed ‘Unreachable Objects’:

Fig: HeapHero Unreachable Objects Report for the Sample Program

Right at the top of the list, we see over nine hundred thousand instances of Double. All of these have been created and discarded since the last GC event, and are waiting to be cleared away next time GC runs.

The report is interactive, and we can browse to find the contents of each object in the array:

Fig: Contents of One of the Items in the Array

Given the type of objects that are churning and their contents, it shouldn’t be difficult to pinpoint which part of the program is responsible for object churn.

Excess Unreachable Objects: Causes and Fixes

Occasionally, object churn can be caused by inefficient GC settings. This article contains links to tuning guides for the various GC algorithms.

It’s far more likely that the problem is caused by poor coding practices. Let’s look at a few of the many ways we can tighten up our code.

1. Repeated String Concatenation

Look at the following code snippet, which scrolls through an array of names and adds each name followed by a comma into a string:

			
String allNames="";
for(int i=0;i<nameList.length();i++)
   allNames=allNames+nameList[i]+",";

Since String objects are immutable, the JVM can only carry this out by creating a new String on each iteration to hold the result. Suppose there were 2000 items in the array. This would create 2000 String variables, all of which would be eligible for GC almost immediately, but would not be actually removed until the next GC event. This is a classic cause of object churn.

How can we fix it?

Let’s look at this code snippet.

			
StringBuilder nameWork=new StringBuilder("");
for(int i=0;i<nameList.length();i++)
   nameWork.append(nameList[i]+",");
String allNames=nameWork.toString();

The StringBuilder class allows Strings to be edited, rather than creating a new object each time. Instead of 2000 new objects, we only have one: the StringBuilder object itself.

2. Reuse Objects Wherever Possible

Wherever we create a new object, especially if it’s resource-heavy, we should think about whether we could re-use an existing object instead.

Use object pooling for things like database connectors, network objects and threads;
Objects that implement the Collection interface have a clear() method. Creating a collection once, then clearing it and re-using it, is much better than creating a new collection every time.
If a constant doesn’t change for the duration of a loop, create it once outside the loop.
Boxed numbers are numbers held within a wrapper class, for example Integer, Double and BigDecimal. If you have to create boxed numbers to carry out a calculation within a loop, create it once outside the loop. Within the loop, set its value if possible from a primitive, rather than creating a new boxed number every time.

3. Growing Collections Within Loops

Let’s take this example: we have a database result set that is likely to contain about 100 000 entries, but could occasionally contain a few more. We want to copy the customer’s names from this set into an ArrayList. Look at the following code snippet:

			
ArrayList nameList = new ArrayList();
while (resultSet.next())
   nameList.add(rs.getString("name");

What happens here is the ArrayList is created with an initial capacity of zero. On the first iteration, when we add an element, it resizes itself, creating a new underlying array capable of holding 10 entries. On the eleventh iteration, the underlying array is full. It resizes itself by a factor of 1.5, creating a new underlying array with 15 entries and copying the data across from the original array. By the time we’ve processed 100 000 entries, it’s resized itself 30 times (you can do the math yourself if you don’t believe me). That’s a lot of unnecessary, fairly large, objects. It’s also a lot of copying.

Instead, we can create the ArrayList with a sensible size, to allow for the 100 000 entries plus a margin of safety.

			
ArrayList nameList = new ArrayList(120000);
while (resultSet.next())
   nameList.add(rs.getString("name");

This sets an initial size, and it will not have to resize itself unless the volume of data grows beyond expected limits. The bottom line is: it’s much more efficient to create collections with a reasonable size, rather than letting them grow ad lib.

Conclusion

In most troubleshooting scenarios, we’re only interested in examining live objects within a heap dump. However, if we suspect object churn, we need to look at unreachable objects to find out why they are building up and causing the GC to work overtime.

When we’re looking at cloud applications with large heaps, the overhead of dumping all objects as opposed to only live objects can be huge. For this reason, we’ve also looked at best practices for when we need to analyze large heap dumps.

Analyzing “Unreachable” Objects in Cloud Dumps

What is Object Churn, and How Does it Affect Cloud Computing Costs?

How to Analyze Large Heap Dumps in Cloud Environments

1. Choosing Tools to Analyze Large Heap Dumps

2. Choose The Environment Where the Tool Will Run

3. Storing the Dump File

4. Transferring the Dump File

5. Security

Using Unreachable Objects When Analyzing Heap Dumps

Excess Unreachable Objects: Causes and Fixes

1. Repeated String Concatenation

2. Reuse Objects Wherever Possible

3. Growing Collections Within Loops

Conclusion

YOU MAY ALSO LIKE

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

What is Object Churn, and How Does it Affect Cloud Computing Costs?

How to Analyze Large Heap Dumps in Cloud Environments

1. Choosing Tools to Analyze Large Heap Dumps

2. Choose The Environment Where the Tool Will Run

3. Storing the Dump File

4. Transferring the Dump File

5. Security

Using Unreachable Objects When Analyzing Heap Dumps

Excess Unreachable Objects: Causes and Fixes

1. Repeated String Concatenation

2. Reuse Objects Wherever Possible

3. Growing Collections Within Loops

Conclusion

YOU MAY ALSO LIKE

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from HeapHero – Java & Android Heap Dump Analyzer