Memory Analysis for Containerized Java Applications (Docker, Kubernetes)

Containerized applications are becoming increasingly popular. They are easy to deploy, and work reliably across different platforms. Containers are especially useful when using microservices architecture. 

Modern applications may consist of a large number of containers, and to manage them efficiently, software such as Kubernetes is invaluable.

However, unless it’s carefully planned, it’s not always easy to troubleshoot an individual container when we encounter problems in production. Memory issues in containerized applications are fairly common, and Java memory analyzers aren’t always readily available on the spot.

In this article, we’ll look at some of the reasons memory problems may occur, and how to resolve some of the common problems we may meet when trying to solve them.

Symptoms that May Indicate Memory Issues in Containerized Applications

Memory problems generally begin by degrading performance, and may progress to causing the application to crash. Since Kubernetes generally restarts a crashed application, what we’re actually likely to see from the user’s point of view is increasingly slow response, followed by disappearing sessions. When the application restarts, the users can reconnect, and performance may be acceptable for a while, before the cycle repeats itself.

If we had performance monitoring in place, we would see high usage for both CPU and memory, as the garbage collector struggles to free memory. If the system crashes, we may see either:

  • java.lang.OutOfMemoryError in the application log or
  • A message in either the operating system logs or the Kubernetes logs indicating the process has been killed by the OOM killer.

The first happens if the Java application is unable to allocate new memory when required. The second occurs if either Kubernetes or the operating system detects that memory on the device is running low: it then picks an application to kill to make sure the system as a whole doesn’t crash.

Java Memory Analysis

To find the cause of the problem, we need to be able to look at the JVM memory, and how it’s being used.

The first and most useful tool is a heap dump analyzer such as HeapHero or Eclipse MAT. For a demonstration of how to use HeapHero to diagnose memory problems, you may like to watch How to Analyze a Heap Dump Fast. We would first take a heap dump, then submit it to one of these Java memory analyzer tools to interactively explore the heap. In later sections of this article, we’ll look at why obtaining a heap dump may not always be so simple within container environments, and how to get around these problems.

Memory issues may not always relate to the heap. They could relate to other areas of the JVM memory, to the container as a whole, or to the environment. JVM memory can be visualized as shown in the diagram below:

Fig: The JVM Memory Model

For a full explanation of what each memory area is used for, watch JVM Explained in 10 Minutes.

We need to take into account the total memory used, not just the heap. This article describes how to analyze the amount of native memory in use: Understanding Native Memory Tracking in Java.

The best way to find out if memory issues relate to the heap, or to native memory, is to analyze the garbage collection logs using a tool such as GCeasy. Studying memory usage over time can uncover GC patterns that give us important diagnostic information.

These causes fall into three categories:

  • Memory leaks: Memory is not being released when it’s no longer needed, and builds up over time;
  • Memory Wastage: Poor coding practices can result in a large amount of wasted memory;
  • Inaccurate Configuration: This is by far the most common cause of memory problems in containerized environments, and we’ll discuss this further.

It’s important when configuring a container to ensure enough memory is allocated to allow  for:

  • The heap
  • Native memory
  • The operating system, and any other services or background processes that may be running.

Too often, administrators use only the heap size as a basis. Instead, it’s important, before putting a system live in production, to use memory analyzer tools to accurately discover the actual memory used by the application. As a rule of thumb, we’re likely to need 30-40% more memory over and above the heap size.

The JVM should include container-friendly run time arguments, to ensure it uses the resources allocated to the cgroup, rather than the device as a whole. This allows it to make more accurate decisions regarding memory allocation.

Use:

  • -XX:+UseContainerSupport to advise the JVM that it will be running in a container;
  • -XX:MaxRAMPercentage to express the maximum memory to allocate as a percentage of the memory available within the cgroup. 60-70% is usually a good setting.
  • -XX:InitialRAMPercentage (optional) sets the initial memory allocation.

Additionally, ensure enough CPU resources are allocated to the container. If they’re not, the garbage collector may suffer from CPU throttling, where it’s unable to access enough CPU time to do its job. This results in memory-related failures even where enough memory is available, as it’s unable to clear unused objects efficiently. It can also result in long pauses, where the garbage collector stops application threads for long periods.

When running under a container manager such as Kubernetes, it’s also necessary to make sure that enough resources are allocated to the pod, and that the node has enough resources overall.

Troubleshooting in a Container Environment

Most of the Java troubleshooting techniques apply anywhere, whether or not we’re using containers. If you’d like to up your skills in this area, you may be interested in this JVM Performance Tuning Masterclass. However, there are additional considerations when we’re working with containers, and particularly with container management systems.

If possible, we’ll need to open a terminal within the container to run diagnostics.

In Docker, we’d use a command like this:

docker exec -it <container_name> /bin/bash

If the container is being managed by Kubernetes, we would instead use a command similar to this:

kubectl exec -it <pod-name> -- /bin/bash

If bash isn’t included in the container, we can use sh instead.

This may not always work because:

  • The tools we need, such as the JDK utilities, may not have been included in the container. In fact, if we’re working with a distro-less container, we may not even have access to a shell.
  • If the application crashed, the container will have been restarted, so we won’t be able to analyze memory at the time of the crash.
  • Any diagnostic data, such as logs, will only have been retained if it was written to persistent storage.

With proper planning before the container is put live, we can overcome these difficulties. Here are some suggestions.

1. Set Up a Troubleshooting Image

Create a Docker image with all the tools you’ll need to troubleshoot your running application. If you’re running under Docker alone, you can use Docker Debug, which allows you to install additional troubleshooting tools into the debugging image. If you’re running under Kubernetes, you set up your debugging container, and add it to the pod as an ephemeral container.

Alternatively, you can run your debugging image as a sidecar at startup, but this may not be the best solution from a security point of view.

2. Configure the JVM to Run a Heap Dump or Script When OutOfMemory is Encountered

A good practice is to always configure the JVM to provide diagnostics before crashing with an OutOfMemoryError.

To provide a heap dump only, use the following JVM switches:

-XX:HeapDumpPath=<path to heap dump location on persistent storage>
-XX:+HeapDumpOnOutOfMemoryError

It’s also possible to run a script on outOfMemoryError.

-XX:OnOutOfMemoryError="<command>"

This is more powerful, since we can gather other diagnostic information as well as the heap dump. The ideal script for this is the free open source yc-360 script from yCrash. This gathers 360° troubleshooting artifacts that preserve troubleshooting information relating to the application and its environment. If this is output to persistent storage, we will have everything we need to diagnose the root cause. The image below describes the information that will be captured.

Fig: Artifacts Captured by yc-360 Script

Note that this solution only works for OutOfMemoryErrors thrown by the JVM. If the application is killed by the Kubernetes or Linux OOM killer, it’s terminated before any action can take place.

3. Enable Remote Java Diagnostics Using JMX

To enable us to run tools such as JDK troubleshooting utilities remotely, we could enable remote JMX on the container. This means that if the container is not performing well, we can debug it from another machine on the network. This may, however, have security implications for confidential applications.

4. Ensure Adequate Logging and Monitoring is In Place

When building the container image, be sure to enable logging to persistent storage. This may include:

  • Application logs;
  • Garbage Collection Logs;
  • System logs.

GC logs in particular are invaluable for debugging memory problems.

Oracle’s Java Flight Recorder is a low-overhead option for recording diagnostic information from a running JVM.

A really good tool for proactively diagnosing problems before they become critical, ensuring troubleshooting artifacts are available when problems are encountered, and discovering root causes is yCrash. A low-overhead agent process regularly samples aspects of JVM performance. If it detects impending problems, full diagnostic information is transferred to the yCrash server, which runs on a different machine. yCrash then analyzes this, gives recommendations, and produces interactive troubleshooting reports. It also raises an alert. Ideally, the yCrash agent should run in a sidecar.

Conclusion

Using a Java memory analyzer to diagnose problems in containerized applications can be tricky.

Pointers to simplify this process include:

  • Planning beforehand how diagnostics can be gathered from the application;
  • Becoming familiar with Docker and Kubernetes debugging procedures;
  • Including good logging and monitoring tools.

It’s also important to understand JVM memory management principles to avoid under-configuring resources.

Containerized applications are notorious for having hard-to-debug memory related issues, but this need not be the case. 

Share your Thoughts!

Up ↑

Index

Discover more from HeapHero – Java & Android Heap Dump Analyzer

Subscribe now to keep reading and get access to the full archive.

Continue reading