Debugging OutOfMemoryError in a Microservices Architecture: Unique Challenges and Container-Specific Solutions

Page contents

With the modern trend towards cloud computing, the use of microservices running in easy-to-deploy containers is becoming more and more widespread. Services can be packaged and installed painlessly using services such as Docker or Kubernetes.

Containers are great – until they go wrong. In particular, OutOfMemoryErrors in a microservices architecture are common, and can be challenging to troubleshoot.

This article looks at the special difficulties of troubleshooting when a Java application running in a container has memory issues, and suggests some solutions to make debugging simpler.

Challenges When Encountering an OutOfMemoryError in a Microservices Architecture

OutfMemoryErrors in Java are not usually difficult to find. This article, Common Memory Errors in Java and How to Fix Them, is a good guide to troubleshooting most OutOfMemoryErrors. I’d also recommend reading Types of OutOfMemoryErrors for a good understanding of Java memory issues.

However, when working with containers, there are a few special challenges we’re likely to encounter when troubleshooting.

The container often automatically restarts itself when an error occurs. Since not all storage in the container is persistent across reboots, this may result in the loss of critical information that could assist in troubleshooting;
Containers are usually designed to use the least possible resources, so it’s unlikely that troubleshooting tools are installed within the container;
Earlier versions of Java weren’t designed to work with containers. If no sizes are configured for the various memory areas, the JVM defaults to a percentage of available memory in the device. With containers, the available size does not depend on the amount of memory in the device, but on the limits set for the cgroup where it’s running. Prior to Java 10, the JVM didn’t take this into account, resulting in the container frequently running out of memory if the JVM wasn’t carefully configured.
If the container runs out of memory, it’s likely to silently kill the application. The application log won’t show an error, although the kernel log will record the event.
Microservices are often critical, and need to be stable.
Issues may not affect every instance of the container, making it harder to debug. This is because instances may be running in different environments, and may also be contesting for resources with other containers.

Using yCrash to Simplify Diagnostic Processes

Let’s now look at a troubleshooting tool that works really well with microservices in containers.

The basis for fast resolution of production issues is comprehensive diagnostic information. We need to know as much as we can, not only about the application’s performance, but also about its operating environment.

This is where yCrash comes in. It gathers 360° data, covering everything we need to know in order to troubleshoot both the application and its container. The image below shows the artifacts that make up these comprehensive diagnostics.

Fig: Data Gathered by yCrash

Among other things, the data includes:

GC monitoring
Thread analysis
Heap and memory usage
Application logs
Processes
Networking Statistics
Resources
Disk Usage
Kernel log

yCrash can run on almost any platform, including popular containers.

The software can be run in two ways.

1. The yCrash Script

This is a free, open-source script available from yCrash’s GitHub. When triggered, it collects 360° data into a zip file. This can either be uploaded to the yCrash server, or used to provide input to other troubleshooting tools. It can be configured to run automatically whenever an OutOfMemoryError is encountered by using the JVM command line argument -XX:OnOutofMemoryError. It can also be invoked manually whenever the system is experiencing performance problems.

2. The yCrash Server and Agent

This is a powerful, lightweight tool for constant system monitoring and performance analysis. The yCrash agent gathers essential diagnostics and transmits them to the yCrash server, which can either be running in the cloud or on a local machine. If the server detects performance issues or impending crashes, it gathers 360° data and raises an alert.

The diagram below illustrates the server/agent architecture.

Fig: yCrash Server/Agent Architecture

yCrash agent has very little overhead. It gathers micrometrics, which can forecast outages before they happen. This allows us to take action to prevent production problems. Most diagnostics focus on macro-metrics such as CPU usage, memory usage and response times. Unfortunately, by the time these are affected by an issue, the application is already in trouble, and user experience is affected.

yCrash concentrates on micrometrics such as GC throughput, GC pause time, thread patterns and states, and thread-level CPU time. It uses machine learning to recognize patterns, and proactively raises the alarm before performance is degraded or the system crashes. This is ideal for microservices running in containers, because it can make diagnostic information available so we can solve problems while the system is still up and running.

The agent can either run in the container, as a sidecar, or on the host machine. For Docker, see this system documentation. For Kubernetes, the yCrash script can be configured as a pre-stop hook in order to gather diagnostics before the system recycles.

Conclusion

OutOfMemoryError in a microservices architecture need not be a show-stopper. With sensible configuration, and ensuring diagnostic information is available in the event of a crash, we can quickly get the system back up and running.

With tools such as yCrash, we can also monitor the system to prevent outages and performance issues.

Debugging OutOfMemoryError in a Microservices Architecture: Unique Challenges and Container-Specific Solutions

Challenges When Encountering an OutOfMemoryError in a Microservices Architecture

Suggested Solutions

1. Ensure Log Files and Diagnostic Information Are Placed on Persistent Storage

2. Choose the Right JVM Version and Configure it Correctly

3. REST APIs

4. On OutOfMemory Configuration in the JVM

Using yCrash to Simplify Diagnostic Processes

1. The yCrash Script

2. The yCrash Server and Agent

Conclusion

YOU MAY ALSO LIKE

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Challenges When Encountering an OutOfMemoryError in a Microservices Architecture

Suggested Solutions

1. Ensure Log Files and Diagnostic Information Are Placed on Persistent Storage

2. Choose the Right JVM Version and Configure it Correctly

3. REST APIs

4. On OutOfMemory Configuration in the JVM

Using yCrash to Simplify Diagnostic Processes

1. The yCrash Script

2. The yCrash Server and Agent

Conclusion

YOU MAY ALSO LIKE

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from HeapHero – Java & Android Heap Dump Analyzer