Java Serverless Memory Optimization: Improving SnapStart & GraalVM Performance

Page contents

Serverless computing can be a huge cost-saver for some applications – but only if it’s optimized correctly.

Huge strides have been made recently in solving various teething problems that sometimes made this technology unreliable, and it’s likely to become more and more popular as time goes on.

With serverless applications, costs are directly tied to usage, so it’s highly important to use as few resources as possible. In services such as Java SnapStart, performance is highly dependent on efficient heap management, making Java serverless memory optimization a critical factor. This article looks at improving the use of memory when writing serverless applications, and for foundational techniques, see our guide on Best Practices for Writing Memory-Efficient Java Code.

What Are Serverless Applications?

Serverless applications are an alternative to renting and managing a virtual machine in the cloud. Instead, the host offers the ability to run a task within their own managed environment. The client pays only for the resources used, rather than hiring a fixed set of resources.

Commonly, the host offers Function as a Service (Faas). Tasks run in response to a trigger, and should be short in duration: there is usually a limit of 10 to 15 minutes. Examples of popular services include AWS Lambda, Azure Functions and Google Cloud Functions. Triggers could include:

A file or image being uploaded to a server;
An HTTP request, possibly initiated by a click on a web page;
A database update;
A CRON job.

Advantages and Disadvantages of Serverless Computing

The table below shows the advantages and drawbacks of serverless computing.

Advantages	Disadvantages
Lower costs if optimized correctly: you pay for what you use	Cold start latency: see below.
No need to use in-house skills to manage infrastructure	Vendor lock-in. The customer is heavily reliant on the diagnostic tools and infrastructure provided by the vendor.
Fast and simple deployment	Execution time limits;
Simple, or even automatic, scaling	No access to standard troubleshooting tools within the run environment
	States are not reliably carried over between invocations;
	Large packages may not work, or may raise costs unacceptably

Looking at the disadvantages, cold start latency is one of the major problems with serverless applications.

It can take 3 to 10 seconds to spin up a Java task if it’s not currently active. For frequently-used tasks, this isn’t a problem, since tasks that have been used recently don’t need to spin up. For intermittent tasks, cold start latency can add an unacceptable overhead.

What Applications Work Well With Serverless Computing?

Not every application works well with Faas.

Here are some examples of applications that work well with this technology:

Processing a file uploaded to a server.
Resizing an Image on Request.
Lightweight Backend Processes, for example, dealing with form submissions.
Short scheduled tasks, such as periodic cleanups.

Applications that shouldn’t be used with Faas include:

Long-running batch jobs such as data warehousing.
Stateful Applications, where state must be maintained between invocations. Chat rooms, for example, won’t work well.
Applications needing predictable high performance, for example forex trading platforms.
Large, complex applications with many dependencies.

In summary, serverless computing is suitable for event-driven, short-lived applications with variable loads. It doesn’t work well for applications that are long running, need a persistent state or critically rely on consistent low latency.

Cold Start Latency Solutions: Java SnapStart and GraalVM

Several solutions have evolved to reduce the problem of cold start latency. SnapStart and GraalVM are two excellent examples. Let’s look at how they work.

Java SnapStart is an option of AWS Lambda, specifically designed for applications that are prone to cold starts. The first time the function is called, the system carries out the initial startup phase, which includes running any block of code defined outside methods and prefixed with the keyword static. It then saves a snapshot of this state, and, in the event of a subsequent cold start, loads the function from the snapshot. This eliminates the need for the JVM to be loaded and static initialization carried out on each cold start. This results in a huge improvement in latency.

When coding for Java SnapStart, always be aware that the function could be:

An initial cold start call, which happens on the first call after the system has been booted up, or when additional instances are created to deal with increasing loads;
A cold start loaded from a snapshot;
A warm start, in which case variables may still be set from the previous call to the function.

Therefore, you should always leave the function in a consistent state ready for the next call.

GraalVM takes a different approach. With GraalVM, you can compile the function to machine code, as opposed to byte code. This greatly reduces memory footprint and execution time. It also reduces cold start latency down to a few milliseconds. The only downside is that some Java libraries may not work well with it. Again, be aware that when the function is called, it could be a cold start, in its initial state, or a warm start, with the possibility of variables retaining values from the previous call.

How Serverless Charges are Calculated

When you configure a serverless function, you specify the memory requirements only. The host will then allocate CPU time in proportion to the memory requested.

Charges are calculated according to:

Number of requests for the service;
Execution time;
Memory allocation.

In some cases, you may also be charged for additional resources, such as network, storage and API calls.

We can express the charges as:

Total cost ≈ Requests + (Execution time × Memory) + Extra services

It’s best to benchmark the function over a fixed number of requests, trying out different memory settings to find the optimum. Increasing the memory may sometimes reduce the costs by reducing the execution time, but this is very difficult to estimate in advance.

Optimizing the Heap for Serverless Performance

As we’ve seen, the memory size has a direct impact on cost. This is where Java serverless memory optimization plays a key role, as efficient heap usage directly influences both execution time and cost per invocation. Of course, this doesn’t only include heap memory: the stack space, metaspace and other native areas also need some memory. On average, you should allow for the heap making up roughly 70% of your total memory requirements.

Will reducing the heap size automatically reduce costs? Usually, but not necessarily. CPU-intensive applications, such as resizing very large images, could actually cost more by reducing the memory size. This is because CPU time allocated by the host is directly proportional to the memory size configured, so the function may take longer to complete, and therefore cost more.

All the same, saving heap space is a high priority. Wasted memory certainly won’t help your budget. Not only will you pay more for each invocation of the function, but if the garbage collector (GC) is struggling to free enough memory, it becomes CPU-intensive, and your function will take longer to complete. We also need to make sure the application never crashes with an OutOfMemoryError, as this could result in requests that are never fully actioned, making the system highly unstable.

It’s important not to pack too much into the same function. A function should perform a single task, thereby keeping its memory footprint small. Different tasks should have their own functions, so that no unnecessary coding or memory space is included.

There are three areas we need to look at:

Eliminating memory wastage;
Ensuring the GC is not struggling to free memory;
Eliminating Memory leaks.

1. Memory Wastage

Most Java applications, if analyzed for wasted memory, produce surprising results. For example, the Spring Boot sample Pet Clinic application was analyzed, and 65% of its heap usage was found to be pure waste.

Here are a few areas where careless coding often wastes huge amounts of memory:

Loading large amounts of data into memory at one time, instead of using streams or database cursors to page through the data as needed. Some applications also waste memory by reading individual rows from a database, instead of summarizing the information in the query.
Duplicate Strings. See this article: String Deduplication in Java.
Collections with no initial size specified. If left to grow according to defaults, the result is frequent resizing, which is not only CPU-heavy, but can cause the collection to grow exponentially. Pick a sensible initial size for the collection, and specify this in the constructor.
Creating separate caches for each thread.
Unnecessary object headers. Wherever possible, use Java primitives such as byte, which occupies one byte of memory rather than the wrapper class Byte, which occupies 13 bytes including the object header. For the same reason, rather create a single object for a group of data, rather than a separate object for each part.
Using complex data structures where simpler, less memory-heavy, structures are adequate.

In addition to this, when coding for serverless functions, use simple frameworks such as Micronaut or Quarkus, instead of complex ones like Spring boot.

For more information, see Coding for Memory Efficiency.

2. Ensure the GC is Working Efficiently

The GC is very CPU-hungry, and we need to make sure it runs as seldom as possible. To ensure this is the case:

Reduce object churn. This occurs when a lot of variables are created and released very quickly. The classic example of this is string concatenation within a loop, e.g. namelist=namelist+currentName; Since strings are immutable, a new object is created every time this instruction is executed. Instead, use the StringBuilder class, which can append a string without creating a new object. Whenever possible, reuse objects instead of creating new ones. Also beware of using unbounded collections, as discussed in Point 1 above, within loops. This can cause excessive resizing, causing the underlying array to be frequently recreated and discarded.
Make sure the heap is sized correctly. If heap usage is too close to the size limit, the GC begins to churn.
Select the best GC algorithm for the task. For small memory sizes on serverless, choose the serial GC, since only one CPU will be allocated, and serial is the most efficient in these circumstances. For larger memory sizes, choose G1GC, which is the best all-rounder for this type of application.
When configuring the heap size, set -Xms (initial heap size) to the same value as -Xmx (maximum heap size). This stops the GC from attempting to keep the heap at the smaller limit, and also saves resources used by heap resizing.

3. Eliminate memory leaks.

Memory leaks occur when objects that are no longer needed build up in memory. For more information about memory leaks, I recommend this article: Memory Leaks in Java. Causes of memory leaks include:

Unbound caches, in other words, caches that have no growth limits;
Caches with no eviction policy;
Objects declared in the wrong scope, so they are retained longer than needed;
Failing to release objects or close resources when they’re no longer needed;
Relying on finalizers to close resources. The finalize() method has been deprecated since Java 9, since there is no guarantee when or if it will be run. This is especially true in serverless applications, where the function may be re-used many times.

When coding for serverless, always make sure any memory or other resource is released when the function completes, otherwise if the function is warm-started, previous objects will still exist.

Configuring and Monitoring Serverless Applications

Serverless tasks are notoriously awkward to troubleshoot. As well as configuring for top performance, it’s important to configure settings that will make debugging easier if problems arise.

Here are some points to consider:

If you’re using a fully compiled machine binary under GraalVM, you should look into enabling monitoring, for example, use the following argument: –enable-monitoring=heapdump when compiling, or it will be impossible to take an easily-analyzed heap dump if the task crashes.
Check your cloud provider’s documentation to find out how to set JVM arguments. AWS, for example, expects JVM arguments to be set in the environment variable JAVA_TOOL_OPTIONS. Arguments should include:
- –Xms and -Xmx to configure the initial and maximum heap size. These should be set to the same value.
- -Xlog:gc*:stdout:time,level,tags to enable GC logging. This is an invaluable artifact for analyzing memory usage, GC performance and key performance indicators over time. Logging to stdout writes the logs to the cloud log. Including the tags option makes it simple to filter out GC logs from other messages. The cloud provider may have a GC log analysis tool, or the filtered log can be submitted to a tool such as GCeasy.
- Choose either serial GC for small memories, or G1GC for bigger ones. The argument will be either -XX:+UseSerialGC or -XX:+UseG1GC.
- Enable heap dumps on OutOfMemoryError, and set the path for the dump: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/<task name>.hprof. The dump can only be sent to the /tmp directory, which is lost when the task completes. To retrieve the dump, you will need to write a script that copies it to permanent storage, also activated on OutOfMemoryError: -XX:OnOutOfMemoryError=”sh <script name>.sh”. The dump can then be analyzed by a tool such as HeapHero.
When configuring the task on your service provider, set enough memory for the heap + native memory. (Remember, the heap usually occupies about 70% of the total memory). It’s also a good idea to set an alert if memory usage exceeds a given percentage of the configured heap: perhaps 80%.

It’s important to monitor the application regularly to proactively deal with problems before they cause performance issues or crashes. Monitoring is usually done using the tools provided by the cloud host. Monitor:

Peak memory usage;
Cold start frequency;
Execution duration;
Cost per invocation.

At the beginning, you may need to monitor very closely and adjust settings to achieve good performance. Test different memory size settings to see which gives the best combination of execution speed and cost per invocation. This is because it’s very difficult to calculate the optimum beforehand.

Conclusion

Serverless computing can be a real cost saver for the right applications.

Since memory size directly impacts costs, it’s important to implement good memory management techniques. Eliminate wastage, ensure there are no memory leaks and configure adequate heap space for the task. An under-configured heap makes the GC work harder, and increases the execution duration, affecting costs.Additionally, understanding how things work and when they are executed helps you to get the best GraalVM and Java SnapStart performance through effective Java serverless memory optimization.

Optimizing Heap for Java on Serverless (SnapStart/GraalVM)

What Are Serverless Applications?

Advantages and Disadvantages of Serverless Computing

What Applications Work Well With Serverless Computing?

Cold Start Latency Solutions: Java SnapStart and GraalVM

How Serverless Charges are Calculated