Virtual Threads vs. Reactive Streams: Comparing Java Memory Models

Page contents

Recently, I’ve heard some discussion as to whether Java’s innovative Virtual Threads technology results in heap pollution.

Does it?

Firstly, let’s define what’s meant by heap pollution, as the term is sometimes used to describe different scenarios. The most accepted meaning, as used in Oracle documentation, is the situation where a variable given a parameterized type is found to point to an object of a different type. Using this definition, virtual threads definitely don’t cause heap pollution.

An earlier, and broader definition of heap pollution is simply the situation where the heap is cluttered with things that shouldn’t really be there. Let’s ask the question again. Given this definition, do virtual threads cause heap pollution? Is it better to use reactive streams?

To answer this question, we need to examine Java memory models, compare how they’re used by the two different approaches, and be in a position to make informed design decisions when creating high-concurrency applications.

JVM Memory Models

Before we look at how virtual threads use memory, let’s make sure we understand how runtime memory is organized in Java applications.

We can visualize the JVM as shown in the diagram below. For this discussion, the two most relevant memory areas are the heap and the thread space, often known as the stack space.

Fig: The JVM Memory Model With Relevant Areas Highlighted

The two major divisions of JVM memory are heap memory and native memory. They are managed very differently. The JVM is fully responsible for managing the heap, whereas native memory is managed by the operating system. Let’s look briefly at what each memory pool is used for.

1. The Heap

The heap is an area shared by all running threads in the application. All objects created by the program are stored here. It’s cleaned regularly by the garbage collector (GC), which removes any item that can no longer be reached by live references. Most GC algorithms split the heap into the young generation (YG) and old generation (OG) to speed up the cleaning process. Objects are created in YG. If they survive several GC cycles, they are moved to the OG.

2. Native Memory

This has several different memory pools. The pool most relevant to this article is the thread space. This stores the stack for each live platform thread. Originally, JVM threads equated to actual operating system threads (platform threads.) With virtual threads, this is no longer the case, as several threads may be mapped to a single operating system thread.

The stack space is arranged like this:

Fig: Thread Space Containing a Stack For Each Platform Thread

Each platform thread has a program counter holding the address of the next instruction to be executed. When the thread calls a method, a frame is pushed to the stack, containing the method context. This frame includes:

A return address
Primitive local variables
pointers to objects in the heap that it references.

If this method calls another method, the old method’s frame remains on the stack, and a frame for the new method is pushed onto the top of the stack. When a method completes, its frame is popped from the stack.

Each thread is allocated a fixed amount of memory for its stack, which is configurable, but typically defaults to 1MB, depending on the JVM version.

Platform threads may therefore occupy 1MB of non-heap memory. In high-concurrency applications, this can add up to a huge overhead: probably 1GB for every 1000 threads. Virtual threads, on the other hand, use heap space that can shrink and grow as needed to handle the stack.

Other pools held in native memory include:

The Metaspace: stores information about each class that makes up the application;
The Code Cache: stores precompiled code for hot methods;
Direct Buffers: used for fast I/O;
GC: reserved for the garbage collector;
JNI: Used for the Java Native Interface;
Misc: Reserved for JVM internal use.

Related to this memory model, there are 9 different types of OutOfMemory error in Java. Some relate to the heap; others relate to native memory pools. When we’re optimizing memory, the heap is not the only area we need to look at. Native memory, including thread space, is equally important. The following errors described in the linked article are especially relevant to reactive vs virtual thread memory models:

OutOfMemoryError: Java Heap Space
OutOfMemoryError: GC Overhead Limit Exceeded
OutOfMemoryError: Unable to create new native threads

Tip:
The heap is the most likely to impact on the behavior of the garbage collector (GC). Since GC events are CPU-heavy, performance can be badly affected if the GC is struggling. This can happen for many reasons, including:Heap memory is close to its size limit;The program is creating too many short-term small objects, resulting in object churn. This causes the GC to run more frequently.

The heap, however, is the most likely to impact on the behavior of the garbage collector (GC). Since GC events are CPU-heavy, performance can be badly affected if the GC is struggling. This can happen for many reasons, including:

Heap memory is close to its size limit;
The program is creating too many short-term small objects, resulting in object churn. This causes the GC to run more frequently.

When we’re looking at whether or not a memory model is efficient, we therefore need to consider the heap size, the native memory size, and the object churn rate.

How JVM Applications Scaled Before Lightweight Threads

In older versions of the JVM, concurrency was problematic.

The number of threads was limited by the number of platform threads available;
High concurrency came with a considerable memory overhead due to the size of the stack space. This could result in OutOfMemory errors;
Complex code was needed if the threads needed to be synchronized;
Locking had to be carefully managed to avoid long waits and deadlocks.

Java’s Project Loom developed in response to these issues. Prior to this, other tactics evolved to try to overcome concurrency issues.

Thread Pools allow a fixed number of threads to be reallocated as required.

Reactive programming breaks a problem down into a series of asynchronous operations. Callbacks allow the developer to specify a method to be called when an operation completes. In Java 5, the Future class simplified this process, and in Java 8, CompletableFuture improved on this model.

This works with the ForkJoinPool, which is automatically created by the JVM when the first request is created. This pool allocates worker threads as needed when tasks are submitted by the CompletableFuture methods. It works very well for non-blocking tasks, such as in-memory sorting and image resizing. It’s less efficient for tasks that are likely to block waiting for resources, such as I/O operations or calling REST APIs. This is because even though the task is blocked, it still retains the worker thread, and therefore the memory allocated to it in the stack.

Let’s look at a small sample program that uses the CompletableFuture class. It carries out three small calculations in parallel. Once all three have completed, it adds the three results together and prints them.

It includes a wait at the beginning of each task to simulate thread blocking and allow time for a heap dump and a thread dump so we can analyze what’s happening in memory.

Note that this example captures the heap and thread dumps programmatically, rather than using standard dump techniques. This is because we need the dumps to take place at a specific point in the program’s execution, which can’t be achieved with external tools.

			
import java.util.concurrent.CompletableFuture;
public class CFExample {
    public static void main(String[] args) {
         // a) Sum: 2 + 3 + 4
        CompletableFuture<Integer> sumFuture =
            CompletableFuture.supplyAsync(() -> {
                   try {Thread.sleep(2000);} 
                   catch (Exception e) {} // Allow time for dumps
                   return 2 + 3 + 4;
                   });
        // b) Product: 2 * 3 * 4
        CompletableFuture<Integer> productFuture =
            CompletableFuture.supplyAsync(() -> {
                    try {Thread.sleep(2000);} 
                    catch (Exception e) {} // Allow time for dumps
                    return 2 * 3 * 4;
            });
        // c) Power: (2^3)^4)
        CompletableFuture<Integer> powerFuture =
            CompletableFuture.supplyAsync(() -> {
                try {Thread.sleep(2000);} 
                catch (Exception e) {} // Allow time for dumps
                int ans1 = (int) Math.pow(2, 3); 
                return (int) Math.pow(ans1,4); 
            });
// While these tasks are running in parallel, we capture 
// diagnostics so we can analyze JVM memory later
        try {
            // Capture heap and thread dumps
            DiagnosticDumps.captureThreadDump("CFExampleThreads1.txt");
            DiagnosticDumps.dumpHeap("CFExampleHeap1.hprof", true);
            }
        catch (Exception e) {
            System.out.println(e.toString());
            }
        // Combine all results once all the futures are available
        CompletableFuture<Integer> finalResult =
            sumFuture.thenCombine
                  (productFuture, Integer::sum) // (sum + product)
              .thenCombine
                     (powerFuture, (partial, power) -> partial + power);
        // Print result
        System.out.println("Final result: " + finalResult.join());
    }
}

		

The custom class below is used by this program to capture heap and thread dumps automatically. We’ve done this rather than take the dumps manually, so that we can control the timing of the dumps.

			
import com.sun.management.HotSpotDiagnosticMXBean;
import java.lang.management.ManagementFactory;
import java.io.IOException;
import java.io.FileNotFoundException;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;
import com.sun.management.DiagnosticCommandMBean;
import javax.management.ObjectName;
import javax.management.MBeanServer;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
import java.io.PrintWriter;
public class DiagnosticDumps {
// Dump the heap
// =============
    public static void dumpHeap(String filePath, boolean live) 
                throws IOException {
    HotSpotDiagnosticMXBean mxBean = 
        ManagementFactory.getPlatformMXBean(HotSpotDiagnosticMXBean.class);
        // 'live': if true, only includes objects with active references
    mxBean.dumpHeap(filePath, live);
    }
// Dump threads
// ============
public static void captureThreadDump(String filePath) 
         throws FileNotFoundException {
    ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
    // dumpAllThreads(boolean lockedMonitors, boolean lockedSynchronizers)
    ThreadInfo[] threadInfos = threadMXBean.dumpAllThreads(true, true);
    
    try (PrintWriter writer = new PrintWriter(filePath)) {
        for (ThreadInfo info : threadInfos) {
            writer.println(info.toString());
        }
    }
    }
// Extract Native Memory Tracking
// Note: NMT tracking must have been enabled on command the line
// =============================================================
public static void extractNMT(String filePath) {
    try {
        MBeanServer server = ManagementFactory.getPlatformMBeanServer();
        ObjectName name =
            new ObjectName("com.sun.management:type=DiagnosticCommand");
        String[] params = {"summary"};
        String result = (String) server.invoke(
            name,
            "vmNativeMemory",
            new Object[]{ params },
            new String[]{ "[Ljava.lang.String;" }
        );
        Files.writeString(Path.of(filePath), result);
       }
       catch(Exception e) {
         System.out.println("Error extracting NMT");
         System.out.println(e.toString());
         e.printStackTrace();
       }
  }
}

		

We can analyze the diagnostics we extracted during the application to see what’s happening inside the JVM.

First, we’ll look at the thread dump. Because all three threads block, this task requires three worker threads in the ForkJoinPool. The image below is taken from a report produced by the thread diagnostic tool fastThread, which was used to analyze the thread dump.

Fig: ForkJoinPool Worker Threads: CompletableFuture

The number of threads available in the ForkJoinPool defaults to the number of available processors -1, but it can temporarily expand itself to compensate for blocked threads. It’s possible to configure a higher number of threads using the command-line argument

-Djava.util.concurrent.ForkJoinPool.common.parallelism=N

This was run on a machine with 2 processors, but the pool has expanded to three worker threads to compensate since the threads are waiting.

Let’s now see what’s been created in the heap. The image below was taken from a HeapHero analysis of the heap dump.

Fig: CompletableFuture Objects Created in the Heap

We see several small, short-lived objects related to CompletableFuture.

If we look further, we’ll also see that the heap contains entries required for the ForkJoinPool.

Fig: ForkJoinPool Objects in the Heap

Note that there is an object of class ForkJoinWorkerThread$InnocuousForkJoinWorkerThread for each thread in the ForkJoinPool. The overhead for this is minimal, since we only have three threads. As we’ll see later, it can become significant if there are a large number of threads.

The Evolution of Java Concurrency: Project Loom

The purpose of the Java Loom project is to reimagine the whole ecosystem of Java concurrency. Here are some of the objectives:

Simplify the code;
Improve scalability;
Reduce overheads;
Improve reliability.

As of Java 26, three modules have been released:

Virtual Threads provide a lightweight option for heavily multi-threaded applications.;
Structured Concurrency greatly reduces the complexity of concurrent operations by allowing us to group tasks together into a unit of work;
Scoped Values provide a more reliable option to ThreadLocals.

Structured concurrency still has experimental status in Java 26, but the other two modules are now stable.

High-Scalability Java Applications with Virtual Threads

Virtual threads allow several Java threads to be mapped to a single platform thread. They also make use of the ForkJoinPool, but unlike CompletableFutures, the Java thread is swapped out if it blocks, and another Java thread can take its place.

For each virtual thread, a VirtualThread object is created in the heap. This contains a Continuation object, which in turn holds a StackChunk object. The thread’s stack is held here. Unless the thread has deeply nested methods, this is generally very small, and the JVM can resize it as needed. A portion of this stack is loaded into the carrier thread’s stack whenever the thread is actively running.

This is why many developers question whether virtual threads put too much pressure on the heap.

We can create a virtual thread in several ways. One of these is to use the Executors.newVirtualThreadPerTaskExecutor() to create an ExecutorService. Any task submitted to it is run in a virtual thread. Another way is to use a new method included in the Thread class: startVirtualThread(Runnable task).

Let’s rewrite the sample program CFExample using virtual threads instead of CompletableFuture.

			
import java.math.BigInteger;
import java.util.concurrent.*;
public class VTExample {
    public static void main(String[] args) throws Exception {
        try (ExecutorService executor = 
             Executors.newVirtualThreadPerTaskExecutor()) {
            // a) Sum: 2 + 3 + 4
            Future<Integer> sumFuture = executor.submit(() -> {
               try {Thread.sleep(30000);} 
               catch (Exception e) {} // Allow time for dumps
               return 2 + 3 + 4;
               });
            // b) Product: 2 * 3 * 4
            Future<Integer> productFuture = executor.submit(() -> {
               try {Thread.sleep(30000);} 
               catch (Exception e) {} // Allow time for dumps
               return  2 * 3 * 4;
               });
            // c) Power: (2^3)^4 
            Future<Integer> powerFuture = executor.submit(() -> {
                try {Thread.sleep(30000);} 
                catch (Exception e) {} // Allow time for dumps
                int ans1 = (int) Math.pow(2,3);
                return (int) Math.pow(ans1,4);
            });
// Take dumps
            try {
            DiagnosticDumps.captureThreadDump("VTExampleThreads.txt");
            DiagnosticDumps.dumpHeap("VTExampleHeap.hprof", true);
            }
        catch (Exception e) {
            System.out.println(e.toString());
            }
            // Wait for results 
            int sum = sumFuture.get();
            int product = productFuture.get();
            int power = powerFuture.get();
            // Combine results
            int finalResult =
                sum + product+power;
            System.out.println("Final result: " + finalResult);
        }
    }
}

		

This carries out exactly the same tasks as the previous program, but submits them to an executor service, which creates a virtual thread for each one.

Let’s look at the thread dump to see what’s happening inside the JVM.

Fig: ForkJoinPool has a Single Worker Thread

Here, the ForkJoinPool needs only a single worker thread, because it can be reused as soon as a thread blocks.

Let’s have a look at the virtual thread objects in the heap:

Fig: Virtual Thread Objects in the Heap

As we can see, the amount of heap space used is very small. Of course, for an application with deeply-nested methods, they would be larger.

Additionally, the heap contains the ForkJoinPool, just as it did for the CompletableFuture example. However, for this program, there is only one worker thread, so some heap space is saved.

Memory Efficiency in Java: Virtual Threads vs CompletableFuture

To make a meaningful comparison, we scaled up the previous example programs by creating the threads in a for loop that runs 1000 times.

We used the following JVM arguments to request native memory tracking and to set parallelism to 1000.

			
java -XX:NativeMemoryTracking=detail -Djava.util.concurrent.ForkJoinPool.common.parallelism=1000

Additionally, we used the DiagnosticDumps.extractNMT(<filename>) method to extract native memory tracking data to a file.

This gives us a text file with a breakdown of how much JVM memory is used in total, and how it’s broken down. We can then easily compare the memory models of the two solutions to see which is more efficient.

We can read this file manually, but it’s easier to visualize if we load the files into the GCeasy tool. This displays the information graphically.

Here’s the analysis of memory used by the CompletableFuture version.

Fig: Native Memory Usage for the CompletableFuture Version

Note that “Reserved” is the amount of memory allocated to the application, whereas “Comitted” is the amount of memory actually used.

Let’s compare this to the VirtualThreads version.

Fig: Native Memory Usage for the Virtual Threads Version

To summarize this, let’s look at the total memory, heap size, and thread space size for the two memory models.

Memory Usage Comparison
	CompletableFuture	Virtual Threads	Difference
Total Memory MB	186.72	172.69	14.03
Heap MB	66.00	63.00	3.00
Thread Space MB	17.74	0.41	17.33

Interestingly, the heap size is actually slightly lower in the Virtual Thread example. Most of the heap space difference is taken up by objects of class ForkJoinWorkerThread$InnocuousForkJoinWorkerThread. The ForkJoinPool holds a separate instance of this class for each worker thread. The CompletableFuture option has the full 1000 threads, whereas the Virtual Thread option is able to reduce this number by sharing platform threads. This is why the heap is smaller when using Virtual Threads.

A few points to note:

These examples use blocking threads. The difference in performance is likely to be less marked if CompletableFuture is running a task that doesn’t block, for example, in-memory sorting.
The examples are very simple, without nested methods in the threads. The difference would be more marked with heavily-nested methods.
This difference would be highly significant in very large servers with hundreds of thousands of threads.

We can conclude from this that Virtual Threads do not cause heap pollution in any sense of the term. They are, in fact, likely to result in small savings in the heap.

CompletableFuture is still a good solution for non-blocking threads. However, it’s worth looking into the new Structured Concurrency model, which is simpler and easier to understand and maintain.

In short: Platform threads in high-concurrency applications are susceptible to OutOfMemoryError: Unable to create new native threads either by exceeding JVM limits or over-utilizing the stack space. Virtual threads may add some overhead to the heap, by moving the stack into the heap space. The concern is that this could instead cause OutOfMemoryError: Java Heap Space. This is certainly something we should monitor and configure for, but since virtual threads store the stack very efficiently, it’s not likely to cause a problem.

Conclusion

Each of these techniques has use cases where it shows to advantage. For blocking threads, Virtual Threads result in considerable memory savings, performance gains, and scalability. Reactive programming can be a good solution for non-blocking tasks, since the ForkJoinPool manages these efficiently.

Comparing the memory models of the two solutions shows that, far from creating heap pollution, Virtual Threads result in memory saving both in the heap and the stack space.

Heap Pollution: Comparing Memory Models of Reactive Streams vs. Virtual Threads

JVM Memory Models

1. The Heap

2. Native Memory

How JVM Applications Scaled Before Lightweight Threads

The Evolution of Java Concurrency: Project Loom

High-Scalability Java Applications with Virtual Threads

Memory Efficiency in Java: Virtual Threads vs CompletableFuture

Conclusion

YOU MAY ALSO LIKE

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

JVM Memory Models

1. The Heap

2. Native Memory

How JVM Applications Scaled Before Lightweight Threads

The Evolution of Java Concurrency: Project Loom

High-Scalability Java Applications with Virtual Threads

Memory Efficiency in Java: Virtual Threads vs CompletableFuture

Conclusion

YOU MAY ALSO LIKE

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from HeapHero – Java & Android Heap Dump Analyzer