Some time ago I published a new chapter for my book on performance testing in a blog post. You can find the first edition here at Amazon. This is a copy of that blog post. One day I may actually publish a next edition of my book…
If you know more of this subject than I do, let me know if I made mistakes. If you don’t, let me know if this helps you understand the way Java handles memory.
Some of Java’s features make performance testing and especially performance monitoring and analysis different. If you only look at the usual monitoring items you can easily be deceived. This is mainly caused by the way Java handles memory. Java has a very interesting model for handling memory which makes programming in Java easier and helps against memory leaks. It does however come with some caveats which you need to be aware of.
Java’s way of handling memory
When you program software in a programming language you end up dealing with memory. You will put items or objects in memory. An object which has been placed in memory at one point has to leave that memory. Essentially when no piece of running code is referencing that memory it has to go.
In languages such C and C++, and most of its brethren, that means that the programmer needs to ensure that the memory is cleared in the program. If no mistakes are made this works well and efficient since an object is immediately released when it is no longer referenced. It does however make programming more complex and all too often mistakes are made resulting in the dreaded memory leak.
Java has a different way of handling memory. As a programmer you don’t have to release the memory anymore. The Java virtual machine which runs your code will from time to time, usually when a threshold is reached, scan the memory and remove objects that are no longer referenced. This is called garbage collection. The big advantages: as a programmer you are relieved of taking care of this and we have less memory leaks.
There are also downsides. It is a little less efficient since it holds objects in memory longer than required, but more importantly: the garbage collection itself takes time and CPU cycles. This is aggravated by the fact that some parts of the garbage collection is what’s called a stop the world (STW) event. This means that while the garbage collection is running the software will freeze and not respond to anything.
Java divides its memory in a section called the heap and a part outside of the heap usually called perm space. Most tuning usually takes place in the heap section. This is the largest section.
The perm area is filled with metadata, class information and methods. If it is not large enough to contain all the data needed it may lead to nasty errors, the dreaded Java.lang.OutOfMemoryError: permgen space, but is less prone to issues coming from a high load. As long as if it is large enough, you will usually don’t have to deal with it.
(Note that the Java.lang.OutOfMemoryError can be for other memory issues. If you see that error, like Java.lang.OutOfMemoryError: java heap space, always look at what it says after this message. )
The heap is divided in what’s called young and old memory. When an object is first created it is placed in the Eden memory. As soon as the Eden memory is full a garbage collection is started, a garbage collection which only looks at the young memory. It will look at all the objects in Eden which are no longer referenced and will delete them. Any object that is still referenced will be placed in one of the survivor spaces (S0 or S1). It will do the same for the survivor space.
The survivor space is a bit peculiar. At each young collection it will switch. So if it was using S0 before a collection, afterwards it will use S1. If it finds objects in S0 that are still referenced it will either promote them to the old memory or copy it to S1. At the end of the young collection S0 will be empty. The next young collection this will be the other way around.
The decision to copy an object to the other survivor space or promote it to old depends mainly on how often it has already survived. This is a setting you can configure. If you want objects to be promoted after surviving just a few collections, you can set the tenuring threshold to a low value.
The idea behind this is that some objects will end up being referenced for a long time whilst others are very volatile and will disappear at one of the young collections. The young collector doesn’t have to scan the objects that apparently are to be long lived.
Also the young collector is deliberately fast but not thorough. It may consider objects still referenced that are not. It is a bit sloppy. If the young collector was made more thorough it would take much longer to run. And contrary to what many believe, the young collection is a stop the world collection. So it needs to be so fast that you can’t really notice.
As a result we get a slowly growing old generation. This is also cleaned by a garbage collection after it he used memory reaches a certain threshold. Such full collections have much more impact than a young collection. They take place a lot less frequently, but the time that they freeze the application is longer. Even for a well tuned system it may take seconds.
The collection algorithm however is thorough and will find all objects that can be deleted.
The life of an object
To understand this a bit better we will look at an example of 3 objects that get created at the same time by a Java process which is acting as a web server.
The first is not related to any request by clients directly but for instance will keep track of a backend system. The other two are related to requests by a client.
Just after these objects are created a young collection takes place. As the server is still busy handling the requests of the clients all three objects are still referenced and survive. They end up in S1. 10 seconds later another young garbage collection takes place and after EDEN is cleared, the collector also checks S1. The three objects are evaluated. The object used for tracking the back end is still referenced. It is copied to S0. One of the other two objects is no longer referenced and is deleted. The other object which was created for a client request is still referenced due to a backend system which for some reason is stuck and still hasn’t answered. So it too is placed in S0.
At the next young collection both objects get evaluated again when S0 is cleaned. If they are still referenced and there is enough space in S1 and the MaxTenuringThreshold is >1 they will be copied to S1.
This will go on until they reach either the value of the TenuringThreshold or if the survivor space to which they would otherwise be copied runs out of space. Then they get promoted (copied) to the old generation.
Now in our example we assume the backend really was unresponsive and both objects end up being promoted to the old generation. Both will stay there until a Global Garbage collection takes place. Let’s assume that is 10 minutes after both were promoted. At that point the object that was used for the client request would be no longer referenced. Either the client timed out or the process towards the backend itself. So that object would be cleaned from the old generation. The other object however is still referenced so it survives even the global collection and stays in old memory.
Different garbage collectors
So there is a big difference between the young and old garbage collection. The young collection is loose, but generally very fast and only looks at EDEN and both survivor spaces. Global Garbage Collectors are thorough and will clean both old and young memory.
Global Garbage Collectors are written in plural since there are various options. The right one for you depends largely on whether your application should be optimized for throughput or responsiveness.
Remember how garbage collectors have activities which are called stop the world (STW). During a STW event, the application does nothing else than garbage collection and doesn’t respond to anything. This is often called pause time. And the pause times can be measured.
If your application is a for instance a batch process you care most about how long it takes for the batch process to finish. If during that process it is unresponsive for a while due to a global garbage collection with a long STW event, you don’t care as long as the net result is that the total processing time is less.
If your application however is a web server you care more about it being responsive. You’d rather have smaller pauses, even if when you add them all up the total pause time may be a bit larger.
Different garbage collectors exist. Let’s start with the basics. To select what garbage collector your application will use, an argument in the startup of the application is used. Many applications have configuration files where you can add something Java_opts. If none is selected a default collector is used. As an example, the argument -XX:+UseConcMarkSweepGC will tell the Java Virtual Machine (JVM) to use the CMS collector.
If you want to know with what options the JVM started your application in Unix the best way is to check is to use the command: ps aux | grep Java in a terminal session. This should show you the running Java processes with the options given to it at startup
To select the right one you should remember that some parts of the garbage collection are stop the world. The speed of the collection itself is mostly dependant on CPU power.
The serial collector (-XX:+UseSerialGC)
Java is around since the nineties when having single cpus with one core where the standard. However these days multicore and multi CPU (real or virtual) are practically the norm for servers.
The simplest garbage collector around is the serial collector. It does not utilize multicores or CPUs. It will run on only one. The other core(s) are not used during a garbage collection. Therefore this one is usually not the best choice one unless you have a special use case.
The parallel collector (XX:+UseParallelGC -XX:+UseParallelOldGC)
Obviously there is also a parallel collector. This collector does use use all processors. You can make it use parallel collecting for the young collections as well as for major collections for the old memory.
To use it for young collections use: -XX:+UseParallelGC or -XX:+UseParNewGC
To enable the parallel collector for old memory select -XX:+UseParallelOldGC
The parallel collector is fast. If you allow it to use all the CPU power of your system it will stop the running Java program during the entire collection, but it will be done quickly. If your application is the only application running on the system you really care about, which is common for server applications, you would allow it to use all cores.
THE CONCURRENT MARK SWEEP COLLECTOR (CMS) (-XX:+UseConcMarkSweepGC)
The parallel collector is fast. And if you don’t care about the application being unresponsive during such a collection, that is fine. However Java is often used for server applications that need to be responsive such as web applications.
The CMS collector was designed to give you the best of both worlds. Without getting into too many technical details, it is able to make sure only a few phases are STW. The downside is that the entire collection process will take a bit longer.
To enable the CMS collector set -XX:+UseConcMarkSweepGC. The CMS has several options to optimize it. If you really want to optimize this collector, many tests with proper monitoring are required.
THE G1 COLLECTOR -XX:+UseG1GC
The G1 collector is relatively new. It was designed especially for large heaps and to be responsive. Although it has production status for quite some time now, some still consider it a little less stable or robust than the CMS collector.
It has some advantages though:
- It is the only collector allowed to use more than half of the heap for young memory. Which is a big plus for large heaps as often most of the memory is used for volatile short lived objects that should not get promoted to old memory. For web applications, more sessions usually mean more short lived objects, whereas the objects that survive tend to even out regardless of how sessions you have. With large heaps this turns into waste
- It doesn’t require tuning. It will optimize itself. It may take a few collections before it is at optimal performance, but this is in my experience done very fast. You simply set the target for the maximum pause time you desire (default is 200ms)
To use this collector you should make sure you are on a recent patch level of JDK 7 or using JDK 8. But you should do this anyway for security reasons. Also make sure you have enough CPU power. It needs power for the large heap.
Monitoring the garbage collection
To optimize the garbage collection some thumb rules can be used. But you will have to monitor the collection to know what is happening, if the GC is performing and if your settings have actually achieved its goal.
There are several ways to do this and many tools. I am not going to mention them all. Fundamentally there are two ways of monitoring: live during the running of the program as the collections are running (using tools such as jstat and or jconsole). The other is by having the JVM log each collection into a file. Both have their advantages and disadvantages.
- jstat: a console tool that will show you a lot of information from a running Java process. If you have the JDK installed you can simply make it connect to the PID (process ID) and it will show lots of information. It will take a sample at a preset interval. It will tell you the state of the memory at that moment. If you have the JDK installed it will give you quick access to some information
- jconsole: a standard GUI based application to monitor a Java process. It will collect information and similar as jstat but will display it in graphs. Very useful to follow what is going on with a running Java process.
- JVM logging: similar as setting the GC options at startup you can tell the JVM to create a log file in which the JVM will log each collection. It has many options to increase the log level and information. Although you can of course ‘tail’ the log file to follow the Java process, it is meant more for analysis afterwards. If you have an operational intelligence platform in place you can of course make sure this log file is indexed in real time by such a platform and create graphs and dashboards. (-XX:+PrintGCDetails -Xloggc:<file>)
My personal favorite is using the JVM logging at all times. It uses very little resources and gives you a lot of information. The JVM only has log what it does after each collection. Which means you miss no collection and it can you many details. The raw data can be very useful if you are troubleshooting. There is however an open source program that will read the file and turn it into graphs and summary information. You can download it here: https://github.com/chewiebug/GCViewer/wiki (be sure to use Java 8 to run GCviewer).
Make sure you use at least these options:
-XX:+PrintGCDetails will give you detailed information of the heap before and after a collection
-Xloggc:<file> tells the JVM to log to a file (replace <file> by the location and filename of your choice, i.e. /var/log/jvmlogs/gc.log
-XX:+PrintGCDateStamps makes sure that each log line is started with a date and time stamp. The timestamp is on the millisecond. This is essential for analysis.
-XX:+PrintGCCause advised to use. It is not required for GCviewer, but if you’re troubleshooting you can find out why the JVM is collecting.
GCviewer is distributed as a jar file (Java Archive). To start it simply type in a console: Java -jar <filename>. You can open a single log as created by the JVM. You can also load a tar.gz file containing multiple files. The screenshot above shows a standard screen and chart. Note tabs in the top. Switching from chart will give you a good summary:
For the performance tester
So as a performance tester you will need to realize:
- You can’t limit yourself to monitoring the memory on the OS. The JVM will never use more than the max heap. The OS may have more than enough memory free and available whilst the Java process doesn’t have enough
- The Garbage collections, especially the stop the world phases, can really affect the performance. You will have to monitor and make sure that you have seen major collections and now how long the pause times are.
- Heap usage will always grow. That does not mean there is a memory leak. It will grow at various rates, but it will grow until it has a major collection.
- There are situations in when the JVM just can’t handle the memory demands and even a normal major collection is not capable of reclaiming enough memory. The JVM in such events will fall back on the serial collector leading to very long stop the world collections.
- Be careful with requirements. Requirements are hard to define beforehand. You must be able to report on pause times, but also on how often they occur. Sporadic longer collections may be better than more frequent but shorter collections. You don’t want to end up providing worse performance that just happens to be within the requirements or has a higher risk of issues in case of other components temporarily being slower.
When load testing a Java application first make sure if throughput or low latency is the goal. Usually these days that will be low latency as you will usually be testing applications with many concurrent users. Throughput is in general for batch processes. I.e. the nightly batch of a bank as it processes transfers and the bank wants to know how many it can handle in the time window allocated.
In both situations you should have garbage collection monitoring in place.
For low latency applications one of the most important metric is the maximum pause time and the frequency of garbage collections. It’s hard to put up thumb rules for requirements. If the application for instance has a young collection every minute and it takes 0.8 ms you will have an application which has no noticeable pause times. If it collects 2 per second 0.8ms can be way too much. As we know, getting requirements beforehand is a challenge on its own. So make sure you that you can explain the actual behaviour after testing and report on the actual performance.
So make sure you know how it behaves. Explore different settings to know what you can do to improve the different aspects and discuss with the stakeholders possible improvements.
As a performance tester it is very rewarding to be able to actually improve the situation. With Java there are many tuning options. Which means you can actually improve the performance without having to send it back to the development team. Naturally it all depends on the application, the performance goals and the available hardware resources but there are some rules that apply in general:
- Bigger isn’t always better. Having a large heap means the JVM has more than enough space. However if the garbage collections take place they will take longer. Often it is okay to have frequent collections, knowing that the actual collections take less time
- Bigger can be safer. If you have optimized your JVM for performance it may show great performance until there is an issue somewhere down the line. When the Java process communicates with a different service or a database server and experiences a slow down it needs to wait. That will occupy memory. Especially when your young memory is tight, this may lead to nasty effects with objects being promoted to old memory to quickly and even with not enough space to hold all valid objects. Make sure you add some extra breathing space for anomalies
- Fixate the heap size. This is advice you will often see and it makes sense, especially if you have a limited set of Java processes and the server it runs on is there for this purpose. Java by default does not expect to be the only puppy in the petshop and will not allocate more memory than it needs. Therefore it has a mechanism to grow the heap size if the usage makes it necessary. To grow the heap however, it needs to perform a full collection to do so. So just as your application is receiving stress, it ends up doing a full collection to cope with the higher demand for memory space. On servers, your application is often the only puppy in the petshop. So if you set the initial size equal to the max size, it never needs to grow and it never even tries it.
- Fixate the perm space. With Java 8 perm space was replaced with metaspace. But with older versions perm space still exists. Perm space is usually a lot smaller than the heap. Common practice is to limit the space. But similar to the heap, if you don’t set the initial value it will need to grow, for which it first has to perform a full collection. It usually starts with a small space and grows with small chunks. It is not uncommon to see some applications showing several full collections at startup due to the need to grow the space. So similar to the heap, set the initial value similar to the max. Best is to measure during a performance test how much was used as a maximum and use that figure with some extra space to determine the size. Either let the perm space grow during the test or set a high value. I.e. if you’ve measured that the perm space never used more than 150mb, assign a max and initial value of 256mb. If you don’t add the extra space you may end up with out of memory messages if it at some point it does need more space.
- Watch out for system.gc() collections. Aside from the JVM controlling the garbage collections, you can provoke a full collection either by code in the application (that calls the system.gc() function) or via the commandline. This can be useful, but often it is not. You can simply check the logs for gccause system.gc(). You can tell the JVM to ignore these calls. Try to find out why the calls are made. Sometimes they make sense when the developer makes sure that the full collection is triggered at convenient times. Often they just provoke unnecessary collections.
- Watch the survivor space. If at a young collection the JVM can’t keep in survivor what it is supposed to, it will promote it to old. And that may be much too quick. There is a way to tell if you have gc logging enabled. Look for the desired survivor threshold and check the values new threshold and max threshold. If the new threshold consistently stays on 1, whilst the max threshold is higher, objects get promoted to old memory despite not having survived to the max threshold. This is done when there just isn’t enough space
- Set the young memory. Except if you use the G1 collector the JVM will never allow you to have a young memory size larger than the old memory. If you don’t set the young memory size, the JVM will set size of the young memory and change it if decides that this is required. This adaptation however can only be done after a full gc. On top of the undesired full GC, with larger heaps as are more common these days, the ratio old versus young is changing. You simply don’t need so much extra old memory whereas extra young memory will prevent short lived objects to make it to old memory and therefore prevent full GCs. Be careful though with assigning half to young and half to old memory. In some cases the JVM just isn’t capable of cleaning enough young memory which will result in it overflowing a lot of objects into old memory. For that it needs space or it will invoke a full GC at exactly the wrong moment.
In the past we had to deal with some issues that were a surprise. Some are described here:
- Don’t forget the virtualisation platform. We were dealing with a large cluster that in some cases just started behaving weird. Longer GC pause times, unable really clean. We couldn’t understand why. In the end it turn out that the CPUs allocated were overbooked. This means as an example if the real server which hosts the virtual machines has 16 CPUs and you have 5 VM’s on it with each 4 CPU’s assigned each, you have 4 CPU’s overbooked. Usually that’s actually fine. The VM’s usually don’t need CPU power at the same time, and this allows each VM to have more power during peak times. However if the minimum CPU power each VM needs isn’t guaranteed you may end up with the process not having enough CPU power even though on OS level you don’t see it. So if your performance test shows that during peak loads i.e. 4 CPUs are required, make sure those are guaranteed.
- Watch the threads. Garbage collection is not the only thing impacting performance. If your application under stress starts many threads you don’t want it to max out. On Linux (or other Unix systems) check the process ID (PID) of your Java process and run cat /proc/<pid>/limits. This will show you the maximum amount of processes your Java process may start. This is actually equivalent to the amount of threads the Java process can start. This is more reliant than checking /etc/security/limits.conf as i.e. Red Hat often uses a different file even though limits.conf will exist.
- Don’t just optimize for performance. To get the best performance you might choose a smaller Heap size. However if any other system doesn’t perform, you need the extra memory to cope. So make sure you add space for anomalies.
- Use GCcause in your monitoring. This will quickly explain why you are getting full collections if you don’t expect them.
- Tune the TCP/IP settings. This is a completely different subject, but remember that Linux and Windows installations are out of the box not necessarily optimized for high amounts of connections.
- Check your JDBC (database) connections. Each application is different. Some that have huge amounts of clients connected only need a few amount of JDBC connections, whereas sometimes other applications may need many for just a few.
- Rapid young collections are not always a bad sign. For instance a service bus will often handle processes in milliseconds rather than seconds. So even if you have multiple collections in a second it may not lead to unwanted promotions. Increasing the heap size will in that case increase the stop the world phases without benefits.
- Just when you think you understand the behaviour of your application and tune for it, it ends up behaving counterintuitive. So experiment, experiment and experiment.
So all in all
Java’s way of handling memory makes it easier for the developers. No need to properly remove objects from memory to avoid memory leaks. The JVM will do that for you by garbage collecting. But for performance engineers and testers it gets a bit more complicated.
The most important thing to remember is that the OS monitoring of memory is not enough as the memory is in a sandbox called the heap.
Java cleans the memory for you. If it is just another application running on your system it will perform just fine and take up no more resources than it needs. But if the purpose of the system is mainly, if not only, to run that Java process, there is a lot to be gained by tuning the garbage collection. You need to understand how it works, the pitfalls and what to look for when testing. Most of all:
- Make sure you know how long the stop the world phases are (pause time)
- Make sure you know how often collections take place
- Have the JVM log each collection. It hardly has any overhead and will tell you much more than just monitoring with tools that periodically check the heap
- Experiment with different settings, different young vs old ratios, different collectors, different heap sizes etc. during your load testing. It is not uncommon to see the counterintuitive behaviour
- Don’t just focus on performance, make sure the JVM has enough ‘breathing space’ to handle temporary issues with other systems
- Don’t forget other performance impacting items such as TCP/IP settings, threads, JDBC connection pools
At first understanding the garbage collection and its impact on performance may seem complex and difficult, but, as there are so many ways to tune the behaviour, it can also be very rewarding for the performance professional. You can in the end really make a difference.
If you want to start experimenting you need to know the options that are available. Check out http://blog.ragozin.info/2016/10/hotspot-jvm-garbage-collection-options.html . It has a cheat sheet with all the options you may need.