So I spent three hours today tweaking and tuning the data -> mesh -> GPU pipeline because there was a significant performance issue when I loaded 293 chunks. I messed with the threads thinking there was some unintentional thread-blocking going on, maybe some locked resources? Nope. I messed with the queuing order, thinking that maybe if I used a different type of queue, or slowed the queue submission rate… Nope. I read that Java stores single bytes as a 4-byte short unless you store multiple bytes in an array, then they are tightly packed. It has something to do with the underlying OS having faster access to 4-byte segments. So, thinking I might get a performance boost, I re-coded the chunk-tile container to use a tightly packed byte array. Ok, a smaller heap usage, but the performance issue was still there. The engine would periodically stoop to one frame per second at several intervals during chunk generation!

Then I decided to break out the VisualVM profiler, and immediately sussed out the culprit: the JVM garbage collector (GC). According to my sources, the GC is triggered when it thinks you’re running out of memory. Since the allocated heap size starts low, as soon as your application starts chewing into a lot of memory, the GC is triggered and the heap size is expanded. All I had to do was allocate more memory to the heap from the CLI and the GC calmed down significantly. Additionally, all of the other tuning I did reduced the memory usage!

The blue line on the left side of the photo represents the GC and on the right side the blue line is the used heap. Both measurements were 30 seconds long and recorded the generation of 293 chunks.