The importance of ThreadDump"S"

 


The Problem

Application that becomes unresponsive is always a pain and find what is going wrong is one of the hardest activity that developers, architects and operators have to deal with.


Recently I had the "plaesure" to see all the VM where an application was deployed crashing for unknown reason: the application did not respond anymore and tomcat was not able to serve any request.


Usually if we look just at the machine, we do not have any information about what is wrong: log stop to be written..we just see that CPU is increasing its usage (and RAM of course).


To really understand what is going wrong, it is important to have a Thread Dump of the JVM to see effectively what each thread is doing at the moment. But there is a problem on that:

if you just take one thread dump you have just a snapshot of the application at a specific time while you are missing the "evolution" of the application


Thread dump overview

For example just looking on the first Thread Dump received, it was not clear what is going wrong. To analyze it, we use one of the best tool which is fastThread.io


The report shows that there are many many TimedWaiting Threads and there are also 2 blocked thread. So what people usually do is to observe the blocked threads and to find what is the problem. This is the principal guilty of the application situation, at the first glance.


Thread Dump mistake

And this is a huge mistake. Thread dumps are ok only if you take more than one and can compare all of them. This is important because you can see the evolution of the application. Blocked doesn't mean that we have a problem, maybe they are using the same resource and in a while the resource will be free to be used, so it's just an instant situation.


Collecting other 2 dumps waiting 2 minutes among them (but you can also have more dumps with less waiting time) you can zip together and upload to the tool and see the comparison:


And if you scroll the comparison and go to CPU spike we see that



So there was 5 threads that were in infinite loop (they are executing the same code, but in different point this is why they are not grouped all together).


The real problem

Infinite loop is not always pure evil, but in this case the loop was related to listing element in a List structure. But at the end of each loop, the list is changed adding one element. The loop was not implemented just using iterator (this prevent list change) but using indexes. So the problem was:


  • Condition on list was always true. The JVM need to check the dimension of the list every loop
  • An element is added at the end of the loop to the list. So the list is increasing the RAM occupied
  • GC is not able to find elements to be discarded because - after million of loops - the list is referring all the elements that can't be removed

The solution was just to use another list, because the goal of the algorithm was to just enrich the list.


Conclusion


So it is important to have a comparison among different Dumps because just analyzing one dump, you can just see only temporary problems that can be a normal situation of a JVM. Remember to use FastThread.io for this purpose because it is free for simple activities, but if you pay you can have ML and experts that can help you to find solution on unresponsive applications













Commenti

Post popolari in questo blog

HTML and AI - A simple request, and a beautiful response for ElasticSearch without Kibana

A simple CD using AWS Pipeline

Websocket Chat with Lambda (but all in Java)