In this post, I will talk about an interesting deadlock and immediate steps taken to resolve it.
The system I work on is distributed in nature where instances communicate among themselves so they don't have to do the same work. We encountered an issue where batching was not being done.
On log analysis, it seemed the batch file was not generated because the thread couldn't acquire the necessary lock. This lock (a database lock) was held by another thread.
Being a self proclaimed concurrency expert in my team, I went straight into the machine where the webapp was running and executed jstack command to get the thread dump.
Following is the snippet of the thread dump which had acquired the lock:
And following is the snippet of the thread dump which was waiting for the lock to be released:
Taking several thread dumps at regular intervals revealed that http-bio-41080-exec-10 thread was doing something fishy line 72 in CLASSNAME.createPartnerTransaction.
The line number 72 surrounding the block is as follows:
Initial glance at the above block seemed fine, I asked myself what could be wrong with above while of code.... The issue was lurking which would only be executed when the account number was less than 9. Noticed it? On second look, the bug seemed obvious...
But just to reconfirm the issue, I took the jar file this class was present in and used a jad de-compiler to get back the java file.
The above block of code was converted into following line by compiler.
Noticed it??
Strings in java are immutable and result of concat method needs to be appended to the original variable (in this case, filler) to get the new value.
Immediate step to resolve this issue:
As it turns out, an infinite loop was present and would be executed if length of accountNumber+filler (always 0) would be less than 9. On detecting this issue, resolution was as simple as replacing the above block of code with apache commons StringUtils.padLeft.
Thoughts on improving this code on a long term basis:
The major issue here is not the infinite loop (thats a symptom), but I think its the combination of Starvation (couldn't acquire the lock held in database, which is the resource) and Livelock (retrying the operation of getting the resource which will always fail because it is acquired by another thread).
Was there a need to hold this lock when executing this method operation (or the nested methods for that matter)? This is one of the important question that we need to answer by revisiting its architecture....
The system I work on is distributed in nature where instances communicate among themselves so they don't have to do the same work. We encountered an issue where batching was not being done.
On log analysis, it seemed the batch file was not generated because the thread couldn't acquire the necessary lock. This lock (a database lock) was held by another thread.
Being a self proclaimed concurrency expert in my team, I went straight into the machine where the webapp was running and executed jstack command to get the thread dump.
Following is the snippet of the thread dump which had acquired the lock:
"http-bio-41080-exec-10" daemon prio=10 tid=0x000000004d387000 nid=0x6620 runnable [0x0000000043e12000] java.lang.Thread.State: RUNNABLE at com.mycompany.integration.partner.CLASSNAME.createPartnerTransaction(CLASSNAME.java:72) ....
And following is the snippet of the thread dump which was waiting for the lock to be released:
"scheduler_Worker-8" prio=10 tid=0x00002aaab05c9000 nid=0x653d waiting for monitor entry [0x0000000044d25000] java.lang.Thread.State: BLOCKED (on object monitor) at com.mycompany.integration.service.impl.CLASSNAME1.acquireNonBlockingLock(CLASSNAME1.java:35) - waiting to lock <0x00000000c7bc4b60> (a java.lang.String) ....
Taking several thread dumps at regular intervals revealed that http-bio-41080-exec-10 thread was doing something fishy line 72 in CLASSNAME.createPartnerTransaction.
The line number 72 surrounding the block is as follows:
String filler = ""; while (accountNumber.length() + filler.length() < 9) { filler.concat(" "); } transaction.setPayeeAccountNumber(filler.concat(accountNumber));
Initial glance at the above block seemed fine, I asked myself what could be wrong with above while of code.... The issue was lurking which would only be executed when the account number was less than 9. Noticed it? On second look, the bug seemed obvious...
But just to reconfirm the issue, I took the jar file this class was present in and used a jad de-compiler to get back the java file.
The above block of code was converted into following line by compiler.
for(filler = ""; accountNumber.length() + filler.length() < 9; filler.concat(" "));
Noticed it??
Strings in java are immutable and result of concat method needs to be appended to the original variable (in this case, filler) to get the new value.
Immediate step to resolve this issue:
As it turns out, an infinite loop was present and would be executed if length of accountNumber+filler (always 0) would be less than 9. On detecting this issue, resolution was as simple as replacing the above block of code with apache commons StringUtils.padLeft.
Thoughts on improving this code on a long term basis:
The major issue here is not the infinite loop (thats a symptom), but I think its the combination of Starvation (couldn't acquire the lock held in database, which is the resource) and Livelock (retrying the operation of getting the resource which will always fail because it is acquired by another thread).
Was there a need to hold this lock when executing this method operation (or the nested methods for that matter)? This is one of the important question that we need to answer by revisiting its architecture....
No comments:
Post a Comment