• Bug
  • Status: Closed
  • 2 Major
  • Resolution: Duplicate
  • cdennis
  • Reporter: tgautier
  • October 01, 2008
  • 0
  • Watchers: 0
  • July 27, 2012
  • October 21, 2008

Attachments

Description

Run the attached test - it requires 5 processes.

After some time kill one or more processes.

Then, without starting new, start some more processes - when the total process count is 5 the loop should continue, but it doesn’t.

In fact the barrier can report number of parties waiting > than the number specified in the constructor:

Waiters: 9 Waiting for other nodes to join…

Comments

Taylor Gautier 2008-10-01

The problem ONLY shows up after the assertion in CDV-923 is printed - so you have to keep restarting processes until that assertion shows.

Then the barrier gets into a bad state.

Chris Dennis 2008-10-20

The bug is due to the mutations that are performed by the final thread arriving at the barrier, being performed under two locks. This results in two separate transactions. If we kill the JVM in question after one of the transaction but before the other, all locks are released, the second of the two transactions is not committed and the resultant unlocked (and therefore unprotected) CyclicBarrier instance is in an invalid state.

Various fixes are under investigation.

Alex Miller 2008-10-21

Targetting to 2.7.1 as Chris has a fix.