• Bug
  • Status: Closed
  • 2 Major
  • Resolution: Cannot Reproduce
  • cdennis
  • Reporter: agolos
  • October 20, 2008
  • 0
  • Watchers: 0
  • July 27, 2012
  • November 19, 2008

Attachments

  • agolos (4.00 M) application/zip src.zip

Description

CyclicBarrier failure. How to reproduce: Unzip the file into a directory on the client machine. Start the Terracotta server on the server machine. On the client machine type “run terra”. The program starts printing statistics (every 1250 iterations) not too often. If desirable, you can change configuration parameters in the file sim.properties. It runs for a while and sometimes dies like this: Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Wake up, but local generation the same as new generation. Entering debug block. Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected] Is local generation equal to new generation: true Current Status CyclicBarrier: – parties: 9 – barrierCommand: null – generation: [email protected] – trip: [email protected]

If you want to look into the code, the interesting for you files are in the package com.accenture.grids.al. The are Scheduler.java and Sequencer.java. Sequencer implements barrier and the waits, and Scheduler implements worker threads and cycle management thread who also participates in the barrier.

Comments

Fiona OShea 2008-10-20

Can we get an LOE on this? thanks

Chris Dennis 2008-10-20

Could you clarify the exact situation in which you see this behavior. This is very similar to another bug observed with CyclicBarrier whereby when multiple JVMs are using the same barrier, the killing of a JVM can leave the barrier in an invalid state. Do you see this behavior only in this sort of case, or do you also see it during normal operation? (Incidentally, this may also happen if you are ending threads by calling Thread.stop() or Thread.destroy(), although I could see no evidence of this in your code).

Thanks

Arie Golos 2008-10-20

No things like Thread.destroy() or killing the JVM. This is just a performance benchmark. I just run it and it sometimes stops like that during the normal operation. I observed it 3 times. Once on Sunday with 5 threads and today with 5 threads and 9 threads. This is not a multiple JVM case either. There is one client jvm only, but multiple threads running through the barrier. I understand that this is not the way to use the barrier and I should probably use cascading barriers, but I wanted to investigate the scalability of the barrier which does not look good either, but this is a different issue.

Chris Dennis 2008-11-19

After applying the fix for CDV-923 I have been unable to reproduce this bug. The fix for CDV-923 was in a related area so it is possible that as a side effect it has heavily mitigated or even resolved the issue (extensive running on a development machine (a week of thrashing) has failed to trigger the errant behavior). I’m therefore temporarily resolving this bug as “Cannot Reproduce” but would strongly encourage anyone who sees it again to reopen it.