CDV ❯ Clients are not recieving operations enabled event properly
-
Bug
-
Status: Closed
-
2 Major
-
Resolution: Fixed
-
DSO:L1,DSO:L2
-
-
kkannaiy
-
Reporter: rsingh
-
December 08, 2009
-
0
-
Watchers: 0
-
February 12, 2013
-
January 04, 2010
Attachments
Description
Attached is the app which reproduces this problem
Steps to reproduce
- Start an active and passive server.
- Start 5 clients C0-C4 using the attached app on the same machine.
- Kill active
- Kill C4 and start a new client C5 while passive is taking over
- When passive takes over all the clients should get operations enabled event and the connected clients should resume there work but instead the cluster gets frozen
Comments
Raghvendra Singh 2009-12-08
Raghvendra Singh 2009-12-08
Seems like the servers are indeed firing the events but somehow clients are stuck here
“WorkerThread(client_coordination_stage, 0)” daemon prio=10 tid=0x00002aab32814400 nid=0x925 in Object.wait() [0x0000000042951000..0x0000000042951aa0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aab0b51adc0> (a com.tc.object.ClusterMetaDataManagerImpl) at java.lang.Object.wait(Object.java:485) at com.tc.object.ClusterMetaDataManagerImpl.waitUntilRunning(ClusterMetaDataManagerImpl.java:297) - locked <0x00002aab0b51adc0> (a com.tc.object.ClusterMetaDataManagerImpl) at com.tc.object.ClusterMetaDataManagerImpl.retrieveMetaDataForDsoNode(ClusterMetaDataManagerImpl.java:139) at com.tc.cluster.DsoClusterImpl.retrieveMetaDataForDsoNode(DsoClusterImpl.java:247) at com.tc.cluster.DsoClusterImpl.fireNodeJoinedInternal(DsoClusterImpl.java:328) at com.tc.cluster.DsoClusterImpl.fireNodeJoined(DsoClusterImpl.java:322) at com.tc.object.handler.ClientCoordinationHandler.handleClusterMembershipMessage(ClientCoordinationHandler.java:54) at com.tc.object.handler.ClientCoordinationHandler.handleEvent(ClientCoordinationHandler.java:30) at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:127)
Piero Positivo 2009-12-08
Here are the logs of the postOfficeApp. I have run many times on both MacOSX machines and Linux machines. They all reproduce the problem. There are two TC servers called TC1 and TC2 in active-passive mode and 3 clients. I have included the 4 client logs where the fourth is the client that attempts to join the cluster while the passive takes over after client 3 has been killed.
Steve Harris 2009-12-08
If this is a bug we should probably look at it in the darwin timeframe
Raghvendra Singh 2010-01-04
fixed in trunk with r14254, merged in 3.2 with r14255
More discussion of this issue is at http://forums.terracotta.org/forums/posts/list/2775.page