L1 shutdown hook should transistion client into a “synchronous” mode (instead of shutting down)

Type: New Feature
Status: Open
Priority: 2 Major
Resolution:
Component/s:
Labels:

Assignee: nmahor
Reporter: teck
Created: June 16, 2010
Votes: 0
Watchers: 5
Updated:
September 23, 2013
Resolved:

Description

I’ve seen a handful of VM shutdown related problems as of late. CDV-1483 is the latest one but we’ve had reports of this behavior forever.

The basic problem is that the L1 (in both express or custom) registers a VM shutdown hook that makes sure the transaction buffer is flushed before the L1 VM exits. That is a good and necessary thing but the bad thing is that the client is also shut down at the same time. That means that any other use of a clustered object will hang forever, likely preventing the VM from ever exiting.

Perhaps we can make our shutdown a little smarter and kill the issue forever by changing our shutdown behavior to instead just flush the current txns and place the client in a mode where all new txns are promoted to “synch write” like semantics. If we can do that we don’t need to actually stop the client in the shutdown hook.

Comments

Fiona OShea 2010-06-22

When do you want to do this? Soon, like next release? or later

Tim Eck 2010-06-22

Seems like a reasonable thing to try for in the next release. I’ll put in there for now

Tim Eck 2012-10-23

if you guys think this is still a reasonable thing to do then stick in a target someplace or close it

Saravanan Subbiah 2012-10-23

I do believe we shouldnt prevent VMs from exiting because someone is making mutations concurrently, but I dont think sync write is the answer. Once we identify that the VM is shutting down we should go into a mode where further operations are not possible and any pending transactions should be sent to the server and waited for an ack.

I think with the new toolkit it might be easier to prevent changes after or during shutdown, so we can take a look and see what it takes to do that.

Abhishek Maheshwari 2013-04-18

I checked the code, today from client VM is going down we do the following through shutdown hook (ClientShutdownManager.execute)

shutdown rejoin manager
execute all runnable registered as before shudown hook
shutdown all L1 managers (lockMgr, clientMgr, remoteTxnMgr etc)
execute RemoteTxnMgr.stop() - which flushed to L2
close the channel

When we are coming from shutdown hook, we do not call client.shutdown(). Our all manager will throw TCNotRunningExp once shutdown() is executed. I dont think our threads are waiting in any manager as per the code today.

@TimE - do we have any reproducible case for this? how do i reproduce it?

Tim Eck 2013-06-25

Sorry, coming back to this one pretty late. I don’t have a test case on hand, but the general test case is an app that has a VM level shutdown hook and that hook tries to interact with a clustered data structure (like a toolkit cache for example).

Provided the app shutdown hook gets exceptions in this case I think this is maybe okay. Previously we would have user shutdown hooks blocked into terracotta calls which would indefinitely block the use shutdown hook (which has the further effect of preventing the VM from exiting).

A new test case couldn’t hurt if nothing along those lines exists

Abhishek Maheshwari 2013-09-05

This is all fixed but we need to add a test case for it. Let discuss it and add it.

Fiona OShea 2013-09-05

Where is the fix? Trunk? 4.1? thanks

Abhishek Maheshwari 2013-09-05

while working on toolkit 2.0 we fixed it on L1 managers code. It would be available in 4.0.x, 4.1.x and trunk.