DiskStore spool thread may die

Type: Bug
Status: Closed
Priority:
Resolution: Fixed
Component/s:
Labels:

Assignee: drb
Reporter: sourceforgetracker
Created: September 21, 2009
Votes: 0
Watchers: 0
Updated:
September 22, 2009
Resolved:
September 22, 2009

Description

We’ve run into two cases where the DiskStore spool thread died but cache elements continued to spool. Generally no error is logged in this condition and the disk store basically becomes another memory store. Here are the two cases that caused the spool thread to die:

The first case was caused by an error in the serialization of an object that threw a NullPointerException. The object had an error in its implementation of writeReplace and through a NullPointerException which was uncaught and caused the spool thread to die. This case can be handled relatively easily by catching any exception during serialization and just logging a message and dumping the object if it fails to serialize.

The other case was a more difficult case to deal with, the spool thread encountered an OutOfMemoryError while flushing the cache. The error caused the spool thread to die but elements continued to be spooled. In this case there isn’t much the spool thread can do about the error so it’s probably best to just die. The problem is that this condition isn’t detected so elements continue to spool.

In order to handle the second condition the DiskStore needs to detect that the spool thread has died and stop spooling elements. A simple check to see if the spool thread is alive should suffice. However, the v1.2 enhancements to the DiskStore’s thread syncronization make this a bit more difficult because there is no lock preventing elements from being spooled while the spool is being flushed by the spool thread. The problem here is that the spool thread may be flushing the spool so new elements that come along get added to the spool, but if the spool thread dies before starting a new flush operation elements will be left in the spool. In this case, the spool will contain elements that have been spooled between the time the spool thread started to flush the spool and the time the spool thread died. The only thing I could come up with for this condition, other than forcing put to wait for the flush operation to complete (which is undesirable for performance reasons), was to handle it the next time an element is put in the store. At this time the store will detect that the spool thread has terminated, clear the spool, and not add any more objects.

I also wasn’t sure what is the best behavior if the spool thread does die. Should the store just log messages and stop writing elements to disk by not spooling any more elements, or should it dispose and throw exceptions? The problem here is that the state of the store file isn’t really known, the dead thread may have corrupted the file before it died. Ultimately I opted for logging messages and not spooling more elements. The implementation of flushing the spool to the file seems pretty robust and the chances of having a reference to a corrupt entry seem slim to none. If the cache is not disposed it can still operate as a store but more elements cannot be added. Sourceforge Ticket ID: 1432458 - Opened By: nobody - 15 Feb 2006 20:55 UTC

Comments

Fiona OShea 2009-09-22

Re-opening so that I can properly close out these issues and have correct Resolution status in Jira