• Bug
  • Status: Closed
  • 2 Major
  • Resolution: Fixed
  • ehcache-core
  • hsingh
  • Reporter: mads1980
  • May 25, 2010
  • 0
  • Watchers: 2
  • January 17, 2013
  • May 25, 2010

Attachments

Description

Suppose we have the following scenario:

1) Thread A invokes get() on the DiskStore, and obtains a DiskElement. 2) Thread B invoked remove() on the DiskStore, and obtains the same DiskElement. 3) Thread B invoked freeBlock() for the DiskElement, reusing the DiskElement instance for adding to the freeSpaces list 4) Thread A attempts to use the DiskElement instance (which has been since mutated) to retrieve an Element from disk by invoking loadElementFromDiskElement(). The mutated DiskElement has been set diskElement.payloadSize = 0; so now loadElementFromDiskElement() reads a 0-byte array from disk. When attempting to read an object from the resulting ObjectInputStream (which itself wraps a 0-byte-based ByteArrayInputStream), the operation fails with an EOFException.

This did not use to happen in previous versions since get() and remove() as well as many other operations were synchronized on the DiskStore instance. However, the recent concurrency performance improvements have left the door open to this problem.

The proposed solution is simply NOT reusing the DiskElement instance when performing the freeBlock() operation, and instead allocating a new DiskElement before adding it to the freeSpaces list. See below:

private void freeBlock(final DiskElement diskElement) {
    totalSize -= diskElement.payloadSize;
    // Instantiate new DiskElement representing free block
    DiskElement freeBlock = new DiskElement();
    freeBlock.position = diskElement.position;
    freeBlock.blockSize = diskElement.blockSize;
    freeBlock.payloadSize = 0;
    freeBlock.key = null;
    freeBlock.hitcount = 0;
    freeBlock.expiryTime = 0;
    freeSpace.add(freeBlock);
}

This still leaves open the possibility of get() attempting to load the element from disk, by invoking loadElementFromDiskElement(), after the block has been marked as free by another concurrent thread that has just removed / expired / evicted the entry. Theoretically the spoolAndExpiryThread could kick in and reuse this free space between the time get() obtains the DiskElement instance and loadElementFromDiskElement() is invoking - this could cause an incorrect Element to be returned simply by occupying the same block, even if both Elements are completely unrelated.

The proposed solution involves declaring an extra field for DiskElement:

    /**
     * Indicates whether the DiskElement block has been marked as free.
     */
    private volatile transient boolean freed = false;

freeBlock() would be modified as follows:

private void freeBlock(final DiskElement diskElement) {

    synchronized (diskElement) {
    	diskElement.freed = true;
    }

    totalSize -= diskElement.payloadSize;

	// Instantiate new DiskElement representing free block
    DiskElement freeBlock = new DiskElement();
    freeBlock.position = diskElement.position;
    freeBlock.blockSize = diskElement.blockSize;
    freeBlock.payloadSize = 0;
    freeBlock.key = null;
    freeBlock.hitcount = 0;
    freeBlock.expiryTime = 0;
    freeSpace.add(freeBlock);
}

And finally, the loadElementFromDiskElement(diskElement) invocation at get() would be replaced by the following snippet:

        synchronized (diskElement) {
            if (!diskElement.freed) {
                return loadElementFromDiskElement(diskElement);
            } else {
                return null;
            }
        }

This would ensure that only non-freed blocks would be read from disk. The extra field just for this purposes may seem like too much just to solve this problem, better alternatives / suggestions are welcome. I considered directly checking freeSpaces, but since it would require (perhaps lengthy) list iteration, I decided to go with the proposed option.

Comments

Manuel Dominguez Sarmiento 2010-05-25

DiskStore patch including mods for EHC-721, EHC-722, EHC-723, EHC-725, and EHC-726

Manuel Dominguez Sarmiento 2010-05-25

EHC-725 is related to this issue.

Himadri Singh 2010-06-27

Tweaked patch has been checked in.