EHC ❯ DiskStore corruption can cause Cache Corruption
-
Bug
-
Status: Closed
-
1 Critical
-
Resolution: Fixed
-
ehcache-core
-
-
hsingh
-
Reporter:
-
October 20, 2009
-
1
-
Watchers: 2
-
January 17, 2013
-
November 18, 2009
Description
Hello, how can it be possible, that a key returned from Cache.getKeys() differs from Element.getKey() ? Have a look at this:
for (Object k : cache.getKeys()) \{ Element e = cache.get(k); if (!k.equals(e.getKey())) System.out.println(k + ":" + cache.get(k)); \}
That will output something like this:
1115:[ key = 1063, value=Volkswagen Scirocco [08-] #ML1063, version=1, hitCount=1, CreationTime = 1250175156170, LastAccessTime = 1251192946288 ] 543:[ key = 1047, value=Dacia Sandero [08-] #ML1047, version=1, hitCount=1, CreationTime = 1250170668653, LastAccessTime = 1251192946288 ] 969:[ key = 709, value=Volkswagen Polo GP [05-09] #ML709, version=1, hitCount=1, CreationTime = 1250175156170, LastAccessTime = 1251192946335 ]
Any help will be appreciated.
I know where is the problem. My cache is initialized with overflowToDisk and persist turned on. The problem arise when JVM was shutdown suddenly (e.g. kill -9) and EHCACHE was either reading or writing to the disc store. Please run following code three times - first time wait until end, second time kill the JVM process after 1-3 seconds and within third run you will see outputs to the console, that illustrates an error.
package test;
import java.io.Serializable; import java.util.Random;
import net.sf.ehcache.Cache; import net.sf.ehcache.CacheManager; import net.sf.ehcache.Element; import net.sf.ehcache.store.MemoryStoreEvictionPolicy;
public class CacheTest implements Serializable {
private static final long serialVersionUID = -5105608893105533825L;
/** * @param args */ public static void main (String[] args) throws Exception { CacheManager cc = CacheManager.getInstance(); cc.addCache(new Cache(“mltest”, 50, MemoryStoreEvictionPolicy.LRU, true, null, true, 0, 0, true, 120, null, null, 0, 2, false)); Cache ca = cc.getCache(“mltest”); CacheTest tt = new CacheTest(); Random random = new Random();
for (int iterator1 = 0; iterator1 < 100; iterator1++) {
for (int i = 0; i < 100; i++) { int id = random.nextInt(); Element e = ca.get(id); if (e == null || e.isExpired()) { Element p = new Element(id, tt); p.setTimeToLive(10); ca.put(p); } }
for (Object k : ca.getKeys()) { Element e = ca.get(k); if (e != null && !k.equals(e.getKey())) System.out.println(k + “:” + e); } }
cc.shutdown(); }
}
For the full forum thread see http://sourceforge.net/projects/ehcache/forums/forum/322278/topic/3377937
Comments
Steve Harris 2009-11-16
gluck 2009-11-17
Ok, reproduced with your test.
This bug was introduced to the code base in 1.6.0-beta4.
The corruption is caused by a .data file which was changed since start up, but the .index file never gets written because there is no orderly shutdown.
It turns out this is very easy to detect. The .index file should never be younger than the .data file on startup. Following is the state of the files after the corruption event caused by a kill -9.
-rw-r–r– 1 gluck gluck 2422406 18 Nov 13:41:23 2009 mltest.data -rw-r–r– 1 gluck gluck 359519 18 Nov 13:41:06 2009 mltest.index
The very simple and safe fix is to add an extra validation check on startup which checks the dates of the files.
gluck 2009-11-17
Added the following logic to the existing index file validator:
if (dataFile.exists() && indexFile.exists()) { if (dataFile.lastModified() > indexFile.lastModified()) { LOG.warn(“The index for data file {} is out of date, probably due to an unclean shutdown. Deleting index file {}”, dataFile.getName(), getIndexFileName()); } indexFile.delete(); }
Nov 18, 2009 1:59:04 PM net.sf.ehcache.store.DiskStore deleteIndexIfNoData WARNING: The index for data file mltest.data is out of date, probably due to an unclean shutdown. Deleting index file mltest.index
A quick way to test this is to touch the .data file after a normal shutdown and start the CacheManager.
Himadri Singh 2009-11-19
Got it reproduced with echache core 1.7.0. Working fine with 1.7.1-SNAPSHOT. Verified with rev 1441.
For now I just want an estimate of how much work this is and how much risk it adds. If it’s going to delay the release then lets just fix it in 1.7.2 for the canberra release on Dec 21st. If we push out 1.7.1 we are holding up a lot of valuable fixes and moving it too close to 1.7.2.