HashMap corruption within aspectwerkz through ClassLoader.defineClass lead to deadlock

Type: Bug
Status: Closed
Priority: 2 Major
Resolution: Fixed
Component/s: X-AspectWerkz
Labels:

Assignee: teck
Reporter: fern
Created: September 14, 2010
Votes: 0
Watchers: 2
Updated:
March 24, 2011
Resolved:
October 28, 2010

Description

We are seeing our production servers hitting a deadlock randomly. And it keeps coming back to this stack trace. The addClassInfo is calling a HashMap.put, which never returns. I can only think that there was some sort of corruption within that map causing the deadlock, (cyclic linked list for a bucket). We are using a webframework (Tapestry 5.1.0.5), which does a lot of class enhancement, thus it probably calls ClassLoader.defineClass a lot exposing this bug.

“http-8086-Processor51” daemon prio=10 tid=0x0000002b60dbb000 nid=0x1ecb runnable [0x000000004b5d8000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.__tc_put(Unknown Source) at java.util.HashMap.put(Unknown Source) at com.tc.aspectwerkz.reflect.impl.asm.AsmClassInfoRepository.addClassInfo(AsmClassInfoRepository.java:142) at com.tc.aspectwerkz.reflect.impl.asm.AsmClassInfo.(AsmClassInfo.java:183) at com.tc.aspectwerkz.reflect.impl.asm.AsmClassInfo.getClassInfo(AsmClassInfo.java:292) at com.tc.object.bytecode.hook.impl.DefaultWeavingStrategy.transformInternal(DefaultWeavingStrategy.java:147) at com.tc.object.bytecode.hook.impl.DefaultWeavingStrategy.transform(DefaultWeavingStrategy.java:124) at com.tc.object.bytecode.hook.impl.DSOContextImpl.preProcess(DSOContextImpl.java:168) at com.tc.object.bytecode.hook.impl.ClassProcessorHelper.defineClass0Pre(ClassProcessorHelper.java:665) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.lang.ClassLoader.defineClass(ClassLoader.java:462) at javassist.Loader.findClass(Loader.java:379) at org.apache.tapestry5.internal.services.ComponentInstantiatorSourceImpl$PackageAwareLoader.findClass(ComponentInstantiatorSourceImpl.java:94) at javassist.Loader.loadClass(Loader.java:311) - locked <0x0000002a9f846f68> (a java.lang.String)

Comments

Fernando Padilla 2010-09-14

could it be as simple as making sure AsmClassInfoRepository uses a ConcurrentHashMap?

Fernando Padilla 2010-09-14

I’m confused, while you’re at it, can you discuss why you’re still using Aspectwerkz, since it looks like it hasn’t been in development since 2005?

http://aspectwerkz.codehaus.org/index-merge.html

Fernando Padilla 2010-09-14

http://www.java2s.com/Open-Source/Java-Document/Net/Terracotta/com/tc/aspectwerkz/reflect/impl/asm/AsmClassInfoRepository.java.htm

It looks like you guys have your own internal copy of AspectWerkz? If so, I guess we just have to change the m_repository and m_repositories to be ConcurrentHashMap to fix this issue. :)

(that is while you’re porting over to aspectj)

Fernando Padilla 2010-09-14

http://svn.terracotta.org/fisheye/browse/Terracotta/dso/trunk/code/base/aspectwerkz/src/com/tc/aspectwerkz/reflect/impl/asm/AsmClassInfoRepository.java?r=HEAD

Sorry for the previous link, this is the latest code in your repository. :)

And yup, either change the HashMaps to ConcurrentHashMap, or wrap all access to m_repository with synchronized() blocks ( like you did to m_repositories ), easy fix. :)

Tim Eck 2010-09-14

I agree that code looks pretty suspect with unsynchronized access to HashMap instances. At least for that class I can’t see how changing HashMap->ConcurrentHashMap would cause any new problems (while preventing the possibility of infinite loops!)

If you can get this problem to happen more or less reliably I’d be happy to give you a new build that uses concurrent hash maps to see if it goes away.

Tim Eck 2010-09-14

short of bugs like this we haven’t had any need to move off that old aspectwerkz stuff. It is quite old (and certainly not bug free) I’ll give you that.

Fernando Padilla 2010-09-14

We are basically taking the dso-boot and replacing the bad class with our own to see what happens. Yes, it’s happening way too ‘reliably’ under load. :)

Tim Eck 2010-10-05

Hi Fernando – did your testing with the modified class prove successful?

Fernando Padilla 2010-10-05

very much so, we haven’t run into the issue again. have you fixed it on your trunk yet?

Tim Eck 2010-10-05

I had to go check since I didn’t remember, but it looks like I made this change:

http://svn.terracotta.org/fisheye/changelog/Terracotta/?cs=16273