• Bug
  • Status: Resolved
  • 2 Major
  • Resolution: Won't Fix
  • cdennis
  • Reporter: cdennis
  • November 30, 2009
  • 0
  • Watchers: 1
  • June 06, 2013
  • June 06, 2013

Description

The basic problem here is as follows:

ClassLoader X is loading a class and TC instrumentation code is inspecting it. The TC inspection triggers the class load of a TC class.

Locking Sequence: ,

Concurrently a TC class is being loaded and the installed Java Agent attempts to transform it, triggering the load of one of the agents dependent classes.

Locking Sequence: ,

In the event that ClassLoaders X and Y are the same loader, a deadlock can occur. This is equivalent to the deadlocks seen when building classloaders that do not follow the normal classloader delegation models. The most likely identity for ClassLoader X/Y is the system class loader since that is the loader for the agent.

This deadlock has been seen three times in the monkeys in the InvalidClassBytesTest (which uses an agent). The test has failed twice recently, and I believe that this is because I moved the ASM and AspectWerkz classes out of the bootjar and into the tc.jar. This means instrumentation code is triggering many more loads from the TC ClassLoader, i.e. the first locking sequence has become more common.

We believe that this problem can be heavily supressed by proactively loading the common classes (especially those needed to confirm a class’s non-instrumented nature). We can do this by simply throwing a simple set of class bytes through the instrumentation code at runtime.

Comments

Alex Miller 2009-12-07

For the peanut gallery, this possibility probably existed before but we loaded most of our classes via the boot jar. Now that we’ve moved a lot of classes out of the boot jar and into the tc.jar, we’ve widened the possibility of deadlocks here (although it still isn’t too wide). We *might* see this once people move to 3.2 if they are using agents (but even in this test it only is failing a few times per month).

If the classloader deadlock does happen, it should present an easily detectable pattern in the thread dump.

Some possibilities to fix:

  • move stuff back into boot jar (not desired)
  • pre-load more classes by forcing an early (possibly fake) instrumentation
  • something else we haven’t thought of yet

Given all that, think we can push this out for the time being.

Fiona OShea 2009-12-16

due to Darwin is this a non-issue? if so please close as will not fix

Chris Dennis 2009-12-17

No this isn’t an issue with our java agent, but rather an issue with our compatibility with java agents in general. Using any kind of java agent with TC could lead to a classloading deadlock. Like I say we think this is unlikely, but may be more likely post Santiago since we have changed the bootjar contents quite significantly.

Steve Harris 2010-06-17

Is this a dso only issue or does it hit express too?

Chris Dennis 2010-06-17

I believe this is highly unlikely to affect express users. It would require a third-party java agent to want to instrument a type being loaded (and adapted) by the clustered state loader, and then in turn the agent would have to trigger a load in the clustered state loader itself. I.e. someone wrote a java agent that used express functionality. If someone does this then I think they would have to be on their own…