• Bug
  • Status: Closed
  • 1 Critical
  • Resolution: Fixed
  • DSO:L2
  • hhuynh
  • Reporter: fuzy
  • May 27, 2008
  • 0
  • Watchers: 0
  • July 01, 2008
  • June 13, 2008

Attachments

Description

The Terracotta Server on my production machine crashed 3 times within the last 8 hours, always caused by a NPE in MarkAndSweepGarbageCollector. See log of latest crash:

2008-05-27 09:20:05,685 [main] INFO com.terracottatech.console - Terracotta Server has started up as ACTIVE node on 0:0:0:0:0:0:0:0:9510 successfully, and is now ready for work. 2008-05-27 09:20:16,663 [TCWorkerComm # 2] INFO com.tc.net.protocol.transport.ServerMessageTransport - ConnectionID(41.d9c3bbd3a59b4e10a0832b98cffed290): Attaching new connection: [email protected]: connected: true, closed: false local=127.0.0.1:9510 remote=127.0.0.1:54778 connect=[Tue May 27 09:20:16 CEST 2008] idle=73ms 2008-05-27 09:20:16,683 [WorkerThread(client_handshake_stage,0)] INFO com.tc.objectserver.handshakemanager.ServerClientHandshakeManager - Client connected ClientID[41] 2008-05-27 09:20:16,773 [WorkerThread(jmxremote_tunnel_stage,0)] INFO com.tc.management.remote.protocol.terracotta.ClientTunnelingEventHandler - L1[ChannelID=[41]] notified us that their JMX server is now available 2008-05-27 09:23:11,602 [TC Memory Monitor] INFO com.tc.runtime.TCMemoryManagerImpl - Sleep time changed to : 1500 2008-05-27 09:23:19,143 [TC Memory Monitor] INFO com.tc.runtime.TCMemoryManagerImpl - Sleep time changed to : 750 2008-05-27 10:20:08,734 [TC Memory Monitor] INFO com.tc.runtime.TCMemoryManagerImpl - Sleep time changed to : 375 2008-05-27 10:20:10,123 [TC Memory Monitor] INFO com.tc.runtime.TCMemoryManagerImpl - Sleep time changed to : 150 2008-05-27 10:20:10,424 [TC Memory Monitor] INFO com.tc.runtime.TCMemoryManagerImpl - Sleep time changed to : 60 2008-05-27 10:20:11,007 [GC] ERROR com.tc.server.TCServerMain - Thread:Thread[GC,5,TC Thread Group] got an uncaught exception. About to sleep then exit. java.lang.NullPointerException at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.collectRoot(MarkAndSweepGarbageCollector.java:228) at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.collect(MarkAndSweepGarbageCollector.java:206) at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.rescue(MarkAndSweepGarbageCollector.java:109) at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.gc(MarkAndSweepGarbageCollector.java:146) at com.tc.objectserver.impl.ObjectManagerImpl$1.run(ObjectManagerImpl.java:705) 2008-05-27 10:20:14,008 [CommonShutDownHook - [email protected]] INFO com.tc.net.protocol.TCNetworkMessage - No of times Buffers wasted = 0 Buffers wasted count = 0

I am using Terracotta 2.5.4 on Ubuntu Hardy Server amd64 with java -version java version “1.6.0_06” Java(TM) SE Runtime Environment (build 1.6.0_06-b02) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)

See also http://forums.terracotta.org/forums/posts/list/1086.page

Comments

Steve Harris 2008-05-27

Would it be possible to attach all the complete logs? Also, when you upgraded, did you try to run newer versions of TC with the old jdb files?

Stefan F 2008-05-27

complete log file from startup to crash (another crash than the the previous)

Saravanan Subbiah 2008-05-27

I am attaching a modified tc.jar that will produce more info on the NPE. Please run with this and send us the log files.

Saravanan Subbiah 2008-06-11

Some interesting prints from log when it first started happening …

2008-05-27 00:56:12,410 [WorkerThread(apply_changes_stage,0)] WARN com.tc.objectserver.managedobject.PhysicalManagedObjectState - Recoverable Incompatible Class Change Identified : Not found ! field = at.molindo.songtexte.SongtexteSession._mailConfirmationRequired value = false 2008-05-27 00:58:22,317 [WorkerThread(apply_changes_stage,0)] WARN com.tc.objectserver.managedobject.PhysicalManagedObjectState - Recoverable Incompatible Class Change Identified : Not found ! field = at.molindo.panel.login.LoginPanel._userService value = ObjectID=[62591104] 2008-05-27 00:58:22,336 [WorkerThread(apply_changes_stage,0)] WARN com.tc.objectserver.managedobject.PhysicalManagedObjectState - Recoverable Incompatible Class Change Identified : Not found ! field = at.molindo.panel.login.LoginPanel._userService value = ObjectID=[62591104] 2008-05-27 01:13:16,531 [GC] ERROR com.tc.server.TCServerMain - Thread:Thread[GC,5,TC Thread Group] got an uncaught exception. About to sleep then exit. java.lang.NullPointerException at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.collectRoot(MarkAndSweepGarbageCollector.java:228) at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.collect(MarkAndSweepGarbageCollector.java:206) at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.rescue(MarkAndSweepGarbageCollector.java:109) at com.tc.objectserver.core.impl.MarkAndSweepGarbageCollector.gc(MarkAndSweepGarbageCollector.java:146) at com.tc.objectserver.impl.ObjectManagerImpl$1.run(ObjectManagerImpl.java:705)

Fiona OShea 2008-06-11

Upping priority for your jira list

Tim Eck 2008-06-11

This bug can be caused in more than one way, but the basic trigger condition is adding a field to a non-static inner class. The specific scenario for this report involved an abstract base class that had a non-static inner subclass. A field was added to the abstract base class. Something like this:

public class Foo {

class NonStaticInner extends AbstractBase { // } }

class AbstractBase { // }

An instance of Foo$NonStaticInner is shared, and then later (in another VM instance), the definition of AbstractBase is updated with the addition of a new field. When an instance of the non-static inner with the updated base class is shared, it ends up generating an incorrect physical state object in the server. Attached are the decompiled problematic state classes (with some names changes to protect the innocent)

Tim Eck 2008-06-11

Attached State1_V133.java and State2_V790.java. The problem with the generated subclass (State2) is that is contains repeated fields and getter/setters for the parent ID. When getObjectReferences is called on the new subclass the field in the parent class will always be null

Tim Eck 2008-06-13

8819 and 8820 in 2.6 and trunk respectively should resolve this issue