Resolve Spectrum Cluster error on distributed transaction commit

Product affected: Spectrum™ Technology Platform

Issue

Below are some errors from the wrapper.log file:

ERROR [OTxTask] Error on distributed transaction commit
ERROR [ODistributedStorage] Cannot route TX operation against distributed node
com.orientechnologies.orient.core.exception.OTransactionException: Error on committing distributed transaction

This is the error I get when I save a job: There was an error saving this dataflow to the server:

System.Web.Services.Protocols.SoapHeaderException: Unable to commit; nested exception is com.pb.spectrum.api.persistence.PersistenceException: com.orientechnologies.orient.core.exception.OStorageException: Cannot route TX operation against distributed node at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall) at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters) at Group1.ESD.WebServiceProxies.Dataflow.DataflowManagerService.update(DataflowDefinition arg0) at Group1.ESD.SAL.Services.Internal.DataflowManager.update(DataflowDefinition arg0) at Group1.ESD.SAL.Services.DataflowService.Update(DataflowDefinition dataflowDefinition) at Group1.ESD.Common.Applications.Dataflows.Editing.Saving.DataflowSaveHelper.InternalSave(Boolean newDataflow, DataflowDefinition dataflowDefinition, SaveResults results, ChangeTypes dirtyType) In addition we are getting a lot of commit errors. We stopped one server and restarting the other. we are in a clustered environment. --- ERROR [OTxTask] Error on distributed transaction commit ERROR [ODistributedStorage] Cannot route TX operation against distributed node com.orientechnologies.orient.core.exception.OTransactionException: Error on committing distributed transaction ERROR [JobProcessImpl] Error executing job Error in transform script: Recode Seed ID
 

ERROR [OTxTask] Error on distributed transaction commit
| com.orientechnologies.orient.core.exception.OStorageException: Error during transaction commit.
|     at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:938) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.server.distributed.ODistributedStorage.commit(ODistributedStorage.java:847) ~[orientdb-server-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:488) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:147) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2437) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.commit(ODatabaseDocumentTx.java:2407) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.server.distributed.task.OTxTask.execute(OTxTask.java:116) ~[orientdb-server-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.executeOnLocalNode(OHazelcastPlugin.java:753) [orientdb-distributed-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.server.hazelcast.ODistributedWorker.onMessage(ODistributedWorker.java:298) [orientdb-distributed-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.server.hazelcast.ODistributedWorker.run(ODistributedWorker.java:121) [orientdb-distributed-2.0.14.jar:2.0.14]
| Caused by: java.lang.NullPointerException: null
|     at com.orientechnologies.orient.core.index.sbtreebonsai.local.OSBTreeBonsaiLocal.load(OSBTreeBonsaiLocal.java:455) ~[orientdb-core-2.0.14.jar:2.0.14]
| jvm 1    | 2017/07/19 11:13:16 |     at com.orientechnologies.orient.core.db.record.ridbag.sbtree.OIndexRIDContainerSBTree.<init>(OIndexRIDContainerSBTree.java:80) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.serialization.serializer.binary.impl.legacy.OStreamSerializerSBTreeIndexRIDContainer_1_7_9.deserializeFromDirectMemoryObject(OStreamSerializerSBTreeIndexRIDContainer_1_7_9.java:255) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.serialization.serializer.binary.impl.legacy.OStreamSerializerSBTreeIndexRIDContainer_1_7_9.deserializeFromDirectMemoryObject(OStreamSerializerSBTreeIndexRIDContainer_1_7_9.java:49) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.deserializeFromDirectMemory(ODurablePage.java:129) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.index.sbtree.local.OSBTreeBucket.getEntry(OSBTreeBucket.java:215) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.index.sbtree.local.OSBTree.get(OSBTree.java:192) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.index.engine.OSBTreeIndexEngine.get(OSBTreeIndexEngine.java:224) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.index.OIndexMultiValues.putInSnapshot(OIndexMultiValues.java:183) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.index.OIndexAbstract.applyIndexTxEntry(OIndexAbstract.java:1117) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.index.OIndexAbstract.addTxOperation(OIndexAbstract.java:749) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.tx.OTransactionOptimistic$CommitIndexesCallback.run(OTransactionOptimistic.java:98) ~[orientdb-core-2.0.14.jar:2.0.14]
|     at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:924) ~[orientdb-core-2.0.14.jar:2.0.14]
|     ... 9 common frames omitted
| ERROR [OrientDBAuditManagement] Error save audit data: Error on committing distributed transaction
| ERROR [OrientDBAuditManagement] Data: AuditLog [id=5fd7bc38-7770-44b9-b3ff-660e39b19ef4, logLevel=INFO, username=system, timestamp=Wed Jul 19 11:13:09 CDT 2017, module=Platform, details, objectId=integrator, objectType=Role, event=Update, tags=[Configuration, Security], additional={}]
 

Cause

Due to the incorrect order of the Spectrum node shutdown/startup, the Spectrum cluster's synchronization has been corrupted.

Resolution

UPDATED: September 6, 2018
To resolve, shut down all Spectrum node servers in the order they were turned on with the seed node being the last to be shutdown, then bring up each node one by one without any requests being submitted to any of the nodes. This means no jobs/services running, no one modifying dataflows, etc.

If the above doesn't help, make a backup of the whole server/app/repository/store directory.

If the above does not help, try stopping all the nodes in the order they were turned on and move the server/app/repository/store/databases/audit folder out (backup) then restart them the cluster in the desired order with the seed node being the first to be turned on. Spectrum should recreate the audit log directory that was removed.