Rollback Behavior and Member Failures

Within the scope of a transaction, GemFire XD automatically initiates a rollback if it encounters a constraint violation.

Any errors that occur while parsing queries or while binding parameters in a SQL statement do not cause a rollback. For example, a syntax error that occurs while executing a SQL statement does not cause previous statements in the transaction to rollback. However, a column constraint violation would cause all previous SQL operations in the transaction to roll back.

Handling Member Failures

Note: This version of GemFire XD does not support transparent failover for transactions. If a member that is participating in a transaction goes down in the middle of the transaction or during a commit, then the commit throws a commit failure exception (SQLState: X0Z16) and the transaction is rolled back. Similarly, GemFire XD does not support automatically copying an existing transactional state to new members that join the system. If a member joins the distributed system in the middle of a transaction and that member is a data store for one of the tables being updated in the transaction, then the transaction fails at commit and is rolled back (SQLState: X0Z16).
The following steps describe specific events that can occur depending on which member fails and when the failure occurs during a transaction:
  1. If the transaction coordinator member fails before a commit is fired, then each of the cohort members aborts the ongoing transaction.
  2. If a participating member fails before commit is fired, then it is simply ignored. If the copies/replicas go to zero for certain keys, then any subsequent update operations on those keys throws an exception as in the case of non-transactional updates. If a commit is fired in this state, then the whole transaction is aborted.
  3. If the transaction coordinator fails before completing the commit process (with or without sending the commit message to all cohorts), the surviving cohorts determine the outcome of the transaction.

    If all of the cohorts are in the PREPARED state and successfully apply changes to the cache without any unique constraint violations, the transaction is committed on all cohorts. Otherwise, if any member reports failure or the last copy the associated rows goes down during the PREPARED state, the transaction is rolled back on all cohorts.

  4. If a participating member fails before acknowledging to the client, then the transaction continues on other members without any interruption. However, if that member contains the last copy of a table or bucket, then the transaction is rolled back.
  5. The transaction coordinator might also fail while executing a rollback operation. In this case, a thin client would see such a failure as a SQLState error. If the client was performing a SELECT statement in a transaction, the member failure would result in SQLState error X0Z01::
    ERROR X0Z01: Node 'node-name' went down or data no longer available while iterating the results (method 'rollback()'). Please retry the operation. 

    Clients that were performing a DML statement in the context of a transaction would fail with one of the SQLState errors: X0Z05, X0Z16, 40XD2, or 40XD0.

    Note: Outside the scope of a transaction, a DML statement would not see an exception due to a member failure. Instead, the statement would be automatically retried on another GemFire XD member. However, SELECT statements would receive the X0Z01 statement even outside of a transaction.

Should this type of failure occur, the remaining members of the GemFire XD distributed system clean up the open transactions for the failed node, and no additional steps are needed to recover from the failure. A peer client connection would not see this exception because the peer client itself acts as the transaction coordinator.

Note: In this release of GemFire XD, a transaction fails if any of the cohorts depart abnormally.

Other Rollback Scenarios

GemFire XD may cancel an executing statement due to low memory, a timeout, or a manual request to cancel the statement (see Cancelling Long-Running Statements). If a statement that is being executed within the context of a transaction is canceled due to low memory or a manual cancellation request, then GemFire XD rolls back the associated transaction. Note that GemFire XD does not roll back a transaction if a statement is canceled due to a timeout.