Overview of GemFire XD Distributed Transactions

All statements in a transaction are atomic. A transaction is associated with a single connection (and database) and cannot span connections. In addition to providing linear scaling, the GemFire XD transaction design minimizes messaging requirements, so that short-lived transactions are efficient.

Main Features of the GemFire XD Transaction Model

The GemFire XD transaction model uses these important features:
  • Each GemFire XD member that participates in a transaction maintains its own transaction state. Queries on the database always see committed data, and they do not need to acquire any locks; as a result, reads and writes can occur in parallel in the READ_COMMITTED isolation level.
  • During transactional writes, GemFire XD individually locks each copy of a row that is being updated on each member. This alleviates the need for a distributed lock manager, and it allows for greater scalability.

    Also, GemFire XD uses special read locks for REPEATABLE_READ and foreign key checks to ensure that those rows do not change for the duration of a transaction.

  • GemFire XD locks generally fail eagerly (fail-fast) with a conflict exception (SQLState: "X0Z02") if a lock cannot be obtained due to concurrent writes from other active transactions.

    An exception to this fail-fast behavior occurs when the GemFire XD member that initiates the transaction also hosts data for the transaction. In this case, GemFire XD batches the transaction on the local member for performance reasons, and conflicts may not be detected on other nodes until just before commit time when GemFire XD flushes the batched data.

    GemFire XD never batches operations for SELECT ... FOR UPDATE statements.

How the Transaction Model Works

When data is managed in partitioned tables, each row is implicitly owned by a single member for non-transactional operations. However, with distributed transactions, all copies of a row are treated as being equivalent, and updates are routed to all copies in parallel. This makes the transactional behavior for partitioned tables similar to the behavior for replicated tables. The transaction manager works closely with the GemFire XD membership management system to make sure that, irrespective of failures or adding/removing members, changes to all rows are either applied to all available copies at commit time, or they are applied to none.

Note: GemFire XD does not support adding new members to a cluster for an ongoing transaction. If you add a new member to the cluster in the middle of a transaction and the new member is to store data involved in the transaction, GemFire XD implicitly rolls back the transaction and throws a SQLException (SQLState: "X0Z05").

There is no centralized transaction coordinator in GemFire XD. Instead, the member on which a transaction was started acts as the coordinator for the duration of the transaction. If the application updates one or more rows, the transaction coordinator determines which owning members are involved, and acquires local "write" locks on all of the copies of the rows. At commit time, all changes are applied to the local cache and any redundant copies. If another concurrent transaction attempts to change one of the rows, the local "write" acquisition fails for the row, and that transaction is automatically rolled back. In the case where there is no persistent table involved, there is no need to issue a two-phase commit to redundant members; in this case, commits are efficient, single-phase operations.

Unlike traditional distributed databases, GemFire XD does not use write-ahead logging for transaction recovery in case the commit fails during replication or redundant updates to one or more members. The most likely failure scenario is one where the member is unhealthy and gets forced out of the distributed system, guaranteeing the consistency of the data. When the failed member comes back online, it automatically recovers the replicated/redundant data set and establishes coherency with the other members. If all copies of some data go down before the commit is issued, then this condition is detected using the group membership system, and the transaction is rolled back automatically on all members.