The key design goal for achieving linear scaling is to use a partitioning strategy that
allows most data access (queries) to be pruned to a single partition. This avoids expensive
locking operations across multiple partitions during query execution.
In a highly concurrent system that has thousands of connections, multiple queries are generally
spread uniformly across the entire data set (and therefore across all partitions).
Therefore, increasing the number of data stores enables linear scalability. Given
sufficient network performance, additional connections can be supported without
degrading the response time for queries.
The general general strategy for designing a GemFire XD database is to identify the tables to
partition or replicate in the GemFire XD cluster, and then to determine the correct
partitioning key(s) for partitioned tables. This usually requires an iterative process
to produce the optimal design:
- Read Identify Entity Groups and Partitioning Keys and
Replicate Dimension Tables
to understand the basic rules for defining partitioned or replicated tables.
- Evaluate your data access patterns to
define those entity groups that are candidates for partitioning. Focus your
efforts on commonly-joined entities. Remember that all join queries must
be performed on data that is co-located. In this release, GemFire XD only
supports joins where the data is co-located. Co-located data is also important
for transaction updates, because the transaction can execute without requiring
distributed locks in a multi-phase commit protocol.
- Identify all of the tables tables in
the entity groups.
- Identify the "partitioning key" for
each partitioned table. The partitioning key is the column or set of columns
that are common across a set of related tables. Look for parent-child
relationships in the joined tables. The primary key of a root entity is
generally also the best choice for partitioning key.
GemFire XD supports
distributed queries by parallelizing the query execution across data stores.
However, each query instance on a partition can only join rows that are
collocated with the partitioned data. This means that queries can join rows
between a partitioned table and any number of replicated tables hosted on
the data store with no restrictions. But queries that join multiple,
partitioned tables have to be filtered based on the partitioning key. Query
examples are provided in this section and in Query Capabilities and Limitations
- Identify all of the tables that are
candidates for replication. You can replicate table data for high availability,
or to co-locate table data that is necessary to execute joins.
Example: Adapting a Database Schema for GemFire XD shows how
to apply these steps to deploy a customer order management system schema to GemFire XD.