Example: Adapting a Database Schema for GemFire XD

If you have an existing database design that you want to deploy to GemFire XD, translate the entity-relationship model into a physical design that is optimized for GemFire XD design principles.

Guidelines for Adapting a Database to GemFire XD

This example shows tables from the Microsoft Northwind Traders sample database.



In order to adapt this schema for use in GemFire XD, follow the basic steps outlined in Design Principles of Scalable, Partition-Aware Databases:
  1. Determine the entity groups.
    Entity groups are generally course-grained entities that have children, grand children, and so forth, and they are commonly used in queries. This example chooses these entity groups:
    Entity group Description
    Customer This group uses the customer identity along with orders and order details as the children.
    Product This group uses product details along with the associated supplier information.
  2. Identify the tables in each entity group.
    Identify the tables that belong to each entity group. In this example, entity groups use the following tables.
    Entity group Tables
    Customer

    Customers

    Orders

    Shippers

    Order Details

    Product

    Product

    Suppliers

    Category

  3. Define the partitioning key for each group.

    In this example, the partitioning keys are:

    Entity group Partitioning key
    Customer CustomerID
    Product ProductID

    This example uses customerID as the partitioning key for the Customer group. The customer row and all associated orders will be collocated into a single partition. To explicitly colocate Orders with its parent customer row, use the colocate with clause in the create table statement:

    create table orders (<column definitions, constraints>)
    partition by (customerID)
    colocate with (customers);
    

    In this way, GemFire XD supports any queries that join any the Customers and Orders tables. This join query would be distributed to all partitions and executed in parallel, with the results streamed back to the client:

    select * from customer c , orders o where c.customerID = o.customerID;

    A query such as this would be pruned to the single partition that stores "customer100" and executed only on that GemFire XD member:

    select * from customer c, orders o where c.customerID = o.customerID 
    and c.customerID = 'customer100';

    The optimization provided when queries are highly selective comes from engaging the query processor and indexing on a single member rather than on all partitions. With all customer data managed in memory, query response times are very fast. Consider how the above query would execute if the primary key was not used to partition the table. In this case, the query would be routed to each partition member where an index lookup would be performed, even though only a single member might have any data associated with the query.

    Finally, consider a case where an application needs to access customer order data for several customers:

     select * from customer c, orders o 
    where c.customerID = o.customerID and c.customerID IN ('cust1', 'cust2', 'cust3');

    Here, GemFire XD prunes the query execution to only those partitions that host 'cust1', 'cust2', and 'cust3'. The union of the results is then returned to the caller.

    Note that the selection of customerID as the partitioning key means that the OrderDetails and Shippers tables cannot be partitioned and colocated with Customers and Orders (because OrderDetails and Shippers do not contain the customerID value for partitioning). If joins are required between these tables, then you may choose to replicate OrderDetails and Shippers to the datastores that host Customers and Orders, as described in the next step.

  4. Identify replicated tables.

    If we assume that the number of categories and suppliers rarely changes, those tables can be replicated in the GemFire XD cluster (replicated to all of the GemFire XD members that host the entity group). If we assume that the Products table does change often and can be relatively large in size, then partitioning is a better strategy for that table.

    So for the product entity group, table Products is partitioned by ProductID, and the Suppliers and Categories tables are replicated to all of the members where Products is partitioned.

    Applications can now join Products, Suppliers and categories. For example:

    select * from Products p , Suppliers s, Categories c 
    where c.categoryID = p.categoryID and p.supplierID = s.supplierID 
    and p.productID IN ('someProductKey1', ' someProductKey2', ' someProductKey3');

    In the above query, GemFire XD prunes the query execution to only those partitions that host 'someProductKey1', ' someProductKey2', and ' someProductKey3.'