Implement a Partitioning Strategy

In this tutorial you drop all tables in the ToursDB schema and then recreate them using a new partitioning and replication strategy.

The ToursDB schema in this tutorial is similar to a 'STAR' schema, having only a few fact tables and several dimension tables. Dimension tables are generally small and change infrequently, but are commonly used in join queries. Dimension tables are good candidates for replicating across GemFire XD members, because join queries can execute in parallel.

The AIRLINES, CITIES, COUNTRIES, and MAPS tables are treated as dimension tables, and are replicated across the GemFire XD cluster. In the tutorial it is assumed that applications frequently join these related tables based on the FLIGHT_ID column, which is chosen as the partitioning column.

FLIGHTS, FLIGHTS_HISTORY, and FLIGHTAVAILABILITY are fact tables, and they will be partitioned. You will co-locate these tables to ensure that all rows that are associated with FLIGHT_ID are maintained in a single partition. This step ensures that frequent join queries based on a selected flight are pruned to a single member and executed efficiently.

Procedure
  1. In a separate terminal window or GUI editor, open the create_colocated_schema.sql file in the /quickstart directory to examine the included DDL commands. The SQL script begins by dropping the existing tables in the schema:

    DROP TABLE AIRLINES;
    DROP TABLE CITIES;
    DROP TABLE COUNTRIES;
    DROP TABLE FLIGHTAVAILABILITY;
    DROP TABLE FLIGHTS;
    DROP TABLE MAPS;
    DROP TABLE FLIGHTS_HISTORY;

    Dimension tables can be replicated using the same basic CREATE statement from the preceding section of the tutorial. However, this script explicitly adds the REPLICATE keyword for clarity. For example:

    CREATE TABLE AIRLINES
       (
          AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY,
          AIRLINE_FULL VARCHAR(24),
          BASIC_RATE DOUBLE PRECISION,
          DISTANCE_DISCOUNT DOUBLE PRECISION,
          BUSINESS_LEVEL_FACTOR DOUBLE PRECISION,
          FIRSTCLASS_LEVEL_FACTOR DOUBLE PRECISION,
          ECONOMY_SEATS INTEGER,
          BUSINESS_SEATS INTEGER,
          FIRSTCLASS_SEATS INTEGER
       ) REPLICATE;

    The FLIGHTS table is partitioned based on the FLIGHT_ID column:

    CREATE TABLE FLIGHTS
       (
          FLIGHT_ID CHAR(6) NOT NULL ,
          SEGMENT_NUMBER INTEGER NOT NULL ,
          ORIG_AIRPORT CHAR(3),
          DEPART_TIME TIME,
          DEST_AIRPORT CHAR(3),
          ARRIVE_TIME TIME,
          MEAL CHAR(1),
          FLYING_TIME DOUBLE PRECISION,
          MILES INTEGER,
          AIRCRAFT VARCHAR(6),
          CONSTRAINT FLIGHTS_PK PRIMARY KEY (
                                FLIGHT_ID,
                                SEGMENT_NUMBER),
          CONSTRAINT MEAL_CONSTRAINT
              CHECK (meal IN ('B', 'L', 'D', 'S'))
    
       )
       PARTITION BY COLUMN (FLIGHT_ID);

    The remaining facts tables are also partitioned, and are colocated with the FLIGHTS table. For example:

    CREATE TABLE FLIGHTAVAILABILITY
       (
          FLIGHT_ID CHAR(6) NOT NULL ,
          SEGMENT_NUMBER INTEGER NOT NULL ,
          FLIGHT_DATE DATE NOT NULL ,
          ECONOMY_SEATS_TAKEN INTEGER DEFAULT 0,
          BUSINESS_SEATS_TAKEN INTEGER DEFAULT 0,
          FIRSTCLASS_SEATS_TAKEN INTEGER DEFAULT 0,
          CONSTRAINT FLIGHTAVAIL_PK PRIMARY KEY (
                                      FLIGHT_ID,
                                      SEGMENT_NUMBER,
                                      FLIGHT_DATE),
          CONSTRAINT FLIGHTS_FK2 Foreign Key (
                FLIGHT_ID,
                SEGMENT_NUMBER)
             REFERENCES FLIGHTS (
                FLIGHT_ID,
                SEGMENT_NUMBER)
    
       )
       PARTITION BY COLUMN (FLIGHT_ID)
       COLOCATE WITH (FLIGHTS);
  2. In the gfxd session, execute the create_colocated_schema.sql script to drop the existing tables and recreate them with the new partitioning and replication strategy. Execute loadTables.sql to populate the tables with data:

    gfxd> run 'create_colocated_schema.sql';
    gfxd> run 'loadTables.sql';
  3. Confirm that the tables are created:

    gfxd> show tables in APP;
    TABLE_SCHEM      |TABLE_NAME                    |REMARKS            
    --------------------------------------------------------------------
    APP              |AIRLINES                      |                   
    APP              |CITIES                        |                   
    APP              |COUNTRIES                     |                   
    APP              |FLIGHTAVAILABILITY            |                   
    APP              |FLIGHTS                       |                   
    APP              |FLIGHTS_HISTORY               |                   
    APP              |MAPS                          |                   
    
    7 rows selected
  4. Verify whether individual tables are replicated or partitioned:
    gfxd> select tablename, datapolicy from sys.systables where tableschemaname='APP';
    TABLENAME                                                     |DATAPOLICY     
    ------------------------------------------------------------------------------
    AIRLINES                                                      |REPLICATE      
    COUNTRIES                                                     |REPLICATE      
    CITIES                                                        |REPLICATE      
    FLIGHTS_HISTORY                                               |PARTITION      
    FLIGHTAVAILABILITY                                            |PARTITION      
    FLIGHTS                                                       |PARTITION      
    MAPS                                                          |REPLICATE      
    
    7 rows selected
  5. The FLIGHTS table and others are now partitioned across the GemFire XD cluster. Query the sys.members table again to display the member IDs that host the table:

    gfxd> select id from sys.systables s, sys.members where s.tablename='FLIGHTS';
    ID                                                                             
    -------------------------------------------------------------------------------
    192.168.125.147(39486)<v1>:40258                                               
    192.168.125.147(39598)<v2>:23162                                               
    
    2 rows selected
  6. Now use the DSID Function to see how many rows of the partitioned FLIGHT table are stored on that GemFire XD server. For example:

    gfxd> select count(*) memberRowCount, dsid() from flights group by dsid();
    MEMBERROWCOUNT|2                                                               
    -------------------------------------------------------------------------------
    282           |192.168.125.147(39486)<v1>:40258                                
    260           |192.168.125.147(39598)<v2>:23162                                
    
    2 rows selected
    Note: The exact row count may be different for your GemFire XD distributed system.
  7. Execute a join on both partition members in parallel.

    gfxd> select flight_date, economy_seats_taken, firstclass_seats_taken from flights f, flightavailability fa 
          where f.flight_id = fa.flight_id and f.flight_id = 'AA1116';
    FLIGHT_DATE|ECONOMY_SEATS_TAKEN|FIRSTCLASS_SEATS_TAKEN
    ------------------------------------------------------
    2004-03-31 |2                  |2                     
    2004-04-11 |1                  |1                     
    2004-04-12 |2                  |2                     
    2004-04-15 |5                  |0                     
    2004-04-20 |10                 |0                     
    2004-04-23 |1                  |1                     
    2004-04-24 |2                  |2                     
    2004-05-03 |11                 |0                     
    2004-05-05 |1                  |1                     
    2004-05-06 |2                  |2                     
    2004-05-17 |1                  |1                     
    2004-05-18 |2                  |2                     
    2004-05-29 |1                  |1                     
    2004-05-30 |2                  |2                     
    
    14 rows selected
    The combined results are returned. Because the table is partitioned by FLIGHT_ID, the execution of the join is pruned to the partition that stores the value 'AA1116.' You can verify that flight_id 'AA1116' is located on only one data store using the query:
    gfxd> select count(*), dsid() from flights where flight_id = 'AA1116';
    1          |2                                                                  
    -------------------------------------------------------------------------------
    1          |192.168.125.147(39598)<v2>:23162                                   
    
    1 row selected