Example Query Plan Analysis

This example uses the installed ToursDB database to explain how to analyze basic query plans in GemFire XD.

To follow along with this example, you must have access to a GemFire XD installation and the ToursDB scripts in the quickstart subdirectory.

  1. Set your PATH environment variable to include the bin subdirectory of the GemFire XD installation.
  2. Start a single GemFire XD server. For example:
    $ mkdir query-plan-example
    $ cd query-plan-example
    $ gfxd server start
  3. Change to the quickstart subdirectory of your GemFire XD installation:
    $ cd ~/Pivotal_GemFireXD_13_bNNNNN_platform/quickstart
  4. Connect to the server as a peer, and execute the scripts to create the sample database:
    $ gfxd
    gfxd> connect peer 'host-data=false;mcast-port=10334';
    gfxd> run 'ToursDB_schema.sql';
    gfxd> run 'loadTables.sql';
  5. Increase the gfxd maximum display width in order to show the full query plan output:
    gfxd> maximumdisplaywidth 10000;
  6. Execute the EXPLAIN command with the following query:
    gfxd> explain select * from flights f, flightavailability fa where f.flight_id=fa.flight_id and f.flight_id='AA1116';
    You receive output similar to:
    MEMBER_PLAN                                                                                                                     
    --------------------------------------------------------------------------------------------------------------------------------
    stmt_id 00000001-ffff-ffff-ffff-000100000509
    SQL_stmt select * from flights f, flightavailability fa where f.flight_id=fa.flight_id and f.flight_id=<?>
     begin_execution 2013-12-04 15:01:02.864 end_execution 2013-12-04 15:01:02.92 (0 seconds elapsed) 
    |   |(1)QUERY-SCATTER (48.58%) execute_time 50.816ms (serialize/deserialize=0.054ms + process=50.762ms + throttle=0.0ms) member_node ward(1335)<v0>:21787
    |   |   |(5)QUERY-SEND (0.31%) execute_time 0.32ms (serialize/deserialize=0.054ms + process=0.266ms + throttle=0.0ms) member_node ward(1335)<v0>:21787
    |   |   |---RESULT-RECEIVE (0.21%) execute_time 0.223ms member_node ward(1335)<v0>:21787
    |   |   |(3)SEQUENTIAL-ITERATION (1.33%) execute_time 1.391ms returned_rows 14 no_opens 1
    |   |   |   |(4)RESULT-HOLDER (1.22%) execute_time 1.279ms (serialize/deserialize=0.0ms + process=1.279ms + throttle=0.0ms) returned_rows 14 no_opens 1 member_node ward(1335)<v0>:21787
    |   |   |(2)DISTRIBUTION-END (48.35%) execute_time 50.568ms (construct=0.0ms + open=50.457ms + next=1.436ms + close=0.066ms) returned_rows 14
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
    member ward(1335)<v0>:21787 begin_execution 2013-12-04 15:01:02.866 end_execution 2013-12-04 15:01:02.914 (0 seconds elapsed) 
    |   |(1)QUERY-RECEIVE (69.12%) execute_time 47.138ms (serialize/deserialize=0.0ms + process=47.138ms + throttle=0.0ms) member_node ward(1342)<v1>:14611
    |   |   |---RESULT-SEND (0.06%) execute_time 0.044ms member_node ward(1342)<v1>:14611
    |   |   |   |---RESULT-HOLDER (4.27%) execute_time 2.911ms returned_rows 14 no_opens 1 member_node NULL
    |   |   |   |   |---NLJOIN (0.36%) execute_time 0.246ms returned_rows 14 no_opens 1
    |   |   |   |   |   |(5)FILTER (4.50%) execute_time 3.069ms (construct=0.0040ms + open=0.127ms + next=9.794ms + close=0.737ms) returned_rows 1 no_opens 1 node_details SELECT :FLIGHT_ID SEGMENT_NUMBER ORIG_AIRPORT DEPART_TIME DEST_AIRPORT ARRIVE_TIME MEAL FLYING_TIME MILES AIRCRAFT WHERE :(F.FLIGHT_ID = CONSTANT:AA1116)
    |   |   |   |   |   |   |(3)ROWIDSCAN (6.62%) execute_time 4.514ms (construct=0.041ms + open=0.119ms + next=6.742ms + close=0.732ms) returned_rows 542 no_opens 1 node_details FLIGHTS : 
    |   |   |   |   |   |   |   |(4)INDEXSCAN (4.62%) execute_time 3.154ms (construct=0.034ms + open=0.107ms + next=2.284ms + close=0.729ms) returned_rows 542 no_opens 1 visited_rows 542 scan_qualifiers None next_qualifiers NULL scanned_object DESTINDEX scan_type  node_details 
    |   |   |   |   |   |---ROWIDSCAN (0.24%) execute_time 0.163ms returned_rows 14 no_opens 1 node_details FLIGHTAVAILABILITY : 
    |   |   |   |   |   |   |(2)CONSTRAINTSCAN (10.21%) execute_time 6.962ms (construct=0.0050ms + open=1.28ms + next=0.158ms + close=5.519ms) returned_rows 14 no_opens 1 visited_rows 17 scan_qualifiers None next_qualifiers NULL scanned_object APP.6__FLIGHTAVAILABILITY__FLIGHT_ID__SEGMENT_NUMBER:base-table:APP.FLIGHTAVAILABILITY scan_type  node_details WHERE : ((F.FLIGHT_ID = FA.FLIGHT_ID) and true)
    
    
    2 rows selected

    In this example, the query is distributed from the peer client to the single data store in the cluster.

    The body of a GemFire XD query plan is divided into two main sections identified by the query plan codes QUERY-SCATTER and QUERY-RECEIVE. Each section contains indented rows showing nested tasks. Because each task feeds into the task above it, you should read the query plan starting with the bottom, most-nested task in each section. Individual task rows show the task execution time in milliseconds, as well as the overall percentage of time spent completing that task relative to all of the tasks in the section, combined.

    QUERY-SCATTER is displayed only when you use the EXPLAIN stmt_id syntax. QUERY-SCATTER includes all of the tasks associated with delivering the query from the query coordinator to one or more data stores in the cluster, and then receiving and processing results on the query coordinator. This example uses a single data store member to provide all results, so very little time is spent on the query coordinator. When data is distributed across multiple data stores (for partitioned and/or replicated tables) additional time will be spent on the query coordinator to merge the results from each data store.

    The QUERY-RECEIVE section of the plan describes the query processing tasks that a data store member takes in order to execute a query. This header displays the member ID and process ID of the data store, as well as the total aggregate time for all of the subordinate tasks. If the query is distributed to multiple data store members, then each member will display a separate QUERY-RECEIVE section. Each section should be identical, unless tables are partitioned across different sets of machines of VMs.

    The example shows that a nested loop join (NLJOIN) is performed in order to produce the results. NLJOIN always has two nested subtasks that represent the outer and inner tables in the join. In this example, a FILTER is first performed on the FLIGHTS table, which holds the foreign key FLIGHT_ID. The scan returns a single row. The inner table then uses the results of the filter in a CONSTRAINTSCAN performed on the FLIGHTAVAILABILITY table.

    Table scans are very expensive operations in query processing, and even this quick query can be improved by avoiding a scan of the FLIGHTS table. The remaining steps show how creating an index changes the query plan and improves performance.

  7. To avoid the costly table scan used in the sample query, execute the following command to create an index on the FLIGHT_ID column of the FLIGHTS table:
    gfxd> CREATE INDEX fa_index ON flights (flight_id);
  8. Re-run the query, creating a new query plan. Then examine the new plan:
    gfxd> explain select * from flights f, flightavailability fa where f.flight_id=fa.flight_id and f.flight_id='AA1116';
    You should receive output similar to:
    MEMBER_PLAN                                                                                                                     
    --------------------------------------------------------------------------------------------------------------------------------
    stmt_id 00000001-ffff-ffff-ffff-000100000511
    SQL_stmt select * from flights f, flightavailability fa where f.flight_id=fa.flight_id and f.flight_id=<?>
     begin_execution 2013-12-04 15:14:43.015 end_execution 2013-12-04 15:14:43.033 (0 seconds elapsed) 
    |   |(1)QUERY-SCATTER (48.82%) execute_time 15.154ms (serialize/deserialize=0.019ms + process=15.135ms + throttle=0.0ms) member_node ward(1335)<v0>:21787
    |   |   |(5)QUERY-SEND (0.68%) execute_time 0.212ms (serialize/deserialize=0.019ms + process=0.193ms + throttle=0.0ms) member_node ward(1335)<v0>:21787
    |   |   |---RESULT-RECEIVE (0.21%) execute_time 0.064ms member_node ward(1335)<v0>:21787
    |   |   |(3)SEQUENTIAL-ITERATION (1.15%) execute_time 0.357ms returned_rows 14 no_opens 1
    |   |   |   |(4)RESULT-HOLDER (0.89%) execute_time 0.276ms (serialize/deserialize=0.0ms + process=0.276ms + throttle=0.0ms) returned_rows 14 no_opens 1 member_node ward(1335)<v0>:21787
    |   |   |(2)DISTRIBUTION-END (48.25%) execute_time 14.978ms (construct=0.0ms + open=14.926ms + next=0.385ms + close=0.024ms) returned_rows 14
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
    member ward(1335)<v0>:21787 begin_execution 2013-12-04 15:14:43.015 end_execution 2013-12-04 15:14:43.03 (0 seconds elapsed) 
    |   |(1)QUERY-RECEIVE (71.84%) execute_time 13.975ms (serialize/deserialize=0.0ms + process=13.975ms + throttle=0.0ms) member_node ward(1342)<v1>:14611
    |   |   |---RESULT-SEND (0.11%) execute_time 0.021ms member_node ward(1342)<v1>:14611
    |   |   |   |(3)RESULT-HOLDER (8.35%) execute_time 1.624ms (serialize/deserialize=0.593ms + process=1.031ms + throttle=0.0ms) returned_rows 14 no_opens 1 member_node NULL
    |   |   |   |   |(5)NLJOIN (1.05%) execute_time 0.204ms (construct=0.0010ms + open=0.219ms + next=3.074ms + close=0.506ms) returned_rows 14 no_opens 1
    |   |   |   |   |   |---ROWIDSCAN (0.21%) execute_time 0.04ms returned_rows 1 no_opens 1 node_details FLIGHTS : 
    |   |   |   |   |   |   |(4)INDEXSCAN (3.22%) execute_time 0.627ms (construct=0.0080ms + open=0.097ms + next=0.021ms + close=0.501ms) returned_rows 1 no_opens 1 visited_rows 0 scan_qualifiers None next_qualifiers NULL scanned_object FA_INDEX scan_type  node_details WHERE : ((F.FLIGHT_ID = CONSTANT:AA1116) and true)
    |   |   |   |   |   |---ROWIDSCAN (0.67%) execute_time 0.13ms returned_rows 14 no_opens 1 node_details FLIGHTAVAILABILITY : 
    |   |   |   |   |   |   |(2)CONSTRAINTSCAN (14.56%) execute_time 2.833ms (construct=0.0040ms + open=0.073ms + next=0.113ms + close=2.643ms) returned_rows 14 no_opens 1 visited_rows 17 scan_qualifiers None next_qualifiers NULL scanned_object APP.6__FLIGHTAVAILABILITY__FLIGHT_ID__SEGMENT_NUMBER:base-table:APP.FLIGHTAVAILABILITY scan_type  node_details WHERE : ((F.FLIGHT_ID = FA.FLIGHT_ID) and true)
    
    
    2 rows selected

    Although the query on the original, unindexed table completed very quickly, note that the index scan query completed in less than half the time. This performance improvement is magnified as the number of rows increases, so always ensure that indexes are used when possible for long-running queries.