Query Capabilities and Limitations

GemFire XD queries have a unique set of capabilities and limitations that are inherent to the distributed database design.

A general prerequisite for multi-table queries is that all of the partitioned tables involved in the query must reside in the same server groups. The partitioned tables' server groups must be a subset of the server groups for any replicated tables in the query. Essentially, this means that GemFire XD peers that host partitioned tables data stores should also host replicated tables. Configuring server groups in this manner enables GemFire XD to service equijoins that involve partitioned and replicated tables locally on the data stores.

Certain types of equijoin queries and inner queries are not supported depending on the table types involved (replicated or partitioned). Updates are not supported on the partitioning or primary key columns.

The following table summarizes GemFire XD support for different types of queries.

Query type

Support level

Notes

Nested subqueries

Limited

GemFire XD does not support subqueries having more than one level of nesting.

Single-table queries

Supported with some exceptions

Basic queries that involve only a single table (either partitioned or replicated) work without any limitations. This includes queries with aggregate functions, group by clause, order by clause, and so forth.

Note:

GemFire XD does not support queries against a partitioned table that include a UNION, INTERSECTION, or EXCEPT clause.

A query that references a partitioned table cannot include a table query in the FROM clause. See Query of a partitioned table with a table query in the FROM clause.

Queries that involve one partitioned table and multiple replicated tables

Supported

Queries that involve a single partitioned table are supported without restriction.

See Query of a single partitioned table.

Queries that involve two or more partitioned tables

Supported with colocation criteria

Queries that involve two or more partitioned tables must satisfy the colocation criteria.

See About the colocation requirement for querying multiple, partitioned tables.

Outer join queries that involve multiple, partitioned tables

Supported with colocation criteria

Queries that involve two or more partitioned tables must satisfy the colocation criteria.

See Outer join query of multiple, partitioned tables.

Outer join queries that involve a single partitioned table

Supported

See Query of a single partitioned table.

See Outer join query of multiple replicated tables and a single partitioned table.

Non-correlated subquery with outer query of a partitioned table

Supported*

*Non-correlated queries of partitioned and replicated tables are supported. However, GemFire XD may attempt to converts such queries into equijoin queries. This optimization is possible only if the query satisfies the colocation criteria.

Non-correlated subquery with an outer query of a partitioned table

Non-correlated subquery with outer query of a replicated table

Supported

Non-correlated subquery with an outer query of a replicated table

Correlated subquery with outer and inner queries of partitioned tables

Supported with colocation criteria

Correlated subqueries involving partitioned tables must satisfy the colocation criteria.

Correlated subquery with outer and inner queries of partitioned tables

Correlated subquery with inner query of a replicated table

Supported

Correlated subquery with inner query of replicated table

Correlated subquery with outer query of a replicated table inner query having one or more partitioned tables

Not Supported

Correlated subquery with outer query of a replicated table inner query of a partitioned table

The sections that follow use the following set of example tables to describe query support and limitations.

create table trade.securities (sec_id int not null, symbol varchar(10) not null, 
     price decimal (30, 20), exchange varchar(10) not null, tid int, constraint sec_pk primary key (sec_id) )replicate

create table trade.customers (cid int not null, cust_name varchar(100), since date, addr varchar(100), tid int, primary key (cid)) 
     partition by column (cust_name) 

create table trade.portfolio (cid int not null, sid int not null, qty int not null, availQty int not null, subTotal decimal(30,20), 
     tid int, constraint portf_pk primary key (cid, sid), constraint cust_fk foreign key (cid) references trade.customers (cid) on delete restrict, 
     constraint sec_fk foreign key (sid) references trade.securities (sec_id)) partition by column (cid) colocate with (trade.customers) 

create table trade.sellorders (oid int not null constraint orders_pk primary key, cid int, sid int, qty int, ask decimal (30, 20), 
     status varchar(10) default 'open', tid int, constraint portf_fk foreign key (cid, sid) references trade.portfolio (cid, sid)) replicate 

create table trade.networth (cid int not null, cash decimal (30, 20), securities decimal (30, 20), loanlimit int, availloan decimal (30, 20), 
     tid int, constraint netw_pk primary key (cid), constraint cust_newt_fk foreign key (cid) references trade.customers (cid) on delete restrict ) 
     partition by column (cid) colocate with(trade.customers) 

create table trade.buyorders(oid int not null constraint buyorders_pk primary key, cid int, sid int, qty int, bid decimal (30, 20), status varchar(10), 
     tid int, constraint bo_sec_fk foreign key (sid) references trade.securities (sec_id) on delete restrict) partition by column (cid)

In the above system, securities and sellorders are replicated tables, while customers, portfolio,networth and buyorders are partitioned tables. The portfolio and networth tables are colocated with customers, partitioned on cid column. The buyorders table is not colocated with any of the partitioned tables.

All tables are assigned to the default server group.

Query of a single partitioned table

Simple queries that involve only one partitioned table (and possibly multiple replicated tables) work without any limitations. For example each of the following queries is supported:

select * from trade.portfolio, trade.securities, trade.sellorders
select * from trade.securities s, trade.portfolio f where sec_id = f.sid and f.tid = ?
select * from trade.securities s, trade.portfolio f, trade.sellorders so where s.sec_id = f.sid and so.sid = s.sec_id and f.tid = ?

About the colocation requirement for querying multiple, partitioned tables

Queries that involve two or more partitioned tables (with or without additional, replicated tables) are supported only if they satisfy both of these colocation criteria:

  1. The query's WHERE clause has equijoin conditions on all of the partitioning columns for all of the partitioned tables in the query.

  2. All partitioned tables in the query are colocated.

These criteria allow GemFire XD to distribute a join query to all of the data stores that host the data. The query is executed concurrently on each local data store without having to move table data from one member to another to perform the join. Joins are performed on the local data set of each data store, and the main query member aggregates the results from each data store to obtain the result set.

When determining colocation, note that if a table B is colocated with table A, and Table C is colocated with Table A, then tables B and C are also colocated. Similarly, if a table C is colocated with table B, and B is colocated with table A, then tables C and A are also colocated. For example, consider the following query:

select * from trade.customers c, trade.securities s, trade.portfolio f where c.cid = f.cid and sec_id = f.sid and f.tid = ?

The query above has an equijoin condition c.cid = f.cid, which associates the portfolio and customers table on the partitioning column, cid. The two tables are colocated on cid, so GemFire XD can execute this query. The fact that the table securities is replicated does not impose any requirement for the equijoin criteria.

The following example shows an example query that meets the equijoin requirements:

select * from trade.customers c, trade.networth n, trade.portfolio p where n.cid = c.cid and n.cid = p.cid and n.tid = ? and c.cid >?

The query above involves three partitioned tables. Because the networth table is colocated with customers, and portfolio is colocated with customers, the three tables are mutually colocated. The partitioning column of each table forms an equi-join condition with the other. The presence of two equi-join conditions n.cid = c.cid and n.cid = p.cid satisfies the colocation criteria, so GemFire XD can execute this query.

Keep in mind that both colocation criteria must be met. Even if the partitioned tables in a query are colocated, GemFire XD does not support queries where the join condition is applied to non-partitioning columns. Queries of this type satisfy only one but not both of the criteria. For example, even though both the customers and portfolio tables are colocated, GemFire XD does not support this query because tid is not a partitioning column:

select * from trade.customers c, trade.portfolio f where c.tid = f.tid

Similarly, join queries on non-colocated partitioned tables are not supported. This query has an equijoin condition on the partitioning column of both tables, but GemFire XD cannot be execute it because the tables are not colocated:

select * trade.buyorders b, trade.customers c where b.cid = c.cid

Finally, GemFire XD does not support queries involving multiple partitioned tables that do not have a join condition. For example, this query is not supported:

Select * from trade.customers c, trade.portfolio f where f.tid = ?

Outer join query of multiple, partitioned tables

In order to execute an outer join against more than one partitioned table, you must satisfy the colocation criteria, which requires a join condition on the partitioning columns. For example, consider the query:

select * from trade.customers c LEFT OUTER JOIN trade.portfolio f LEFT OUTER JOIN trade.sellorders so on f.cid = so.cid on c.cid= f.cid where f.tid = ?

This query involves two partitioned tables and one replicated table. Because the cid column is the partitioning column of the customers and portfolio tables, GemFire XD can execute the query.

Outer join query of multiple replicated tables and a single partitioned table

Outer join queries that involve only one partitioned table (and possibly multiple replicated tables) are supported without meeting any additional criteria. For example, GemFire XD can execute this query because it references only one partitioned table:

select * from trade.portfolio f LEFT OUTER JOIN trade.sellorders so on f.cid = so.cid on c.cid= f.cid where f.tid = ? 

Non-correlated subquery with an outer query of a partitioned table

GemFire XD supports non-correlated subqueries where the outer query table is partitioned and the inner query table is either replicated or partitioned. However, GemFire XD only optimizes the independent queries into an equi-join query if the colocation criteria are satisfied. If the colocation criteria is not satisfied, GemFire XD evaluates the subquery independently (using map and reduce functions).

For example, GemFire XD converts this query into an equijoin query because customer and portfolio are colocated on cid:

select * from trade.customers c where c.cid IN (select f.cid from trade.portfolio f where qty > 297) and tid =?

GemFire XD executes this query without converting it into an equijoin, because the tables are not colocated on tid:

select * from trade.customers c where c.tid IN (select f.tid from trade.portfolio f where qty > 297)

This query is also executed by converting it into an equijoin, because the inner table is replicated and does not require colocation criteria:

select * from trade.portfolio f where f.sid IN ( select sec_id from trade.securities ).

Non-correlated subquery with an outer query of a replicated table

GemFire XD supports non-correlated subqueries where the outer query table is replicated and the inner query table is either replicated or partitioned. If the inner query table is replicated, GemFire XD optimizes the independent queries into an equi-join query if possible. If the inner query table is partitioned, then GemFire XD executes the query without attempting an equijoin conversion.

For example, GemFire XD executes this query without converting it to an equijoin, because the inner query table is partitioned:

select * from trade.securities s where s.sec_id IN ( select f.sid from trade.portfolio f ) and s.tid = ?

GemFire XD converts this query into an equijoin because both the inner and outer query tables are replicated:

select * from trade.securities s where s.sec_id IN ( select so.sid from trade.sellorders so ) and s.tid = ?

Correlated subquery with outer and inner queries of partitioned tables

GemFire XD can execute a correlated subquery having inner and outer queries of partitioned tables only if the query satisfies the colocation criteria.

For example, GemFire XD can execute this query because of the presence of an equijoin condition on cid columns of the customers and portfolio tables:

select * from trade.customers c where exists (select * from trade.portfolio f where c.cid = f.cid and qty > 297) and tid =?

Correlated subquery with inner query of replicated table

GemFire XD can execute a correlated subquery when inner query is against a replicated table and the outer query is against a partitioned or replicated table.

For example, the following query contains two correlated subqueries. GemFire XD can execute the first subquery because both the customers and portfolio tables satisfy the colocation criteria, and it can execute the second subquery because the inner query is against a replicated table, sellorders.

select * from trade.customers c where EXISTS (select * from trade.portfolio f where c.cid = f.cid and tid =?) 
   and NOT EXISTS (select * from trade.sellorders s where c.cid = s.cid and status IN ('open','filled'))

Table Subquery having table expression in a From Clause and involving Partitioned Table

Query of a partitioned table with a table query in the FROM clause

Any query that includes a partitioned table cannot include a table query in the FROM clause. For example, GemFire XD cannot execute this query because the FROM clause includes a table subquery and the query references a partitioned table:

select * from (select cid, sid, qty, tid from trade.portfolio f) where tid = 7

Correlated subquery with outer query of a replicated table inner query of a partitioned table

If an outer query involves only replicated tables, then the presence of a partitioned table in the correlated subquery renders the query incompatible with GemFire XD. For example, GemFire XD cannot execute the query:

select sec_id, price from securities s 
     where price > (select Avg(qty) from portfolio f where sec_id = f.sid and f.tid =4 and qty <> 0 ) and tid =4