Programming Data-Aware Procedures and Result Processors

A procedure is an application function call or subroutine that is managed in the database server. Because multiple GemFire XD members operate together in a distributed system, procedure execution in GemFire XD can also be parallelized to run on multiple members, concurrently. A procedure that executes concurrently on multiple GemFire XD members is called a data-aware procedure.

Note: GemFire XD does not support executing DDL statements in the body of a procedure or function.
Data-aware procedures use an extended CALL syntax with an ON clause to designate the GemFire XD members on which the procedure executes. When you invoke a procedure, the GemFire XD syntax provides the option to parallelize the procedure execution on:
  • All data stores in the GemFire XD cluster
  • A subset of data stores (on members that belong to one or more server groups)
  • All data store members that host a table
  • All data store members that host a subset of data in a table

GemFire XD executes the user code in-process to where the data resides, which provides very low-latency access to colocated data. (This is in contrast to map-reduce job execution frameworks like Hadoop, where data has to be streamed from processes into a Task process.) Procedures often return one or more result sets. GemFire XD streams a result set to one coordinating member that can perform the reduction step on the results (typically this involves aggregation, as in map-reduce). In GemFire XD, the reduction step is carried out by a result processor. GemFire XD provides a default result processor, and you can also develop your own result processor implementations to customize the reduction step.

The sections that follow describe how to develop, configure, and invoke procedure and result processor implementations in GemFire XD.