Pivotal GemFire, Pivotal HD, and Apache Derby Components

GemFire XD incorporates core Pivotal GemFire technology and Apache Derby RDBMS components to provide a high-performance, distributed database management system. GemFire XD extends standard SQL statements where necessary for creating and managing tables and configuring the GemFire XD system. GemFire XD also provides Pivotal HD integration components to make HDFS persistence files available to Hadoop tools and HAWQ.

The sections that follow document how GemFire XD utilizes the Pivotal GemFire and Derby component functionality.

Pivotal GemFire Technology

GemFire XD incorporates the following Pivotal GemFire technology:
  • Reliable data distribution
  • High performance replication and partitioning
  • Caching framework
  • Parallel 'data-aware' application behavior routing

The Pivotal community site provides a comparison of Pivotal GemFire XD to other data management systems, including Pivotal GemFire.

Pivotal HD Components

Note: GemFire XD 1.4 is certified only for stand alone use. It is not certified for installation or interoperability with Pivotal HD 2.1. Pivotal is working to certify GemFire XD 1.4 with an upcoming version of Pivotal HD for HDFS table support. See the Pivotal GemFire XD 1.4.0 Release Notes for more information.

If you are currently using GemFire XD 1.3.x with Pivotal HD for HDFS table support, continue to use your installed version of the product. Pivotal will provide maintenance releases for version 1.3.x until GemFire XD 1.4 is certified with an upcoming version of Pivotal HD.

GemFire XD stores table data in HDFS in a indexed format that supports read/write operations to the persisted data. This enables GemFire XD to easily persist large amounts of data to HDFS while providing indexed access that data for quick retrieval when SQL queries cannot be satisfied purely by the data available in memory.

However, GemFire XD also provides an InputFormatter component to enable direct processing of the HDFS data in the Hadoop ecosystem. Pivotal HD tools such as MapReduce can use the InputFormatter to access table data without acting as GemFire XD clients. Similarly, an OutputFormatter enables these tools to push table data back into the in-memory tier in a GemFire XD-compatible format.

GemFire XD also supports a PXF driver that enables HAWQ to query GemFire XD data stored in HDFS, without starting or accessing a GemFire XD distributed system. (The PXF driver is installed with HAWQ.)

Apache Derby RDBMS Components

GemFire XD integrates Pivotal GemFire functionality with several components of the Apache Derby relational database management system (RDBMS):
  • JDBC driver. GemFire XD supports a native, high-performance JDBC driver (peer driver) and a thin JDBC driver. The peer driver is based on the Derby embedded driver and JDBC 4.0 interfaces, but all communication with GemFire XD servers is implemented through the Pivotal GemFire distribution layer.
  • Query engine. GemFire XD uses Derby to parse the SQL queries and generate parse trees. GemFire XD injects its own logic for intermediate plan creation and distributes the plan to data stores in the cluster. GemFire XD also capitalizes on some aspects of the built-in optimizer in Derby to generate query plans. The query execution itself uses memory-based indexes and custom storage data structures. When query execution requires distribution, GemFire XD uses a custom algorithm to execute the query in parallel on multiple data stores.
  • Network server. GemFire XD servers embed the Derby network server for connectivity from thin JDBC, ODBC, and .NET clients. The communication protocol is based on the DRDA standard that is used by in IBM DB2 drivers.

GemFire XD modifies and extends the query engine and SQL interface to provide support for partitioned and replicated tables, data-aware procedures, data persistence, data eviction, and other features unique to the distributed GemFire XD architecture. GemFire XD also adds SQL commands, stored procedures, system tables, and functions to help easily manage features of the distributed system, such as persistent disk stores, listeners, and locators.