GemFire XD Glossary

bucket

The container for data that determines its storage site (or sites when there is redundancy), and the unit of migration for rebalancing.

colocation

A relationship between two tables whereby the buckets that correspond to the same values of their partitioning fields are guaranteed to be physically located in the same server or peer client. In GemFire XD, a table configured to be colocated with another table has a dependency on the other table. If the other table needs to be dropped, then the colocated tables must be dropped first.

data store

A server or peer client process that is connected to the distributed system and has the host-data property set to true. A data store is automatically part of the default server group, and may be configured to be part of other server groups.

default colocation

A colocation relationship that is set up automatically between tables when there is no COLOCATED WITH clause in the CREATE TABLE statement.

default server group

The anonymous server group that implicitly includes all servers in the distributed system. This is the server group that hosts the data for a table where there is no SERVER GROUPS clause in the CREATE TABLE statement, and there were no server groups specified in the CREATE SCHEMA statement for the schema that this table belongs to.

distributed system

A typical GemFire XD deployment is made up of a number of distributed processes that connect to each other to form a peer-to-peer network. These member processes may or may not host any data. The JDBC client process and all servers are always peer members of the distributed system. The members discover each other dynamically through a built-in multicast based discovery mechanism or using the GemFire XD locator service when TCP is more desirable. Sometimes a distributed system is also referred to as a GemFire XD cluster.

hash-partitioning

A partitioning strategy based on the hashcode of one or more fields, such that all the values with the same hashcode are placed into the same bucket.

HAWQ

A parallel SQL query engine that can read and write data to HDFS. HAWQ uses standards compliant SQL. GemFire XD utilizes the PXF driver installed with HAWQ to provide HDFS table data to HAWQ external tables.

HDFS

Hadoop Distributed File System. GemFire XD supports the HDFS implementation provided with Pivotal HD Enterprise.

heap

Memory allocated for use by the JVM. Heap memory undergoes garbage collection.

horizontal vs. vertical partitioning

Horizontal partitioning refers to partitioning strategies where a table is split by rows so that a bucket always contains entire rows. Vertical partitioning refers to strategies where a table is split by columns so that a bucket always contains entire columns. GemFire XD currently only supports horizontal partitioning strategies.

list-partitioning

A partitioning strategy based on specified lists of values of one or more fields. For example, a table could be list-partitioned on a string-valued field so that all the values for a specified list of string values are placed in the same bucket.

locator

A locator facilitates discovery of all members in a distributed system. This is a component that maintain a registry of all peer members in the distributed system at any given moment. Though typically started as a separate process (with redundancy for HA), a locator can also be embedded in any peer member (like a server). This opens a TCP port and all new members connect to this process to get initial membership information for the distributed system.

off-heap memory

Memory that is not part of the JVM's allocated heap but is allocated upon server startup for data storage. Off-heap memory is not managed by JVM garbage collection processes.

partitioned table

A table that manages large volumes of data by partitioning it into manageable chunks and distributing it across all the servers in its hosting server groups. Partitioning attributes, including the partitioning strategy can be specified by supplying a PARTITION BY clause in a CREATE TABLE statement. See also replicated table, partitioning strategy.

partitioning strategy

The policy used to determine the specific bucket for a field in a partitioned table. GemFire XD currently only supports horizontal partitioning , so an entire row is stored in the same bucket. You can hash-partition a table based on its primary key or on an internally-generated unique row id if the table has no primary key. Other partitioning strategies can be specified in the PARTITION BY clause in a CREATE TABLE statement. The strategies that are supported by GemFire XD include hash-partitioning on columns other than the primary key, range-partitioning , and list-partitioning.

peer client

Also known as the embedded client, this is a process that is connected to the distributed system using the GemFire Peer Driver. The member may or may not host any data depending on the configuration property host-data. By default, all peer clients will host data. Configuration describes how this property can be set at connection time. Essentially, the peer client can be configured to just be a "pure" client or can be a client as well as a data store. When hosting data, the member can be part of one or more server groups.

peer driver

JDBC driver packaged in gemfirexd.jar. The client connects to the distributed system using the GemFire XD driver with the URL jdbc:gemfirexd: and doesn't specify a host and port in the URL. This driver provides single-hop access to all the data managed in the distributed members. (The GemFire XD JDBC thin-client driver also supports one-hop access for lightweight client applications.)

PXF

A driver plug-in that enables HAWQ to query HDFS table data as an external table. The PXF driver is installed with HAWQ.

query coordinator

The process that executes the query and determines the overall plan. It may distribute the query to the appropriate servers that host the data. When using a peer client, the query coordinator is the peer client itself. When using a thin client, the query coordinator is the server member to which the client is connected.

range-partitioning

A partitioning strategy based on specified contiguous ranges of values of one or more fields. For example, a table could be range-partitioned on a date field so that all the values within a range of years are placed into the same bucket.

replicated table

A table that keeps a copy of its entire dataset locally on every data store in its server groups. GemFire XD creates replicated tables by default if you do not specify a PARTITION BY clause. See also partitioned table.

server

A JVM started with the gfxd server command, or any JVM that calls the FabricServer.start method. A GemFire XD server may or may not also be a data store, and may or may not also be a network server.

server group

A logical grouping of servers used for specifying which members will host data for table. Also used for load balancing thin client connections.

thin client

A process that is not part of the distributed system but is connected to the distributed system through a thin driver. The thin client connects to a single server in the distributed system which in turn may delegate requests to other members of the distributed system. JDBC thin clients can also be configured to provide one-hop access to data for lightweight client applications.

thin client driver

The JDBC thin driver bundled in the product (gemfirexd-client.jar). A process that is not part of the distributed system but is connected to it through a thin driver. The connection URL for this driver is of the form jdbc:gemfirexd://hostname:port/.