Estimating GemFire XD Heap Overhead and Table Memory Requirements

GemFire XD requires different amounts of heap memory overhead per table and index entry depending on whether you persist table data or configure tables for overflow to disk. If you already have representative data, query the SYS.MEMORYANALYTICS table to obtain a more accurate picture of the memory required to store your data.

Note: All heap overhead values are approximate. Be sure to validate your estimates in a test environment with representative data.
Table 1. Approximate Heap Memory Overhead for GemFire XD Table Entries
Table is persisted? Overflow is configured? Approximate heap overhead
No No 64 bytes
Yes No 120 bytes
Yes Yes 152 bytes
Note: For a persistent, partitioned table, GemFire XD uses an additional 16 bytes per entry used to improve the speed of recovering data from disk. When an entry is deleted, a tombstone entry of approximately 13 bytes is created and maintained until the tombstone expires or is garbage-collected in the member that hosts the table. (When an entry is destroyed, the member temporarily retains the entry to detect possible conflicts with operations that have occurred. This retained entry is referred to as a tombstone.)
Note: These overhead numbers apply to JVM heap overhead even if you have off-heap memory enabled for your table. To obtain a rough estimate of your heap memory requirements, add these overhead figures to the estimated size of each table stored in the heap and then add the estimated overhead associated with your index entries.
Table 2. Approximate Heap Memory Overhead for GemFire XD Index Entries
Type of index entry Approximate heap overhead
New index entry 80 bytes
First non-unique index entry 24 bytes
Subsequent non-unique index entry 8 bytes to 24 bytes*

*If there are more than 100 entries for a single index entry, the heap overhead per entry increases from 8 bytes to approximately 24 bytes.

Note: Table indexes are always stored in the JVM heap even if you have off-heap memory enabled for your table.

Calculating Total Memory Requirements with Overhead for Table Data

To calculate the total amount of memory that your table will require, use the SYS.MEMORYANALYTICS table on a representative data set that is as close as possible to your production data set.

If you are storing all your table data in the Java heap, multiply TOTAL_SIZE by the total number of rows (NUM_ROWS) for your table. Then add the per-entry and per-index overhead figures described in the tables above. This will provide an estimate of how much memory you need to allocate to the JVM heap.

If you are storing your table data in off-heap memory, you will need to calculate both heap memory requirements and off-heap memory requirements. To calculate your heap memory requirements, use the tables in the previous section to calculate the per-entry and per-index heap overhead. To calculate the amount of off-heap memory that your off-heap table will require, multiply VALUE_SIZE_OFFHEAP by the number of rows (NUM_ROWS) that will be in your table. This provides the amount of off-heap memory you will need to allocate for this table on your data stores.

See Viewing Memory Usage in SYS.MEMORYANALYTICS for more details.

Calculating Heap Memory Requirements for DML and Query Execution against Off-Heap Tables

If you store table data in off-heap memory, GemFire XD also consumes heap memory to perform UPDATE and DELETE operations against those tables, as well as to perform certain types of queries against the tables. The approximate amount of heap memory required for a DML operation or query is m * n * 64 bytes, where m is the estimated number of concurrent operations or qualifying queries performed against the table and n is the estimated number of rows modified or accessed during each operation or query. Repeat this calculation for each DML operation or qualifying query and use the maximum value in your capacity planning.

The full list of off-heap table operations that consume additional heap memory are:
  • UPDATE operations.
  • DELETE operations.
  • Single-table queries that use the ORDER BY clause.
  • Single-table GROUP BY queries that use the DISTINCT clause. For example: select max(distinct c1) from T1 group by c2;
  • Single-table GROUP BY queries that cannot use an index to sort results on the grouped column. For example: select sum(c1) from T1 where c3 > 10 group by c2; consumes additional heap memory if the optimizer cannot use an index on c2.
  • Union-based queries where each subquery is on a single table, such as: select * from t1 union select * from t2;
  • Intersection-based queries where each subquery is on a single table, such as: select * from t1 intersection select * from t2;
  • Correlated subqueries that the optimizer cannot convert into an equi-join.

In general, all multi-table queries that have an explicit join (or that the optimizer can convert into an equi-join) do not consume additional heap memory during execution.