Thursday, November 28, 2013

Optimizing Multi Value Compression

Sponsored by PRISE Ltd.
www.prisetools.com

Teradata MVC optimization
Techniques and effects

What is Multi Value Compression (MVC)?

Teradata RDBMS supports a nice feature: multi-value-compression. It enables to reduce the storage space allocated by the tables in the database, while - this is incredible - processing compressed data usually requires less resources (CPU and I/O) than the uncompressed.
The feature needs no additional licence or hardware components.

How does MVC work?

I give a short summary, if you are interested in the details please refer to Teradata documentation.

MVC can be defined in CREATE TABLE DDL or later added/modified by ALTER TABLE statements. User must define a 1..255 element list of values for each compressable columns. Those will be stored as compressed value, while others will be uncompressed.
If a column is compressed, each row has an additional area of 1..8 bits allocated (if N value is listed: upper(log2(N)) bits will be allocated). One of bit combinations means that the value is uncompressed (and allocates its corresponding space within the row layout), but all others mean compressed value, which will not allocate the value's place in the row.
The compress bits are allocated in every rows regardless the actual value is compressed or not.
Compress bits are "compacted", eg.: 3 + 8 + 4 = 15 compress bits will allocate 2 bytes with only 1 wasted bit instead of 3 byte aligned values.
The value belonging to each bit combinations are stored in the table header.

Multi Value Compression is:
  • Column level
    Have to be defined on each applicable columns of a table separately
  • "Manual"
    You have to calculate which values are worth to compress - Teradata gives no automatism
  • Static
    Once you defined the values it will not adapt to the changing conditions by itself
It is obvious that the current optimal settings of the compression depends on the data demography and the applied data types. Optimal setting may be different later, when data demography may be different.

Summary of most important properties of MVC once again:
  • Can be defined in the CREATE TABLE statement
  • Can be applied or modified later in an ALTER TABLE statement
  • Must be set on COLUMN level
  • Optimal value list must be calculated by you
  • Optimal settings may change in time. Optimize regularly.

Storage effects

Using MVC tables will allocate less PERM space, as can be calculated - simple.
What about the increment?
The table sizes usually grow along the time as more and more data is generated. The change in growth speed depands on the volatility of data demography. If it is stable then the growth speed will drop by the rate of compression. If typical values change in time than growth will not drop, or may speed up in extreme cases. However theese cases are when regular optimization is neccessary.

The growth look like this in stable demography cases:




Performance effects

It is a key question - what have to be payed for less storage

It is obvious that compression process requires resources during both compress and decompress phase.However there are processing gains also, which usually dominate the costs. How?

Compressed table will reside in proportionally less data blocks, therefore data fetching requires less I/O operations. In addition moving data in-memory (during processing) requires less CPU cycles.
While SELECTing table data usually small fragment of the row is used, and not used coulmns will not be decompressed.
Caching is a CPU intensive operation also, which is more effective if less data blocks are processed.
Compression helps tables to be treated as "small enough to cache 100% into memory", which results more effective execution plans.

Summary:
  • INSERT into a compressed table usually consume more CPU by 10..50% (only final step!)
  • SELECT usually cost no more, or less CPU than at uncompressed tables
  • SELECT and INSERT usually cost proportionally less I/O like the compression ratio
  • System level CPU and I/O usage usually drops by 5..10% (!) when compressing the medium and big tables of the system (caused by more effective caching)

How to set MVC?

Setting up the MVC compression on a single table should consist of the following 4 steps:
  1. Analyze the data demography of each compressible columns of the table *1.
  2. Calculate the optimal compress settings for the columns. Notice that
    •   Optimum should be calculated not on separated columns, but on table level, since compress bits are packed into whole bytes.
    •   The more values are listed as compressed, the more overhead is on compress. Proper mathematical formula is to be used for calculating the optimum. *2
    •   Take care of the exceptions: PI / FK / etc.columns and some data types are not compressible (varies in different Teradata versions).
  3. Assemble the corresponding scripts
    CREATE TABLE DDL + INSERT SELECT + RENAME / ALTER TABLE DDL
  4. Implement the compression by running the script
     Concern to take good care of data protection like: backups, locking, documenting.
*1 Simplified sample: 
     select top 256 <columnX> , count(*), avg(<length(columnX)>) from <table> group by 1 order by 2 desc; for each columns
 *2 About the algorithm: It is a maximum-seeking function (n) based on the expression of gains when specific TOP {(2^n)-1} frequent values are compressed. The expression is far more complex to discuss here because different datatypes, exceptions and internal storing constructions.

 One time or regular?

Optimal MVC setting is valid for a specific point in time, since your data changes along your business. The daily change is usually negligible, but it accumulates.
Practice shows that it is worth to review compress settings every 3..6 months, and continually optimize new tables, couple of weeks after coming into production.


Estimate how much space and processing capacity is lost if compress optimization is neglected!

 

Solution in practice

There are "magic excels" on the net, which can calculate the optimal settings if you load the data demography, but it requires lots of manual work in addition (Running the calculations, DDL assembling, transformation script writing, testing, etc.)
 
If you want a really simple solution, try PRISE Compress Wizard , that supplies a comprehensive solution:
  • Assists to collect good candidate tables to compress
  • Automatically analyses the tables, and gives feedback:
    • How much space can be saved by compress
    • What is the current compress ratio (if there is compress already applied)
    • How much resources were used for analysis
    • What is the optimal structure
  • Generates transforming script (+ checks, lock, logging) along with
    • Backup (arcmain)
    • Revert process (for safety and documentation)
    • Reverse engineering (for E/R documentation update)
  • Log implementation
    •  Reflect achieved space saving: success measurement
    •  Report used CPU and I/O resources for transformation

Sponsored by PRISE Ltd.
www.prisetools.com

1 comment: