Who's online

There are currently 0 users and 30 guests online.

Recent comments

What can EHCC do for you?

What can EHCC do for you?
By now you have probably heard about ExaData Hybrid Columnar Compression (EHCC), but what benefit can EHCC give you in terms of storage and performance savings?  
As always, it depends on your data.  Below I’ll share some of the test results I came across when testing EHCC. The data used for these tests are a short version of a fact table.   The tests were performed on a quarter rack ExaData database machine (2 db nodes – with 16 cores each and 3 storage servers)
As you may already have noticed, English is my second language, so please excuse me for spelling, grammar and whatever errors you may find in this post ;-)
- Test block compression against EHCC compression tables
First let’s look at the time it took to create the test data and the compression rate.
-- None compressed table
SQL> create table he100 PARALLEL 64 as select /*+ PARALLEL 64 */ * from he100_load_test;
Table created.
Elapsed: 00:00:33.97  
Did not seem to be CPU bound - Saw CPU utilization around 10 - 25%.
-- Block compressed table
SQL> create table he100_block compress for all operations PARALLEL 64 as select /*+ PARALLEL 64 */ *
                      from he100_load_test;
Elapsed: 00:00:28.51
Noticed CPU utilization to be around 40 - 70%
-- Table creation with EHCC query option
SQL> create table he100_query compress for query PARALLEL 64 as select /*+ PARALLEL 64 */ *
                      from he100_load_test;
Elapsed: 00:00:21.01
Noticed CPU utilization around 80-90%
-- Table creation with EHCC archive option
SQL> create table he100_archive compress for archive PARALLEL 64 as
            select /*+ PARALLEL 64 */ * from he100_load_test;
Elapsed: 00:00:24.24
Noticed CPU utilization around 80-90%
SQL> @desc he100
           Name                        Null?    Type
           ------------------ -------- -----------
    1      EVENT                            NUMBER
    2      C_ID                                NUMBER
    3      E_ID                                NUMBER
    4      T_ID                                NUMBER
    5      BEGIN_TIME                 NUMBER
    6      END_TIME                     NUMBER
    7      NAME                             VARCHAR2(50)
    8      A_ID                                NUMBER
    9      R_ID                                NUMBER
   10      I_ID                                NUMBER
   11      VALUE_TYPE               NUMBER
   12      VALUE                           VARCHAR2(4000)
SQL> select count(*) from he100;
         COUNT(*)   -> 77,930,585
It appeared that the compressed data was created faster than the non compressed data; however it is likely that I have not been able to push the CPU/cores hard enough. I tried a few different approaches in parallel and most of the cores did not do a lot <10%.   I also looked at the session stats during the none compressed table creation and the waits where on the direct path write where the others where CPU bound.
Let’s check the compression rates compared to the non-compressed data
------------------------   -----------
HE100                                   6,719                                  = 100.0%
HE100_BLOCK                    2,942   (2,942/6,719)*100 = ~43.8%
HE100_QUERY                       449     ( 449/6,719)*100 = ~ 6.7%
HE100_ARCHIVE                   438     ( 438/6,719)*100 = ~ 6.5%
Let see how a basic aggregation query will perform on each of the table.
SQL> select /*+ PARALLELL no compression */ t_id,  e_id, min(value), avg(value), max(value)
           from he100 group by t_id, e_id;
Elapsed: 00:00:17.27 - no compression
Elapsed: 00:00:13.27 - no compression
Elapsed: 00:00:10.93 – block compression   around 30 - 50% faster than on a non compressed table
Elapsed: 00:00:06.99 – query compression   around 50% faster than on a non compressed table
Elapsed: 00:00:06.67 – archive compression    around 50% faster than on a non compressed table
I also examined the session statistics and the number of block and I/O visits was obviously less for query and archive compression options, but for both of these options we did also see new CC statistic generated:
--   Example of a few CC statistics from the query compression tests
NAME                                                                                      VALUE
---------------------------------------------------------------- ----------
CC CUs Decompressed                                                                 22591
CC Query High CUs Decompressed                                              11466
CC Compressed Length Decompressed                                 807050919
CC Decompressed Length Decompressed                           8367891105
CC Columns Decompressed                                                          68403
CC Total Columns for Decompression                                         271092
CC Total Rows for Decompression                                        113019839
CC Pieces Buffered for Decompression                                         22698
CC Total Pieces for Decompression                                              22698
CC Scan CUs Decompressed                                                             70
CC Turbo Scan CUs Decompressed                                             11126
From the testing done at the time of writing, it looks like all decompression of HCC occur on the database nodes and all compression to HCC takes place on the storage servers.
So how will the data load performance hold up against none compressed tables
-- Test data load performance on block compressed and EHCC compressed tables
SQL> @show_segm
-----------------------  ------------
HE100                                  6,719
HE100_BLOCK                  2,942
HE100_QUERY                     449
HE100_ARCHIVE                 438
-- Testing insert append of 77.9 Million rows on the non-compressed table.
SQL> insert /*+ APPEND */ into he100 select /*+ PARALLEL 64 */ * from he100_load_test;
77930585 rows created.
Elapsed: 00:00:28.23
For the regular table we inserted about 2,783,235 million rows per second
-------------------------- ------------
HE100                                13,436
HE100_BLOCK                  2,942
HE100_QUERY                     449
HE100_ARCHIVE                 438
We added and extra 6.7GB of data ~ writing about 239MB per second.
 Again as we saw during the table creation for the non compressed data, I only saw a few working on each node and they maxed at 40% CPU utilization.
-- Testing Insert append of 77.9M rows on table with block compression
SQL> insert /*+ APPEND */ into he100_block select /*+ PARALLEL 64 */ * from he100_load_test;
77930585 rows created.
 Elapsed: 00:00:27.49
So for a table with block compression we inserted about 2,834,870 million rows per second
One instance had CPU utilization around 70% and 2nd instance around 20-25%
--------------------------- ----------
HE100                                13,436
HE100_BLOCK                  5,881
HE100_QUERY                     449
HE100_ARCHIVE                 438
We added another 2.9 GB of data ~ writing about 109MB block compressed data per second.
-- Testing Insert append of 77.9M rows on table with query compression option
SQL> insert /*+ APPEND */ into he100_query select /*+ PARALLEL 64 */ * from he100_load_test;
77930585 rows created.
Elapsed: 00:00:23.20
For query compressed table - That is 3,359,077 rows inserted per second  
On both DB instances I saw around 90% CPU utilization.
-------------------------- ------------
HE100                               13,436
HE100_BLOCK                 5,881
HE100_QUERY                    897
HE100_ARCHIVE                438
And only another 448MB of data was added ~ writing about 20MB query compressed data per second.
-- Insert append of 77.96M rows on table with archive compression option Table with archive table compression
SQL> insert /*+ APPEND */ into he100_archive select /*+ PARALLEL 64 */ * from he100_load_test;
77930585 rows created.
Elapsed: 00:00:23.98
Archive compression did nearly as well as query compression in this case – inserting 3,249,815 rows inserted per second.
On both DB instances I saw around 90% CPU utilization.
--------------------------   -----------
HE100                                13,436
HE100_BLOCK                  5,881
HE100_QUERY                     897
HE100_ARCHIVE                 872
And  only added another 434 MB of data. ~ writing about 18MB of archive compressed data per second.
Conclusion:  EHCC minimize the foot print of your data significantly.  For the cases I tested above, it also increased the performance, however as Kevin Closson has been so kind to remind me, HCC should never be a load-time performance enhancement! This is also what Note ID 1094934.1 is mentioning, which I have copied a little bit from below regarding Data Compression.  I have executed similar tests on data warehouse fact tables and did EHCC decrease load performance.  So test EHCC in our environment and data and go from there.
For the tests I ran, I was not CPU bound on the non compressed table creation or insert append – the main wait was on direct path write; and for the HCC I was CPU bound – I have tried to run further tests on the none compressed data creation and insert append, but have not been able to become CPU bound, so maybe it is my quarter rack which cannot handle more writes per second or it could be bundle patch 4 related issue.    I do plan to upgrade to bundle patch 5 with-in a few weeks, so I’ll run the test again on that patch set to see if it makes a difference.
I’m using EHCC for our warehouse, so I have not tested the impact of updates and deletes, as I’m mainly using bulk loading and exchange partition for data loading, so if you use a different approach you would like to test the impact of other types of DML, before using EHCC in production. 
As I only have a quarter rack, the space savings are significant to us and being able to load plus 3M rows per second is good enough for me at this point in time.   EHCC also provide some nice side effects like spending less time backing up, DR and caching.
From note ID 1094934.1 - Oracle Sun Data Machine Application Best Practice for Data Warehouse
Data Compression
Using table compression obviously reduces disk and memory usage, often resulting in better scale-up performance for read-only operations. Table compression can also speed up query execution by minimizing the number of round trips required to retrieve data from the disks. Compressing data however imposes a performance penalty on the speed of the data loading and any subsequent DML operations.
Exadata Hybrid Columnar Compression (EHCC) achieves its compression using a different compression technique. A logical construct called the compression unit is used to store a set of Exadata Hybrid Columnar-compressed rows. When data is loaded, a set of rows is pivoted into a columnar representation and compressed. After the column data for a set of rows has been compressed, it is fit into the compression unit. If conventional DML is issued against a table with EHCC, the necessary data is uncompressed in order to do the modification and then written back to disk using a block-level compression algorithm.

Updated September 9 2010

Test-cases and conclussion updated based on feedback from Kevin Closson.

Post new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.