Search

OakieTags

Who's online

There are currently 0 users and 26 guests online.

Recent comments

Affiliations

Shrinking Tables to Aid Full Scans

{This blog is about shrinking tables where the High Water Mark is higher than it needs to be. Prompted by one of the comments, I wrote a follow-up post on finding the High Water Mark and tables that consist mostly of empty space, which would be candidates for shrinking.}

This blog is about one of those things I do as almost an autonomous “not thinking about it” performance housekeeping task, one which I have been meaning to mention for ages.

There can be quite a lot to gain by checking out full scans on “small” tables and seeing if it is as efficient as it can be. Sometimes it is, sometimes it is not. Often it is “good enough”. Occasionally it is awful.

Recently I was just casting an eye over the “top 20″ SQL on a system for any code generating a lot of consistent gets. I came across the below:

FRST_LOAD_TIME SQL_ID PRSE EXCS BUFFS DISCS RWS
-------------------- ------------- --------- ----------- -------------- ---------- ------------
CPU_MS ELAPSD_MS SORTS DIR_W OPT_COST
-------------- -------------- ---------- ---------- --------------
SQL_TEXT
----------------------------------------------------------------------------------------------------
2-03-14/18:00:10 bk9b5u6zyvy59 36,262 36,262 90,634,158 7 36,261
320,102.3 326,920.9 0 0 546
SELECT count(*) RUNNING_SESSIONS from SNP_SESSION WHERE SESS_STATUS = :1 and AGENT_NAME=:2

It caught my eye as it struck me as a lot of buffer gets for what I thought was quite a small table and it is executed quite often. In this case, a few thousand times a day. It takes 2500 buffer gets per count(*). All buffered, but the CPU has to chunk through it.

I check, it is a small table:

ABCD01> select count(*) from dev_ODI_XXXX.snp_session

COUNT(*)
----------
73

ABCD01> @tab_lst
Enter value for tab_name: snp_session

TABLE_NAME OWNER NUM_ROWS
------------------------------ ------------------------------ -------------
SNP_SESSION DEV_ODI_XXXX 49

-- and a quick check on those key columns

OWNER COLUMN_NAME NUM_DISTINCT N_NULLS LOW_V HI_V BKTS AVG_L
-------- -------------------- ------------ ---------- --------------- --------------- ---- -----
DEV_ODI_ SESS_STATUS 4 0 D W 4 2
DEV_ODI_ AGENT_NAME 4 6 AAA_ODI_AGENT BBB_ODI_AGENT 4 13

The table IS small, only 73 rows. It is also interesting that the stats are out by quite a percentage, but the table is still seen as small according to the stats. This indicates the table is quite volatile, but consistently small. Another thing to notice is that the number of distinct values for each column in the WHERE clause is only 4 and thus an index lookup would not be very specific. With such a small number of records in the table and not very specific index I “know” that an index lookup would not be of much benefit {BTW Everything that you “know” in this way – do check it in reality every now and again…}. I don’t look at adding an index, I look at why such a small table needs so many buffer gets.

So, why so many buffer gets?

First of all, I need to be careful. The original source of information I was looking as was V$SQL and that holds cumulative stats for all executions of the same SQL statement. Thus the buffer gets could have been very high a while ago (say, yesterday) and now are small, and the average is 2500 BGs per select but right now it is running fine. So I verify that the code is still doing a lot of work for such a small number of records:

set autotrace on
SELECT count(*) RUNNING_SESSIONS from dev_ODI_XXXX.SNP_SESSION 
WHERE SESS_STATUS = 'D' --:1 
and AGENT_NAME= 'DEV_ODI_AGENT'--:2

RUNNING_SESSIONS
----------------
1
Elapsed: 00:00:00.99

Execution Plan
----------------------------------------------------------
Plan hash value: 1837310352
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 546 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
|* 2 | TABLE ACCESS STORAGE FULL| SNP_SESSION | 1 | 15 | 546 (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')
filter("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2013 consistent gets
0 physical reads

That’s still a lot of consistent gets to scan all of 75 rows and do a count (which is the sort aggregate step). A second to process and 2013 consistent gets to look at 75 rows in a skinny table – just trust me it is a skinny table. And the optimiser cost the full scan step at 546 “equivalent single IOs”.
{Oh, if anyone is wondering what the “table access storage full” and predicate step “storage(“SESS_STATUS”=’D’ AND “AGENT_NAME”=’DEV_ODI_AGENT’)” it’s because this system is an exadata box. Just ignore this for now, it is not relevant to this issue. I’ll blog about it *next week*}.

What is the problem?

Well, let’s see how big the table is.

ABCD01> @seg_dets
Enter value for seg_name: snp_session
Enter value for owner: dev_odi%
Any Key>

OWNER SEG_NAME SEG TS_NAME BYTES_K BLOCKS exts INI_K NXT_K
-------- --------------- --- -------- ---------- --------- ---- ------- -------
DEV_ODI_ SNP_SESSION TAB DEV_ODI_ 16,384 2,048 31 64 1024
REPO USER

The table is 2048 blocks in size, the consistent gets to look at the table is 2013. That’s close.

The problem is when you full scan a table, it scans up to the high water mark of that table, ie the highest block that Oracle knows there has been data in. As Oracle inserts new data into a table and extends it, it keeps a note of the highest block data has been put into, the High Water Mark (HWM). When you delete data from a table, Oracle marks individual blocks to be reused as they empty (drop below a certain percentage full). But Oracle does NOT check to see if many blocks are being fully emptied and it can reduce the high water mark. In fact, there could well be a few records in the first block of the table and then nothing until a few records in the last block that has been used in the table, the High Water Mark. Oracle just scans all blocks and counts relevant records up to the high water mark. This is actually a very long-standing performance tweek, you can allocate GBs of data to a table but Oracle knows the very highest block it has put data into for that segment and only scans up to the High Water Mark. Usually, the HWM is useful.

I happen to know that this table used to have several thousand records in it but is now used far less extensively and holds only a few records. They are constantly deleted and new ones created. I have seen {very regularly, at least once a year} some extreme situations where a table has had many millions of rows in it at one point, setting the HWM to several hundred thousand blocks into the table segment, but the table now only holds a few hundred records. But a Full Table Scan scans all the way up to the HWM, taking several minutes to access a few hundred records. Often that table is being inefficiently cleared with a “delete from table_x” statement and then re-populated.

What is the answer? Well, I could truncate the table and set the HWM to zero, but this table actually holds some real data. I can’t truncate it in this situation. I could MOVE the table, ie get Oracle to recreate the table and move the existing data into it. That would set the HWM of the new segment to block one and then raise it as it inserts the data neatly into the first few blocks. The third option, and the one I took, is to SHRINK the table. This basically tells oracle to shuffle the data down to the start of the table and reset the HWM. You can do this on the fly and I have done it on relatively active systems, but I must give the usual advice to test this yourself and you may have locking issues on a busy system hitting the table you shrink very hard. Also, you do need to enable row movement on the table for Shrink to work, as is demonstrated below:

ABCD01> alter table dev_ODI_XXXX.snp_session shrink space cascade;
alter table dev_ODI_XXXX.snp_session shrink space cascade
*
ERROR at line 1:
ORA-10636: ROW MOVEMENT is not enabled

ABCD01> alter table dev_ODI_XXXX.snp_session enable row movement;

Table altered.

ABCD01> alter table dev_ODI_XXXX.snp_session shrink space cascade;

Table altered.

Elapsed: 00:00:01.98

So, how does my little select perform now?

set autotrace on
SELECT count(*) RUNNING_SESSIONS from dev_ODI_XXXX.SNP_SESSION 
WHERE SESS_STATUS = 'D' --:1 
and AGENT_NAME= 'DEV_ODI_AGENT'--:2

RUNNING_SESSIONS
----------------
1
Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1837310352
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 546 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
|* 2 | TABLE ACCESS STORAGE FULL| SNP_SESSION | 1 | 15 | 546 (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')
filter("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads

Fabulous. The run time is very short and the number of consistent gets is tiny. You might notice that the Explain Plan still thinks the cost of the full scan will be 546 (ie the equivalent of 546 single block reads) but that is as I have not re-gathered stats. I left the system to do this automatically that night.

I’ll just mention one more thing. Shrinking a table is a bit like the old trick of rebuilding indexes to compact them. People misunderstood what rebuilding indexes was achieving and why the indexes were so much larger than they “needed” to be and would spend CPU and IO rebuilding indexes overnight – only for them to quickly explode back to the original size (creating a lot of overhead as they did). Shrinking oversized tables can have a great benefit but keep in mind that if some process pushes the HWM right up every day by shoving lots of data into the table and then clearing it down later, shrinking the table is just a waste of time. It gives benefit when the table really has shrunk down in size for a while. Or it only occasionally explodes to a massive size. I have in the past written “regular shrink” scripts for tables I know explode once in a while (eg once a week) but are usually small, but you need to know what the pattern is. Don’t just blindly shrink tables.

For me in this situation, that code scanning that table dropped out of my “top 20 most demanding SQL statements” and has stayed out of it for weeks. The code now completed in considerably under 0.1 seconds as opposed to a whole second – but no one has noticed. I’d removed about 5% of the workload of the system – but no one noticed. But it took me 1/2 an hour to investigate and fix and I’d freed up a non-trivial chunk of processing resource. In my Performance Tuning heart I know I did a good job on that day.