Search

OakieTags

Who's online

There are currently 0 users and 41 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Foreign Archived Log in #Oracle – what does it mean?

When you look into V$RECOVERY_AREA_USAGE, you see a strange row at the bottom:

SQL> select * from v$recovery_area_usage;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         10.18                         0              73          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

Curious what that could be? You will see values other than zero on a Logical Standby Database:

SQL> connect sys/oracle@logst as sysdba
Connected.
SQL> select database_role from v$database;

DATABASE_ROLE
----------------
LOGICAL STANDBY

SQL> select * from v$recovery_area_usage;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                         14.93                         0               9          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                  2.03                         0              26          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

In contrast to a Physical Standby Database, this one writes not only into standby logs but also into online logs while being in standby role. That leads to two different kinds of archive logs:
Foreign Archived LogsWhen DML (like insert and update) is done on the primary 1) that leads to redo entries into online logs 2) that are simultaneously shipped to the standby and written there into standby logs 2) also. The online logs on the primary and the standby logs on the standby will be archived 3) eventually. So far that is the same for both physical and logical standby. But now a difference: Logical standby databases do SQL Apply 4) by logmining the standby or the archive logs that came from the primary. That generates similar DML on the standby which in turn leads LGWR there to write redo into online logs 5) that will eventually get archived 6) as well.

A logical standby could do recovery only with its own archive logs (if there was a backup taken before) but not with the foreign archive logs. Therefore, those foreign archive logs can and do get deleted automatically. V$ARCHIVED_LOG and V$FOREIGN_ARCHIVED_LOG can be queried to monitor the two different kinds of logs.

That was one topic of the course Oracle Database 12c: Data Guard Administration that I’m delivering as an LVC this week, by the way. Hope you find it useful :-)

Tagged: Data Guard, High Availability

Parallel Fun

As I write, there’s an ongoing thread on Oracle-L that started with the (paraphrased) question: “I’ve got this query that returns 7 million rows; when I change it to ‘select count(*)’ it returns in 4 seconds but when we display the full result set on screen it takes hours, and every second or two the screen pauses; how do I make it go faster.”

The general rapid response was: “You shouldn’t be running 7M rows to a screen – the time is the time for the network traffic and display.”

The first part of the statement is right – the second part is quite likely to be wrong and there’s a very strong hint in the question that makes me say that, it’s the “pauses every second or two”. Of course we don’t know what the OP isn’t telling us, and we don’t know how accurate he is in what he is telling us, so any ideas we have may be completely wrong. For example, we haven’t been given any idea of how long a “pause” is, we don’t really know how accurate that “second or two” might be and whether “every” is an exaggeration, and maybe the query is returning CLOB columns (and that could make a big difference to what you can do to improve performance).

If we take the statement at face value, though, there is one very obvious inference: although some of the time will be due to network traffic time, most of the time is probably due to Oracle doing something expensive for a significant fraction of the rows returned. The pattern of activity probably looks like this:

  • client: call server to fetch next array of rows
  • server: spend some time populating array  — this is where the client sees a pause
  • client: display result array
  • client: call server to fetch next array of rows
  •  etc…

Here’s a trivial example:

connect / as sysdba
set arraysize 500
set pagesize 40

select
        o1.spare1 ,
        (
        select  max((ctime))
        from    obj$    o2
        where   o2.owner# = o1.owner#
        and     o2.obj# < o1.obj#
        ) ct
from obj$ o1
;

On my laptop, running an instance of 11.2.0.4 with about 80,000 rows in obj$ (and a lot of them owned by SYS), I can count seconds and find that (approximately) I alternate between one second watching results scrolling up the screen and one second waiting as the server generates the next 500 rows.

Of course it’s possible to argue that the problem really is the network and nothing but the network struggling to cope with the never-ending stream of little packets produced by 7M rows. Could there be a choke point that causes the data to stop and start with great regularity, maybe – but previous experience says probably not. I have experienced bad network problems in the past, but when they’ve occurred I’ve always observed extremely random stop/go behaviour. The regularity implied in the question makes the Oracle-based problem seem far more likely.

Conveniently a couple of people asked for more clues – like the query text and the execution plan; even more conveniently the OP supplied the answers in this response. Since the email format makes them a little hard to read I’ve copied them here:


SELECT  bunch of stuff.....,

        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr1.RELATED_SID
                        ||
                        ',')
                ORDER BY sr1.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr1
                WHERE   sr1.SID                    = slv.SID
                        AND sr1.RELATIONSHIP_LEVEL = '1'
                GROUP BY sr1.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL1,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr2.RELATED_SID
                        ||
                        ',')
                ORDER BY sr2.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr2
                WHERE   sr2.SID                    = slv.SID
                        AND sr2.RELATIONSHIP_LEVEL = '2'
                GROUP BY sr2.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL2,
        (
               SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr3.RELATED_SID
                        ||
                        ',')
                ORDER BY sr3.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr3
                WHERE   sr3.SID                    = slv.SID
                        AND sr3.RELATIONSHIP_LEVEL = '3'
                GROUP BY sr3.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL3,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr4.RELATED_SID
                        ||
                        ',')
                ORDER BY sr4.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr4
                WHERE   sr4.SID                    = slv.SID
                        AND sr4.RELATIONSHIP_LEVEL = '4'
                GROUP BY sr4.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL4,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr5.RELATED_SID
                        ||
                        ',')
                ORDER BY sr5.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr5
                WHERE   sr5.SID                    = slv.SID
                        AND sr5.RELATIONSHIP_LEVEL = '5'
                GROUP BY sr5.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL5
FROM    service_lookup slv
        LEFT JOIN service_location sl
        ON      sl.service_location_id = slv.service_location_id;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1570133209

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                      |  7331K|  5593M|  1877   (5)| 00:00:01 |        |      |            |
|   1 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   2 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10000             |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   6 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   7 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   8 |    PX SEND QC (RANDOM)           | :TQ20000             |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |
|   9 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWC |            |
|* 10 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWP |            |
|  11 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  12 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  13 |    PX SEND QC (RANDOM)           | :TQ30000             |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | P->S | QC (RAND)  |
|  14 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWC |            |
|* 15 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWP |            |
|  16 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  17 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  18 |    PX SEND QC (RANDOM)           | :TQ40000             |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | P->S | QC (RAND)  |
|  19 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWC |            |
|* 20 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWP |            |
|  21 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  22 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  23 |    PX SEND QC (RANDOM)           | :TQ50000             |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | P->S | QC (RAND)  |
|  24 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWC |            |
|* 25 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWP |            |
|  26 |  PX COORDINATOR                  |                      |       |       |            |          |        |      |            |
|  27 |   PX SEND QC (RANDOM)            | :TQ60002             |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | P->S | QC (RAND)  |
|* 28 |    HASH JOIN RIGHT OUTER BUFFERED|                      |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  29 |     PX RECEIVE                   |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,02 | PCWP |            |
|  30 |      PX SEND HASH                | :TQ60000             |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | P->P | HASH       |
|  31 |       PX BLOCK ITERATOR          |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWC |            |
|  32 |        TABLE ACCESS STORAGE FULL | SERVICE_LOCATION     |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWP |            |
|  33 |     PX RECEIVE                   |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  34 |      PX SEND HASH                | :TQ60001             |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | P->P | HASH       |
|  35 |       PX BLOCK ITERATOR          |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWC |            |
|  36 |        TABLE ACCESS STORAGE FULL | SERVICE_LOOKUP       |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------

We have a simple two-table outer join, and five scalar subqueries in the select list. (Not being very familiar with the various XML calls I had no idea of what the scalar subqueries were doing, or how they produced a result, beyond the fact that they were querying and aggregating multiple rows. In fact the combination of calls does much the same as listagg(), though it allows for a CLOB result (which could be part of the performance problem, of course) rather than being limited to a varchar2() result).

Would you like to guess at this point why I constructed my demonstration query again obj$ the way I did when presenting the idea of high-cost per row queries as a reason for regular pauses in the output ? The execution plan matched one of my two initial guesses about what the query was going to look like. When you “select count(*) from {this query}”, the optimizer will factor out the scalar subqueries and only have to count the result set from the hash join – and it might even manage to use a couple of parallel index fast full scans to get that result rather than doing the tablescans. When you run the query you have to run the scalar subqueries.

If we trust the statistics, we have 5 subqueries to run for each row of the hash join – and the hash join is predicted to return 7.3 million rows. Given that the subqueries are all going to run parallel tablescans against a fairly large table (note – the cost of the tablescans on SERVICE_RELATIONSHIP is 368, compared to the cost of the tablescan on SERVICE_LOCATION which is 366 to return 3.1M rows) that’s an awful lot of work for each row returned – unless we benefit from an enormous amount of scalar subquery caching.

Here’s another performance threat that the plan shows, though: notice where the PX SEND QC operation appears – that means the PX slaves send their (7M) rows to the Query Co-ordinator and the QC is responsible for doing all the work of running the scalar subqueries. Another interesting little threat visible in the plan shows up in the TQ column – the plan uses six “data flow operations” (using the original naming convention, though that changed some time ago but survived in the column names of v$pq_tqstat). In principle each DFO could allocate two sets of PX slaves (and every DFO could have a different degree of parallelism); in this example DFO number 6 (the driving hash join) uses two sets of slave, and the other five DFOs (the scalar subqueries) use a single set each. The upshot of this is that if the default degree of parallelism in play is N this query will allocate 7N parallel query slaves. It gets a little nastier than that, though (based on checking the output from v$sql_plan_monitor), because each time one of the scalar subqueries runs Oracle seems to allocate and deallocate the slaves that are supposed to run it – which is probably going to cause some contention if there are other parallel queries trying to run at the same time.

Optimisation

So what could you do with this query ? It depends on how much change you want to make to the code.

It’s possible that an index on service_relationship(relationship_level, sid) – with compress 1 – might help if it’s very precise, and if the target table stays in the buffer cache for the duration of the query – but, in the absence scalar subquery caching that could still leave the query co-ordinator executing 35 million (5 queries x 7 million rows) subqueries in a serialised process.

A better bet may be to convert from subqueries to joins – remembering that the listagg() / xmlserialize() calls will require you to aggregate (which means sorting in this case) an estimated 25 rows per driving row per relationship_level; in other words you may need to sort 7M * 125 = 875M rows – but at least you could do that in parallel, and there’s always the possibility that the estimated 25 drops off as you work through the different levels. You could choose to do 5 outer hash joins or (as Iggy Fernandez outlined in the thread) you could do a single outer join with a decode on the relationship_level. Another variation on this theme (which would probably have a plan showing ‘join then aggregate’) would be to ‘aggregate then join’. It’s possible that creating a non-mergeable inline view for the 5 values of relationsip_level from a single table access, aggregating it to produce the five required columns, then using the result in an outer join, would be the most efficient option. In the absence of a detailed understanding of the data volume and patterns it’s hard to make any prediction of which strategy would work best.

Footnote:

I may be wrong in my analysis of this problem. When I first saw the question the reason for the performance pattern suggested an “obvious” design error in either the SQL or the infrastructure, and when I saw that the query and execution plan matched my prediction it became very hard for me to think that there might be some other significant cause.

There were a couple of interesting details in the execution plan that made me pursue the problem a little more. In the first case I built a very simple model to get an estimate of the time needed to display 7M rows of a reasonable width in SQL*Plus running across a typical LAN (my estimate was in the order of 45 minutes – not hours); then I spent a little more time (about 10 minutes) to build a model that reproduced the key features of the execution plan shown.

I then spent two or three hours playing with the model, and I’ll be writing a further blog with some of the results later on. One detail to carry away today, though, is that in 12c Oracle can do a new form of subquery unnesting which transformed the query from its 5 scalar subquery form into the seven table join form that was one of the suggestions made on the thread; even more interestingly, if I blocked the unnesting (to force the subquery execution) Oracle 12.1.0.2 came up with a new operator (EXPRESSION EVALUATION) that allowed it to run the subqueries from the PX slaves before passing the results to the query co-ordinator – in other words eliminating the serialisation point.

To be continued …

Quick Links to Helpful Literature on Oracle Database In-Memory Column Store.

I’m surprised to find that Google is not cleanly ranking the helpful set of blog posts by Oracle’s Maria Colgan on the Oracle Database 12c In-Memory Column Store feature so I thought I’d put together this convenient set of links. Google search seems to only return a few of them in random order.

Over time I may add other helpful links regarding Oracle’s new, exciting caching technology.

Starter Information

Getting Started With Oracle Database In-Memory. Part I.

Getting Started With Oracle Database In-Memory. Part II.

Getting Started With Oracle Database In-Memory. Part III.

Getting Started With Oracle Database In-Memory. Part IV. 

In-Memory Column Store With Real Application Clusters

The following are links to information about Oracle Database In-Memory on Real Application Clusters:

Oracle Database In-Memory on RAC. Part I.

In-Memory Product That Requires Proprietary Storage?

How could the brand of storage matter for an in-memory cache feature? Good question.

Fellow Oaktable Network member Christian Antognini has produced a very important article regarding how Oracle Database 12c In-Memory Column Store with Real Application Clusters is questionable unless using Oracle storage (Exadata, SPARC SuperCluster).  I found Christian’s article very interesting because, after all, the topic at hand is an in-memory cache product (a.k.a., In-Memory Column Store). I fail to see any technical reason why Oracle wouldn’t support an in-memory product with blocks from any and all storage. It is in-memory after all, isn’t it? Please visit Christian’s article here: The Importance of the In-Memory DUPLICATE Clause for a RAC System.

 

 

 

 

 

Filed under: oracle

Audio semi-Visual Presentation on Clustering Data in Oracle

I suppose it had to happen eventually but one of my presentations has ended up on YouTube. It’s a recent presentation I did for the Oracle Midlands user group in September.

The topic is (as the title of this blog post hints at!)Boosting select performance by clustering data. The video consists of the slides I presented, changing as the presentation progresses, with my audio over the top. It goes on for a bit, close to an hour, but you could watch a section and then go and do something else before watching a bit more.

I have to say, it is very odd hearing my voice (and the slight touch of the “brummie” {Birmingham} accent coming through) and I do wince at the places where I blather or say something slightly wrong or make a joke that involved a visual element that is lost. Oh well, at least you don’t see me wandering around and jumping up,literally, to point out bits on the slides.

I’m glad to say I will be repeating a slightly more polished version of the presentation at this year’s UKOUG Tech14 conference in December. I was a bit under the weather back on September the 16th, due to having just returned to the Working Life, and with this perfect example of what I did then I should be able to make the next shot at it a cracker… maybe.

On the topic of Oracle Midlands, I like this user group that is run by Mike Mckay Dirden, the meetings consist of evening presentations in Birmingham with a bit of support from Redgate. This includes half-time samosas to keep you going. The next meeting is described here and is on Tuesday 25th November. This meeting has two presentations by my friend Bjoern Rost, who is an Oracle Ace Director {gasps of appreciation from the audience} and a consummate presenter. I wish I could make it there as I would like to share a pint with Bjoern (well, he’ll have wine or a cocktail I suspect as he is not a beer fan) as well as some of my other friends up that part of the country.

EM12c, Rel. 4, OMS and OMR Health, Part II

There are a large number of “moving parts” when performance tuning or trouble shooting an Enterprise Manager environment.  The new EM performance features, (available in release 12.1.0.4) are there to assist you in understanding the source of the issue and can really make the difference for those that are unfamiliar with the challenges of Weblogic, java, network or other complexities that make up EM12c and aren’t commonly thought of as part of the DBA’s job role.

Now that we’ve finished with the Health Overview, we can look deeper into the health and performance of the two most well known components of the EM12c architecture, the Oracle Management Repository, (OMR) and the Oracle Management Services, (OMS).

Due to the impressive features offered in the new EM performance consoles, I’m going to break these up into multiple posts and start with OMR and focus on the Repository Tab.

The Repository

The Repository Tab is accessed via the same Setup menu in EM12c console:

rep1

Once accessed, there are a number of tabs at the top of the page, Repository, Metrics and Schema.  Starting with the Repository tab, (Left to Right) we’ll inspect what specific performance data is important when reviewing an OMR.

Repository Tab

The Repository page is going to display a number of graphs that tells you everything from specific information about the OMR database, incidents involving the OMR database and even how collections performance at the database repository level.  It is important to remember that this tab is all about the Repository, (OMR) and should not be confused with the Service, (OMS).

Basic Information

We begin by viewing information about database name, space allocated, space used number of current sessions actively connected to the OMR.

rep5

All of these are links, so you can click on the link and a you’ll be taken to a detailed view of the data and more information to investigate if you have questions.

For the Target type for the OMR, you can click on the target name, (db name) and the console will take you to the home page for the OMR database.

Click on Last Backup date and the console will take you to the backup activity report for the OMR database.

Click on Space Used and the console will then bring you to the Management Services and Repository page drill down for Repository Tablespace Used.

rep6

There is a ton of information in this location and we’ll dig deeper into it as a separate post, but just to understand how user friendly the interface is, note the links you have at your fingertips right here.

If you click on Management Service Repository Sessions, the following table with session type and counts will display:

rep7

Incidents

On the right hand side of the top of the page, we access to the incidents linked to the OMR.  No other incidents will be listed except the ones connected to the OMR, so this is a great place to check first when you are experiencing issues.

rep8

Notice that it includes incidents for page processing time outs to the OMS and collection timeouts.  This can be very helpful when you are experiencing slow response and need to know where the issue is sourced from.

Initialization Parameters for the OMR

Not only does the next graph identify what size category you fall into for your Enterprise Manager environment, (small, medium or large) but it also lets you know if any of your parameters are outside of the recommended sizing for that category.

rep9

In our example, you can see that we don’t have a set MEMORY_TARGET value and that is outside of our compliance as we recommend to have this set to one.  We can also view each of the values we do have set and how they compare to what Oracle thinks the minimum value for that category of OMR size should be.

Job Scheduler Status

To the right of the Init parameters is all the graph with information pertaining to all the jobs running in the OMR to support the Enterprise Manager environment.  Unlike Job Activity in the console, this is reporting all those jobs that are taking care of the Enterprise Manager.

If a job fails and you have the option to edit the schedule to run again, (the glasses icon) then you can click on the glasses and the following popup will show and you can then enter a new time for the job to retry:

rep11

Once you enter in the new time to run the job, click on Save and verify that the job has been successful in the console view, (green check mark vs. a red X.)

Collections Performance

At the bottom left, Collections is the next category that’s covered.  If collections aren’t uploading to the OMR, then the console isn’t able to provide the most up to date data and notifications of incidents and alerts aren’t sent out to notify administrators of issues.  Timely collections and the performance of collections is of a great concern to an EM Cloud Control administrator.

rep12

The graph is well laid out and shows clearly the number of collections in backlog and throughput performance.   The top of the graph, when hovered over, will show you the warning and critical threshold line for number of backlogs allowed.

Backlog is an issue, as if it gets too high and hits the threshold, your agent can stop uploading.  You can also see the duration, on average of the collections and view over time if the duration is increasing.  If you use a lot of metric extensions or plug-ins, this is something you’ll want to monitor, so this graph is extremely helpful when inspecting collection performance.

By hovering your cursor over the Collections Backlog line in the graph, I then am offered a number of options to look into the performance:

rep13

You have the option to click on Problem Analysis to look into the backlog, Metrics Detail or go to the Target Home.

Problem Analysis

As my EM environment is running quite smoothing at the OMR level, there isn’t a lot to show you in the Problem Analysis, but I wanted to at least give everyone a peak into this cool, new tool.

rep14

First of all, if I did have an issue, there would be collections showing in backlog.  This is very important for an administrator to check and ensure that backlog is not occurring.

As there is no backlog, you can see, my resource usage by my collections is pretty consistent and quite below the thresholds expected for most of the resource types shown:

rep15

You can also export the data from the table view, (small link at the bottom right of the screen, not shown) if you need the raw data.

You will note that my memory utilization is creeping, little by little to the critical threshold.  This is commonly due to java garbage collection causing a small memory leak and should be reviewed from time to time.  If it is considerable, the java heap should be examined and a more efficient value set.

Adding Metrics to the Performance Analysis

On the right hand side of the Performance Analysis, you will notice the Metric Palette.  This offers you the opportunity to go from the standard configuration to display more data on the existing metrics or add analysis on other metrics, such as Agents and Page Performance.

It’s important to know, even though you can be brought to this page from many different links within the OMR/OMS Performance pages, while you are in the Performance Analysis, you can inspect other performance metric factors than just the original ones you are reviewing.

For our example, we’ll add an additional metric graph,(Time estimates for clearing backlog) for review to the left hand analysis page-

rep16

We now have an additional graph on the left hand side analysis to compare to our existing data to see if the load times correlate to resource usage:

rep17

This can be done for dozens of metrics and offers some incredible analysis power when researching performance issues with EM12c.  The Performance Analysis link is one of the most powerful tools for locating where a bottleneck in performance is coming from and very quickly.  The fluid ability to add metrics to the graphs section and see how they correspond to the other resource usage is incredibly beneficial as well.

Metric Details

Now back to our Collections graph, if you remember we had three options when we click on the blue line:

rep13

By clicking on the Metrics Details link, we’ll then go to performance page for All Metrics.

rep18

This page displays information about the number of short and long running collections in backlog and will display the status if the threshold value has been hit for backlog quantity.  The page functions similar to Incidents, in that you can click on the right middle button to display the Task Class information highlighted to full page.

You are also offered the option to modify thresholds if the current values don’t meet the demands of the system is under currently, but know that the recommended values are their for a reason and the option to change them should be seriously researched beforehand.

Target Home

This link takes you to the Overview and Health page for the OMR.  I get to save a lot of typing by just sending you to my blog post on this great feature! :)

A final clarification, too-  the three options available, Performance Analysis, Metric Details and Target Home are options available for each metric by double-clicking in the Repository Collections Performance or the Metric Data Rollup graph, which we’ll discuss next.

Metric Data Rollup Performance

The last graph, in the right hand bottom corner, is the for metric data.  This graph displays the number of metric records rolled up and the throughput per minute for this data to be uploaded into the OMR.

We again have the ability to inspect performance analysis by double-clicking on the metric in the graph.

rep19

Each of the three options work almost exactly the same way as I demonstrated for the Collections Performance, but the data is based on the metrics rollup.

The main functionality of each of these sections is to realize how many different ways you can do performance analysis on different performance data:

rep20

Yes, even the legend can be clicked on and a detail option chosen.

That completes the review of the Repository Tab, remember, I have two more tabs to cover in posts before we dig into the Management Services and Agents performance consoles.

rep21

 

 

 

 

 

 

 

 

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [EM12c, Rel. 4, OMS and OMR Health, Part II], All Right Reserved. 2014.

The Importance of the In-Memory DUPLICATE Clause for a RAC System

With the INMEMORY clause you can specify 4 sub-clauses:

  • The MEMCOMPRESS clause specifies whether and how compression is used
  • The PRIORITY clause specifies the priority (“order”) in which the segments are loaded when the IMCS is populated
  • The DISTRIBUTE clause specifies how data is distributed across RAC instances
  • The DUPLICATE clause specifies whether and how data is duplicated across RAC instances

The aim of this post is not to describe these attribues in detail. Instead, this post shows you the impact of the DUPLICATE clause when a query is executed in a RAC environment. For a short description, have a look to this table in the Database Reference guide. For more details, have a look to the Oracle Database In-Memory white paper.

It is essential to understand that the DUPLICATE clause (also known as “Fault Tolerant In-Memory Column Store”), according to the Oracle Database Licensing Information guide, requires Exadata or Supercluster. Also note that, according to this Twitter conversation, Oracle also plans to support the feature on ODA. Hence, the idea is to support it on all engineered systems.

In this post I would like to show you examples that make clear why the DUPLICATE clause is key to get good performance on a RAC system. To do so, I will execute a simple query like “SELECT sum(col) FROM tab” with different settings.

First of all, let me set the scene…

  • Since the In-Memory option is required, I have to use version 12.1.0.2.0
SQL> SELECT banner
  2  FROM v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE    12.1.0.2.0      Production
TNS for Linux: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production
  • What you cannot see from the previous output is that the database is hosted on RAC/Exadata system
  • The test table stores 12GB of data in about 1.5 million blocks (BTW, this is the same table I used in this post)
SQL> SELECT blocks, round(blocks*8192/1024/1024) AS size_mb
  2  FROM user_tables
  3  WHERE table_name = 'T';

    BLOCKS    SIZE_MB
---------- ----------
   1550061      12110
  • Note that parallel processing is disabled at the table level
  • At the beginning, make sure that the In-Memory Column Store (IMCS) is not used for the test table
SQL> ALTER TABLE t NO INMEMORY;

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

no rows selected



TEST 1

Let me prepare the environment for the first test:

  • One instance is up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  COUNT(*)
----------
         1
  • Specify to use the IMCS for the test table and trigger the population
SQL> ALTER TABLE t INMEMORY;

SQL> SELECT count(*)
  2  FROM t;

  COUNT(*)
----------
 100000000
  • After waiting for few minutes, check that the population is over (POPULATE_STATUS=COMPLETED) and that the test table is completely stored in the IMCS (BYTES_NO_POPULATED=0).
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED    1.0921E+10 1.2707E+10                   0
  • Finally, flush the buffer cache (note that the IMCS is not impacted by this action)
SQL> ALTER SYSTEM FLUSH BUFFER_CACHE;

Now that the environment is ready, I execute the test query and, at the same time, I activate extended SQL trace and gather runtime information:

  • The execution took 2.75 seconds (UPDATE 2014-12-11: The execution isn’t as fast as you might expect because SQL trace was enabled during the test. Without SQL trace the execution is about 10 times faster.)
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:02.75

SQL> execute dbms_session.session_trace_disable
  • A serial execution taking advantage of an IM scan (TABLE ACCESS INMEMORY FULL) was used
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:02.73 |       4 |      2 |
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:02.73 |       4 |      2 |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:00.91 |       4 |      2 |
-------------------------------------------------------------------------------------------------------
  • The corresponding extended SQL trace information as shown by TKPROF is the following
SELECT /*+ gather_plan_statistics */ sum(n_32)
FROM t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.72       2.73          2          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.72       2.73          2          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=2 pw=0 time=2730184 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=907260 us cost=17717 size=300000000 card=100000000)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                 2        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

In summary, the execution was, for a query that had to process 12GB of data, quite fast. This was possible because all data was extracted from the IMCS.



TEST 2

The second test uses two instances. As a result, after starting the second instance, the Im-Memory attributes of the test table needed to be changed, and the IMCS had to be re-populated.

  • Two instances are up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  COUNT(*)
----------
         2
  • Data is duplicated on both instances
SQL> ALTER TABLE t INMEMORY DUPLICATE;
  • Trigger population
SQL> SELECT count(*)
  2  FROM t;

  COUNT(*)
----------
 100000000
  • After waiting for few minutes, check that the population is over and that test table is completely stored in both IMCS
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED    1.0921E+10 1.2707E+10                   0
         1 COMPLETED    1.0921E+10 1.2707E+10                   0
  • Flush the buffer cache of both instances

Of course the test is the same as for TEST 1. The only difference is that the test table is stored in the IMCS of both instances.

  • The execution took 2.97 seconds (UPDATE 2014-12-11: The execution isn’t as fast as you might expect because SQL trace was enabled during the test. Without SQL trace the execution is about 10 times faster.)
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:02.97

SQL> execute dbms_session.session_trace_disable
  • It goes without saying that the execution plan is the same as for TEST 1
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:02.97 |       4 |      2 |
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:02.97 |       4 |      2 |
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:01.02 |       4 |      2 |
-------------------------------------------------------------------------------------------------------
  • And here is the extended SQL trace information
SELECT /*+ gather_plan_statistics */ sum(n_32)
FROM t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.95       2.96          2          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.95       2.97          2          4          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=4 pr=2 pw=0 time=2967854 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=4 pr=2 pw=0 time=1023321 us cost=17717 size=300000000 card=100000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                 2        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

In summary, there is no real difference between TEST 1 and TEST 2.



TEST 3

The third test also uses two instances. But, this time, data is not fully duplicated on each instance. Instead, it is only distributed. This means that every row of the test table is stored in the IMCS of either one instance or the other.

Let me prepare the environment:

  • Two instances are up and running
SQL> SELECT count(*)
  2  FROM gv$instance;

  COUNT(*)
----------
         2
  • Data is not duplicated on both instances
SQL> ALTER TABLE t INMEMORY NO DUPLICATE;
  • Trigger population
SQL> SELECT count(*)
  2  FROM t;

  COUNT(*)
----------
 100000000
  • After waiting for few minutes, check that the population is over. Notice that since no instance contains all data, the BYTES_NOT_POPULATED column remains greater than 0
SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
  2  FROM gv$im_user_segments
  3  WHERE segment_name = 'T';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED    5726535680 1.2707E+10          6029524992
         1 COMPLETED    5194317824 1.2707E+10          6652174336
  • Flush the buffer cache of both instances

Once more, the same SQL statements are used to run the test:

  • The execution took 2 minutes and 40 seconds!!!!!
SQL> execute dbms_session.session_trace_enable

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:02:39.95

SQL> execute dbms_session.session_trace_disable
  • The execution plan is the same as for TEST 1 and TEST 2. The difference is that a huge number of logical and physical I/O were performed
SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

-------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:02:39.94 |    1086K|   1086K|
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:02:39.94 |    1086K|   1086K|
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:02:31.06 |    1086K|   1086K|
-------------------------------------------------------------------------------------------------------
  • The information provided by extended SQL trace confirms what dbms_xplan shows. In addition, it also shows that the poor performance is due to the high number of single-block physical reads that were performed (about 275K)
SELECT /*+ gather_plan_statistics */ sum(n_32)
FROM t

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     93.50     159.94    1086979    1086981          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     93.50     159.94    1086979    1086981          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 95
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=1086981 pr=1086979 pw=0 time=159942519 us)
 100000000  100000000  100000000   TABLE ACCESS INMEMORY FULL T (cr=1086981 pr=1086979 pw=0 time=151063172 us cost=18478 size=300000000 card=100000000)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read            274946        0.03         73.01
  gc cr multi block request                    6368        0.03          4.19
  cell multiblock physical read                6368        0.02         10.00
  latch: cache buffers lru chain                  6        0.00          0.00
  latch: gcs resource hash                        2        0.00          0.00
  latch: gc element                             113        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00

Since the bad performance is caused by the number of physical I/O, it goes without saying that if only logical I/O are taking place, the performance is much better. In this case, an immediate re-execution of the query took 8.28 seconds.

SQL> SELECT /*+ gather_plan_statistics */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:08.28

SQL> SELECT * FROM table(dbms_xplan.display_cursor('7qry07h06c2xs',0,'iostats last'));

---------------------------------------------------------------------------------------
SQL_ID  7qry07h06c2xs, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ sum(n_32) FROM t

Plan hash value: 2966233522

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |      1 |        |      1 |00:00:08.28 |    1086K|
|   1 |  SORT AGGREGATE             |      |      1 |      1 |      1 |00:00:08.28 |    1086K|
|   2 |   TABLE ACCESS INMEMORY FULL| T    |      1 |    100M|    100M|00:00:08.26 |    1086K|
----------------------------------------------------------------------------------------------

In summary, only part of the data was accessed through the IMCS. In addition, despite the fact that an IM scan was processed, except if data was in the buffer cache, many single-block physical reads were carried out. As a result, the performance was bad. This behaviour shows that a session is only able to access the IMCS it is managed by the instance it is connected to. This is also a good example showing that accessing data through the IMCS is faster than accessing the same data through the buffer cache.



TEST 4
The fourth and final test is a continuation of the previous one. Its aim is to check the performance of a parallel query. Hence, I added the PARALLEL hint to the test query. Also note that I flushed the buffer cache of both instances before executing the test query.

SQL> SELECT /*+ gather_plan_statistics parallel */ sum(n_32)
  2  FROM t;

 SUM(N_32)
----------
1550000000

Elapsed: 00:00:01.49

SQL> SELECT * FROM table(dbms_xplan.display_cursor('br2as2yzy7gk4',2,'iostats last'));

SQL_ID  br2as2yzy7gk4, child number 2
-------------------------------------
SELECT /*+ gather_plan_statistics parallel */ sum(n_32) FROM t

Plan hash value: 3126468333

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |      1 |00:00:01.41 |       7 |      1 |
|   1 |  SORT AGGREGATE                 |          |      1 |      1 |      1 |00:00:01.41 |       7 |      1 |
|   2 |   PX COORDINATOR                |          |      1 |        |     10 |00:00:01.34 |       7 |      1 |
|   3 |    PX SEND QC (RANDOM)          | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE              |          |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   5 |      PX BLOCK ITERATOR          |          |      0 |    100M|      0 |00:00:00.01 |       0 |      0 |
|*  6 |       TABLE ACCESS INMEMORY FULL| T        |      0 |    100M|      0 |00:00:00.01 |       0 |      0 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - inmemory(:Z>=:Z AND :Z<=:Z)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 10
   - parallel scans affinitized for inmemory

The execution took 1.49 seconds. In other words, the performance of the parallel query is way better than the one of TEST 3. The reason can be seen in the Note section of the dbms_xplan output: "parallel scans affinitized for inmemory". This means that parallel slaves were started on both instances and that every set of parallel slaves accessed only the data stored in the IMCS of the instance they were attached to. As a result, even though the data was distributed, the performance was good. The only "overhead" compared to the serial execution is that the parallel slaves had to communicate with the query coordinator.



WHAT WE LEARNED

  • DUPLICATE is only applicable to engineered systems (as I write this blog, ODA does not yet support it)
  • On a RAC system, without DUPLICATE, data is distributed across all instances
  • A process can only access data stored in the IMCS of the instance it is connected to
  • Even though an IM scan is used, serial executions on RAC systems might require many logical/physical I/O. (UPDATE 2014-12-11: Parallel executions using Auto DOP don't suffer from this problem.)

In summary, I would not advise to use the In-Memory option on a RAC system that does not support the DUPLICATE clause. At least, not in general.

APEX Woman In Technology Award for 2014

Based on the advise of a colleague, and also because of my desire to become more involved in the wider technology community in Colorado, I joined the Colorado Technology Association (CTA) last year.  I had been involved with the Rocky Mountain Oracle Users Group (RMOUG) for over 20 years, much of that time on the board of directors, including two 3-year stints as president, but I wanted to participate in the wider community beyond my beloved Oracle technology community.

I’ve only attended three CTA events to date, but this past summer I learned that CTA is a great advocate of Women In Technology (WIT) and has done a great deal of work to encourage women to enter the technology industry and to stay in the industry.

When I began my own career in the IT industry in the early 1980s, I recall many women as colleagues, working as programmers and system administrators.  Imperceptibly, over the intervening years, the proportion of women in IT seems to have declined.  When my wife Kellyn Pot’vin-Gorman and I began dating in 2011, she made me aware that the number of women had dropped precipitously, to less than 25% overall.

It is a confusing trend.  Information technology, especially the technical jobs like programming, application development, testing, and systems administration, should be dominated by the girls with whom I grew up, who were so much better at math, science, and physics than us boys.  Society places pressures on girls and women to consider family ahead of career, but that makes IT even more of a natural fit, particularly today with the ability to work remotely from anywhere.

What the heck is going on?  Why is this happening?  Why has a profession in which logic and cerebral prowess is so important proved to be so antediluvian?

Kellyn has been concerned as well, and she is alarmed that she is becoming increasingly more isolated as the only woman in the room in many places in the IT industry.  She’s been acutely aware of the exodus of women from the IT industry, she has heard the explanations by women leaving, and has felt the pressures, large and small, herself.  I know she loves getting into the geeky stuff up to her elbows, as do I, so it is not through “natural inclination” for her or our women colleagues to leave this work, same as I.  If you read her blog DBAKevlar, you’ll understand and witness her fascination with technology, and you’ll be amazed at her facility in sharing that wonder.  Likewise, her Twitter handle @DBAKevlar is a powerful channel for her interests and fun perspective and has been cited by Forbes magazine as one of the “top 50 twitter handles that everyone interested in Oracle should follow“.

My wife is a geek.  And I admire her.

Over the past few years, Kellyn decided to take action.  It is not clear what the root cause(s) or problems might be, so it was not clear what form that action should take.  Everyone has their own personal viewpoint on the matter, ranging from outright misogyny, to hints of discrimination, with flavors at cultural preferences and misunderstandings on occasion, seasoned with outdated stereotypes stirred in for good measure.

It is an explosive brew.

Clearly, anyone who wants to get near to the third rail of sexual politics in the high-flying technology field, has to tread carefully.

But you just can’t tread carefully when the root cause is the third rail itself.

As a technologist, Kellyn had learned that it is easier to socialize a solution if one has marshaled the necessary facts.  She has researched the issues facing women in the IT industry, relying on studies and basic research from academia, government, and professional societies.  She has the facts close to hand, with references.  Coupled with her own first-hand experiences and those of her female colleagues, she started expressing her findings to her colleagues, discussing women’s issues at technology conferences, asking the basic question “what do you think?

She understands that men are not necessarily the full cause of the problem.  Often, it is women who seek to oppress other women, in a bid to be the “queen bee”.  Often, men are undermined or thwarted at any attempts to support the women in their lives.  Consider the ribbing that a man who takes on more responsibility for childcare receives from other women and men, the raised eyebrows, the leering jokes.  The suspicion directed toward a man in the grocery store during work-hours.  The challenges of male nurses, who face the implicit assumption they are homosexual or questions why they are not doctors.

These are the sorts of issues that women in technology face.

At times,

for her many tireless (and sometimes personally costly) efforts encouraging and mentoring women in the technology industry.  About 2 months ago, we learned that she was considered a finalist for the award.

And this past Wednesday evening, at the Hyatt Regency Denver at the Colorado Convention Center, we found that her non-stop work has won her the award for 2014.

Concurrent statistics gathering in 12.1.0.2

Prompted by an actual task at hand I spent some time investigating an 11.2.0.2 feature – concurrent statistics gathering. It has been on my to-do list for quite some time but so far I didn’t have a use case, and use cases make it so much easier. The question was-how can I gather statistics on a really large, partitioned table? Previously, you could revert to the degree in dbms_stats.gather_table_stats to ensure that statistics were gathered in parallel. This is all good, but sometimes you need more umph. Some DBAs wrote scripts to execute individual statistic gathering jobs against partitions in parallel, using the tabname and partname arguments in dbms_stats.gather_table_stats(). But that requires manual effort – and the not-quite-so-new concurrent option is so much nicer. Let me take you along the ride… Actually I have to tell the story starting with the happy ending as I had a few snags along the way. This is 12.1.0.2.1 on Oracle Linux 6.5.

Enabling concurrent statistics gathering

A flag needs to be set to enable the new concurrent mechanism for gathering statistics. The initial blog post on the subject set the flag globally:

https://blogs.oracle.com/optimizer/entry/gathering_optimizer_statistics_is_one

Scroll down to “configuration and settings” for the detail. I personally would rather like to set this at table level, but despite the Oracle documentation suggesting that it was possible, it is not. You can check the header of DBMS_STATS to see the different values that can be passed as PNAME to set_table_prefs().

After enabling the concurrent flag on the database level, you can query it:

SQL> select dbms_stats.get_prefs('CONCURRENT')
  2  from dual;

DBMS_STATS.GET_PREFS('CONCURRENT')
------------------------------------------------------------------------------------
TRUE

SQL> select dbms_stats.get_prefs('CONCURRENT',user,'BIGTAB_PART') flag from dual;

FLAG
-------------------------------------------------------------------------------------
TRUE

We should be ready for using the concurrent method now. Here is some background information about the table in question:

SQL>  select partition_name, num_rows, sample_size, last_analyzed 
  2   from user_tab_partitions where table_name = 'BIGTAB_PART'
  3  ;

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254
SYS_P1253
SYS_P1252
SYS_P1251
SYS_P1250
SYS_P1249
SYS_P1248
SYS_P1247
SYS_P1246
SYS_P1245
P_MAN

11 rows selected.

SQL> select partition_name, bytes/power(1024,2) m from user_segments where segment_name = 'BIGTAB_PART';

PARTITION_NAME                          M
------------------------------ ----------
SYS_P1245                            8064
SYS_P1246                            8064
SYS_P1247                            8064
SYS_P1248                            8064
P_MAN                                8064
SYS_P1249                            8064
SYS_P1250                            8064
SYS_P1251                            8064
SYS_P1252                            8064
SYS_P1253                            8064
SYS_P1254                               8

11 rows selected.

So each partition is about 8 GB in size. A little helper script prints all the table preferences:

getting table preferences for table BIGTAB_PART
value for CASCADE is DBMS_STATS.AUTO_CASCADE
value for DEGREE is NULL
value for ESTIMATE_PERCENT is DBMS_STATS.AUTO_SAMPLE_SIZE
value for METHOD_OPT is FOR ALL COLUMNS SIZE AUTO
value for NO_INVALIDATE is DBMS_STATS.AUTO_INVALIDATE
value for GRANULARITY is AUTO
value for PUBLISH is TRUE
value for INCREMENTAL is FALSE
value for STALE_PERCENT is 10
value for CONCURRENT is TRUE

Now you can simply execute the call to dbms_stats.gather_table_stats(user, ‘BIGTAB_PART’, degree=>4) and thanks to all the defaults the gathering of statistics will be performed by multiple jobs. In order for this to work you need to have job_queue_processes set to > 4 (the default is 1000 in 12.1.0.2) and you need a resource manager plan to be active. Refer to the oracle blog post referenced earlier for more information about statement queueing and the execution of these jobs. The post also recommends setting parallel_adaptive_multi_user to false, which is the default in 11.2. Just be careful changing PX parameters-these can have undesired side effects.

Using the diagnostic queries take from the Oracle blog post you can see what’s going on. In my case, 6 jobs were created:

select job_name, state, comments from dba_scheduler_jobs where job_class like 'CONC%'

JOB_NAME             STATE           COMMENTS
-------------------- --------------- ----------------------------------------------------------------------------------------------------
ST$T3606_6_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1254";"MARTIN"."BIGTAB_PART"
ST$T3606_5_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1252";"MARTIN"."BIGTAB_PART"."SYS_P1253"
ST$T3606_4_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1250";"MARTIN"."BIGTAB_PART"."SYS_P1251"
ST$T3606_3_B2        RUNNING         "MARTIN"."BIGTAB_PART"."SYS_P1248";"MARTIN"."BIGTAB_PART"."SYS_P1249"
ST$T3606_2_B2        SCHEDULED       "MARTIN"."BIGTAB_PART"."SYS_P1246";"MARTIN"."BIGTAB_PART"."SYS_P1247"
ST$T3606_1_B2        RUNNING         "MARTIN"."BIGTAB_PART"."P_MAN";"MARTIN"."BIGTAB_PART"."SYS_P1245"

6 rows selected.

By the way if you are interested in the payload, you can use DBMS_METADATA:

SQL> get get_job
  1* SELECT DBMS_METADATA.GET_DDL('PROCOBJ', '&1') from dual;
SQL>

You pass it the job name and you get some pretty cryptic looking output :)

Interlude

Actually, you have to see this…

DBMS_METADATA.GET_DDL('PROCOBJ','ST$T3982_1')
-----------------------------------------------------------------------------------------


BEGIN
dbms_scheduler.create_job('"ST$T3982_1"',
job_type=>'PLSQL_BLOCK', job_action=>
'declare c dbms_stats.CContext := dbms_stats.CContext(); begin c.extend(30); c(1):=''ST$T3982_1;1;855;1;FALSE;FALSE;''; c(2):=q''#"MARTIN";#''; c(3):=q''#"BIGTAB_PART";#''; c(4):=q''#"SYS_P1250";#''; c(5):=''0;''; c(6):=''FALSE;''; c(7):=q''#FOR ALL
COLUMNS SIZE AUTO;#''; c(8):=''NULL;''; c(9):=''PARTITION;''; c(10):=''TRUE;''; c(11):=q''#NULL;#''; c(12):=q''#NULL;#''; c(13):=q''#NULL;#''; c(14):=''NULL;''; c(15):=''DATA;''; c(16):=''FALSE;''; c(17):=''GLOBAL AND PARTITION;''; c(18):=''TRUE;'';
c(19):=''0;''; c(20):='';''; c(21):=''FALSE;''; c(22):=''FALSE;''; c(23):=''4;''; c(24):=''ST$T3982;''; c(25):=''TRUE;''; c(26):=''FALSE;''; c(27):=''4.562199;''; c(28):=''228.109937;''; c(29):=''0;''; c(30):=''0;''; dbms_stats.gather_table_stats('''
','''',context=>c); end;'
, number_of_arguments=>0,
start_date=>NULL, repeat_interval=>
NULL
, end_date=>NULL,
job_class=>'"CONC_ST$T3982"', enabled=>FALSE, auto_drop=>TRUE,comments=>
'"MARTIN"."BIGTAB_PART"."SYS_P1250"'
);
dbms_scheduler.set_attribute('"ST$T3982_1"','raise_events',38);
dbms_scheduler.enable('"ST$T3982_1"');
COMMIT;
END;

This looks better when you parse it a bit more so that it reads:

DECLARE
  c dbms_stats.CContext := dbms_stats.CContext();
BEGIN
  c.extend(30);
  c(1) :='ST$T3982_1;1;855;1;FALSE;FALSE;';
  c(2) :=q'#"MARTIN";#';
  c(3) :=q'#"BIGTAB_PART";#';
  c(4) :=q'#"SYS_P1250";#';
  c(5) :='0;';
  c(6) :='FALSE;';
  c(7) :=q'#FOR ALL COLUMNS SIZE
AUTO;#';
  c(8) :='NULL;';
  c(9) :='PARTITION;';
  c(10):='TRUE;';
  c(11):=q'#NULL;#';
  c(12):=q'#NULL;#';
  c(13):=q'#NULL;#';
  c(14):='NULL;';
  c(15):='DATA;';
  c(16):='FALSE;';
  c(17):='GLOBAL AND PARTITION;';
  c(18):='TRUE;';
  c(19):='0;';
  c(20):=';';
  c(21):='FALSE;';
  c(22):='FALSE;';
  c(23):='4;';
  c(24):='ST$T3982;';
  c(25):='TRUE;';
  c(26):='FALSE;';
  c(27):='4.562199;';
  c(28):='228.109937;';
  c(29):='0;';
  c(30):='0;';
  dbms_stats.gather_table_stats('','',context=>c);
END;

Interesting!

The concurrent statistics gathering works for incremental statistics as well and is a good way to get the initial set of stats it needs anyway (see Doc ID 1541543.1 for background on incremental statistics gathering).

The end result are statistics!

SQL> select partition_name, num_rows, sample_size, last_analyzed
  2   from user_tab_partitions where table_name = 'BIGTAB_PART';

PARTITION_NAME                   NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
SYS_P1254                              32          32 07-NOV-14
SYS_P1253                        25600000    25600000 07-NOV-14
SYS_P1252                        25600000    25600000 07-NOV-14
SYS_P1251                        25600000    25600000 07-NOV-14
SYS_P1250                        25600000    25600000 07-NOV-14
SYS_P1249                        25600000    25600000 07-NOV-14
SYS_P1248                        25600000    25600000 07-NOV-14
SYS_P1247                        25600000    25600000 07-NOV-14
SYS_P1246                        25600000    25600000 07-NOV-14
SYS_P1245                        25600000    25600000 07-NOV-14
P_MAN                            25599968    25599968 07-NOV-14

11 rows selected.

Stats not gathered concurrently if partitions are small

This all looks pretty straight forward, except that it wasn’t when I first tried. I set all the parameters as documented but still had no jobs running in the background. The first table I tried to gather statistics on concurrently was partitioned exactly like BIGTAB_PART with the difference that all partitions where compressed for QUERY HIGH. This means that a lot less space is taken for each. I got an idea that segment_size might be reladed, so I compressed BIGTAB_PART and here is the result:

SQL> select partition_name, bytes/power(1024,2) m from user_segments 
  2   where segment_name = 'BIGTAB_PART';

PARTITION_NAME                                   M
------------------------------ -------------------
P_MAN                                       321.75
SYS_P1245                                   321.00
SYS_P1246                                   322.50
SYS_P1247                                   321.50
SYS_P1248                                   320.75
SYS_P1249                                   322.25
SYS_P1250                                   322.25
SYS_P1251                                   322.25
SYS_P1252                                   322.25
SYS_P1253                                   322.00
SYS_P1254                                     8.25

11 rows selected.

When trying to gather statistics concurrently (without changing the table preferences at all), this did not happen. So remember next time you gather statistics concurrently and don’t see multiple jobs for intra-table statistics gathering that your partitions might actually be too small.

Presentations to go to at #DOAG2014

As every year, there’s a long list of great speakers with interesting talks to attend at the DOAG (German Oracle User Group) annual conference. Sadly I cannot attend them all, so I’ve got to make a choice:

First day

Datenbank-Upgrade nach Oracle 12.1.0.2 – Aufwand, Vorgehen, Kunden by Mike Dietrich, Oracle

Die unheimliche Begegnung der dritten Art: XML DB für den DBA by Carsten Czarski, Oracle

Advanced RAC Programming Features by Martin Bach, Enkitec

Automatische Daten Optimierung, Heatmap und Compression 12c live by Ulrike Schwinn, Oracle

Second day

Understanding Oracle RAC Internals – The Cache Fusion Edition by Markus Michalewicz, Oracle

Die Recovery Area: Warum ihre Verwendung empfohlen ist – I have to go to that one because I present it myself :-)

Geodistributed Oracle GoldenGate and Oracle Active Data Guard: Global Data Services by Larry Carpenter, Oracle

Oracle Database In-Memory – a game changer for data warehousing? by Hermann Baer & Maria Colgan, Oracle

Oracle Distributed Transactions by Joel Goodman, Oracle

Third day

High Noon – Bessere Überlebenschancen beim Datenbank Security Shoot Out by Heinz-Wilhelm Fabry, Oracle

Tuning Tools für echte Männer und Sparfüchse – vom Leben ohne EM12c by Björn Rost, portrix Systems

Best Practices in Managing Oracle RAC Performance in Real Time by Mark Scardina, Oracle

Maximum Availability with Oracle Multitenant: Seeing Is Believing by Larry Carpenter, Oracle

Tagged: #DOAG2014

Cache Buffer Chains Demystified

You must have seen sessions waiting on the event “latch: cache buffers chains” from time to time. If you ever wondered what this means and how you can reduce time spent on it, read on. Here you will learn how buffer cache works, how Oracle multi-versioning works, how buffers are allocated and deallocated, what hash chain is and how buffers are linked to it, what the role of cache buffer chain latch is and why sessions wait on it, how to find the objects causing the contention and how to reduce the time spent on that event.

While exploring the reasons for the slowness in database sessions, you check the wait interface and see the following output:

 SQL> select state, event from v$session where sid = 123;

STATE   EVENT
------- ---------------------------
WAITING latch: cache buffers chains

This event is more common, especially in applications that perform a scan of a few blocks of data. To resolve it, you should understand what the cache buffers chains latch is and why sessions have to wait on it. To understand that you must understand how the Oracle buffer cache works. We will explore these one by one, and close with the solution to reducing the cache buffers chains latch waits.

This is the fifth in the Series "100 Things You Probably Didn't Know About Oracle". If you haven't already, I urge you to read the other parts -

  • Part1 (Commit does not force writing of buffers into the disk), 
  • Part2 (How Oracle Locking Works), 
  • Part3 (More on Interested Transaction Lists).
  • Part4(Can I Fit a 80MB Database Completely in a 80MB Buffer Cache?)

You will learn
•    How buffer cache works
•    How buffers are populated
•    About buffer states and versioning
•    How buffers are flushed
•    About the role of Cache Buffer Chain latch
•    How to reduce CBC Latches
•    About other kinds of latches

How Buffer Cache Works

The buffer cache (BC), resident inside the SGA of an Oracle instance is designed to hold blocks that come from the database. When a user issues a statement such as the following:

update EMP
set NAME = ‘ROB’
where EMPNO = 1

the Oracle server process assigned to the session performs the following actions:
1)    locates the block that contains the record with EMPNO = 1
2)    loads the block from the database file to an empty buffer in the buffer cache
3)    if an empty buffer is not immediately found, locates an empty buffer or forces the DBWn process to write some dirty buffers to make room
4)    updates the NAME column to ROB in the buffer

In step 1, we assume an index is present and hence the server process can locate the single block immediately. If the index is not present, Oracle will need to load all the blocks of the table EMP into the buffer cache and check for matching records one by one.

The above description has two very important concepts:
1)    a block, that is the smallest unit of storage in the database
2)    a buffer, that is a placeholder in the buffer cache used to hold a block.

Buffers are just placeholders, which may or may not be occupied. They can hold exactly one block at a time. Therefore for a typical database where the block size is set to 8KB, the buffers are also of size 8KB. If you use multiple block sizes, e.g. 4KB or 16KB, you would have to define multiple buffer caches corresponding to the other block sizes. In that case the buffer sizes will match the block sizes corresponding to those blocks.

When buffers come to the cache, the server process must scan through them to get the value it wants. In the example shown above, the server process must find the record where EMPNO=1. To do so, it has to know the location of the blocks in the buffers. The process scans the buffers in a sequence. So, buffers should ideally be placed in a sequence, e.g. 10 followed by 20, then 30, etc. However this creates a problem. What happens when, after this careful placement buffers, buffer #25 comes in? Since it falls between 20 and 30, it must be inserted in-between, i.e. Oracle must move all the buffers after 20 one step towards the right to make room for the new buffer #25. Moving of memory areas in the memory is not a good idea. It costs expensive CPU cycles, requires all actions on the buffers (even reading) to stop for the duration and is prone to errors.

Therefore, instead of moving the buffers around, a better approach is to put them in something like a linked list. Fig 1 shows how that is done. Each of the buffers has two pointers: which one is behind and which one is right ahead. In this figure, buffer 20 shows that 10 is in front and 30 is the one behind. This would be the case regardless of the actual position of the buffers. When 25 comes in, all we have to do is to update the “behind pointer” of 20 and “ahead pointer” of 30 to point to 25. Similarly the “ahead pointer” and “behind pointer” of 25 are updated to point to 30 and 20 respectively. This simple update is much quicker, does not need activity to stop on all the buffers except the ones being updated and less error-prone.

However, there is another problem. This is just one of the lists. Buffers are used for other purposes as well. For instance, the LRU algorithm needs a list of buffers in LRU order, the DBWn process needs a list of buffers for writing to the disk, etc. So, physically moving the buffers to specific lists is not just impractical, it’s impossible at the same time. Oracle employs a simpler technique to overcome the obstacle. Rather than placing the actual buffers in the linked list, Oracle creates a simpler, much lighter structure called buffer header as a pointer to an actual buffer. This buffer cache is moved around, leaving the actual buffer in place. This way, the buffer header can be listed in many types of lists at the same time. These buffer headers are located in the shared pool, not the buffer cache. This is why you will find the reference to buffers in the shared pool.

Buffer Chains

The buffers are placed in strings. Compare that to rows of spots in a parking lot. Cars come in to an empty spot in a row. If they don’t find one, they go to the next row and so on. Similarly buffers are located on the cache as rows. However, unlike the parking spots which are physically located next to each other, the buffers are logically placed as a sequence in the form of a linked list, described in the above section. Each linked list of buffers is known as a buffer chain, as shown in Fig 2.

Notice how each of the three chains has different numbers of buffers. This is quite normal. Buffers are occupied only when some server process brings them up from the block. Otherwise the buffers are free and not linked to anything. When the buffers are freed up, perhaps because some process such as DBWn writes their contents to the disk, they are removed from the list—a process known as unlinking from the chain. So, in a normal database, buffers will be constantly linked to and unlinked from a chain—making the chain long or small depending on the frequency of either activity. The number of buffer chains is determined by the hidden database parameter _db_block_hash_buckets, which is automatically calculated from the size of the buffer cache.

When a server process wants to access a specific buffer in the cache, it starts at the head of the chain and goes on to inspect each buffer in sequence until it finds what it needs. This is called walking the chain. You might be wondering about a nagging question here—when a buffer comes to the cache, who decides which of the three chains it should be linked to and how? A corollary to that is a challenge posed by the server process in trying to find a specific buffer in the cache. How does the process know which chain to walk? If it always starts at the chain 1, it will take an extraordinary amount of time to locate the block. Typical buffer caches are huge, so the number of chains may run into 10’s of thousands, if not 100’s. So, searching all the chains is not practical. On the other hand, if Oracle were to maintain a memory table showing which blocks are located in which buffers is not practical either, because maintaining that memory table will be time consuming and make the process sequential. Several processes can’t read chains in parallel then.

Oracle solves the problem in a neat manner. Consider the parking lot example earlier. What if you forget where you parked your car? Suppose after you come out of the mall, you find that all the cars have been buried under a thick pile of snow making identification of any of the cars impossible. So, you would have to start at the first car at the first row, dust off the snow from the license plate, check for your car, move on to the next, and so on. Sounds like a lot of work, doesn’t it? So, to help forgetful drivers, the mall marks the rows with letter codes and asks the drivers to park in the row matching the first letter of their last name. John Smith will need to park in row S, and in row S only, even if row T or row R are completely empty. In that case, when John returns to find his car and forgets where it is, he will know to definitely find it in row S. That will be the domain of his search—much, much better than searching the entire parking lot.

Similarly, Oracle determines which specific chain a buffer should be linked to. Every block is uniquely identified by a data block address (DBA). When the block comes to the buffer cache, Oracle applies a hash function to determine the buffer chain number and places the block in a buffer in that chain alone. Similarly, while looking up a specific buffer, Oracle applies the same hash function to the DBA, instantly knows the chain the buffer will be found and walks that specific buffer only. This makes accessing a buffer much easier compared to searching the entire cache.

To find out the data block address, you need to first get the relative file# and block#. Here is an example where I want to find out the blocks of the table named CBCTEST.

 SQL> select
  2     col1,
  3    dbms_rowid.rowid_relative_fno(rowid) rfile#,
  4    dbms_rowid.rowid_block_number(rowid) block#
  5  from cbctest;

      COL1     RFILE#     BLOCK#
---------- ---------- ----------         
1 6        220         
2 6        220         
3 6        220         
4 6        221         
5 6        221         
6 6        221

6 rows selected.

From the output we see that there are 6 rows in this table and they are all located in two blocks in a file with relative file# 6. The blocks are 220 and 221. Using this, we can get the data block address. To get the DBA of the block 220:

SQL> select dbms_utility.make_data_block_address(6,220) from dual; 

DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(6,220)
-------------------------------------------                                   
25166044

The output shows the DBA of that block is 25166044. If there are three chains, we could apply a modulo function that returns the reminder from an input after dividing it by 3:

SQL> select mod(25166044,3) from dual; 

MOD(25166044,3)
---------------              
1

So, we will put it in chain #1 (assuming there are three chains and the first chain starts with 0). The other block of that table, block# 221 will end up in chain #2:

SQL> select dbms_utility.make_data_block_address(6,221) from dual; 

DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(6,221)
-------------------------------------------                                   
25166045

SQL> select mod(25166045,3) from dual;

MOD(25166045,3)
---------------              
2

And so on. Conversely, Oracle if we get a DBA, we can apply the mod() function and the output shows the chain it can be found on. Oracle does not use the exact mod() function as shown here; but a more sophisticated hash function. The exact mechanics of the function is not important; the concept is similar. Oracle can identify the exact chain the buffer needs to go to by applying a hash function on the DBA of the buffer.

Multi-versioning of Buffers

Consider the update SQL statement shown in the beginning of the paper. When Oracle updates the buffer that already exists in the buffer cache, it does not directly update it. Instead, it creates a copy of the buffer and updates that copy. When a query selects data from the block as of a certain SCN number, Oracle creates a copy of the buffer as of the point in time of interest and returns the data from that copy. As you can see, there might be more than a single copy of the same block in the buffer cache. While searching for a buffer the server process needs to search for the versions of the buffer as well. This makes the buffer chain even longer.

To find out the specific buffer of a block, you can check the view V$BH (the buffer headers). The column OBJD is the object_id. (Actually it's the DATA_OBJECT_ID. In this case both are the same; but may not be in all cases). Here are the columns of interest to us:

•    FILE# - the file_id
•    BLOCK# - the block number
•    CLASS# - the type of the block, e.g. data block, segment header, etc. Shown as a code
•    STATUS - the status of the buffer, Exclusive Current, Current, etc.

To make it simpler to understand, we will use a decode() on the class# field to show the type of the block. With that, here is our query: 

select file#, block#,
decode(class#,
    1,'data block',
    2,'sort block',
    3,'save undo block', 
    4,'segment header',
    5,'save undo header',
    6,'free list',
    7,'extent map',
    8,'1st level bmb',
    9,'2nd level bmb',
    10,'3rd level bmb', 
    11,'bitmap block',
12,'bitmap index block',
    13,'file header block',
    14,'unused',
15,'system undo header',
    16,'system undo block', 
    17,'undo header',
18,'undo block') 
        class_type, 
    status
from v$bh
where objd = 99360
order by 1,2,3
/

    FILE#     BLOCK# CLASS_TYPE       STATUS     
---------- ---------- ----------------- ----------         
6        219 segment header    cr                  
6        221 segment header    xcur         
6        222 data block        xcur         
6        220 data block        xcur

4 rows selected.

There are 4 buffers. In this example we have not restarted the cache. So there are two buffers for the segment header. There is one buffer for each data block – 220 and 221. The status is "xcur", which stands for Exclusive Current. It means that the buffer was acquired (or filled by a block) with the intention of being modified. If the intention is merely to select, then the status would have shown CR (Consistent Read). In this case since the rows were inserted modifying the buffer, the blocks were gotten in xcur mode. From a different session update a single row. For easier identification I have used Sess2> as the prompt:

Sess2> update cbctest set col2 = 'Y' where col1 = 1;

1 row updated.

From the original session, check the buffers:

      FILE#     BLOCK# CLASS_TYPE        STATUS   
---------- ---------- ----------------- ----------         
6        219 segment header    cr                 
6        220 segment header    xcur         
6        220 data block        xcur                
6        220 data block        cr                  
6        221 data block        xcur       

5 rows selected.

There are 5 buffers now, up one from the previous four. Note there are two buffers for block ID 220. One CR and one xcur. Why two?

It's because when the update statement was issued, it would have modified the block. Instead of modifying the existing buffer, Oracle creates a "copy" of the buffer and modifies that. This copy is now XCUR status because it was acquired for the purpose of being modified. The previous buffer of this block, which used to be xcur, is converted to "CR". There can't be more than one XCUR buffer for a specific block, that's why it is exclusive. If someone wants to find out the most recently updated buffer, it will just have to look for the copy with the XCUR status. All others are marked CR.

Suppose from a third session, update a different row in the same block.

Sess3> update cbctest set col2 = 'Y' where col1 = 2;
1 row updated. 

From the original session, find out the buffers. 


 FILE# BLOCK# CLASS_TYPE         STATUS
------ ------ ------------------ ----------
6 219 segment header xcur
6 219 segment header cr
6 221 data block xcur
6 220 data block xcur
6 220 data block cr
6 220 data block cr
6 220 data block cr
6 220 data block cr
6 220 data block cr

9 rows selected.


Whoa! There are 9 buffers now. Block 220 now has 6 buffers - up from 4 earlier. This was merely a select statement, which, by definition does not change data. Why did Oracle create a buffer for that?

Again, the answer is CR processing. The CR processing creates copies of the buffer and rolls them back or forward to create the CR copy as of the correct SCN number. This created 2 additional CR copies. From one block, now you have 6 buffers and some buffers were created as a result of select statement. This how Oracle creates multiple versions of the same block in the buffer cache.

Latches

Now that you know how many buffers can be created and how they are located on the chains in the buffer cache, consider examine another problem. What happens when two sessions want to access the buffer cache? There could be several possibilities:

1)    Both processes could be after the same buffer
2)    The processes are after different buffers but the buffers are on the same chain
3)    The buffers are on different chains

Possibility #3 is not an issue; but #2 will be. We don’t allow two processes to walk the chain at the same time. So there needs to be some sort of a mechanism that prevents other processes to perform an action when another process is doing it. This is enabled by a mechanism called a latch. A latch is a memory structure that processes compete to acquire. Whoever gets is is said to “hold the latch”; all others must wait until the latch is available. In many respects it sounds like a lock. The purpose is the same—to provide exclusive access to a resource—but locks have queues. Several processes waiting for a lock will get it when the lock is released in the same sequence they started waiting. Latches, on the other hand, are not sequential. Whenever latches are available, every interested process jumps into the fray to capture it. Again, only one gets it; the others must wait. A process first performs a loop, for 2000 times to actively look for the availability of a latch. This is called spinning. After that the process sleeps for 1 ms and then retries. If not successful, it tries for 1 ms, 2 ms, 2 ms, 4 ms, 4 ms, etc. until the latch is obtained. The process is said to be sleep state in between.

So, latches are the mechanism for making sure no two processes are accessing the same chain. This latch is known as cache buffers chains latch. There is one parent CBC latch and several child CBC latches. However, latches consume memory and CPU; so Oracle does not create as many child latches as there are chains. Instead a single latch may be used for two or more chains, as shown in Fig 3. The number of child latches is determined by the hidden parameter _db_block_hash_latches.

Latches are identified by latch# and child# (in case of child latches). A specific instance of latch that is used is identified by its address in memory (latch address). To find out the latch that protects a specific buffer, get the file# and block# as shown earlier and issue this SQL:

select hladdr
from x$bh
where dbarfil = 6
and dbablk = 220;

Going back to CBC latches, let’s see how you can find out the correlation between chains and latches. First, find the Latch# of the CBC latch. Latch# may change from version to version or across platforms; so it’s a good idea to check for it.

select latch# from v$latch
where name = 'cache buffers chains';

LATCH#
------
   203

This is the parent latch. To find out the child latches (the ones that protect the chains), you should look into another view—V$LATCH_CHILDREN. To find out how many child latches are there:

SQL> select count(1) cnt from v$latch_children where latch# = 203; 

CNT
-------
  16384

If you check the values of the two hidden parameters explained earlier, you will see:

_db_block_hash_buckets 524288
_db_block_hash_latches 16384

The parameter _db_block_hash_buckets decides how many buffer chains are there and the parameter _db_block_hash_latchesdecides the number of CBC latches. Did you notice the value, 16384? It determines the number of CBC latches and we confirmed that it is in fact the number of CBC latches.

Diagnosis of CBC Latch Waits

Let’s now jump into resolving the CBC latch issues. The sessions suffering from CBC latch waits will show up in V$SESSION. Suppose one such session is SID 366. To find out the CBC latch, check the P1, P1RAW and P1TEXT values in V$SESSION, as shown below:

select p1, p1raw, p1text
from v$session where sid = 366;

P1         P1RAW          P1TEXT
---------- ---------------- -------
5553027696 000000014AFC7A70 address

P1TEXT clearly shows the description of the P1 column, i.e. the address of the latch. In this case the address is 000000014AFC7A70. We can check the name of the latch and examine how many times this latch has been requested by sessions but has been missed.

 SQL> select gets, misses, sleeps, name
  2  from v$latch where addr = '000000014AFC7A70';

GETS  MISSES SLEEPS NAME
----- ------ ------ --------------------
49081     14     10 cache buffers chains

From the output we conform that this is a CBC latch. It has been acquired 49,081 times, 14 times missed and 10 times processes have gone to sleep waiting for it.

Next, identify the object whose buffer is so popular. Get the File# and Block# from the buffer cache where the CBC latch is the latch address we identified to be the problem:

select dbarfil, dbablk, tch
from x$bh
where hladdr = '000000014AFC7A70';

DBARFIL DBABLK TCH
------- ------ -----
      6    220 34523

The TCH column shows the touch count, i.e. how many times the buffer has been accessed—a measure of its popularity and hence how much likely that it will be subject to CBC latch waits. From the file# and block# we can get the object ID. The easiest way is to dump the block and get the object ID from the dump file. Here is how you dump the above mentioned block.


alter system dump datafile 6 block min 220 block max 220;

This produces a tracefile, a part of which is shown below.


Start dump data blocks tsn: 4 file#:6 minblk 220 maxblk 220
Block dump from cache:
Dump of buffer cache at level 4 for pdb=0 tsn=4 rdba=25166044
BH (0x7ff72f6b918) file#: 6 rdba: 0x018000dc (6/220) class: 1 ba: 0x7ff7212a000
  set: 12 pool: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 39,28
  dbwrid: 0 obj: 93587 objn: 93587 tsn: [0/4] afn: 6 hint: f

Get the object ID (the value after “objn”). Using that value you can get the object name:


 SQL> select object_name
  2 from dba_objects
  3 where object_id = 93587;

OBJECT_NAME
----------------------------------------
CBCTEST

Now you know the table whose blocks are so highly popular resulting in CBC latches.

Resolving CBC Latch Waits

From the above discussion you would have made one important observation—CBC latch waits are caused by popularity of the blocks by different processes. If you reduce the popularity, you reduce the chances that two processes will wait for the same buffer. Note: you can’t completely eliminate the waits; you can only reduce it. To reduce is, reduce logical I/O. For instance, Nested Loops revisit the same object several times causing the buffers to be accessed multiple times. If you rewrite the query to avoid NLs, you will significantly reduce the chance that one process will wait for the CBC latch.

Similarly if you write a query that accesses the blocks from a table several times, you will see the blocks getting too popular as well. Here is an example of such a code:

 for i in 1..100000 loop
   select …
   into l_var
   from tablea
   where …;
   exit when sql%notfound;
end loop;

You can rewrite the code by selecting the data from the table into a collection using bulk collect and then selecting from that collection rather than from the table. The SQL_ID column of the V$SESSION will show you which SQLs are causing the CBC latch wait and getting to Object shows you which specific object in that query is causing the problem, allowing you to devise a better solution.

You can also proactively look for objects contributing to the CBC latch wait in the Active Session History, as shown below:

select p1raw, count(*)
from v$active_session_history
where sample_time < sysdate – 1/24
and event = 'latch: cache buffers chain'
group by event, p1
order by 3 desc;

The P1RAW value shows the latch address, using which you can easily find the file# and block#:


select o.name, bh.dbarfil, bh.dbablk, bh.tch
from x$bh bh, sys.obj$ o
where tch > 0
and hladdr= ''
and o.obj#=bh.obj
order by tch;

With the approach shown earlier, you can now get the object information from the file# and block#. Once you know the objects contributing to the CBC latch waits, you can reduce the waits by reducing the number of times the latch is requested. That is something you can do by making the blocks of the table less popular. The less the number of rows in a block, the less popular the block will be. You can reduce the number of rows in a block by increasing PCTFREE or using ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK. If that does not help, you can partition a table. That forces the data block address to be recomputed for each partition, making it more likely that the buffers will end up in different buffer chains and hence the competition for the same chain will be less.

Conclusion

In this blog you learned how Oracle manages the buffer cache and how latches are used to ensure only one process can walk the chain to access a buffer. This latch is known as Cache Buffer Chain latch. You learned why this latch is obtained by Oracle and how to reduce the possibility that two processes will want the latch at the same time. I hope this helped you understanding and resolving Cache Buffer Chains Latch related waits. Your feedback will be highly appreciated.