At the recent Trivadis Performance Days in Zurich, Chris Antognini answered a question that had been bugging me for some time. Why would Oracle want to set the default value of _cursor_obsolete_threshold to a value like 8192 in 12.2 ?
In 11.2.0.3 the parameter was introduced with the default value 100; then in 11.2.0.4, continuing into 12.1, the default value increased to 1,024 – what possible reason could anyone have for thinking that 8192 was a good idea ?
The answer is PDBs – specifically the much larger number of PDBs a single CBD can (theoretically) support in 12.2.
In fact a few comments, and the following specific explanation, are available on MoS in Doc ID 2431353.1 “High Version Counts For SQL Statements (>1024) Post Upgrade To 12.2 and Above Causing Database Slow Performance”:
The default value of _cursor_obsolete_threshold is increased heavily (8192 from 1024) from 12.2 onwards in order to support 4096 PDBs which was only 252 PDBs till 12.1. This parameter value is the maximum limit for obsoleting the parent cursors in an multitenant environment and cannot be increased beyond 8192.
Having said, this is NOT applicable for non-CDB environment and hence for those databases, this parameter should be set to 12.1 default value manually i.e. 1024. The default value of 1024 holds good for non-CDB environment and the same parameter can be adjusted case-to-case basis should there be a problem.
It’s all about PDBs – more precisely, it’s all about CDBs running a huge number of PDBs, which is not necessarily the way that many companies are likely to use PDBs. So if you’re a fairly typical companyy running a handful of PDBs in a single CDB then it’s probably a good idea to set the parameter down to the 12.1 value of 1024 (and for bad applications I’d consider going even lower) – and this MOS note actually makes that an official recommendation.
What’s the worst that could happen if you actually have many PDBs all executing the same application and that application has a few very popular and frequently executed statements? Chris Antognini described a model he’d constructed and some tests he’d done to show the effects. The following code is a variation onhis work. It addresses the following question:
If you have an application that repeatedly issues (explicitly or implicitly) parse calls but doesn’t take advantage of the session cursor cache it has to search the library cache by hash_value / sql_id for the parent cursor, then has to walk the chain of child cursors looking for the right child. What’s the difference in the work done if this “soft parse” has to walk the list to child number 8,191 instead of finding the right cursor at child number 0.
Here’s the complete code for the test:
create table t1 select 1 id from dual / alter table t1 add constraint t1_pk primary key (id) / spool cursor_obsolete.lst alter system flush shared_pool; alter system flush shared_pool; set serveroutput off select /*+ index(t1) */ id from t1 where id > 0; select * from table(dbms_xplan.display_cursor); execute snap_my_stats.start_snap execute snap_my_stats.start_snap declare m_id number; begin for i in 100+1..100+8192 loop execute immediate 'alter session set optimizer_index_cost_adj = ' || i ; select /*+ index(t1) */ id into m_id from t1 where id > 0; end loop; end; / set serveroutput on execute snap_my_stats.end_snap column sql_text format a60 select sql_id, child_number, loaded_versions, executions, sql_text from v$sql where sql_text like 'SELECT%T1%' order by child_number; prompt =============== prompt Low child reuse prompt =============== set serveroutput off execute snap_my_stats.start_snap declare m_id number; begin for i in 100+1..100+1024 loop execute immediate 'alter session set optimizer_index_cost_adj = ' || i ; select /*+ index(t1) */ id into m_id from t1 where id > 0; end loop; end; / set serveroutput on execute snap_my_stats.end_snap prompt ================ prompt High child reuse prompt ================ set serveroutput off execute snap_my_stats.start_snap declare m_id number; begin for i in 7168+1..7168+1024 loop execute immediate 'alter session set optimizer_index_cost_adj = ' || i ; select /*+ index(t1) */ id into m_id from t1 where id > 0; end loop; end; / set serveroutput on execute snap_my_stats.end_snap spool off
I’ve created a table with just one row and given it a primary key. My testing query is going to be very short and simple. A query hinted to return that one row by primary key index range scan.
I’ve flushed the shared pool (twice) to minimise fringe contention from pre-existing information, then executed the statement to populate the dictionary cache and some library cache information and to check the execution plan.
The call to the package snap_my_stats is my standard method for reporting changes in v$mystat across the test. I’ve called the start_snap procedure twice in a row to make sure that its first load doesn’t add some noise to the statistics that we’re trying to capture.
The test runs in three parts.
What I’m looking for is the extra work it takes for Oracle to find the right child cursor when there’s a very long chain of child cursors. From my memory of dumping the library cache in older versions of Oracle, the parent will point to a “segmented array” of pointers to child cursors, and each segment of the array will consist of 16 pointers, plus a pointer to the next segment. So if you have to find child cursor 8191 you will have to following 512 segment pointers, and 16 pointers per segment (totalling 8708 pointers) before you find the child you want – and you’re probably holding a mutex (or latch) while doing so.
One preipheral question to ask, of course, is whether Oracle keeps appending to the segmented array, or whether it uses a “pushdown” approach when allocating a new segment so that newer child cursors are near the start of the array. (i.e. will searching for child cursor 0 be the cheapest one or the most expensive one).
And the results, limited to just the second and third parts, with just a couple of small edits are as follows:
host sdiff -w 120 -s temp1.txt temp2.txt >temp.txt =============== | ================ Low child reuse | High child reuse =============== | ================ Interval:- 0 seconds | Interval:- 6 seconds opened cursors cumulative 2,084 | opened cursors cumulative 2,054 recursive calls 6,263 | recursive calls 6,151 recursive cpu usage 33 | recursive cpu usage 570 session logical reads 1,069 | session logical reads 1,027 CPU used when call started 33 | CPU used when call started 579 CPU used by this session 37 | CPU used by this session 579 DB time 34 | DB time 580 non-idle wait count 16 | non-idle wait count 5 process last non-idle time 1 | process last non-idle time 6 session pga memory 524,288 | session pga memory 65,536 enqueue requests 10 | enqueue requests 3 enqueue releases 10 | enqueue releases 3 consistent gets 1,069 | consistent gets 1,027 consistent gets from cache 1,069 | consistent gets from cache 1,027 consistent gets pin 1,039 | consistent gets pin 1,024 consistent gets pin (fastpath) 1,039 | consistent gets pin (fastpath) 1,024 consistent gets examination 30 | consistent gets examination 3 consistent gets examination (fastpath) 30 | consistent gets examination (fastpath) 3 logical read bytes from cache 8,757,248 | logical read bytes from cache 8,413,184 calls to kcmgcs 5 | calls to kcmgcs 3 calls to get snapshot scn: kcmgss 1,056 | calls to get snapshot scn: kcmgss 1,026 table fetch by rowid 13 | table fetch by rowid 1 rows fetched via callback 6 | rows fetched via callback 1 index fetch by key 9 | index fetch by key 1 index scans kdiixs1 1,032 | index scans kdiixs1 1,024 session cursor cache hits 14 | session cursor cache hits 0 cursor authentications 1,030 | cursor authentications 1,025 buffer is not pinned count 1,066 | buffer is not pinned count 1,026 parse time cpu 23 | parse time cpu 558 parse time elapsed 29 | parse time elapsed 556 parse count (total) 2,076 | parse count (total) 2,052 parse count (hard) 11 | parse count (hard) 3 execute count 1,050 | execute count 1,028 bytes received via SQL*Net from client 1,484 | bytes received via SQL*Net from client 1,486
Two important points to note:
You might question are the 2,048-ish parse count(total) – don’t forget that we do an “execute immediate” to change the optimizer_index_cost_adj on each pass through the loop. That’s probably why we double the parse count, although the “alter session” doesn’t then report as an “execute count”.
The third call to a statement is often an important one – it’s often the first one that doesn’t need “cursor authentication”, so I ran a similar test executing the last two loops a second time – there was no significant change in the CPU or parse activity between the 2nd and 3rd executions of each cursor. For completeness I also ran a test with the loop for the last 1,024 child cursors ran before the loop for the first child cursors. Again this made no significant difference to the results – the low number child cursors take less CPU to find than the high number child cursors.
The longer the chain of child cursors the more time (elapsed and CPU) you spend searching for the correct child; and when a parent is allowed 8,192 child cursors the extra time can become significant. I would claim that the ca. 5 seconds difference in CPU time appearing in this test corresponds purely to an extra 5 milliseconds walking an extra 7,000 steps down the chain.
If you have a well-behaved application that uses the session cursor cache effectively, or uses “held cursors”, then you may not be worried by very long chains of child cursors. But I have seen many applications where cursor caching is not used and every statement execution from the client turns into a parse call (usually implicit) followed by a hunt through the library cache and walk along the child chain. These applications will not scale well if they are cloned to multiple PDBs sharing the same CDB.
The odd thing about this “cursor obselete” feature is that I have a distinct memory that when PDBs were introduced at an ACE Director’s meeting a few years ago the first thought that crossed my mind was about the potential for someone running multiple copies of the same application as separate PDBs seeing a lot of library cache latch contention or cursor mutex contention because any popular statement would now be hitting the same parent cursor from multiple PDBs. I think the casual (i.e. neither formal, nor official) response I got when I raised the point was that the calculation of the sql_id in future releases would take the con_id into consideration. It seems that that idea fell by the wayside.
If you do see a large number of child cursors for a single parent then you will probably end up looking at v$sql_shared_cursor for the sql_id to see if that gives you some good ideas about why a particular statement has generated so many child cursors. For a list of explainations of the different reasons captured in this view MOS Doc Id 296377.1“Troubleshooting: High Version Count Issues” is a useful reference.
Recent comments
3 years 5 weeks ago
3 years 17 weeks ago
3 years 22 weeks ago
3 years 23 weeks ago
3 years 27 weeks ago
3 years 48 weeks ago
4 years 16 weeks ago
4 years 46 weeks ago
5 years 30 weeks ago
5 years 31 weeks ago