Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Index Engineering

This is a case study based on a question that appeared on the Oracle Developer Community forum a few days ago.

What I’m aiming to present in this note is the pattern of thinking that you should adopt in cases like this. The final suggestion in this note isn’t necessarily the best answer to the question posed (at the time of writing the OP hadn’t supplied enough information to allow anyone to come up with a best solution), but the point of the exercise is to talk about the journey and (perhaps) remind you of some of the extreme engineering you can do with indexes.

The (massaged) problem statement is as follows:

I have a table of more than 200 million rows that is used for inserts, updates and queries. I have a query on this table and want to know what index I could create to speed up the query.

The supplied definition of the table was not consistent with the names used in the query, so I’ve had to do a little editing, but table, current indexes, and query were as follows:

rem
rem     Script:         extreme_indexing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

create table tbl (
        r_id                    varchar2(30) not null,
        c_id                    number,
        n_id                    varchar2(40),
        created_by              varchar2(30) not null,
        last_modified_by        varchar2(30),
        c_status                char(1),
        a_action                char(1),
        r_creation_dt           timestamp(6),
        cnt                     number(38)
)
;

create        index tbl_1 on tbl(cnt, r_creation_dt, c_id, a_action, last_modified_by);  
create        index tbl_2 on tbl(cnt, c_status, r_creation_dt);  
create bitmap index tbl_3 on tbl(c_status); 

select
        /*+ index(tbl) */
        c_id,
        a_action,
        cnt,
        last_modified_by
from
        tbl
where
        c_status in(
            'N',
            'F'
        )
and     cnt <= 5 -- > comment to avoid wordpress format issue
and     r_creation_dt is not null
group by
        cnt,
        r_creation_dt,
        c_id,
        a_action,
        last_modified_by,
        c_status
order by
        r_creation_dt
fetch 
        first 1000 rows only
;


The first thing to point out is the bitmap index tbl_i3 is almost certainly a bad idea – bitmaps and transactional activity do not mix. It seems quite likely that the OP in this case had read one of the many Internet notes that makes the “not totally wrong” but very misleading statement “bitmap indexes are good when you have a small number of distinct values”, and appled the principle to a column that looks like a “status” column holding only a few distisnct values.

Having got that error out of the way we can start to think about the query.  It’s using the (fairly new) “Fetch first N rows” syntax, which means we may have to find a lot of data and sort it before returning a subset: performance issues can be very deceptive in cases like this because we might want a small result set but have to do a large amount of work to get it.

In this case we’re after the first 1,000 rows – which makes you think that maybe there will be a lot of data satisfying the query. So we have two targets to meet to optimise the query:

  • acquire the data we need as efficiently as possible
  • post-process the data we acquire to derive the 1,000 rows as efficiently as possible

The query is just a single table access – which means we’re either going to do a full tablescan or find a good indexed access path, we don’t have to worry about join strategies.  So the first thing to consider is the volume (and scatter) of data that matches the predicates. If there’s only a “small” amount of data where “c_status in (‘N’,’F’) and cnt <= 5” then an index on – or starting with – (c_status, cnt) may be very helpful. (Note how I’ve specified the column with the equality predicate first – that’s part of a generic strategy for creating multi-column indexes.)

This, though, raises several questions that need to be answered:

  • How small is “small” ? In the context of 200 million rows, 100,000 is small; but if you had to visit 100,000 different blocks in the table and do 100,000 real single block reads from disc that might still be a very bad thing.
  • How many rows have status ‘N’, how many have status ‘F’, how many have cnt <= 5 ? Maybe a really tiny number of rows have cnt<=5 and lots have c_status in (‘N’,’F’) which could make this a case where ignoring the generic column-ordering strategy would be very effective.  Maybe the number of rows satisfying the individual conditions is high but the number satisfying the combination is very low.
  • Is this the ONLY combination of c_status and cnt that is of interest, or (for example) was 5 just the number that was picked as an example,  Would different c_status values be of interest, would some required combinations of c_status and cnt have to use completley different execution paths for the best performance.

I’m going to make some decisions in order to proceed – they may be totally wrong as far as the OP is concerned – so remember that this note is just for discussion purposes. Let’s assume that the common query is always exactly as stated. Perhaps it’s a query that runs every few minutes to clear up some outstanding work with the expectation that new rows matching the query keep appearing while older rows are processed, change status, and disappear from the result set. Let’s also assume that the result set is always “small”, and that it’s small because ‘N’ and ‘F’ are rare (even if the total number of rows with cnt <= 5 is large).

With these assumptions we could start by creating an index on (c_status, cnt), which gets us to exactly the rows we want from the table with no “throwaway” after visiting the table. Here’s the excution plan if that’s our choice of index (running on 12.2.0.1, and with an index() hint to force the use of the index when necessary):

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |        |      1 |        |   1000 |00:00:00.03 |    1573 |     34 |       |       |          |
|*  1 |  VIEW                           |        |      1 |   1000 |   1000 |00:00:00.03 |    1573 |     34 |       |       |          |
|*  2 |   WINDOW NOSORT STOPKEY         |        |      1 |   6451 |   1000 |00:00:00.03 |    1573 |     34 |   219K|   219K|          |
|   3 |    SORT GROUP BY                |        |      1 |   6451 |   1001 |00:00:00.03 |    1573 |     34 |  1186K|   567K| 1054K (0)|
|   4 |     INLIST ITERATOR             |        |      1 |        |  13142 |00:00:00.02 |    1573 |     34 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID| TBL    |      2 |  13743 |  13142 |00:00:00.02 |    1573 |     34 |       |       |          |
|*  6 |       INDEX RANGE SCAN          | TBL_I1 |      2 |  13743 |  13142 |00:00:00.01 |      33 |     34 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1000)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "R_CREATION_DT")<=1000)
   5 - filter("R_CREATION_DT" IS NOT NULL)
   6 - access((("C_STATUS"='F' OR "C_STATUS"='N')) AND "CNT"<=5)

I’ve enabled rowsource_execution_statistics (alter session set statistics_level = all) and pulled my execution plan from memory. As you can see from the A-rows for the index range scan and table access by index rowid, I’ve identified and acquired exactly the rows from the table that might be relevant (all 13,142 of them), then I’ve done a sort group by of all that data, sorting in a way that means the rows will be produced in exactly the order I need for the windowing function that Oracle will use to select the 1,000 rows I want.

If you’re curious, here (courtesy of dbms_utility.expand_sql_text() but cosmetically enhanced) is the transformed SQL that was actually optimised and executed:

SELECT 
        A1.C_ID C_ID,A1.A_ACTION A_ACTION,A1.CNT CNT,A1.LAST_MODIFIED_BY LAST_MODIFIED_BY 
FROM  (
        SELECT 
                /*+ INDEX (A2) */ 
                A2.C_ID C_ID,
                A2.A_ACTION A_ACTION,
                A2.CNT CNT,
                A2.LAST_MODIFIED_BY LAST_ MODIFIED_BY,
                A2.R_CREATION_DT rowlimit_$_0,
                ROW_NUMBER() OVER ( ORDER BY A2.R_CREATION_DT) rowlimit_$$_rownumber 
        FROM 
                TEST_USER.TBL A2 
        WHERE 
                (A2.C_STATUS='N' OR A2.C_STATUS='F') 
        AND     A2.CNT<=5 
        AND     A2.R_CREATION_DT IS NOT NULL 
        GROUP BY 
                A2.CNT,A2.R_CREATION_DT,A2.C_ID,A2.A_ACTION,A2.LAST_MODIFIED_BY,A2.C_STATUS
        ) A1 
WHERE 
        A1.rowlimit_$$_rownumber<=1000 
ORDER BY 
        A1.rowlimit_$_0

There are three main drawbacks to this choice of index.

  • I’ve acquired all the rows in the table that match the predicate even though I only really needed a subset
  • I’ve done a massive sort
  • I’ve created an index that includes every row in the table

Remember that the OP has a table of 200M rows, and we are assuming (pretending) that only a very small fraction of them match the initial predicates. Creating an index on 200M rows because we’re interested in only a few tens of thousands is wasteful of space and (given we have a “status” column) probably wasteful of processing resources as the status moves through several values. So I’m going to address that issue first. Let’s create a “function-based” index that ignores most of the data, and change the code to take advantage of that index – but since this is 12c, let’s do it by adding a virtual column and indexing that column.


alter table tbl add nf_r_creation_dt invisible 
        generated always as (
                case
                        when c_status in ('N','F') and cnt <= 5
                        then r_creation_dt
                end
        ) virtual
/

create index tbl_i2 on tbl(nf_r_creation_dt)
/

I’ve introduced an invisible virtual column called nf_r_creation_dt (nf_ for status N/F) which uses a CASE expression matching the original predicate to return the r_creation_dt for rows that match and null for all the other (ca. 200M) rows. So when I create an index on the column the only entries in the index are for rows that I might want to see.

I have to edit the SQL to match – which simply means changing every appearance of r_creation_dt to nf_r_creation_dt, and eliminating the original predicate giving the following text and execution plan:


select
        /*+ index(tbl) */
        c_id,
        a_action,
        cnt,
        last_modified_by
from
        tbl
where
        nf_r_creation_dt is not null
group by
        nf_r_creation_dt,
        cnt,
        c_id,
        a_action,
        last_modified_by,
        c_status
order by
        nf_r_creation_dt
fetch 
        first 1000 rows only    -- 1,000 rows in the original
/

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |      1 |        |   1000 |00:00:00.02 |   13139 |     35 |       |       |          |
|*  1 |  VIEW                          |        |      1 |   1000 |   1000 |00:00:00.02 |   13139 |     35 |       |       |          |
|*  2 |   WINDOW NOSORT STOPKEY        |        |      1 |     48 |   1000 |00:00:00.02 |   13139 |     35 | 73728 | 73728 |          |
|   3 |    SORT GROUP BY               |        |      1 |     48 |   1001 |00:00:00.02 |   13139 |     35 |  1116K|   556K|  991K (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| TBL    |      1 |   2500 |  13142 |00:00:00.02 |   13139 |     35 |       |       |          |
|*  5 |      INDEX FULL SCAN           | TBL_I2 |      1 |  13142 |  13142 |00:00:00.01 |      36 |     35 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1000)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "NF_R_CREATION_DT")<=1000)
   5 - filter("NF_R_CREATION_DT" IS NOT NULL)

The plan shows an index full scan on the new index. Since the index holds only those rows that might be interesting this isn’t a threat. However we still have to visit all the matching rows in the table – and that might result in more random I/O than we like. So the next step in enhancing performance is to consider adding all the columns we want to the index. There’s a little problem with that: if we add the columns as they are we will go back to having an index entry for every single row in the table so we need to use the same CASE mechanism to create more virtual columns:

alter table tbl add nf_c_status invisible 
        generated always as (
                case
                        when c_status in ('N','F') and cnt <= 5
                        then c_status
                end
        ) virtual
/

alter table tbl add nf_last_modified_by invisible 
        generated always as (
                case
                        when c_status in ('N','F') and cnt <= 5
                        then last_modified_by
                end
        ) virtual
/

alter table tbl add nf_a_action invisible 
        generated always as (
                case
                        when c_status in ('N','F') and cnt <= 5
                        then a_action
                end
        ) virtual
/

alter table tbl add nf_c_id invisible 
        generated always as (
                case
                        when c_status in ('N','F') and cnt <= 5
                        then c_id
                end
        ) virtual
/

alter table tbl add nf_cnt invisible 
        generated always as (
                case
                        when c_status in ('N','F') and cnt <= 5
                        then cnt
                end
        ) virtual
/

create index tbl_i3 on tbl(
        nf_r_creation_dt,
        nf_cnt,
        nf_c_id,
        nf_a_action,
        nf_last_modified_by,
        nf_c_status
)
;

It looks like a bit of a pain to go through all this rigmarole to get all those columns that are null most of the time but echo the original values when the rows match our original predicate; and then we have to modify the query to match:


select
        /*+ index(tbl) */
        nf_c_id,
        nf_a_action,
        nf_cnt,
        nf_last_modified_by
from
        tbl
where
        nf_r_creation_dt is not null
group by
        nf_r_creation_dt,
        nf_cnt,
        nf_c_id,
        nf_a_action,
        nf_last_modified_by,
        nf_c_status
order by
        nf_r_creation_dt
fetch 
        first 1000 rows only    -- 1,000 rows in the original
/

But the big payoff comes from the execution plan:


----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |      1 |        |   1000 |00:00:00.01 |      74 |     12 |
|*  1 |  VIEW                  |        |      1 |   1000 |   1000 |00:00:00.01 |      74 |     12 |
|*  2 |   WINDOW NOSORT STOPKEY|        |      1 |   2500 |   1000 |00:00:00.01 |      74 |     12 |
|   3 |    SORT GROUP BY NOSORT|        |      1 |   2500 |   1001 |00:00:00.01 |      74 |     12 |
|*  4 |     INDEX FULL SCAN    | TBL_I3 |      1 |   2500 |   1003 |00:00:00.01 |      74 |     12 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1000)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "NF_R_CREATION_DT")<=1000)
   4 - filter("NF_R_CREATION_DT" IS NOT NULL)

Notice how the SORT GROUP BY operation is a NOSORT, and the WINDOW operation is both NOSORT and STOPKEY ?

We’ve got the smallest index possible that only gets modified as rows move into, or out of, the interesting state, and when we run the query Oracle does a full scan of the index maintaining “running totals” but stop as soon as it’s aggregated enough results.

tl;dr

For very special cases it’s really amazing what you can (sometimes) do – if you can modify the code – with carefully engineered indexes to minimise the work done by a query AND the work done maintaining the infrastructure needed for that query. Virtual columns are a fantastic aid, especially now that 12c allows them to be invisible.

Video : Schema Only Accounts in Oracle Database 18c Onward

Today’s video is a demonstration of schema only accounts, introduced in Oracle Database 18c.

This is based on the following articles.

The star of today’s video is Paul Vallee, of Pythian and Tehama fame.

Cheers

Tim…


Video : Schema Only Accounts in Oracle Database 18c Onward was first posted on January 20, 2020 at 8:53 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Oracle Database 19c Automatic Indexing: Minimum Number Of Required Indexes (Low)

  As I discussed in my previous posts, Oracle Automatic Indexing will try and create as few indexes as possible to satisfy existing workloads, even if that means reordering the columns in an existing index. To illustrate how Automatic Indexing creates as few indexes as possible, I’ll create the following table which has a number […]

Has my scheduler been turned off?

When the scheduler came into existence in Oracle 10g, there was a cool API call that could be used to temporarily turn the entire scheduler off.  That command was:


SQL> exec dbms_scheduler.set_scheduler_attribute('SCHEDULER_DISABLED','TRUE')

PL/SQL procedure successfully completed.

However, a quick check of the documentation in later releases, no longer makes any reference to this attribute:

image

I did some testing, and if you call the SET_SCHEDULER_ATTRIBUTE to disable the scheduler in 12c onwards, it still does seem to have effect of turning off the scheduler. So the question is – why is it no longer there in the documentation?

The reason is that the effects of the attribute were never totally clear – for example, would it stop DBMS_JOB? Would it stop running jobs? What if someone issued an explicit RUN_JOB command? What about the automatic task infrastructure that submits scheduler jobs on the database’s behalf?

Since 12c, the supported way to turn off the scheduler is now to set job_queue_processes to zero. You should not use the SCHEDULER_DISABLED attribute.

Because it is no longer supported, that raises an interesting issue. What if someone is still calling it? If you attempt to retrieve the attribute, we return null in all instances:


SQL> variable x varchar2(10)
SQL> exec dbms_scheduler.get_scheduler_attribute('SCHEDULER_DISABLED',:x)

PL/SQL procedure successfully completed.

SQL> print x

X
--------------------------------
(null)

Luckily, we do still expose it in the dictionary, so you can easily double check:


SQL> exec dbms_scheduler.set_scheduler_attribute('SCHEDULER_DISABLED','TRUE')

PL/SQL procedure successfully completed.

SQL> select * from DBA_SCHEDULER_GLOBAL_ATTRIBUTE;

ATTRIBUTE_NAME                           VALUE
---------------------------------------- ----------------------------------------
MAX_JOB_SLAVE_PROCESSES
LOG_HISTORY                              30
DEFAULT_TIMEZONE                         Australia/Perth
EMAIL_SERVER
EMAIL_SERVER_ENCRYPTION                  NONE
EMAIL_SERVER_CREDENTIAL
EMAIL_SENDER
LAST_OBSERVED_EVENT
EVENT_EXPIRY_TIME
FILE_WATCHER_COUNT                       0
CURRENT_OPEN_WINDOW
SCHEDULER_DISABLED                       TRUE   

Postgres@CERN

For once, I was at a conference without being a speaker, but co-organizer. The idea came only 3 months ago at the Swiss PostgreSQL Users Group dinner organized by dbi services in Milan before the PostgreSQL Conference when Laetitia Avrot asked me if it would be possible to organize a meetup at CERN.

SwissPUGOrg dinner in Milan, organized by dbi services

My answer was “yes” of course, but I had just resigned from CERN so it had to happen before February where I come back the consulting life at dbi-services. Laetitia organized everything in a short time: find the sponsors, the speakers, organized the drinks and food, advertise for it,…

Huge thanks to Laetitia for that work in a short time! I was involved in smaller things like: finding the venue at CERN and manage the registrations, a visit to the LHC for the speakers, and some little details to get people there (and out without being lost in the 1960’s buildings…)

And it was a huge success! When opening the registrations in December, the 100 seats of the IT Amphitheater were taken in 3 days. Fortunately, I was able to get a larger room: the Council Chamber with 160 seats. The feedback I got was really good for CERN employees as well people coming from near or far. It was also recorded (available soon) and webcasted, which is still possible to watch if you have a good internet bandwidth:

Postgres@CERN - PostgreSQL Meetup at CERN - January 17th | CERN Webcast Website

Here are the times where each talk begins in this recording:

  • 00:34:20 Gülçin Yıldırım — Evolution of Fault Tolerance in PostgreSQL
    explaining all the evolutions of transaction protection and replication though WAL, physical standby, streaming, failover and switchover.
  • 01:28:00 Pavlo Golub — Professional PostgreSQL scheduling made easy on advanced job scheduling with pg_timetable
  • 02:43:30 Anastasia Lubennikova — Advanced PostgreSQL Backup and Recovery methods for a comparison of backup/recovery solutions
  • 03:29:50 Vik Fearing — Advent of Code Using PostgreSQL showing the power of SQL as a programing language
  • 03:50:10 Romuald Thion — PostgreSQL at the university: some lessons learned using PostgreSQL from a teacher/student point of view
  • 04:14:30 Oleg Bartunov — All You Need Is Postgres! an awesome talk on the past and the future of the fastest-growing open-source database, and how this future fits with enterprise-level usage.

Oleg Bartunov — All You Need Is Postgres!

There’s a 1'30 audio blackout in Oleg talk, unfortunately. The mike went off and thanks to his loud voice he was clearly audible in the room and I didn’t realize immediately the problem. Sorry for this.

While talking about the little issues, there was some confusion at CERN reception. The registered participants got an access card by e-mail which is supposed to be printed and wear visible on the CERN campus. But today, we are used to keeping those things on our phones. And then many had to print it at the reception. I should have pre-print all of them and noticed the reception guard. Sorry for the confusion, and thanks to my colleagues who went to get people and guide them to the venue.

You may have had some problems to connect to the CERN Public Wifi. Yes, that may sound weird as we were 50 meters from the office where the World Wide Web was invented. I don’t know how it was related but there was a big Swisscom outage with some consequences at CERN and elsewhere. The speakers had no problems, as I shared the network from my laptop ;)

Besides those little things, everything was perfect. It was the first time for me in an event organization of this size (more than 100 people on site and up to 90 online). My tip from this experience: try to think to every little detail in the months/week/days in advance and then, the day of the event, just try to do the best with the little unexpected things. And big thanks to the colleagues who helped me, and to the CERN teams managing the rooms, the webcasts, the recordings, the Indico software…

All participants got access to CERN premises for the whole day. They had access to the permanent exhibitions and to the restaurant. It would have been awesome to offer a special visit for all but… that’s impossible for 100 people.

https://twitter.com/obartunov/status/1218130688131641346?s=20

For the speakers, I’ve organized a special visit to ATLAS Detector, one of the 4 experiments on the LHC, 100 meters underground. Here is a map of the CERN accelerator complex: https://panoramas-outreach.cern.ch/home. Currently, all is stopped for the long shutdown maintenance and this is why it can be visited. But to small groups only (max. 6 people which was perfect for our speakers) because there’s a lot of work in progress. But you can also do a virtual visit with “pegman” on Google Maps: http://www.google.com/maps/preview#!data=!1m8!1m3!1d3!2d6.055071!3d46.235832!2m2!1f192.55!2f98!4f75!2m4!1e1!2m2!1skWZ2TA53b9AAAAQJODkDDg!2e0&fid=5

I also guided a visit though the CERN campus to the Antiproton Decelerator — again limited to 12 people so I had to pick-up some people in pseudo-random order among the first registered. I’m sorry for all the people who would have enjoyed to join but given the current maintenance on those experiments, this was the maximum possible.

Being a guide is one of the thing I enjoyed during my year at CERN. I like to share the little I’ve learned about the research that are done there. I like to meet people coming from all countries.

If you come back around Geneva, there’s a regular PostgreSQL meetup:

PostgreSQL User Group Genève (Genève, Switzerland)

(all photos stolen on twitter here… Thanks to @westermanndanie, @Pybrehier, @MendiSaid, @obartunov, @Karenhjex, @apatheticmagpie for sharing them)

How to obtain semaphore information in gdb when the symbols are missing

This post was created when trying to understand how the Oracle executable works. Specifically the logwriter, which, if it is posted by a process, which is done using semop(), signals that process back using semop() if the logwriter happens to be in post/wait mode, and is not using the ‘scalable logwriter mode’, which means it is not using additional worker processes.

To be more specific, I tried investigating something that is not Oracle specific, but specific to the usage of semaphores on linux with an executable for which you do not have the source code and is not compiled with debugging symbols.

I attached to the process using gdb, and put a break on semop:

$ gdb -p 1000
...
(gdb) break semop
Breakpoint 1 at 0x7fb92b0410c0: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) c
Continuing.

A word here: you probably will not see “file ../sysdeps/unix/syscall-template.S, line 81.”. This is because I installed the following debuginfo packages:

kernel-uek-debuginfo-4.14.35-1902.9.2.el7uek.x86_64
nss-softokn-debuginfo-3.44.0-5.0.1.el7.x86_64
kernel-uek-debuginfo-common-4.14.35-1902.9.2.el7uek.x86_64
glibc-debuginfo-common-2.17-292.0.1.el7.x86_64
libaio-debuginfo-0.3.109-13.el7.x86_64
numactl-debuginfo-2.0.12-3.el7_7.1.x86_64
glibc-debuginfo-2.17-292.0.1.el7.x86_64

When using Oracle linux (version 7), this is actually really easy, you add the debug info packages repo by adding the file /etc/yum.repos.d/debug.repo, and put this in the file:

[ol7_debuginfo]
name=Oracle Linux 7 debuginfo
baseurl=http://oss.oracle.com/ol7/debuginfo
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

You now even can use the ‘debuginfo-install’ executable that gdb tells you to do. A word of warning too: this repository is not very closely maintained by oracle (sadly, I blogged about this in the past), so things might be missing. For example, the debuginfo package for the libgcc on my system can not be found by yum. Another issue I encountered, was that when I tried installing the debuginfo package for my kernel, I couldn’t just say debuginfo-install kernel-uek, because that installed the debuginfo package for the latest kernel. So I had to specifically point it to my exact kernel version. When installing the kernel debuginfo package, which is very bulky, another word of warning: the repo (at least for me) is limited to a very low bandwidth, so downloading the file (+200MB) took a long time.

I installed this with the idea to have all system variables, like the ones for semaphores, present, so I could look into them. This turned out not to be the case:

Continuing.

Breakpoint 1, semop () at ../sysdeps/unix/syscall-template.S:81
81	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb)

Gdb broke execution because it encountered semop. Now let’s investigate. Because of the pseudo system call handler, we can only indirectly investigate the semop call. But how to know what to investigate? That’s where the manpages come in:

$ man semop
SEMOP(2)                Linux Programmer's Manual                  SEMOP(2)

NAME
       semop, semtimedop - System V semaphore operations

SYNOPSIS
       #include 
       #include 
       #include 

       int semop(int semid, struct sembuf *sops, unsigned nsops);

       int semtimedop(int semid, struct sembuf *sops, unsigned nsops,
                      struct timespec *timeout);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       semtimedop(): _GNU_SOURCE

So, semop takes 3 arguments, the semid as integer, a struc sembuf that holds the actual operation to be executed and the number of operations in the sembuf.
We still can investigate this, by knowing how the arguments are passed to a function:
– The first argument is in the CPU register $rdi
– The second argument is in the CPU register $rsi
– The third argument is in the CPU register $rdx
Well, let’s look at our session:

(gdb) p $rdi
$1 = 229376
(gdb) p $rsi
$2 = 140721769476432
(gdb) p $rdx
$3 = 1

So, the simple information is available directly, the semid is 229376, and there is 1 operation.
Let’s look at semid 229376 (warning: you have to have access to the semaphore array to be able to see it):

$ ipcs -si 229376

Semaphore Array semid=229376
uid=54321	 gid=54321	 cuid=54321	 cgid=54321
mode=0600, access_perms=0600
nsems = 250
otime = Sun Jan 19 16:36:30 2020
ctime = Sun Jan 19 15:52:22 2020
semnum     value      ncount     zcount     pid
0          1          0          0          3524
1          9065       0          0          3524
2          13900      0          0          3524
3          32766      0          0          3524
4          0          0          0          0
5          0          0          0          0
6          0          1          0          9340
7          0          1          0          9347
8          0          1          0          9356
9          0          0          0          0
10         0          1          0          10146
11         0          1          0          10163
12         0          1          0          30940
13         0          1          0          10189
14         0          1          0          10189
15         0          1          0          0
...and so on...

Okay, so in order to understand what that semop call does, we need to look into the struct.

But this is what gdb says:

(gdb) p $rsi
$4 = 140721769476432

Wait a minute, didn’t the main page say: struct sembuf *sops? That asterisk (‘*’) means it’s a pointer. Let’s try that:

(gdb) p * $rsi
$5 = 65574

Well…not sure what that means…

(gdb) ptype *$rsi
type = int

Ah…it thinks it’s an integer, and displays that… That’s not very helpful.

You can cast (declare a variable to be of a certain type, not the magician type of thing) a variable, so let’s try that:

(gdb) p (struct sembuf *) $rsi
No struct type named sembuf.

Mhhh, despite installing all these debuginfo packages, it turns out the struct definition is not available.

But I really want to know the semaphore information!

I found this gdb feature:

(gdb) help add-symbol-file
Load symbols from FILE, assuming FILE has been dynamically loaded.
Usage: add-symbol-file FILE ADDR [-s   -s   ...]
ADDR is the starting address of the file's text.
The optional arguments are section-name section-address pairs and
should be specified if the data and bss segments are not contiguous
with the text.  SECT is a section name to be loaded at SECT_ADDR.

So, I can add symbols from a file, provided that file is dynamically loadable. What if I create a mini file with the definition of sembuf? Would that work??

First create a very small c program that only defines a sembuf variable:

$ cat semh.c
#include 
struct sembuf mysembuf;

That’s two lines, that really is small, isn’t it?
Then compile it, but do not link it, we only need the object file:

$ gcc -c -g semh.c -o semh.o

(the ‘-c’ switch makes it only compile, not linking)

Now we got an object file semh.o. Let’s try to “side-load” that:

(gdb) add-symbol-file semh.o 0
add symbol table from file "semh.o" at
	.text_addr = 0x0
(y or n) y
Reading symbols from /home/oracle/pin-3.11-97998-g7ecce2dac-gcc-linux/semh.o...done.
(gdb)

(you have to say ‘y’ for it to load the symbol table at address 0x0)

Now let’s try casting again:

(gdb) print (struct sembuf *) $rsi
$3 = (struct sembuf *) 0x7ffc5714f150

And now we can ask gdb to print the casted variable:

(gdb) p *$3
$5 = {sem_num = 38, sem_op = 1, sem_flg = 0}

And that’s because it knows how it looks like:

(gdb) ptype *$3
type = struct sembuf {
    unsigned short sem_num;
    short sem_op;
    short sem_flg;
}

Now this information can be used to find the process the semop call is executed for:

$ ipcs -si 229376 | grep ^38
38         0          1          0          32324

So process 32324.

Retrieve PostgreSQL variable-length storage information thanks to pageinspect

Introduction

In PostgreSQL a variable-length datatype value can be stored in-line or out-of-line (as a TOAST). It can also be compressed or not (see the documentation for more details).

Let’s make use of the pageinspect extension and the information about variable-length datatype found in postgres.h to build a query to retrieve tuples variable-length storage information.

The query

The query is the following:

$ cat toast_info.sql
select
t_ctid,
-- See postgres.h
CASE
  WHEN (fo is NULL) THEN 'null'
  -- VARATT_IS_EXTERNAL_ONDISK: VARATT_IS_EXTERNAL (fo = 'x01') && tag == VARTAG_ONDISK (x12)
  -- rawsize - VARHDRSZ > extsize
  WHEN (fo = 'x01') AND (tag = 'x12') AND (osize - 4 > ssize) THEN 'toasted (compressed)'
-- rawsize - VARHDRSZ <= extsize
  WHEN (fo = 'x01') AND (tag = 'x12') AND (osize - 4 <= ssize) THEN 'toasted (uncompressed)'
  -- VARATT_IS_EXTERNAL_INDIRECT: VARATT_IS_EXTERNAL && tag == VARTAG_INDIRECT (x01)
  WHEN (fo = 'x01') AND (tag = 'x01') then 'indirect in-memory'
  -- VARATT_IS_EXTERNAL_EXPANDED: VARATT_IS_EXTERNAL && VARTAG_IS_EXPANDED(VARTAG_EXTERNAL)
  WHEN (fo = 'x01') AND (tag = 'x02' OR tag = 'x03') then 'expanded in-memory'
  -- VARATT_IS_SHORT (va_header & 0x01) == 0x01)
  WHEN (fo & 'x01' = 'x01') THEN 'short in-line'
  -- VARATT_IS_COMPRESSED (va_header & 0x03) == 0x02)
  WHEN (fo & 'x03' = 'x02') THEN 'long in-line (compressed)'
  ELSE 'long in-line (uncompressed)'
END as toast_info
from
(
select
page_items.t_ctid,
substr(page_items.t_attrs[1]::text,2,3)::bit(8) as fo,
('x'||substr(page_items.t_attrs[1]::text,5,2))::bit(8) as tag,
('x'||regexp_replace(substr(page_items.t_attrs[1]::text,7,8),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as osize ,
('x'||regexp_replace(substr(page_items.t_attrs[1]::text,15,8),'(\w\w)(\w\w)(\w\w)(\w\w)','\4\3\2\1'))::bit(32)::int as ssize
from
generate_series(0, pg_relation_size('bdttoast'::regclass::text) / 8192 - 1) blkno ,
heap_page_item_attrs(get_raw_page('bdttoast',blkno::int), 'bdttoast'::regclass) as page_items
) as hp;

As you can see, the query focus on the bdttoast table. Let’s create this table and put some data in it to see the query in action.

Let’s see the query in action

Let’s create this table:

postgres=# CREATE TABLE bdttoast ( message text );
CREATE TABLE

the message field storage type is “extended”:

postgres=# \d+ bdttoast
                                 Table "public.bdttoast"
 Column  | Type | Collation | Nullable | Default | Storage  | Stats target | Description
---------+------+-----------+----------+---------+----------+--------------+-------------
 message | text |           |          |         | extended |              |

extended allows both compression and out-of-line storage. This is the default for most TOAST-able data types. Compression will be attempted first, then out-of-line storage if the row is still too big.

let’s add one tuple:

postgres=# INSERT INTO bdttoast VALUES ('default');
INSERT 0 1

and check how the message field has been stored thanks to the query:

postgres=# \i toast_info.sql
 t_ctid |  toast_info
--------+---------------
 (0,1)  | short in-line
(1 row)

as you can see it has been stored in-line.

Add another tuple with more data in the message field, and check its storage information:

postgres=# INSERT INTO bdttoast VALUES (repeat('a',10000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
(2 rows)

as you can see this field value has been stored as long in-line and is compressed.

Add another tuple with even more data in the message field, and check its storage information:

postgres=# INSERT INTO bdttoast VALUES (repeat('b',1000000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
(3 rows)

this time it has been stored as TOAST-ed and is compressed.

Let’s change the message column storage to external:

postgres=# ALTER TABLE bdttoast ALTER COLUMN message SET STORAGE EXTERNAL;
ALTER TABLE
postgres=# \d+ bdttoast
                                 Table "public.bdttoast"
 Column  | Type | Collation | Nullable | Default | Storage  | Stats target | Description
---------+------+-----------+----------+---------+----------+--------------+-------------
 message | text |           |          |         | external |              |

external means it allows out-of-line storage but not compression.

Now, let’s add 3 tuples with the same field message size as the 3 ones previously added and compare the storage information.

Let’s add one tuple with the same message size as the one previously stored as “short in-line”:

postgres=# INSERT INTO bdttoast VALUES ('externa');
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
 (0,4)  | short in-line
(4 rows)

this one is still short in-line (same as t_ctid (0,1)).

Let’s add one tuple with the same message size as the one previously stored as “long in-line (compressed)”:

postgres=# INSERT INTO bdttoast VALUES (repeat('c',10000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
 (0,4)  | short in-line
 (0,5)  | toasted (uncompressed)
(5 rows)

this one is TOAST-ed and uncompressed with storage external (as compare with t_ctid (0,2) with storage extended).

Let’s add one tuple with the same message size as the one previously stored as “toasted (compressed)”:

postgres=# INSERT INTO bdttoast VALUES (repeat('d',1000000));
INSERT 0 1
postgres=# \i toast_info.sql
 t_ctid |        toast_info
--------+---------------------------
 (0,1)  | short in-line
 (0,2)  | long in-line (compressed)
 (0,3)  | toasted (compressed)
 (0,4)  | short in-line
 (0,5)  | toasted (uncompressed)
 (0,6)  | toasted (uncompressed)
(6 rows)

this one is TOAST-ed and uncompressed with storage external (as compare with t_ctid (0,3) with storage extended).

Remarks

  • use this query on little-endian machines only (bit layouts would not be the same on big-endian and would impact the query accuracy)
  • t_attrs[1] is used in the query to retrieve the information. This is because the message field is the 1st of the relation

Conclusion

Thanks to the pageinspect extension we have been able to write a query to retrieve variable-length storage information. We have been able to compare how our data has been stored depending on the column storage being used (extended or external).

What’s new with Oracle database 18.8 versus 18.9

For the difference between Oracle database versions 18.8 and 18.9 this too follows the line of a low amount of differences.

As always, there are some parameters that have changed from being undocumented spare to being undocumented with a name.

Also, the DBA and CDB table (DBA|CDB)_REGISTRY_BACKPORTS is back again. The disappearance of this table in 18.8 turned out to be a bug. There is a patch for 18.8 if you need this table.

parameters unique in version 18.8 versus 18.9

NAME
--------------------------------------------------
_eighth_spare_parameter
_one-hundred-and-forty-eighth_spare_parameter
_second_spare_parameter

parameters unique in version 18.9 versus 18.8

NAME
--------------------------------------------------
_bug29825525_bct_public_dba_buffer_dynresize_delay
_enable_ptime_update_for_sys
_session_modp_list

dba tables unique to 18.8 versus 18.9

dba tables unique to 18.9 versus 18.8

NAME
----------------------------------------------------------------------------------------------------
DBA_REGISTRY_BACKPORTS

cdb tables unique to 18.8 versus 18.9

cdb tables unique to 18.9 versus 18.8

NAME
----------------------------------------------------------------------------------------------------
CDB_REGISTRY_BACKPORTS

On the C function side, it becomes apparent that this update is truly an update, roughly the number of functions that have gone and appeared are the same. A large portion of the functions that are removed in 18.9 are functions that have to do with AWR, and some other functions dealing with ASM and others.

The functions that have been added seem to be quite diverse,

code symbol names unique in version 18.8 versus 18.9

NAME                                                         RESOLVE                                                      ANNOTATION
------------------------------------------------------------ ------------------------------------------------------------ --------------------------------------------------------------------------------
R_CR_entropy_resource_init                                   R_CR_entropy_resource_init                                   ??
kcfis_clear_lob                                              (kcfis)_clear_lob                                            kernel cache file management intelligent storage ??
kds_update_turbo_scan_pivot_statistics                       (kds)_update_turbo_scan_pivot_statistics                     kernel data seek/scan ??
kewramcs_app_map_condbid_str                                 (kewr)amcs_app_map_condbid_str                               kernel event AWR repository ??
kewramvn_append_mdb_vvwname                                  (kewr)amvn_append_mdb_vvwname                                kernel event AWR repository ??
kewraxc_attribute_xcb                                        (kewr)axc_attribute_xcb                                      kernel event AWR repository ??
kewrbtsg_build_topseg                                        (kewr)btsg_build_topseg                                      kernel event AWR repository ??
kewrccsq_collect_csql                                        (kewr)ccsq_collect_csql                                      kernel event AWR repository ??
kewrgcfes_get_cacheid_from_enum_str                          (kewr)gcfes_get_cacheid_from_enum_str                        kernel event AWR repository ??
kewrmplvl_map_snap_level                                     (kewr)mplvl_map_snap_level                                   kernel event AWR repository ??
kewrpfbue_pdb_from_buffer_entry                              (kewr)pfbue_pdb_from_buffer_entry                            kernel event AWR repository ??
kewrptsq_prep_topsql                                         (kewr)ptsq_prep_topsql                                       kernel event AWR repository ??
kewrrdsi_rank_dstat_item                                     (kewr)rdsi_rank_dstat_item                                   kernel event AWR repository ??
kewrrtsq_rank_topsql                                         (kewr)rtsq_rank_topsql                                       kernel event AWR repository ??
kewrsaobn_set_all_objnames                                   (kewr)saobn_set_all_objnames                                 kernel event AWR repository ??
kewrsonie_set_object_names_in_entry                          (kewr)sonie_set_object_names_in_entry                        kernel event AWR repository ??
kewrsqlc_sql_iscolored_cb                                    (kewr)sqlc_sql_iscolored_cb                                  kernel event AWR repository ??
kfatknHsh                                                    (kfa)tknHsh                                                  kernel automatic storage management alias operations ??
kfdDskComputeHashForRow                                      (kfd)DskComputeHashForRow                                    kernel automatic storage management disk ??
kfioSrMsgBuf_ack                                             (kfio)SrMsgBuf_ack                                           kernel automatic storage management translation I/O layer ??
kfkcrRefresh                                                 (kfk)crRefresh                                               kernel automatic storage management KFK ??
kfnFreeKfnpnmMem                                             (kfn)FreeKfnpnmMem                                           kernel automatic storage management networking subsystem ??
kkqtutlSelItemMatches                                        (kkqt)utlSelItemMatches                                      kernel compile query  table ??
kkquIsCorrTransInList                                        (kkqu)IsCorrTransInList                                      kernel compile query  subquery unnesting ??
krvxgtf                                                      (krvx)gtf                                                    kernel redo recovery extract ??
krvxrsr_SkipRedo                                             (krvx)rsr_SkipRedo                                           kernel redo recovery extract ??
ksmg_estimate_sgamax                                         (ksm)g_estimate_sgamax                                       kernel service  memory ??
kzagetcid                                                    (kza)getcid                                                  kernel security audit  ??
lxCharsetIsByteUnique                                        (l)xCharsetIsByteUnique                                      core library functions ??
qeroiFirstPart                                               (qeroi)FirstPart                                             query execute rowsource extensibel indexing query component ??
qkspmTravInit                                                (qkspm)TravInit                                              query kernel sql plan management ??
ri_entcb_cmd_func                                            ri_entcb_cmd_func                                            ??
zt_yield_entropy_source_cb                                   (zt)_yield_entropy_source_cb                                 security encryption ??

code symbol names unique in version 18.9 versus 18.8

NAME                                                         RESOLVE                                                      ANNOTATION
------------------------------------------------------------ ------------------------------------------------------------ --------------------------------------------------------------------------------
apagwnrn                                                     (apa)gwnrn                                                   SQL Access Path Analysis ??
apagwnrnprd                                                  (apa)gwnrnprd                                                SQL Access Path Analysis ??
apatwnrn                                                     (apa)twnrn                                                   SQL Access Path Analysis ??
kafcpy_one_row                                               (kaf)cpy_one_row                                             kernel access fetch ??
kcbz_eff_bsz                                                 (kcbz)_eff_bsz                                               kernel cache buffers subroutines for kcb ??
kdilm_row_diskcompress_policy_type                           (kdil)m_row_diskcompress_policy_type                         kernel data index load ??
kdsReadAheadSafe                                             (kds)ReadAheadSafe                                           kernel data seek/scan ??
kewrcc_bind_cb                                               (kewr)cc_bind_cb                                             kernel event AWR repository ??
kewrcc_check_columns                                         (kewr)cc_check_columns                                       kernel event AWR repository ??
kewrcc_fetch_cb                                              (kewr)cc_fetch_cb                                            kernel event AWR repository ??
kfdp_getNormalFgCnt                                          (kfdp)_getNormalFgCnt                                        kernel automatic storage management disk PST ??
kfkIsAFDLoaded                                               (kfk)IsAFDLoaded                                             kernel automatic storage management KFK ??
kgskltyp                                                     (kgsk)ltyp                                                   kernel generic service resource manager ??
kjcts_syncseq_incident_dump                                  (kjc)ts_syncseq_incident_dump                                kernel lock management communication ??
kkoRowNumLimit_Int                                           (kko)RowNumLimit_Int                                         kernel compile optimizer ??
kkoWnRowNumLimit                                             (kko)WnRowNumLimit                                           kernel compile optimizer ??
kkqljpUpdateXplAnn                                           (kkq)ljpUpdateXplAnn                                         kernel compile query  ??
kkqoreAndDriver                                              (kkqore)AndDriver                                            kernel compile query  or-expansion ??
kpcxdrBindReorderInfo                                        (kp)cxdrBindReorderInfo                                      kernel programmatic interface ??
kpdbCheckCommonprofileCbk                                    (kpdb)CheckCommonprofileCbk                                  kernel programmatic interface pluggable database ??
kpdbSyncCreateProfile                                        (kpdbSync)CreateProfile                                      kernel programmatic interface pluggable database DBMS_PDB.KPDBSYNC SYNC_PDB ??
kpossGetEncNonTemplateOverflow                               (kpo)ssGetEncNonTemplateOverflow                             kernel programmatic interface oracle ??
kpossSetEncNonTemplateOverflow                               (kpo)ssSetEncNonTemplateOverflow                             kernel programmatic interface oracle ??
kpossTemplateMatch_                                          (kpo)ssTemplateMatch_                                        kernel programmatic interface oracle ??
kpossTemplateSet                                             (kpo)ssTemplateSet                                           kernel programmatic interface oracle ??
kpussTemplateSend                                            (kpu)ssTemplateSend                                          kernel programmatic interface user ??
kpuxcSessionSignatureRecv                                    (kpuxc)SessionSignatureRecv                                  kernel programmatic interface user db replay? ??
kpuxcSessionTemplateSend                                     (kpuxc)SessionTemplateSend                                   kernel programmatic interface user db replay? ??
kpuxcSessionTemplatesFree                                    (kpuxc)SessionTemplatesFree                                  kernel programmatic interface user db replay? ??
krvxdsr                                                      (krvx)dsr                                                    kernel redo recovery extract ??
ksmg_estimate_nonimc_sga_size                                (ksm)g_estimate_nonimc_sga_size                              kernel service  memory ??
ksp_init_modp_send                                           (ksp)_init_modp_send                                         kernel service  parameter ??
ksp_modp_get_enckeyvals                                      (ksp)_modp_get_enckeyvals                                    kernel service  parameter ??
ksp_modp_get_keyvals                                         (ksp)_modp_get_keyvals                                       kernel service  parameter ??
ksp_modp_set_enckeyvals                                      (ksp)_modp_set_enckeyvals                                    kernel service  parameter ??
ksp_modp_set_keyvals                                         (ksp)_modp_set_keyvals                                       kernel service  parameter ??
ksp_modp_update_sign                                         (ksp)_modp_update_sign                                       kernel service  parameter ??
kspdecbuf                                                    (ksp)decbuf                                                  kernel service  parameter ??
kspencbuf                                                    (ksp)encbuf                                                  kernel service  parameter ??
qergiSetFirstPartFlag                                        (qergi)SetFirstPartFlag                                      query execute rowsource granule iterator (partitioning? or PX granules?) ??
qeroiFindGranuleIter                                         (qeroi)FindGranuleIter                                       query execute rowsource extensibel indexing query component ??
qjsnIsDollarOnly                                             (qjsn)IsDollarOnly                                           query json ??
qjsnJsonSerialize_optim                                      (qjsn)JsonSerialize_optim                                    query json ??
qkaIsRTRIMRequiredForViewCol                                 (qka)IsRTRIMRequiredForViewCol                               query kernel allocation ??
qksopCheckConstOrOptWithBindInAndChains                      (qksop)CheckConstOrOptWithBindInAndChains                    query kernel sql operand processing ??

(disclaimer: I can’t look at the sourcecode, which means I look at the oracle executable with normal, modern tools. This also means that there’s a of stuff that I don’t see, for example if functionality has been added inside an existing function, then that’s totally invisible to me)

Group by Elimination

Here’s a bug that was highlighted a couple of days ago on the Oracle Developer Community forum; it may be particularly worth thinking about if if you haven’t yet got up to Oracle 12c as it appeared in an optimizer feature that appeared in 12.2 (and hasn’t been completely fixed) even in the latest release of 19c (currently 19.6).

Oracle introduce “aggregate group by elimination” in 12.2, protected by the hidden parameter “_optimizer_aggr_groupby_elim”. The notes on MOS about the feature tell us that Oracle can eliminate a group by operation from a query block if a unique key from every table in the query block appears in the group by clause. Unfortunately there were a couple of gaps in the implementation in 12.2 that can produce wrong results. Here’s some code to model the problem.

rem
rem     Script:         group_by_elim_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

create table ref_clearing_calendar(
        calendar_name   char(17),
        business_date   date,
        update_ts       timestamp (6) default systimestamp,
        constraint pk_ref_clearing_calendar 
                        primary key (business_date)
)
/

insert into ref_clearing_calendar (business_date)
select
        sysdate + 10 * rownum
from 
        all_objects 
where 
        rownum <= 40 -- > comment to avoid wordpress format issue
/

commit;

execute dbms_stats.gather_table_stats(null,'ref_clearing_calendar',cascade=>true)

set autotrace on explain

select
        to_char(business_date,'YYYY') , count(*)
from
        ref_clearing_calendar
group by 
        to_char(business_date,'YYYY')
order by 
        to_char(business_date,'YYYY')
/

set autotrace off

I’ve created a table with a primary key on a date column, and then inserted 40 rows which are spaced every ten days from the current date; this ensures that I will have a few dates in each of two consecutive years (future proofing the example!). Then I’ve aggregated to count the rows per year using the to_char({date column},’YYYY’) conversion option to extract the year from the date. (Side note: the table definition doesn’t follow my normal pattern as the example started life in the ODC thread.)

If you run this query on Oracle 12.2 you will find that it returns 40 (non-unique) rows and displays the following execution plan:


---------------------------------------------------------------------------------------------
| Id  | Operation        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   1 |  SORT ORDER BY   |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| PK_REF_CLEARING_CALENDAR |    40 |   320 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

The optimizer has applied “aggregate group by elimination” because it hasn’t detected that the primary key column that appears in the group by clause has been massaged in a way that means the resulting value is no longer unique.

Fortunately this problem with to_char() is fixed in Oracle 18.1 where the query returns two rows using the following execution plan (which I’ve reported from an instance of 19.5):

---------------------------------------------------------------------------------------------
| Id  | Operation        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   1 |  SORT GROUP BY   |                          |    40 |   320 |     2  (50)| 00:00:01 |
|   2 |   INDEX FULL SCAN| PK_REF_CLEARING_CALENDAR |    40 |   320 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Unfortunately there is still at least one gap in the implementation. Change the to_char(business_date) to extract(year from business_date) at all three points in the query, and even in 19.6 you’re back to the wrong results – inappropriate aggregate group by elimination and 40 rows returned.

There are a couple of workarounds, one is the hidden parameter _optimizer_aggr_groupby_elim to false at the system or session level, or through an opt_param() hint at the statement level (possibly injected through an SQL_Patch. The other option is to set a fix_control, again at the system, session, or statement level – but there’s seems to be little point in using the fix_control approach (which might be a little obscure for the next developer to see the code) when it seems to do the same as the explicitly named hidden parameter.

select
        /*+ opt_param('_optimizer_aggr_groupby_elim','false') */
        extract(year from business_date) , count(*)
from ,,,

select
        /*+ opt_param('_fix_control','23210039:0') */
        extract(year from business_date) , count(*)
from ...

One final thought about this “not quite fixed” bug. It’s the type of “oversight” error that gives you the feeling that there may be other special cases that might have been overlooked. The key question would be: are there any other functions (and not necessarily datetime functions) that might be applied (perhaps implicitly) to a primary or unique key that would produce duplicate results from distinct inputs – if so has the code that checks the validity of eliminating the aggregate operation been written to notice the threat.

Footnote

The problem with extract() has been raised as a bug on MOS, but it was not public at the time of writing this note.

Update (about 60 seconds after publication)

Re-reading my comment about “other functions” it occurred to me that to_nchar() might, or might not, behave the same way as to_char() in 19c – so I tested it … and got the wrong results in 19c.

 

 

 

Announcing SLOB 2.5.2.2

SLOB 2.5.2.2 is available via the SLOB Resources Page.

SLOB 2.5.2.2 is a bug-fix release. After announcing the undocumented Obfuscated Column Data Feature, a few SLOB users reported bugs. The bugs have been fixed in this release.