Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Transitive Closure

This is a follow-up to a note I wrote nearly 12 years ago, looking at the problems of transitive closure (or absence thereof) from the opposite direction. Transitive closure gives the optimizer one way of generating new predicates from the predicates you supply in your where clause (or, in some cases, your constraints); but it’s a mechanism with some limitations. Consider the following pairs of predicates:


    t1.col1 = t2.col2
and t2.col2 = t3.col3

    t1.col1 = t2.col2
and t2.col2 = 'X'

A person can see that the first pair of predicate allows us to infer that “t1.col1 = t3.col3” and the second pair of predicates allows us to infer that “t1.col1 = ‘X'”. The optimizer is coded only to recognize the second inference. This has an important side effect that can have a dramatic impact on performance in a way that’s far more likely to appear if your SQL is generated by code. Consider this sample data set (reproduced from the 2006 article):

rem
rem     Script:         transitive_loop.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem

create table t1 
as
select
        mod(rownum,100) col1,
        rpad('x',200)   v1
from
        all_objects
where   
        rownum <= 2000
;

create table t2
as
select
        mod(rownum,100) col2,
        rpad('x',200)   v2
from
        all_objects
where   
        rownum <= 2000
;

create table t3
as
select
        mod(rownum,100) col3,
        rpad('x',200)   v3
from
        all_objects
where   
        rownum <= 2000
;

-- gather stats if necessary

set autotrace traceonly explain

prompt  =========================
prompt  Baseline - two hash joins
prompt  =========================

select 
        t1.*, t2.*, t3.*
from
        t1, t2, t3
where
        t2.col2 = t1.col1
and     t3.col3 = t2.col2
;

prompt  ================================================
prompt  Force mismatch between predicates and join order
prompt  ================================================

select 
        /*+
                leading(t1 t3 t2)
        */
        t1.*, t2.*, t3.*
from
        t1, t2, t3
where
        t2.col2 = t1.col1
and     t3.col3 = t2.col2
;

The first query simply joins the tables in the from clause order on a column we know will have 20 rows for each distinct value, so the result sets will grow from 2,000 rows to 40,000 rows to 800,000 rows. Looking at the second query we would like to think that when we force Oracle to use the join order t1 -> t3 -> t2 it would be able to use the existing predicates to generate the predicate “t3.col3 = t1.col1” and therefore be able to do the same amount of work as the first query (and, perhaps, manage to produce the same final cardinality estimate).

Here are the two plans, taken from an instance of 12.2.0.1:


=========================
Baseline - two hash joins
=========================

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   800K|   466M|    48  (38)| 00:00:01 |
|*  1 |  HASH JOIN          |      |   800K|   466M|    48  (38)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T3   |  2000 |   398K|    10   (0)| 00:00:01 |
|*  3 |   HASH JOIN         |      | 40000 |    15M|    21   (5)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   |  2000 |   398K|    10   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."COL3"="T2"."COL2")
   3 - access("T2"."COL2"="T1"."COL1")

================================================
Force mismatch between predicates and join order
================================================

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |   800K|   466M| 16926   (3)| 00:00:01 |
|*  1 |  HASH JOIN            |      |   800K|   466M| 16926   (3)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | T2   |  2000 |   398K|    10   (0)| 00:00:01 |
|   3 |   MERGE JOIN CARTESIAN|      |  4000K|  1556M| 16835   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T1   |  2000 |   398K|    10   (0)| 00:00:01 |
|   5 |    BUFFER SORT        |      |  2000 |   398K| 16825   (2)| 00:00:01 |
|   6 |     TABLE ACCESS FULL | T3   |  2000 |   398K|     8   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."COL2"="T1"."COL1" AND "T3"."COL3"="T2"."COL2")

As you can see, there’s a dramatic difference between the two plans, and a huge difference in cost (though the predicted time for both is still no more than 1 second).

The first plan, where we leave Oracle to choose the join order, builds an in-memory hash table from t3, then joins t1 to t2 with a hash table and uses the result to join to t3 by probing the in-memory hash table.

The second plan, where we force Oracle to use a join order that (I am pretending) we believe to be a better join order results in Oracle doing a Cartesian merge join between t1 and t3 that explodes the intermediate result set up to 4 million rows (and the optimizer’s estimate is correct) before eliminating a huge amount of redundant data.

As far as performance is concerned, the first query took 0.81 seconds to generate its result set, the second query took 8.81 seconds. In both cases CPU time was close to 100% of the total time.

As a follow-up demo I added the extra predicate “t3.col3 = t1.col1” to the second query, allowing the optimizer to use a hash join with the join order t1 -> t3 -> t2, and this brought the run time back down (with a slight increase due to the extra predicate check on the second join).

Summary

The choice of columns in join predicates may stop Oracle from choosing the best join order because it is not able to use transitive closure to generate all the extra predicates that the human eye can see. If you are using programs to generate SQL rather than writing SQL by hand you are more likely to see this limitation resulting in some execution plans being less efficient than they could be.

 

 

 

 

Virtual Patching or Good Security Design instead?

I got an email from someone recently who asked me about virtual patching for Oracle as they were running an out of date version of Oracle and were thinking that virtual patching maybe a good solution to make their database....[Read More]

Posted by Pete On 19/12/18 At 01:32 PM

UKOUG “Lifetime Achievement Award” Speaker Award

I was recently very honoured and flattered to have received the “Lifetime Achievement Award” Speaker Award from the UKOUG. I have only managed to get to the excellent UKOUG Tech conferences on a couple of occasions, so it was both a thrill and a surprise to be so honoured. Unfortunately, I wasn’t able to make […]

NULL predicate

People ask me from time to time if I’m going to write another book on the Cost Based Optimizer – and I think the answer has to be no because the product keeps growing so fast it’s not possible to keep up and because there are always more and more little details that might have been around for years and finally show up when someone asks me a question about some little oddity I’ve never noticed before.

The difficult with the “little oddities” is the amount of time you could spend trying to work out whether or not they matter and if it’s worth writing about them. Here’s a little example to show what I mean – first the data set:


rem
rem     Script:         null_filter.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem

create table t1
nologging
as
select  *
from    all_objects
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
commit;

create index t1_i1 on t1(object_type, data_object_id, object_id, created);

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/

It’s a simple data set with a single index. The only significant thing about the index is that the second column (data_object_id) is frequently null. This leads to a little quirk in the execution plans for a very similar pair of statements:


set serveroutput off
alter session set statistics_level = all;

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id = 20002
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id is null
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

How much difference would you expect in the execution plans for these two queries? There is, of course, the side effect of the “is null” predicate disabling the “implicit column group” that is the index distinct_keys value, but in this case I’ve got a range-based predicate on one of the columns so Oracle won’t be using the distinct_keys anyway.

Of course there’s the point that you can’t use the equality operator with null, you have to use “is null” – and that might make a difference, but how ? Here are the two execution plan:


----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID"=20002 AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID" IS NULL AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))
       filter(("OBJECT_ID"=20002 AND "CREATED">TRUNC(SYSDATE@!-90)))

The query with the predicate “data_object_id is null” repeats the object_id and sysdate predicates as access predicates and filter predicates. This seems a little surprising and a potential performance threat. In the first query the run_time engine will hit the correct index leaf block in exactly the right place very efficiently and then walk along it supplying every rowid to the parent operator until it hits the end of the range.

With the “is null” plan the run-time engine will be checking the actual value of object_id and created for every index entry on the way – how much extra CPU will this use and, more importantly, might Oracle start with the first index entry where object_type = ‘TABLE’ and data_object_id is null and walk through every index entry that has that null checking for the correct object_id as it goes ?

That last question is the reason for running the query with rowsource execution stats enabled. The first query did a single physical read while the second didn’t have to, but the more important detail is that both queries did the same number of buffer gets – and there is, by the way, a set of eight rows where the object_id and data_object_id are  20,002, but they were created several years ago so the index range scan returns no rows in both cases.

Based on that comparison, how do we show that Oracle has not walked all the way from the first index entry where object_type = ‘TABLE’ and data_object_id is null checking every entry on the way or, to put it another way, has Oracle really managed to prune down the index range scan to the minimum “wedge” indicated by the presence of the predicates “OBJECT_ID”=20002 AND “CREATED”>TRUNC(SYSDATE@!-90) as access predicates?

Let’s just count the number of leaf blocks that might be relevant, using the sys_op_lbid() function (last seen here) that Oracle uses internally to count the number of leaf blocks in an index. First we get the index object_id, then we scan it to see how many leaf blocks hold entries that match our object_type and data_object_id predicates but appear in the index before our target value of 20,002:


column object_id new_value m_index_id

select
        object_id
from
        user_objects
where
        object_type = 'INDEX'
and     object_name = 'T1_I1'
;

select  distinct sys_op_lbid(&m_index_id, 'L', rowid)
from    t1
where   object_type    = 'TABLE'
and     data_object_id is null
and     object_id      < 20002
;


SYS_OP_LBID(159271
------------------
AAAm4nAAFAAACGDAAA
AAAm4nAAFAAACF9AAA
AAAm4nAAFAAACGCAAA
AAAm4nAAFAAACF/AAA
AAAm4nAAFAAACF+AAA
AAAm4nAAFAAACGFAAA
AAAm4nAAFAAACGEAAA
AAAm4nAAFAAACGGAAA

8 rows selected.


This tells us that there are 8 leaf blocks in the index that we would have to range through before we found object_id 20,002 and we would have seen 8 buffer gets, not 3 in the rowsource execution stats, if Oracle had not actually been clever with its access predicates and narrowed down the wedge of the index it was probing.

Bottom line: for a multi-column index there seems to be a difference in execution plans between “column is null” and “column = constant” when the column is one of the earlier columns in the index – but even though the “is null” option results in some access predicates re-appearing as filter predicates in the index range scan the extra workload is probably not significant – Oracle still uses the minimum number of index leaf blocks in the index range scan.

 

Oracle VPD as a safeguard for DML

A new blog post on the Databases at CERN blog about using VPD Row-Level Security (DBMS_RLS) as a safeguard for the privileged users who need to bypass the application and run SQL directly: https://db-blog.web.cern.ch/blog/franck-pachot/2018-12-oracle-vpd-safeguard-dml

Of course, your data should be guarded behind a hard shell (See Bryn Llewellyn presentation https://community.oracle.com/docs/DOC-1018915) but there may be some exceptional reasons to directly modify data with SQL because some information was not originally supposed to be changed, and then the application has no GUI or API for this. If all security was implemented through the application, everything is now possible when directly connected and a mistake (like a where clause predicate lost in ac copy-paste) can be critical. Flashback features are awesome to react to this kind of error, but VPD rules can be used as a proactive safeguard by allowing, by default, only a subset of data to be touched.

Oracle VPD as a safeguard for DML

Little things worth knowing: the latest public-yum-ol7.repo configuration file enables an upgrade to UEK R5

For a little while now I have been using Ansible for all installation/configuration tasks I do in the lab. I can’t really be bothered to do these things by typing commands anymore: once you get the hang of Ansible, you can develop an urge to automate everything. As part of my playbook installing the Oracle database on Oracle Linux, I replace /etc/yum.repos.d/public-yum-ol7.repo with the current version from Oracle’s server to make sure I have the latest and greatest software available.

This step is not needed outside playground/lab environments. For “real” deployments you would use a repository file provisioned during build time, pointing to your local Spacewalk/Satellite/insert favourite tool server. I should briefly point out this is my lab system and I’m perfectly happy to point yum at Oracle’s public YUM servers. If this were a system outside my own lab I’d of course use proper channels to maintain software releases on my  servers. These however are outside the scope of this post.

The setup

I haven’t downloaded the latest Oracle Linux 7.6 ISO yet so I’m starting off on Oracle Linux 7.4 with Kernel UEK 4. I know, it’s a looooong upgrade. Here is my current configuration:

[root@server4 ~]# cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="7.4"
ID="ol"
VERSION_ID="7.4"
PRETTY_NAME="Oracle Linux Server 7.4"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:4:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.4
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.4

[root@server4 ~]# rpm -q kernel-uek
kernel-uek-4.1.12-94.3.9.el7uek.x86_64
[root@server4 ~]#

Oracle’s Unbreakable Enterprise Kernel Release 4 is based on the upstream release 4.1.x. 

Upgrading!

With the latest version of the /etc/yum.repos.d/public-yum-ol7.repo in place I upgrade the freshly provisioned system – after it has been created using virt-install and my kickstart file – to “latest” using yum upgrade …

A little later, when the system was upgraded and my prompt returned, I rebooted the VM. When it came up, I was a little surprised to see a change in kernel version:

[root@server4 ~]# reboot
Connection to server4 closed by remote host.
Connection to server4 closed.
[martin@server1 ~]$
[martin@server1 ~]$ ssh root@server4
root@server4's password:
Last login: Sat Dec 15 12:16:38 2018 from 192.168.100.1
[root@server4 ~]# uname -r
4.14.35-1818.5.4.el7uek.x86_64
[root@server4 ~]#

So what happened? First of all, I have 2 UEK kernels on my system now. 

[root@server4 ~]# rpm -qi kernel-uek
Name : kernel-uek
Version : 4.1.12
Release : 94.3.9.el7uek
Architecture: x86_64
Install Date: Thu 13 Dec 2018 19:17:37 GMT
Group : System Environment/Kernel
Size : 160287394
License : GPLv2
Signature : RSA/SHA256, Sat 15 Jul 2017 05:06:17 BST, Key ID 72f97b74ec551f03
Source RPM : kernel-uek-4.1.12-94.3.9.el7uek.src.rpm
Build Date : Sat 15 Jul 2017 04:54:05 BST
Build Host : x86-ol7-builder-02.us.oracle.com
Relocations : (not relocatable)
Vendor : Oracle America
URL : http://www.kernel.org/
Summary : The Linux kernel
Description :
The kernel package contains the Linux kernel (vmlinuz), the core of any
Linux operating system. The kernel handles the basic functions
of the operating system: memory allocation, process allocation, device
input and output, etc.
Name : kernel-uek
Version : 4.14.35
Release : 1818.5.4.el7uek
Architecture: x86_64
Install Date: Sat 15 Dec 2018 12:24:50 GMT
Group : System Environment/Kernel
Size : 61690109
License : GPLv2
Signature : RSA/SHA256, Tue 04 Dec 2018 05:22:41 GMT, Key ID 72f97b74ec551f03
Source RPM : kernel-uek-4.14.35-1818.5.4.el7uek.src.rpm
Build Date : Tue 04 Dec 2018 05:12:59 GMT
Build Host : ca-build85.us.oracle.com
Relocations : (not relocatable)
Vendor : Oracle America
URL : http://www.kernel.org/
Summary : Oracle Unbreakable Enterprise Kernel Release 5
Description :
The kernel package contains the Linux kernel (vmlinuz), the core of any
Linux operating system. The kernel handles the basic functions
of the operating system: memory allocation, process allocation, device
input and output, etc.
[root@server4 ~]#

The first entry in the list – 4.1.12 – is UEK R4 whilst the second – 4.14.35 – is UEK R5. This can easily be explained looking at the enabled repositories:

[root@server4 ~]# yum repolist
Loaded plugins: ulninfo
repo id repo name status
ol7_UEKR5/x86_64 Latest Unbreakable Enterprise Kernel Release 5 for Oracle Linux 7Server (x86_64) 115
ol7_latest/x86_64 Oracle Linux 7Server Latest (x86_64) 11,708
repolist: 11,823

When comparing with another Oracle Linux 7.4 installation using the stock public-yum-ol7.repo I can see this instead:

[root@rac18pri1 ~]# yum repolist
Loaded plugins: ulninfo
repo id repo name status
ol7_UEKR4/x86_64 Latest Unbreakable Enterprise Kernel Release 4 for Oracle Linux 7Server (x86_64) 124
ol7_latest/x86_64 Oracle Linux 7Server Latest (x86_64) 11,708
repolist: 11,832

While I appreciate being upgraded to the latest kernel automatically, there might be people out there who don’t. The good news is that my system still had the ability to boot to UEK R4 should I need to:

https://martincarstenbach.files.wordpress.com/2018/12/181214-boot-screen... 150w, https://martincarstenbach.files.wordpress.com/2018/12/181214-boot-screen... 300w" sizes="(max-width: 719px) 100vw, 719px" />
This figure shows the interactive boot menu after the UEK R5 update with UEK R5 highlighted as the default kernel

If you have to go back to UEK R4, you might be able to change the default entry back to UEK R4, reboot, and see what happens. .

It’s all downhill past 30

Yes, it sounds like a lamentation of the aging process, and from my own experience, it is definitely true that the wheels started falling off for my knees, back and plenty of other body parts once I turned 30. But that is perhaps a legacy of too much alcohol, not eating well in my youth and failing to stretch rather than any particular chronological milestone Smile.

But this post is not about that. This one is about the magical 30 character limit on identifiers that we finally lifted in version 12c. For static SQL, this is obviously a no impact change – you either define your tables and columns with longer names or you don’t, and the static SQL you write reflects that position.

But dynamic SQL is a different proposition, in particular, this discovery I made with DBMS_SQL recently. DBMS_SQL has a number of package data types that reflect the maximum identifier length in the database, for example:


  type desc_rec is record (
        col_type            binary_integer := 0,
        col_max_len         binary_integer := 0,
        col_name            varchar2(32)   := '',
        col_name_len        binary_integer := 0,
        col_schema_name     varchar2(32)   := '',
        col_schema_name_len binary_integer := 0,
        col_precision       binary_integer := 0,
        col_scale           binary_integer := 0,
        col_charsetid       binary_integer := 0,
        col_charsetform     binary_integer := 0,
        col_null_ok         boolean        := TRUE);

For dynamic SQL where we need to determine the column names dynamically with an arbitrary SQL statement, DBMS_SQL uses this data type in it’s API calls. The example below lists out some user names from the database and dynamically derives the column names (even though we obviously know them in advance for this simple example)


SQL> set serverout on format wrapped
SQL> declare
  2    c clob := 'select username, created from dba_users';
  3    l_cur     int := dbms_sql.open_cursor;
  4    l_val     varchar2(4000);
  5    l_status  int;
  6    l_desc    dbms_sql.desc_tab;
  7    l_cnt     int;
  8    l_longcol int := 0;
  9  begin
 10    dbms_sql.parse(l_cur,c,dbms_sql.native );
 11    dbms_sql.describe_columns(l_cur,l_cnt,l_desc);
 12    for i in 1 .. l_cnt loop
 13      dbms_sql.define_column(l_cur,i,l_val,4000);
 14      l_longcol := greatest(l_longcol,length(l_desc(i).col_name));
 15    end loop;
 16    l_status := dbms_sql.execute(l_cur);
 17    while dbms_sql.fetch_rows(l_cur) > 0
 18    loop
 19      for i in 1 .. l_cnt
 20      loop
 21        dbms_sql.column_value(l_cur, i,l_val);
 22        dbms_output.put_line(
 23          rpad(l_desc(i).col_name,l_longcol)||': '||l_val );
 24      end loop;
 25      dbms_output.new_line;
 26    end loop;
 27  exception
 28    when others then
 29      dbms_output.put_line(dbms_utility.format_error_backtrace);
 30      raise;
 31  end;
 32  /
USERNAME: SYS
CREATED : 08-MAR-17

USERNAME: SYSTEM
CREATED : 08-MAR-17

USERNAME: XS$NULL
CREATED : 08-MAR-17

USERNAME: OJVMSYS
CREATED : 08-MAR-17

USERNAME: SYSADMIN
CREATED : 19-JUL-18

It would seem a trivial change to increase the size limit as the version increases from 11g to 12c, but DBMS_SQL is caught between a rock and a hard place here. If you do change the limit, then you risk issues with backward compatibility for customers. If you do not change the limit, then you run into problems like the demo below:


SQL> create table t ( using_my_new_cool_long_column_names_in_12c )
  2  as select 1 from dual;

Table created.

SQL>
SQL>
SQL> set serverout on format wrapped
SQL> declare
  2    c clob := 'select * from t';
  3    l_cur     int := dbms_sql.open_cursor;
  4    l_val     varchar2(4000);
  5    l_status  int;
  6    l_desc    dbms_sql.desc_tab;
  7    l_cnt     int;
  8    l_longcol int := 0;
  9  begin
 10    dbms_sql.parse(l_cur,c,dbms_sql.native );
 11    dbms_sql.describe_columns(l_cur,l_cnt,l_desc);
 12    for i in 1 .. l_cnt loop
 13      dbms_sql.define_column(l_cur,i,l_val,4000);
 14      l_longcol := greatest(l_longcol,length(l_desc(i).col_name));
 15    end loop;
 16    l_status := dbms_sql.execute(l_cur);
 17    while dbms_sql.fetch_rows(l_cur) > 0
 18    loop
 19      for i in 1 .. l_cnt
 20      loop
 21        dbms_sql.column_value(l_cur, i,l_val);
 22        dbms_output.put_line(
 23          rpad(l_desc(i).col_name,l_longcol)||': '||l_val );
 24      end loop;
 25      dbms_output.new_line;
 26    end loop;
 27  exception
 28    when others then
 29      dbms_output.put_line(dbms_utility.format_error_backtrace);
 30      raise;
 31  end;
 32  /
ORA-06512: at "SYS.DBMS_SQL", line 2084
ORA-06512: at line 11

declare
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at line 30
ORA-06512: at "SYS.DBMS_SQL", line 2084
ORA-06512: at line 11

The solution here is a simple one. The existing definitions in DBMS_SQL were left unchanged to preserve that backward compatibility, and additional structures were added to handle longer column names. We need simply alter our code to use the new “desc_tab2” data type and it’s accompanying “describe_columns2” call.


SQL> declare
  2    c clob := 'select * from t';
  3    l_cur     int := dbms_sql.open_cursor;
  4    l_val     varchar2(4000);
  5    l_status  int;
  6    l_desc    dbms_sql.desc_tab2;
  7    l_cnt     int;
  8    l_longcol int := 0;
  9  begin
 10    dbms_sql.parse(l_cur,c,dbms_sql.native );
 11    dbms_sql.describe_columns2(l_cur,l_cnt,l_desc);
 12    for i in 1 .. l_cnt loop
 13      dbms_sql.define_column(l_cur,i,l_val,4000);
 14      l_longcol := greatest(l_longcol,length(l_desc(i).col_name));
 15    end loop;
 16    l_status := dbms_sql.execute(l_cur);
 17    while dbms_sql.fetch_rows(l_cur) > 0
 18    loop
 19      for i in 1 .. l_cnt
 20      loop
 21        dbms_sql.column_value(l_cur, i,l_val);
 22        dbms_output.put_line(
 23          rpad(l_desc(i).col_name,l_longcol)||': '||l_val );
 24      end loop;
 25      dbms_output.new_line;
 26    end loop;
 27  exception
 28    when others then
 29      dbms_output.put_line(dbms_utility.format_error_backtrace);
 30      raise;
 31  end;
 32  /
USING_MY_NEW_COOL_LONG_COLUMN_NAMES_IN_12C: 1


PL/SQL procedure successfully completed.

In fact, a column name could always be more than 32 characters because an un-aliased column expression can yield column names of almost arbitrary length, so these additions were made several releases ago, but 12c had brought that issue to the fore because even “normal” column names can now exceed the previous limit.

Now if I only I could add a “2” to the end of my aging body to get to handle years up to 128 just like the column names in 12c Smile

CPU Capacity planning from OEM metrics

The CPU used by your Oracle Database is expensive because it is the metric used by licensing. The more you can control and know what you need, the more freedom you will have to optimize the costs. With instance caging, available in all editions, you can put a soft limit. This means that:

  • you run on a limited number of threads and after a while, this gives a good idea of what you really need. You can forecast the capacity for a future consolidation.
  • you monitor ‘resmgr: cpu quantum’ and if activity is high you can decide to scale-up immediately, throttle some services, or do some query/design tuning.

In order to set instance caging, you need to define a value for CPU_COUNT according to the past activity. This post is the detail behind the following tweet:

Oracle Enterprise Manager cloud Control collects many metrics from the target databases and maintains a history of them with daily minimum, maximum and average,… The Oracle documentation mentions the view MGMT$METRIC_DAILY but I prefer the MGMT_METRICS_1DAY because the ‘underscore’ views are not mentioned in the Licensing Information documentation. The ‘dollar’ metric views require Diagnostic Pack. However, those scripts are to be used with databases where Diagnostic Pack is enabled.

Average, Maximum or Percentile?

The goal is to define a CPU_COUNT which accepts the peak activity. Then I’ll get the maximum value from the daily metric: the maximum number of sessions in CPU observed during each day. However, we may have experienced an issue where some queries take lot of resources. If this is caused by an application bug or bad execution plan, we need to fix it. We don’t want to pay for CPU to cope with these issues. Then, rather than taking the maximum, I’ll look at a percentile. Here is a graph from 4 instances during 2 years:

Do you want to pay for 120 threads just because of one peak? Probably not. Taking a percentile 99% will give the value to cope with 99% of the days.

My query for these instances returns the following: 125 thread is the maximum, but 74 threads are sufficient for 99% of the days observed. And 50 threads are ok for 90% of the days:

You see the same number for multiple instances because they are nodes for the same database.

RAC instances

Here is another example with two nodes from the same database. In addition to the peaks where we can accept to throttle for a short time, there is a service that has been relocated several times:

Then which CPU_COUNT do you want to set on each instance? The safest would be to accept, on each node, the possibility to run the load observed on the whole cluster. This can happen in an exceptional case where only one node remains. But this costs a lot. Rather than setting the sum of cluster load, we can set the maximum observed, so that each node can accept a peak observed on one of the nodes. Or, if we don’t expect service relocation, or if we accept some response time degradation in those cases, we can just set what has been observed on the node.

The following query joins the daily metrics with some target information: “Line of Business” and “Department” to group the databases from a business point of view, “LifeCycle Status” to differentiate production, test, and development.

The metric selected is the per-instance CPU second per second, which is the average number of sessions in CPU.

“CPU load/instance/day” does the join and calculates the value, the max, and the sum among the cluster. With single-instance, you don’t have to worry. With RAC, you may decide, in “Target CPU load/instance/day” which one you use. Then “Proposed instance caging” calculates the percentiles. The main query, at the end, does a ‘group by rollup’ to add some aggregates per business and environment. Of course, you will customize for your needs. My goal is to show the useful sources of information and how to mine them.

with 
TARGET_PROPERTIES as (
select target_guid,property_name,property_value
from sysman.MGMT_TARGET_PROPERTIES
where property_type='INSTANCE'
),
TARGET_PROP_DEFS as (
select property_name,property_display_name
from sysman.MGMT$ALL_TARGET_PROP_DEFS
where defined_by='SYSTEM'
),
TARGET_LINE_OF_BUSINESS as (
select target_guid,property_value "Line of Business"
from TARGET_PROP_DEFS
natural left outer join
TARGET_PROPERTIES where property_display_name='Line of Business'
),
TARGET_DEPARTMENT as (
select target_guid,property_value "Department"
from TARGET_PROP_DEFS
natural left outer join
TARGET_PROPERTIES where property_display_name='Department'
),
TARGET_LIFESYCLE_STATUS as (
select target_guid,property_value "LifeCycle Status"
from TARGET_PROP_DEFS
natural left outer join
TARGET_PROPERTIES
where property_display_name='LifeCycle Status'
),
TARGET_RAC_DATABASES as (
select member_target_name target_name
,member_target_guid target_guid
,composite_target_name "RAC Database"
from sysman.MGMT_TARGET_MEMBERSHIPS
where composite_target_type='rac_database'
and member_target_type='oracle_database'
),
TARGETS_INSTANCES as (
select target_guid, target_type, type_meta_ver
,category_prop_1, target_name
from sysman.mgmt_targets
where target_type='oracle_database'
),
METRICS_CPU as (
select metric_guid
,target_type, type_meta_ver, category_prop_1, metric_name
,metric_label, key_column, num_keys, column_label
,description, short_name, source, eval_func
from sysman.mgmt_metrics
where column_label = 'CPU Usage (per second)'
),
METRICS_1DAY as (
select target_guid
,metric_guid,rollup_timestamp "Day"
,value_maximum/100 "Max CPU load"
from sysman.mgmt_metrics_1day
),
"CPU load/instance/day" as (
select "Line of Business","Department","LifeCycle Status"
,"RAC Database",target_name "Instance","Day","Max CPU load"
-- sums over the RAC cluster because we should afford running all services on one instance (but sum of max can be large when a service has been relocated during the day)
,sum("Max CPU load") over (partition by "Line of Business","Department","LifeCycle Status","RAC Database","Day") "Sum instances CPU load"
-- or just need to ensure that each node can run the maximum observed per node
,max("Max CPU load") over (partition by "Line of Business","Department","LifeCycle Status","RAC Database","Day") "Max instances CPU load"
from
METRICS_1DAY
natural join
METRICS_CPU
natural join
TARGETS_INSTANCES
natural left outer join
TARGET_LINE_OF_BUSINESS
natural left outer join
TARGET_RAC_DATABASES
natural left outer join
TARGET_LIFESYCLE_STATUS
natural left outer join
TARGET_DEPARTMENT
),
"Target CPU load/instance/day" as (
select "Line of Business","Department","LifeCycle Status","RAC Database","Instance","Day"
-- choice of the metric used when in RAC:
-- - "Max CPU load" when not counting relocation of services,
-- - "Sum instances CPU load" when counting that each nodes can accept the maximum load seen in any node
-- - "Max instances CPU load" when counting that each node can accept the maximul load seen in the whole cluster
,"Max CPU load" "Max CPU"
from "CPU load/instance/day"
),
"Proposed instance caging" as (
select
"Line of Business","Department","LifeCycle Status","RAC Database","Instance"
,ceil(max("Max CPU")) "From Max"
-- here we add a percentile calculation because we do not count expceptional peaks
,ceil(percentile_cont(0.99) within group(order by "Max CPU")) "From percentile 99%"
,ceil(percentile_cont(0.90) within group(order by "Max CPU")) "From percentile 90%"
from "Target CPU load/instance/day"
group by "Line of Business","Department","LifeCycle Status","RAC Database","Instance"
)
select
"Line of Business","Department","LifeCycle Status"
,"RAC Database","Instance"
,sum("From Max")
,sum("From percentile 99%"),sum("From percentile 90%")
from "Proposed instance caging"
group by rollup (
"Line of Business","Department","LifeCycle Status","RAC Database","Instance"
)
order by
grouping("Line of Business") desc,grouping("Department") desc,grouping("LifeCycle Status") desc,grouping("RAC Database") desc,grouping("Instance") desc,
"Line of Business","Department","LifeCycle Status","RAC Database","Instance"
;

You may like or not my way of writing SQL queries. I use Common Table Expressions to define the source of data, name each column according to its role in the final result, and use natural join because the names define clearly the join columns. I find this very easy to develop and test each step.

The “CPU load/instance/day” result can easily be exported to an Excel pivot graph to look at the whole picture, as above, before deciding which percentile and which cluster aggregation to use. You may even have to look at your logs to see if the peaks are related to business activity (where you want to scale-up) or a problem (which you want to cage). And remember that the goal is to set a base for instance caging, which can be adapted easily later. When you run a while with a controlled number of threads, you can consider consolidation and licensing optimization. And don’t forget to see if you can reduce the CPU_COUNT with some tuning. Please, don’t hesitate to comment, here or on Twitter, with remarks or improvements.

Franck Pachot (@FranckPachot) | Twitter

Extreme Nulls

This note is a variant of a note that I wrote a few months ago about the impact of nulls on column groups. The effect showed up recently on a client site with a little camouflage that confused the issue for a little while, so I thought it would be worth a repeat.  We’ll start with a script to generate some test data:

rem
rem     Script:         pt_hash_cbo_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:        
rem
rem     Last tested 
rem             12.1.0.2
rem

create table t1 (
        hash_col,
        rare_col,
        n1,
        padding
)
nologging
partition by hash (hash_col)
partitions 32
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        mod(rownum,128),
        case when mod(rownum,1021) = 0 
                then rownum + trunc(dbms_random.value(-256, 256))
        end case,
        rownum,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1048576 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(hash_col, rare_col) nologging
local compress 1
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                granularity => 'ALL',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve got a hash-partitioned table with 32 partitions; the partitioning key is called hash_col, and there is another column called rare_col that is almost alway null – roughly 1 row in every 1,000 holds a value. I’ve added a local index on (hash_col, rare_col) compressing the leading column since hash_col is very repetitive, and gathered stats on the partitions and table. Here’s a view of the data for a single value of hash_col, and a summary report of the whole data set:

select  
        hash_col, rare_col, count(*)
from
        t1
where
        hash_col = 63
group by
        hash_col, rare_col
order by
        hash_col, rare_col
;

  HASH_COL   RARE_COL   COUNT(*)
---------- ---------- ----------
        63     109217          1
        63     240051          1
        63     370542          1
        63     501488          1
        63     631861          1
        63     762876          1
        63     893249          1
        63    1023869          1
        63                  8184

9 rows selected.

select
        count(*), ct
from    (
        select
                hash_col, rare_col, count(*) ct
        from
                t1
        group by
                hash_col, rare_col
        order by
                hash_col, rare_col
        )
group by ct
order by count(*)
;

  COUNT(*)         CT
---------- ----------
         3       8183
       125       8184
      1027          1

Given the way I’ve generated the data any one value for hash_col will have there are 8,184 (or 8,183) rows where the rare_col is null; but there are 1027 rows which have a value for both hash_col and rare_col with just one row for each combination.

Now we get to the problem. Whenever rare_col is non null the combination of hash_col and rare_col is unique (though this wasn’t quite the case at the client site) so when we query for a given hash_col and rare_col we would hope that the optimizer would be able to estimate a cardinality of one row; but this is what we see:


variable n1 number
variable n2 number

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   908 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

The optimizer has predicted a massive 908 rows. A quick check of the object stats shows us that this is “number of rows in table” / “number of distinct keys in index” (1,048,576 / 1,155, rounded up).

Any row with rare_col set to null cannot match the predicate “rare_col = :n2”, but because the optimizer is looking at the statistics of complete index entries (and there are 1048576 of them, with 1155 distinct combinations, and none that are completely null) it has lost sight of the frequency of nulls for rare_col on its own. (The same problem appears with column groups – which is what I commented on in my previous post on this topic).

I’ve often said in the past that you shouldn’t create histograms on data unless your code is going to use them. In this case I need to stop the optimizer from looking at the index.distinct_keys and one way to do that is to create a histogram on one of the columns that defines the index; and I’ve chosen to do this with a fairly arbitrary size of 10 buckets:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns rare_col size 10')

explain plan for
select /*+ index(t1) */
        n1
from
        t1
where
        hash_col = :n1
and     rare_col = :n2
;

select * from table(dbms_xplan.display);

========================================

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |     1 |    12 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |     1 |    12 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Bonus observation

This problem came to my attention (and I’ve used a partitioned table in my demonstration) because I had noticed an obvious optimizer error in the client’s execution plan for exactly this simple a query. I can demonstrate the effect the client saw by running the test again without creating the histogram but declaring hash_col to be not null. Immediately after creating the index I’m going to add the line:


alter table t1 modify hash_col not null;

(The client’s system didn’t declare the column not null, but their equivalent of hash_col was part of the primary key of the table which meant it was implicitly declared not null). Here’s what my execution plan looked like with this constraint in place:


--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   908 | 10896 |    76   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |       |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   908 | 10896 |    76   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |    28 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))

Spot the difference.

The estimate of index rowids is far smaller than the estimate of the rows that will be fetched using those rowids. This is clearly an error.

If you’re wondering how Oracle got this number divide 908 by 32 (the number of partitions in the table) – the answer is 28.375.

Fortunately it’s (probably) an error that doesn’t matter despite looking worryingly wrong. Critically the division hasn’t changed the estimate of the number of table rows (we’ll ignore the fact that the estimate is wrong anyway thanks to a different error), and the cost of the index range scan and table access have not changed. The error is purely cosmetic in effect.

Interestingly if you modify the query to be index-only (i.e. you restrict the select list to columns in the index) this extra division disappears.

Summary

1) If you have a B-tree index where one (or more) of the columns is null for a large fraction of the entries then the optimizer may over-estimate the cardinality of a predicate of the form: “(list of all index columns) = (list of values)” as it will be using the index.distinct_keys in its calculations and ignore the effects of nulls in the individual columns. If you need to work around this issue then creating a histogram on one of the index columns will be sufficient to switch Oracle back to the strategy of multiplying the individual column selectivities.

2) There are cases of plans for accessing partitioned tables where Oracle starts by using table-level statistics to get a suitable set of estimates but then displays a plan with the estimate of rows for an index range scan scaled down by the number of partitions in the table. This results in a visible inconsistency between the index estimate and the table estimate, but it doesn’t affect the cardinality estimate for the table access or either of the associated costs – so it probably doesn’t have a destabilising effect on the plan.

Linux grub2 boot loader manipulation

This post is about how to manage grub2 in an easy way.

grub1

In the past, which is before linux EL7, the boot loader was grub, the grand unified bootloader (version 1). Things were very simple; if you installed another kernel (using rpm) it would add an entry to grub’s configuration in /boot/grub/menu.lst. If you wanted to change grub to boot that newly installed kernel by default you edit /boot/grub/menu.lst and set ‘default’ to the number, counting from zero, of the newly installed kernel, in the order of the kernels listed. If you wanted a certain option set for booting the kernel, you added it to the kernel line.

grub2

Then came grub2. Reading up on the grub2 homepage it turns out that grub 1 apparently was old code, and it was clunky, hard to maintain and extend with new features. Other reasons that probably contributed to grub2 acceptance where UEFI support as well as limited support for other architectures than x86 and x86_64.

grub2: the mechanics

The heart of the grub2 configuration is by default in /boot/grub2/grub/grub.cfg. The start of the file looks like bash shell script. A little further in the file, the installed kernels and arguments are found:

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Oracle Linux Server (4.1.12-124.18.6.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $me
nuentry_id_option 'gnulinux-4.1.12-124.18.6.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  4fc4063c-861b-4b07-8c28-e847a53
5c6cb
        else
          search --no-floppy --fs-uuid --set=root 4fc4063c-861b-4b07-8c28-e847a535c6cb
        fi
        linux16 /vmlinuz-4.1.12-124.18.6.el7uek.x86_64 root=/dev/mapper/ol-root ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet nu
ma=off transparent_hugepage=never
        initrd16 /initramfs-4.1.12-124.18.6.el7uek.x86_64.img
}
menuentry 'Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.1.12-112.16.4.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {

A knee-jerk reaction would be to see if this file can be edited, but the heading of the file quite clearly says to not do that and that any modification could be overwritten:

#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub2-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

So, this points us /etc/default/grub. To be sure what actually to do, the documentation/manual makes it clear how to control what kernel and options are chosen in the chapter 6.1 simple configuration handling:

grub-mkconfig does have some limitations. While adding extra custom menu entries to the end of the list can be done by editing /etc/grub.d/40_custom or creating /boot/grub/custom.cfg, changing the order of menu entries or changing their titles may require making complex changes to shell scripts stored in /etc/grub.d/. This may be improved in the future. In the meantime, those who feel that it would be easier to write grub.cfg directly are encouraged to do so (see Booting, and Shell-like scripting), and to disable any system provided by their distribution to automatically run grub-mkconfig.

The file /etc/default/grub controls the operation of grub-mkconfig. It is sourced by a shell script, and so must be valid POSIX shell input; normally, it will just be a sequence of ‘KEY=value’ lines, but if the value contains spaces or other special characters then it must be quoted.

What this means to say is that for basic operation and selection, the /etc/default/grub file must be edited, after which grub-mkconfig must be run to “build” the actual grub settings. That’s not extremely intuitive from my point of view.

Let’s look at the /etc/default/grub on my test system, which is not touched as far as I know:

[root@o184 ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never"
GRUB_DISABLE_RECOVERY="true"

The first thing to look at is ‘GRUB_DEFAULT’, which is the default grub entry/kernel that is booted. It is set to ‘saved’, which is not really helpful. The utility ‘grub2-editenv’ can be used to list what the saved entry is:

[root@o184 ~]# grub2-editenv list
saved_entry=0

Which means the first (this is counting from zero) entry in /boot/grub2/grub.cfg. I have not come across a utility that is installed by default to list the menu entries with their number in grub.cfg, it’s easy to do with some shell commands, although that feels clumsy to me:

[root@o184 ~]# cat /boot/grub2/grub.cfg | grep ^menuentry | nl -v0
     0	menuentry 'Oracle Linux Server (4.1.12-124.18.6.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.1.12-124.18.6.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
     1	menuentry 'Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.1.12-112.16.4.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
     2	menuentry 'Oracle Linux Server (3.10.0-862.11.6.el7.x86_64 with Linux) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-862.11.6.el7.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
     3	menuentry 'Oracle Linux Server (0-rescue-069a6c1ff25b409fa87b8e587a2f8b4d with Linux) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-069a6c1ff25b409fa87b8e587a2f8b4d-advanced-64c9f287-84a8-48ec-b048-364295362114' {

So in this case kernel 4.1.12-124.18.6.el7uek.x86_64 is started by default.

In fact, the value of ‘saved’ is stored in /boot/grub2/grubenv:

[root@o184 ~]# cat /boot/grub2/grubenv
# GRUB Environment Block
saved_entry=0
################...

There is more to tell about the GRUB_DEFAULT and especially GRUB_SAVEDEFAULT; if GRUB_SAVEDEFAULT is set and set to ‘true’, and GRUB_DEFAULT is set to ‘saved’, then any kernel that is chosen in grub at boot will be saved as default. However, as far as I can see, this option is not set in a default /etc/default/grub file.

Another caveat is that by setting the number of the menuentry, the order could be changed when installing a new kernel, and the numbering may be different. For that reason, it’s also possible to set either the entry description after menuentry, or a name (“identifier”) set with –id at the menuentry line.

grubby

This is all nice, but it feels really indirect to me, multiple files must be combined to understand the current settings, and multiple commands must be entered to make a change. This also means it’s a little less easy to automate.

Now on to the actual reason for this blogpost. There actually is a utility that can do most of the manipulation using a single command, which is installed by default. However, it is not very well known. Let me introduce ‘grubby’ to you!

List grub entries:

[root@o184 ~]# grubby --info=ALL
index=0
kernel=/boot/vmlinuz-4.1.12-124.18.6.el7uek.x86_64
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never "
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-124.18.6.el7uek.x86_64.img
title=Oracle Linux Server (4.1.12-124.18.6.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5
index=1
kernel=/boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never "
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-112.16.4.el7uek.x86_64.img
title=Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5
...

List current default kernel:

[root@o184 ~]# grubby --default-kernel
/boot/vmlinuz-4.1.12-124.18.6.el7uek.x86_64

Set another kernel to boot:

[root@o184 ~]# grubby --set-default /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
[root@o184 ~]# grubby --default-kernel
/boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64

List current settings of the grub2 menuentry:

[root@o184 ~]# grubby --info /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
index=1
kernel=/boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never"
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-112.16.4.el7uek.x86_64.img
title=Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5

Grubby also facilitates making changes to the arguments of the kernel.
For example, if you want to change the setting ‘numa=off’ to ‘numa=on’, you can remove the argument:

[root@o184 ~]# grubby --update-kernel /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 --remove-args="numa=off"
[root@o184 ~]# grubby --info /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 | grep ^args
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet transparent_hugepage=never"

And then add it with the correct argument:

[root@o184 ~]# grubby --update-kernel /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 --args="numa=on"
[root@o184 ~]# grubby --info /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 | grep ^args
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet transparent_hugepage=never numa=on"

Grubby makes it easy to list and manage the grub2 boot loader settings, and also easily be used in scripts.