Oracle

12c Access Control Lists

There is already enough information about the new simplified 12c way to define Access Control Lists, such as in oracle-base.
I’m just posting my example here to show how it is easy.

Create constraints in your datawarehouse – why and how

We still see some developers not declaring referential integrity constraints in datawarehouse databases because they think they don’t need it (integrity of data has been validated by the ETL). Here is a small demo I did to show why you need to declare them, and how to do it to avoid any overhead on the ETL.

Test case

I create 3 dimension tables and 1 fact table:

21:01:18 SQL> create table DIM1 (DIM1_ID number, DIM1_ATT1 varchar2(20));
Table DIM1 created.
 
21:01:19 SQL> create table DIM2 (DIM2_ID number, DIM2_ATT1 varchar2(20));
Table DIM2 created.
 
21:01:20 SQL> create table DIM3 (DIM3_ID number, DIM3_ATT1 varchar2(20));
Table DIM3 created.
 
21:01:21 SQL> create table FACT (DIM1_ID number, DIM2_ID number, DIM3_ID number,MEAS1 number);
Table FACT created.

The full table scan direct path read decision for version 12.2

This post is about the decision the Oracle database engine makes when it is using a full segment scan approach. The choices the engine has is to store the blocks that are physically read in the buffercache, or read the blocks into the process’ PGA. The first choice is what I refer to as a ‘buffered read’, which places the block in the database buffercache so the process itself and other processes can bypass the physical read and use the block from the cache, until the block is evicted from the cache. The second choice is what is commonly referred to as ‘direct path read’, which places the blocks physically read into the process’ PGA, which means the read blocks are stored for only a short duration and is not shared with other processes.

impdp content=metadata_only locks the stats

With Oracle you can learn something every day. Today, preparing a migration to 12.2, I found all tables had locked statistics. I learned that it is the expected behavior since 10.2 when importing metadata_only including statistics, to avoid that the automatic job gathering comes and replaces the stats by ‘0 rows’.

It is documented in DataPump Import Without Data Locks Table Statistics (Doc ID 415081.1) but as I was really surprised about that (and also frustrated to learn a 10.2 thing when migrating to 12.2) that I wanted to test myself.

Understanding Data Gravity as a DBA

Data gravity and the friction it causes within the development cycle is an incredibly obvious problem in my eyes.

Data gravity suffers from the Von Newmann Bottleneck. It’s a basic limitation on how fast computers can be. Pretty simple, but states that the speed of where data resides and where it’s processed is the limiting factor in computing speed.

12c dbms_stats.gather_table_stats on GTT do not commit

In my UKOUG OracleScene article on 12c online statistics and GTT I mentioned the following:

A final note about those 12c changes in statistics gathering on GTT. In 11g the dbms_stats did a commit at the start. So if you did gather stats after the load, you had to set the GTT as ON COMMIT PRESERVE ROWS. Or you just vacuum what you’ve loaded. That has changed in 12c. If you now choose to do a conventional insert followed by dbms_stats (having set private stats of course) then you don’t need to set on commit preserve rows anymore.

Today, I realized that I’ve never explained exactly when dbms_stats.gather_table_stats commits the transaction or not. Because, of course, it depends. In summary: 12c non-SYS owner GTT with private statistics.

When PDB name conflicts with CDB name

Going to multitenant architecture is not a big change. The administration things (DBA, monitoring, backups) connect to the CDB and the application things connect to the PDB. Without the multitenant option, it is still recommended to go to the CDB architecture. The non-CDB is deprecated and the multitenant architecture brings interesting features. People often ask how to name the CDB and the PDB, especially when they have naming rules or policies in the company. My recommendation is to name the PDB as you are used to naming the databases: the name often gives an idea of the data that is inside, the application, and the environment. The CDB is the container, and in my opinion, you should apply the same naming rules as for servers. Don’t forget that pluggable databases are made to be moved across CDB, so the CDB name should not depend on the content.

Solaris Cluster and Decreasing Resource Needs After Upgrade

Delphix Engineering and Support are pretty amazing folks.  They continue to pursue for solutions, no matter how much time it takes and the complex challenges they’re faced with supporting heterogenous environments, hardware configurations and customer needs.

This post is in support of the effort from our team that resulted in stability to a previously impacted Solaris 11.2 cluster configuration.  The research, patching, testing and then resulting certification from Oracle was a massive undertaking from our team and I hope this information serves the community, but in no way is recommended by Delphix.  It’s just what was done to resolve the problem, after logical decisions for the use of the system by our team.

Hadoop for Database Professionals – St. Louis (7. Sep)

Here’s some more free stuff by Gluent!

We are running another half-day course together with Cloudera, this time in St. Louis on 7. September 2017.

We will use our database background and explain using database professionals terminology why “new world” technologies like Hadoop will take over some parts of the enterprise IT, why are those platforms so much better for advanced analytics over big datasets and how to use the right tool from Hadoop ecosystem for solving the right problem.

More information below. See you there!

Hadoop for Database Professionals – St. Louis

Postgres vs. Oracle access paths XI – Sample Scan

I was going to end this series with the previous post because the last access path available in Postgres is a bit special: a Seq Scan that returns only a sample of the rows, at random. However, it is the occasion to come back to the difference between random and sequential reads.

I’m still working on the same table as in the previous posts, with 10000 rows in 1429 pages. 5% of rows is 500 rows and 5% of blocks is about 72 pages.