Search

OakieTags

Who's online

There are currently 0 users and 40 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Invoker Rights in Oracle Database 12c : Some more articles

I wrote about the Code Based Access Control (CBAC) stuff in Oracle Database 12c a while back.

I’ve recently “completed the set” by looking at the INHERIT PRIVILEGES and BEQUEATH CURRENT_USER stuff for PL/SQL code and views respectively.

It’s pretty cool, but I’m not sure how much of it I will see in the wild as it will require developers to do a bit more thinking, rather than doing what they’ve always done… :)

Cheers

Tim…


Invoker Rights in Oracle Database 12c : Some more articles was first posted on July 9, 2014 at 12:03 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

12c Index Like Table Statistics Collection (Wearing The Inside Out)

This change introduced in 12c has caught me out on a number of occasions. If you were to create a new table: And then populate it with a conventional insert: We find there are no statistics associated with the table until we explicitly collect them: But if we were to now create an index on this […]

Oracle Linux and MySQL : Progress

A few months ago I wrote about some MySQL on Oracle Linux migrations we were working through. It’s been a long time coming, but last weekend was the go-live for this batch of migrations. So far so good! :)

Most of the elapsed time since my last post on this subject has been spent with the developers and users testing the migrations.

The process has taken a bit longer than some people would have liked. Rather than doing a quick and dirty upgrade, I’ve been pushing to get things done properly. Since I was the person who set up the infrastructure, I’ve been extremely anal about levels of privilege I’m granting. This has caused some minor rewrites of applications, which were essentially relying on admin privileges to perform some actions. Not any more! :)

I’m no MySQL guru, but I think what we have now is pretty darn decent, especially compared to where we started. I guess time will tell how bold a statement that is. :)

Onwards and upwards…

Cheers

Tim…


Oracle Linux and MySQL : Progress was first posted on July 8, 2014 at 6:26 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

It was 12 years ago today…

It was exactly 12 years ago today that I first presented my Index Internals – Rebuilding The Truth presentation at a local ACT Oracle User Group event. And so my association with Oracle indexes started. It would be an interesting statistic to know how many people have subsequently read the presentation :) It would no doubt result in […]

KeePass 2.27 Released

I just noticed that KeePass 2.27 has been released.

I was introduced to KeePass at my current job and now I use it for everything at home too. You can read how I use KeePass here.

Happy upgrading…

Cheers

Tim…


KeePass 2.27 Released was first posted on July 7, 2014 at 8:41 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Avoiding Data Doomsday: succeed now or pay later

 “Virtualized data is fast becoming the new norm. No one can justify using storage like before” – Tim Gorman

#222222;">
#222222;">How to Prevent a Data Doomsday 
#222222;">
#222222;">Gartner has famously noted the exponential growth of data and predicted a #1155cc;" href="http://www.gartner.com/newsroom/id/2672515" target="_blank">data doomsday. While IT organizations have been struggling against current data sizes for a while, they largely hobbled along using old approaches to data management. One of those approaches, snapshotting, is becoming increasingly prohibitive now for two reasons: the bureaucracy and the technology.

#222222;">
  • Bureaucracy: A task that might take an hour when someone is idle can take multiple days as that person starts to be 95% busy or more. The more people involved such as DBAs, sys admins, storage admins, network admins then the longer it takes to manage the snapshots leading to days and weeks of delays.
  • Technology: All of the technical solutions that have been used traditionally require specialized expertise, manual intervention and scripting for third parties.
As a result of these two major bottlenecks in data management, IT organizations are learning that a new approach of virtualized storage is stripping away the time spent in bureaucratic cycles and the difficulties in managing snapshotting technologies.
#222222;">
#222222;">Virtualized storage is quickly becoming the new norm. CIOs can either get on board now to reap competitive advantage, or wait a few years and realize that the reason they’re having to play catch-up with their competitors is that their infrastructures have become obsolete.
#222222;">

Already the leading companies in their industry such as
    #555555;">
  • Fortune #1 Walmart
  • #1 pharmaceutical Pfizer
  • #1 social Facebook
  • #1 US bank  Wells Fargo
  • #1 networking  Cisco
  • #1 cable provider Comcast
  • #1 auction site Ebay
  • #1 insurance New York Life
  • #1 chip manufacture Intel
#222222;">
use data virtualization and are extending their lead in their markets by increasing application productivity 20-50% and sharping business intelligence with fast access to the newest data. Those companies stuck in the old data management paradigms  of cloning and copying data will fall farther and farther behind. Ultimately only those companies using data virtualization will manage to succeed.


#222222;">Data virtualization is the most powerful method to improve data flow and development output as well as quality in the 20+ years I’ve been working with IT departments. It improves businesses by eliminating the enormous infrastructure, bureaucracy and time drag that it takes to provision databases and data for business intelligence groups, development environments, auditing requirements and backups. Development environments and business intelligence groups depend on having a copies of production data and databases and data virtualization allows provisioning in a few minutes with almost no storage overhead by sharing duplicate blocks among all the copies. As opposed to snapshot technologies, data virtualization automates the complete end to end process of collecting changed data from data sources, storing that data in a timeflow and provides provisioning of that data to target machines in minutes with a few clicks of a mouse. Because virtualized data is centrally managed the full chain of custody can be audited and managed as well.
#222222;">


#222222;">For more on data virtualization and the over all process of data supply chain that data virtualization accomplishes, see how Delphix accomplished this technology.

 

cut_through_maze copy

Recurring Conversations: AWR Intervals (Part 1)

I've seen plenty of blog posts and discussions over the years about the need to increase the default AWR retention period beyond the default value of 8 days. Experienced Oracle folk understand how useful it is to have a longer history of performance metrics to cover an entire workload period so that we can, for example, compare the acceptable performance of the last month end batch processes to the living hell of the current month end. You'll often hear a suggested minimum of 35-42 days and I could make good arguments for even more history for trending and capacity management.

That subject has been covered well enough, in my opinion. (To pick one example, this post and it's comments are around 5 years old.)  Diagnostics Pack customers should almost always increase the default AWR retention period for important systems, even allowing for any additional space required in the SYSAUX tablespace.

However, I've found myself talking about the best default AWR snapshot *interval* several times over recent months and years and realising that I'm slightly out of step with the prevailing wisdom on the subject, so let's talk about intervals.

I'll kick off by saying that I think people should stick to the default 1 hour interval, rather than the 15 or 30 minute intervals that most of my peers seem to want. Let me explain why.

Initially I was influenced by some of the performance guys working in Oracle and I remember being surprised by their insistence that one hour is a good interval, which is why they picked it. Hold on, though - doesn't everyone know that a 1 hour AWR report smoothes out detail too much?

Then I got into some discussions about Adaptive Thresholds and it started to make more sense. If you want to compare performance metrics over time and trigger alerts automatically based on apparently unusual performance events or workload profiles, then comparing specific hours today to specific hours a month ago makes more sense than getting down to 15 minute intervals which would be far too sensitive to subtle changes. Adaptive Thresholds would become barking mad if the interval granularity was too fine. But when nobody used Adaptive Thresholds too much even though they seemed like a good idea (sorry JB ;-)) this argument started to make less sense to me.

However, I still think that there are very solid reasons to stick to 1 hour and they make more sense when you understand all of the metrics and analysis tools at your disposal and treat them as a box of tools appropriate to different problems.

Let's go back to why people think that a 1 hour interval is too long. The problem with AWR, Statspack and bstat/estat is that they are system-wide reporting tools that capture the difference (or deltas) between the values of various metrics over a given interval. There are at least a couple of problems with that that come to mind.

1) Although a bit of a simplification, almost all of the metrics are system-wide which makes them a poor data source for analysing an individual users performance experience or an individual batch job because systems generally have a mixture of different activities running concurrently. (Benchmarks and load tests are notable exceptions.)

2) Problem 1 becomes worse when you are looking at *all* of the activity that occurred over a given period of time (the AWR Interval), condensed into a single data set or report. The longer the AWR period you report on, the more useless the data becomes. What use is an AWR report covering a one week period? So much has happened during that time and we might only be interested in what was happening at 2:13 am this morning.

In other words, AWR reports combine a wide activity scope (everything on the system) with a wide time scope (hours or days if generated without thought). Intelligent performance folks reduce the impact of the latter problem by narrowing the time scope and reducing the snapshot interval so that if a problem has just happened or is happening right now, they can focus on the right 15 minutes of activity1.

Which makes complete sense in the Statspack world they grew up in, but makes a lot less sense since Oracle 10g was released in 2004! These days there are probably better tools for what you're trying to achieve.

But, as this post is already getting pretty long, I'll leave that for Part 2.

1The natural endpoint to this narrowing of time scope is when people use tools like Swingbench for load testing and select the option to generate AWR snapshots immediately before and after the test they're running. Any AWR report of that interval will only contain the relevant information if the test is the only thing running on the system. At last year's Openworld, Graham Wood and I also covered the narrowing of the Activity scope by, for example, running the AWR SQL report (awrrpt.sql) to limit the report to a single SQL statement of interest. It's easy for people to forget - it's a *suite* of tools and worth knowing the full range so that you pick the appropriate one for the problem at hand.

Why is P1 the only parameter populated in cell smart table scan?

Anyone who has looked at Exadata might ask the question, and I did so too. After all, cell smart table scan is in wait class User IO so there should be more, right? This is what you find for a smart scan:

NAME                           PARAMETER1           PARAMETER2           PARAMETER3                     WAIT_CLASS
------------------------------ -------------------- -------------------- ------------------------------ ---------------
cell smart table scan          cellhash#                                                                User I/O
cell smart index scan          cellhash#                                                                User I/O

Compare this to the traditional IO request:

NAME                           PARAMETER1           PARAMETER2           PARAMETER3                     WAIT_CLASS
------------------------------ -------------------- -------------------- ------------------------------ ---------------
db file sequential read        file#                block#               blocks                         User I/O
db file scattered read         file#                block#               blocks                         User I/O
direct path read               file number          first dba            block cnt                      User I/O

At first I didn’t understand as to why that was the. Until tracing a lot more. Using the method described by my colleague Frits Hoogland I traced Exadata calls as well as the IB interconnect in addition to a SQL trace (level 8). Please don’t do this outside the lab, the trace will get huge!

Now that produces an interesting file, and the SQL trace information is only 1 intermittent line drowning in a sea of other information. So I used a sed oneliner to make them stand out a bit:

$ sed  "/WAIT/{x;p;x;G;}" db12c1_ora_11916.trc > kxd_libcell_sqltace.trc

This introduces a newline before and after each line containing WAIT. The SQL Trace information is familiar:

WAIT #139970814085680: nam='cell smart table scan' ela= 251 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911250866

By the way the obj# is the data_object_ID, not the object_id in DBA_OBJECTS.

Stepping through the trace

The trace information reflects what is happening during the various smart scan processing steps. This first example shows you some of the processing during the smart scan initialisation phase, when the cells are being initialised. I initially struggled mapping the “appliance” hexadecimal number to an actual cell. But it’s easy if you grep for kcfis_open_appliance_fd, and match the cellhash to v$cell.cell_hashval. Continuing the example:

kcfis_open_appliance:
throttle: initialized for appliance 0x7f4d7e9bc4c0
kcfis_find_appliance_fd: appliance fd not found. appliance 192.168.12.10/ Cached appliance fd count 0
kcfis_open_appliance_fd: 0x7f4d7e9bc4c0
oss_open called for o/192.168.12.10//box/predicate
Sending message RQ_Tag_2827385864_11: 0xf9b3f08 with frag # 0, to box my_box_refid: 2 (box inc: 12)
Command with band 1, payload length 116, payload 0xf9b4110
RQ_Tag_2827385864_11: Command name OPEN, refid 11
Open flags: readwrite - 1, sync - 0, char - 1.
 Device name - /box/predicate
Vrpc completion - RQ_Tag_2827385864_11: cmd 0xf9b3f48, req 0xf9b3a40, frag # 1
Vrpc reply 0xf9b4000 with refid 11 for existing message: 0xf9b3f08
Reply with band 0, payload length 0, payload (nil)
RQ_Tag_2827385864_11: Command name REPLY OPEN, refid 11
Returned open file descriptor  1
opening file number 11

WAIT #139970814085680: nam='cell smart table scan' ela= 259 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911167788

So here the cell with IP 192.168.12.10 maps to 0x7f4d7e9bc4c0 in the trace. As you can see, immediately before the wait event is written to the trace file a file is opened on the cell. This causes the line to be emitted in the SQL trace file. So far there hasn’t been any significant scanning activity against the data. But the trace line makes sense: the IO request was issued against the cell with hash 822451848. We don’t know what it did though, and which file it accessed.

SYS:db12c1> select cell_path from v$cell where cell_hashval = '822451848';

CELL_PATH
----------------------------
192.168.12.10

I was initially quite pleased with myself initially when I found this out.

More trace information. A lot more.

Lesson learned trawling through the trace: just because there is a single line in the trace file doesn’t mean there aren’t other requests either in flight or under preparation.

The next few thousand (!) lines show you the mapping between RDBMS and cell addressing. This part is well explained in Frits’ post I referenced earlier. Further down in the trace it gets more interesting from a SQL trace perspective again. Here payload maps are generated and issued against the cells. Remember that these IO requests to cells are performed asynchronously. I found quite a few calls to the code generating payloads for the cells between individual SQL Trace WAIT lines. This makes reading the trace a bit confusing, it almost appears as if the SQL Trace couldn’t keep up with the speed the requests are created/sent.

Looking at the file I found the following pattern preceding a smart scan line in the first half of the process (before data comes back from the cells). Beginning immediately after the last WAIT line, in abbreviated format:

Ioctl completed.
appliance 0x7f4d7e9bc4c0 : cellsrv pid: 9563: predicate: /box/predicate37843
...
kcfis_create_maps_payload. appliance 0x7f4d7ea65570 num_disks 12
throttle: allowing map push: appliance 0x7f4d7ea65570, maps_to_push 7
...
Pushing request : disknumber = 0, offset = 936606973952, len = 1032192 rdba: 43778306 version 0 reqid=10 cver=1
disk=0x7f4d7e9c0aa8 state=1
trans_req_element = 0x7f4d7e9cba00

The appliance name, cellsrv (thread) ID, and the box predicate appear to remain constant for the lifetime of the smart scan. You can find them in other occurrences of the trace as well (but not used in the same context).

The preceding lines are then followed by many more lines beginning with “Pushing request”, iterating over all 12 disks (-> num_disks), for a range of Data Block Addresses. After that map payload has been created it is time to push it to the cell. Here it gets a bit murky: the KXD.* traces show requests being pushed, but not necessarily to the cell mentioned in the event line. Consider this:

kcfis_create_maps_payload. alloc_len 272 num maps 2
throttle: mappayload: maps_to_push 4 iosize_being_pushed 8388608
kcfis_push: pushing maps to appliance 0x7f4d7e9bc4c0
kcfis_issue_ioctl: payload_type 2
oss_ioctl open file: 11, opcode: 99
Sending message RQ_Tag_2827385864_54: 0xf9c5728 with frag # 0, to box my_box_refid: 2 (box inc: 12)
Command with band 1, payload length 272, payload 0x7f4d7e9b9ef8
RQ_Tag_2827385864_54: Command name IOCTL, refid 54
Ioctl arguments fd 1 opcode 99 size 272
oss_ioctl done
oss_wait called for request: 0xf9b3a40
Vrpc completion - RQ_Tag_2341095748_52: cmd 0xf9c6ec8, req 0xf9b3a40, frag # 1
Vrpc reply 0xf9c6f80 with refid 52 for existing message: 0xf9c6e88
Reply with band 1, payload length 40, payload 0x7f4d7ea65a90
RQ_Tag_2341095748_52: Command name REPLY IOCTL, refid 52
oss_wait done, request 0xf9b3a40, result 0x7f4d837e4c48

WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172

The kcfis_push call is addressed to appliance 0x7f4d7e9bc4c0, which maps to 192.168.12.10. The cell hash from the event (3249924569) maps to a different cell though. You can find the corresponding kcfis_push earlier in the trace. In fact, it is the first occurrence after the last WAIT event line.

trans_req_element = 0x7f4d7e870780
Pushing request : disknumber = 5, offset = 937011707904, len = 4194304 rdba: 44570624 version 0 reqid=1021 cver=1
disk=0x7f4d7e9b5980 state=1
disk=0x7f4d7e9b4968 state=1
disk=0x7f4d7e9b3460 state=1
disk=0x7f4d7e9b2f70 state=1
disk=0x7f4d7e9df5a0 state=1
disk=0x7f4d7e9df0b0 state=1
disk=0x7f4d7e9dea78 state=1
kcfis_create_maps_payload. alloc_len 2648 num maps 35
throttle: mappayload: maps_to_push 10 iosize_being_pushed 102039552
kcfis_push: pushing maps to appliance 0x7f4d7ea65570
kcfis_issue_ioctl: payload_type 2
SKGXP-connecting to the box 192.168.12.8 OSS OS Pid 4541
oss_ioctl open file: 2, opcode: 99
Sending message RQ_Tag_2341095748_52: 0xf9c6e88 with frag # 0, to box my_box_refid: 0 (box inc: 16)
Command with band 3, payload length 2648, payload 0x7f4d7ea62e98
RQ_Tag_2341095748_52: Command name IOCTL, refid 52
Ioctl arguments fd 1 opcode 99 size 2648
oss_ioctl done

This is getting too complex, can you say it again?

Now let’s put this together again, simplified:

WAIT #139970814085680: nam='cell smart table scan' ela= 24 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911267234

kcfis_push: num-appliances 3. payload_size 0x7f4d837e4cf0 ioctl_issued 0x7f4d837e4cc8 results 0x7f4d837e4c48 payload_type 0x7f4d837e4d20
kcfis_create_maps_payload. appliance 0x7f4d7ea65570 num_disks 12
...
kcfis_create_maps_payload. alloc_len 2648 num maps 35
throttle: mappayload: maps_to_push 10 iosize_being_pushed 102039552
kcfis_push: pushing maps to appliance 0x7f4d7ea65570
kcfis_issue_ioctl: payload_type 2
SKGXP-connecting to the box 192.168.12.8 OSS OS Pid 4541
oss_ioctl open file: 2, opcode: 99
Sending message RQ_Tag_2341095748_52: 0xf9c6e88 with frag # 0, to box my_box_refid: 0 (box inc: 16)
Command with band 3, payload length 2648, payload 0x7f4d7ea62e98
RQ_Tag_2341095748_52: Command name IOCTL, refid 52
Ioctl arguments fd 1 opcode 99 size 2648
oss_ioctl done
...
kcfis_create_maps_payload. appliance 0x7f4d7e9bf518 num_disks 12
...
kcfis_create_maps_payload. alloc_len 1208 num maps 15
throttle: mappayload: maps_to_push 10 iosize_being_pushed 34471936
kcfis_push: pushing maps to appliance 0x7f4d7e9bf518
kcfis_issue_ioctl: payload_type 2
SKGXP-connecting to the box 192.168.12.9 OSS OS Pid 9547
...
oss_ioctl done
...
kcfis_create_maps_payload. appliance 0x7f4d7e9bc4c0 num_disks 12
...
kcfis_create_maps_payload. alloc_len 272 num maps 2
throttle: mappayload: maps_to_push 4 iosize_being_pushed 8388608
kcfis_push: pushing maps to appliance 0x7f4d7e9bc4c0
...
oss_ioctl done
...
oss_wait done, request 0xf9b3a40, result 0x7f4d837e4c48

WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172

Interestingly the results for the requests come back in that order as well. It seems coincidence, judging by the rest of the trace. Compare the appliance values with the summary you just read:

WAIT #139970814085680: nam='cell smart table scan' ela= 44 cellhash#=3249924569 p2=0 p3=0 obj#=61471 tim=3470911270172

Ioctl completed. Payload type 2
Ioctl quarantine response 1 for appliance 0x7f4d7ea65570
appliance 0x7f4d7ea65570 : cellsrv pid: 4541: predicate: /box/predicate114
oss_wait called for request: 0xf9c69c0
Vrpc completion - RQ_Tag_2825762783_53: cmd 0xf9c6768, req 0xf9c69c0, frag # 1
Vrpc reply 0xf9c6820 with refid 53 for existing message: 0xf9c6728
Reply with band 1, payload length 40, payload 0x7f4d7e9bfa38
RQ_Tag_2825762783_53: Command name REPLY IOCTL, refid 53
oss_wait done, request 0xf9c69c0, result 0x7f4d837e4c68

WAIT #139970814085680: nam='cell smart table scan' ela= 64 cellhash#=674246789 p2=0 p3=0 obj#=61471 tim=3470911270269

Ioctl completed. Payload type 2
Ioctl quarantine response 1 for appliance 0x7f4d7e9bf518
appliance 0x7f4d7e9bf518 : cellsrv pid: 9547: predicate: /box/predicate37211
oss_wait called for request: 0xf9c5260
Vrpc completion - RQ_Tag_2827385864_54: cmd 0xf9c5768, req 0xf9c5260, frag # 1
Vrpc reply 0xf9c5820 with refid 54 for existing message: 0xf9c5728
Reply with band 1, payload length 40, payload 0x7f4d7e9bc9e0
RQ_Tag_2827385864_54: Command name REPLY IOCTL, refid 54
oss_wait done, request 0xf9c5260, result 0x7f4d837e4c88

WAIT #139970814085680: nam='cell smart table scan' ela= 54 cellhash#=822451848 p2=0 p3=0 obj#=61471 tim=3470911270351

Ioctl completed. Payload type 2
Ioctl quarantine response 1 for appliance 0x7f4d7e9bc4c0

SQL trace wait events frequently seem to be written after an IOCTL (standard Linux call) or oss_wait (Oracle) has completed.  So it appears as if requests are created and treated on a cell-by-cell basis. This explains why you get so many cell table smart scan events in general. It’s not because they are necessarily small in size, the payloads always seem to include more than 1 disk to scan. Thinking about it now it makes sense, but as an analyst I would have appreciated more insights into what happens under the covers. But we can trace, right?

SQL Plan Baselines

Here’s a thread from Oracle-L that reminded of an important reason why you still have to hint SQL sometimes (rather than following the mantra “if you can hint it, baseline it”).

I have a query that takes 77 seconds to optimize (it’s not a production query, fortunately, but one I engineered to make a point). I can enable sql plan baseline capture and create a baseline for it, and given the nature of the query I can be confident that the resulting plan will always be exactly the plan I want. If I have to re-optimize the query at any time  (because it runs once per hour, say, and is constantly being flushed from the library cache) how much time will the SQL plan baseline save for me ?

The answer is NONE.

The first thing that the optimizer does for a query with a stored sql plan baseline is to optimize it as if the baseline did not exist.

If I want to get rid of that 77 seconds I’ll have to extract (most of) the hints from the SQL Plan Baseline and write them into the query.  (Or, maybe, create a Stored Outline – except that they’re deprecated in the latest version of Oracle, and I’d have to check whether the optimizer used the same strategy with stored outlines or whether it applied the outline before doing any optimisation). Maybe we could do with a hint which forces the optimizer to attempt to use an existing, accepted SQL Baseline without attempting the initial optimisation pass.

 

No Hakan factor for IOT

Sadly there seems to be no concept of the Hakan factor for an IOT.

I have an application which merges into an IOT, the merge incrementally populating a swag of initially null columns, hence growing the rows in size.  Some simple benchmarking shows the overhead of this versus merging into a table with pre-populated values:

SQL> create table T1
  2   ( x int primary key,
  3     y1 number(10),
  4     y2 number(10),
  5     y3 number(10),
  6     y4 number(10),
  7     y5 number(10),
  8     y6 number(10),
  9     y7 number(10),
 10     y8 number(10),
 11     y9 number(10),
 12     y0 number(10)
 13  )
 14  organization index
 15  /

Table created.

SQL> create table T2
  2   ( x int primary key,
  3     y1 number(10),
  4     y2 number(10),
  5     y3 number(10),
  6     y4 number(10),
  7     y5 number(10),
  8     y6 number(10),
  9     y7 number(10),
 10     y8 number(10),
 11     y9 number(10),
 12     y0 number(10)
 13  )
 14  organization index
 15  /

Table created.

SQL> insert into t1 (x) select rownum from dual connect by level <= 100000;

100000 rows created.

SQL> insert into t2 select rownum,123,123,123,123,123,123,123,123,123,123 from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user,'T2')

PL/SQL procedure successfully completed.

SQL> alter session set events = '10046 trace name context forever, level 8';

Session altered.

SQL> merge into t1
  2  using ( select rownum x,
  3                        12 y1,
  4                        23 y2,
  5                        34 y3,
  6                        45 y4,
  7                        56 y5,
  8                        67 y6,
  9                        78 y7,
 10                        89 y8,
 11                        90 y9,
 12                        100 y0
 13          from dual
 14          connect by level <= 20000
 15        ) m
 16  on ( t2.x = m.x )
 17  when matched then
 18  update
 19  set
 20  t2.y1 = m.y1,
 21  t2.y2 = m.y2,
 22  t2.y3 = m.y3,
 23  t2.y4 = m.y4,
 24  t2.y5 = m.y5,
 25  t2.y6 = m.y6,
 26  t2.y7 = m.y7,
 27  t2.y8 = m.y8,
 28  t2.y9 = m.y9,
 29  t2.y0 = m.y0;

20000 rows merged.

SQL> merge into t2
  2  using ( select rownum x,
  3                        12 y1,
  4                        23 y2,
  5                        34 y3,
  6                        45 y4,
  7                        56 y5,
  8                        67 y6,
  9                        78 y7,
 10                        89 y8,
 11                        90 y9,
 12                        100 y0
 13          from dual
 14          connect by level <= 20000
 15        ) m
 16  on ( t1.x = m.x )
 17  when matched then
 18  update
 19  set
 20  t1.y1 = m.y1,
 21  t1.y2 = m.y2,
 22  t1.y3 = m.y3,
 23  t1.y4 = m.y4,
 24  t1.y5 = m.y5,
 25  t1.y6 = m.y6,
 26  t1.y7 = m.y7,
 27  t1.y8 = m.y8,
 28  t1.y9 = m.y9,
 29  t1.y0 = m.y0;

20000 rows merged.

SQL> disc

So T1 was a table with mainly null columns. The trace file shows this result

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.57       0.57          0        770      56302       20000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.57       0.58          0        770      56302       20000

whereas when you compare it to T2 with the pre-populated "junk"

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.49       0.49          0       1046      20884       20000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.49       0.49          0       1046      20884       20000

Now of course you can get those better results with T1 by rebuilding the IOT with plenty of free space to hold the row expansion. Repeating the merge after a "alter table move pctfree 90" gives a trace result of:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.51       0.54        367       6396      20418       20000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.53       0.56        367       6396      20418       20000

but of course, I’ve also just smashed a PCTFREE 90 growth into all of the already merged (or "filled") rows.

The Hakan factor would be a nice fix….but alas

SQL> alter table T1 minimize records_per_block;
alter table T1 minimize records_per_block
*
ERROR at line 1:
ORA-28601: invalid [no]MINIMIZE option 

(Tested on v10, 11 and 12)