Here’s a funny little bug – which probably won’t cause any damage – that may remind you that (most of) the people who work for Oracle are just ordinary people like you and me who make ordinary little mistakes in their programming. It’s a bug I discovered by accident because I just wanted to check something about how a particular undo tablespace had been defined, and I called dbms_metadata instead of querying dba_tablespaces. Here’s the cut-n-paste from an SQL*Plus session on 11.2.0.2:
SQL> select dbms_metadata.get_ddl('TABLESPACE','UNDOTBS1') text from dual;
TEXT
----------------------------------------------------------------------------------------------------
CREATE UNDO TABLESPACE "UNDOTBS1" DATAFILE
SIZE 209715200
AUTOEXTEND ON NEXT 5242880 MAXSIZE 32767M,
'/u01/app/oracle/oradata/TestDB11/redo03.log' SIZE 209715200
AUTOEXTEND ON NEXT 5242880 MAXSIZE 32767M,
'/u01/app/oracle/oradata/TestDB11/undotbs01.dbf' SIZE 209715200
AUTOEXTEND ON NEXT 5242880 MAXSIZE 32767M
BLOCKSIZE 8192
EXTENT MANAGEMENT LOCAL AUTOALLOCATE
ALTER DATABASE DATAFILE
'/u01/app/oracle/fast_recovery_area/TESTDB11/flashback/o1_mf_8v4qrxy1_.flb' RESIZE 959447040
ALTER DATABASE DATAFILE
'/u01/app/oracle/oradata/TestDB11/redo03.log' RESIZE 959447040
ALTER DATABASE DATAFILE
'/u01/app/oracle/oradata/TestDB11/undotbs01.dbf' RESIZE 959447040
SQL> select file_name from dba_data_files where tablespace_name = 'UNDOTBS1';
FILE_NAME
----------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/TestDB11/undotbs01.dbf
SQL> select member from v$logfile;
MEMBER
----------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/TestDB11/redo01.log
/u01/app/oracle/oradata/TestDB11/redo02.log
/u01/app/oracle/oradata/TestDB11/redo03.log
As you can see, a call to dbms_metadata.get_ddl() is telling me that my undo tablespace is made up of two files (and then wants to alter three (!) of them). One of which looks suspiciously like it might be one of my redo log files, and the third file is one of the flashback logs in my fast recovery area! So I queried dba_tablespaces to find that (as expected) I had only one file in my undo tablespace, and then I queried v$logfile to check whether I really did have a log file called redo03.log and I checked the fast recovery area to see when that last file had appeared.
I don’t suppose this error in dbms_metadata will have any serious side effects – although I did wonder if there might be some ramifications for a “full transportable export” (an 11.2.0.3 option) from datapump.
A quick check on MOS revealed this to be a known bug fixed in 12.1: “Bug 10177856 : DBMS_METADATA(‘TABLESPACE’, …) INCLUDES WRONGLY TEMPFILES AND ONLINE REDO-LOGF” The specific bug (and there were three reported by my search) described the SYSTEM tablespace report also reporting the temporary tablespace(s) – so I did a quick check on my SYSTEM and SYSAUX to see what would happen, and the results made it easy to see how the problem had happened – when it was collecting filenames the code in dbms_metadata had failed to check file types properly.
My query for SYSTEM reported system.dbf along with temp01.dbf and redo01.log and an OMF-named flashback log.
My query for SYSAUX report sysaux.dbf along with temp_special.dbf and redo02.log and an OMF-named flashback log.
Here’s a query (runnable only by SYS) showing why those particular extra files appeared on my system:
SQL> select fnfno, fntyp,fnnam from x$kccfn where fnfno <= 3 order by fnfno, fntyp;
FNFNO FNTYP FNNAM
---------- ---------- ---------------------------------------------------------------------------
0 200 /u01/app/oracle/oradata/TestDB11/bct.dbf
1 3 /u01/app/oracle/oradata/TestDB11/redo01.log
4 /u01/app/oracle/oradata/TestDB11/system01.dbf
7 /u01/app/oracle/oradata/TestDB11/temp01.dbf
24 /u01/app/oracle/fast_recovery_area/TESTDB11/flashback/o1_mf_8v439xsc_.flb
2 3 /u01/app/oracle/oradata/TestDB11/redo02.log
4 /u01/app/oracle/oradata/TestDB11/sysaux01.dbf
7 /u01/app/oracle/oradata/TestDB11/temp_special.dbf
24 /u01/app/oracle/fast_recovery_area/TESTDB11/flashback/o1_mf_8v43b0hp_.flb
3 3 /u01/app/oracle/oradata/TestDB11/redo03.log
4 /u01/app/oracle/oradata/TestDB11/undotbs01.dbf
24 /u01/app/oracle/fast_recovery_area/TESTDB11/flashback/o1_mf_8v4qrxy1_.flb
The code is reporting the names of the corresponding files of types 3, 7 and 24 whenever it reports a file of type 4 (which is the data files). There are other possible file types, of course, and if I had other features enabled perhaps I would see some of them. Running dbms_metadata.get_ddl with tracing enabled showed me that it queries view ku$_tablespace_view, which includes a join to view ku$_file_view, which includes a three-part union all of x$kccfn including subqueries for files of type 4 and 7 specifically, and a subquery with no restriction on file type but a join on file number to the flashback file list x$ktfthc.
Although the bug is reported as fixed in 12.1, it seems to be fixed in 11.2.0.3 (so that deals with my worry about datapump) – it was just unlucky (or lucky, depending on your viewpoint) that I happened to run my first query on version 11.2.0.2.
Oracle OEM 12c introduces a new feature that enables the creation of Oracle database thin clones by leveraging file system snapshot technologies from either Oracle ZFS Storage Appliance or Netapp. The OEM adds a graphic interface to the process of making database thin clones. The feature that enables database thin cloning in OEM is called Snap Clone and is part of OEM’s Cloud Control Self Service for data cloning. Snap Clone is available via the feature Database as a Service (DBaaS). Snap clone leverages the copy on write technologies available in some storage systems for database cloning. Support is initially available for NAS storage and specifically on Oralce ZFS Storage Appliacen and NetApp Storage.
In order to use Snap Clone, one has to install the source database such that the source database data files are on a ZFS storage appliance or Netapp array and have the storage managed by agents on a LINUX machine and then one can thin clone data files on that same storage array.
Snap Clone offers role based access, so storage admin can login in and only have access to areas they are responsible for as well as limiting access to source databases, clones and resource by end users.
The prerequisites for getting start with Snap Clone are having available storage on ZFS Storage Appliance or Netapp storage array as well as having access to a master test database. A master test databse is a database that has a sanatized version of a production database such that it is either a subset and or masked. The test master database has to be registered with OEM. After the test master is registered with OEM, Snap Clone can be setup. To set up snap clone, come into Oracle Cloud Control 12c as “cloud administrator” role with “storage adminstator” priviledge or super administrator and register the storage. To register the storage navigate to “ setup -> provisining patching -> storage registration”.
Figure 1. Shows the entry page in OEM 12c Cloud Control. From here go to the top right and choose setup, then provisining patching then storage registration as shown above.
To setup Snap Clone navigate to storage registyratiom choose the menus “setup -> provisining patching -> storage registration”.
Figure 2 Shows a zoom into the menus to choose
Figure 3. Storage Registration Page
Figure 4. Choose the type of storage array
Once on the Storage Registration page, choose “Register” and then choose the storage, either Netapp or Sun ZFS.
Figure 5. Storage Registration Page
To register the storage supply the following information
All of which is documented in cloud administration guide.
Then define agents used to manage storage. Agents have are required to run on a LINUX host. More than one agen can define to provide redundancy. The agents will be the path by which OEM communicates with the storage. For each agent, supply the following information
And finally define the frequency with which the agent synchronizes with the storage to gather the sorage hardware details such as information on aggregates shares volumes.
After the storage information, agent information and agent synchronization information has been filled out, then hit “submit” button in the top right. Hitting the submit button will return the UI back to the “Storage Registration”. On the “Storage Registration”, click on the storage appliance listed in the top, then click on the contents tab on the bottom half of the page. This will list all the volumns and aggregates in the storage appliance.
Figure 6. Editing storage ceiling by clicking on a aggregate and hitting the “Edit Storage Config” tab.
For each aggregate one can set storage ceilings. Click on the aggregate or FlexVol and the click “Edit Storage Ceilings” tab.
On the database tab is a list of databases that can be used for cloning. OEM detects the database automatically on the hosts it is managing. OEM will also automatically correlate databases that have storage on the storage array added storage registration. OEM looks for all databases that have files on the registered storage. Click on database, then the show files tabs which will show the files and volumes for this database.
Figure 7. List of files by volumn for database
Figure 8. Enable Snap Clone for databases that will be used as test masters.
Nominating a database as test master requires enabling snap clone. To enable snap clone for a database, click on the chosen database, then click “Enable Snap Clone” tab just above the list of databases. This will automatically validate that all the volumes are flex clone enabled (in the case of Netapp).
The next step is to configure zone which can be used to organize cloud resources
Choose the menu option “Enterprise -> Cloud -> Midelware and Database Home”
Figure 9. Navigate first to “Cloud -> Middleware and Database Home”
Figure 10. Middleware and Databawe Cloud page
In order to see the zones defined, click on the number next to the title “Paas Infrastructure Zones” in the top left under General Information.
Figure 11. PaaS Infrastructure Zones
To create a zone, click the tab “Create”.
Figure 12. first page of wizard to create a PaaS Infrastructure Zone, give a meaningful name and description of the zone and define maximum CPU utilizaiton and memory allocation.
In the first page of the “PaaS Infrastructure Zone”, give zones a meaningful name and description. Define constraints such as maximum host CPU and memory allocations.
Figure 13. Second page of the “PaaS Infrastructure Zone” wizard, add hosts that are available to the zone.
Next define hosts that are members of the zone and provide credentials that operate across all members of this zone
Figure 14. Third page of the “PaaS Infrastructure Zone” wizard, limit which roles can see the zone.
Next define what roles can see and access this szone. The visibiliy of the zone can be limited to a certain class of users via roles like Dev, QA etc
Figure 15. Final review page for “PaaS Infrastructure Zone” wizard
Finally review settings and click submit
Figure 16. Showing the Confirmation that the PaaS Infranstructure Zone has been successfully created.
The remaining steps required to enable snap clone is to create a database pools which is a collection of servers or nodes that have database software installed. The remaining part of the setup is done by a differnet user who is the administrator for database as a service.
Log in as DBAAS_ADMIN.
For the next part navigate to the menu “Setup -> Cloud -> Database”.
Figure 17. Middleware and Database Cloud page
Figure 18. Navigate to “Setup -> Cloud -> Database”.
Figure 19. Database Cloud Self Service Portal Setup. To create a database pool choose the “Create” button in the center of the page and from the pull down, choose “For Database”.
To create a new pool click on the “Create” button in the center of the page, and chose “For Database” from the pull down menu that appears.
Figure 20. Choose “Create -> For Database”
Figure 21. Edit pool page. Provide a meaningful name a descrpition of the pool. Add Oracle home directories in the bottom of the page. At the very bottom of the page set a constraint on the number of databases instances that can be created in the pool. On the top right, set the host credentials.
Set
In the “Edit Pool” page, at the top left of the screen, provide a meaningful name and description for the pool. In the middle of the screen add Oracle homes that will be used for databse provisioning. Every member of a database pool is required to be homogeneous. Homogenous requires that the platform and Oracle version is the same across all the hosts and Oracle homes in the pool. All the Oracle installations also have to be of the same type either single instance or RAC. In the top right add the Oracle home provide oracle credentials and root credentials. Finally at the bottom of the page a constraint can be set on the number of database instances that can be started in this pool.
Figure 22. Set request limits on the pool
The next page sets the request settings. The first restriction sets how far in advanced can requrest can be made. Second restricts how long a request can be kept which is the archive retension. After the archive retention time the requests will be deleted. Finally is the request duration which is the maximum duration for which the request can be made.
Figure 23. Set memory and storage quotas per role for the pool. The quotas cover memory, storage, database requests and schema requests.
The above page configures quotas. Quota is allocated to each and every self service user. The quotas controls the amount fo resources users have access to. Quotas are assigned to a role and users inherit quota values from the role. Click “Create” in the middle of the screen.
Figure 24. Editing the quotas on a pool for a role.
The popup dialogue has for these entries
Figure 25. Profiles and Service Templates
Profiles and service templates
.A profile is use to capture information about the source database which can then be used for provisioning.
A service template is a standardized service definition for a database configuration that is offered to the self service users. A collection of service templates forms the service catalogue. A service template will provision databsae with or without seed data. To capture an ideal configuration, the easist thing to do is to point at an existing database and fetch information of interest from that database. The information from the database can be captured using a profile.
To create a profile click on the “Create” button under “Profiles”
Figure 26. Specify a reference target for a profile.
Click the magnifying glass to search for a source database.
Figure 27. Search for a reference target database
Pick a refrence target by clicking on it, the click the “Select” button in bottom right.
Figure 28. Creating a Database Provisioning Profile
To pick a database for use in thin cloning, choose the check box “Data Content” with suboption selected fro “Structured Data” with sub-option selected for “Create” with sub-option selected for “Storage Snapshot”. This option is only enabled only when the “enable snapshot” option is enabled on the storage registration page. Disable option capture oracle home.
Provide credentials for the host machines Oracle account and for the database login.
The “Content Option” step is not needed for the above selections.
Figure 29. Give the profile a meaniful name and description
Next provide credentials for Oracle home and Oracle databse, then provide a meaningful name for the profile as well as a location. The profile will be userful when creating a service template.
Figure 30. Review of create profile options.
Next review the summary and click subit which will connect to storage and take snapshots of the storage
Figure 31. Shows a zoom into the menus to choose
To create a new service template choose a profile and in this case use “thin provisioning for reference DB” profile. Now to create a new service template click “create” and choose “for database”. Service templates are part of the service catalogue and exposed to the self service users.
Figure 32. Provide a meaningful name and description for the service template.
Provide a meaningful name and description. For the rest of service template provide information about the databses that will be created from the snapshots such as providing database type, rac or single instance, for rac provide number of nodes. Provide the SID prefix to appended to the SIDs generated for the clones, provide the Domain Name and the port.
Figure 33. Provide a storage area for writes to use.
The cloning operation only creates a read only copy thus it is required to provide write space elsewhere in order to allow writing to the thin clone.
click on the edit button
click on volumne, then edit button
Figure 34. set diretory and maximum space usage for the write location
Provide the mount point prefix and amount of writeable space wish to allocate
Users of the thin clone databses can also be allowed to take further snapshots. These snapshots can be used as a mechinism to rollback changes, The number of thiese snapshtos can limited just below storage size section:
Figure 35. set the number of snapshots that can be taken of a a thin clone
Figure 36. set the initial passwords for database accounts on the thin clone.
next provide credentials for administrative accounts
for all other non-administartive accounts can choose to leave them as is or change them all to one password you can modify certain init.ora parameters for exmaple memory
Figure 37. Modify any specific init.ora parameters for the thin clone
Figure 38. Set any pre and post provision scripts to be run at the creation of a thin clone.
custom scripts can be provide as pre or post creation steps this can be very useful if you want to register databses with OID or certian actions that are specific to your organization
Figure 39. Set the zone and pool for the thin clone
Figure 40. set the roles that can use the service template.
you associate this srvice template with a zone and a template this insures that the service template can actualy work on the group of resources that you have identified and can limit the visibility of the service tempalte usein roles
Figure 41. review of the service template creation requites
finally we review the summary and click submit
Figure 42. 12c Cloud Control
Figure 43. 12c Cloud Control Self Service Portal
Contents of the Database Cloud Self Service Potal screen
Figure 44. To clone a database, choose the “Request” then “Database” menu.
Figure 45. From the list choose a Self Service Template. In this case “SOEDB Service Template”
Options are
Figure 46. Fill out the clone Service Request
request wizard asks for
Users do not get system access to the databases but instead get a slightly less privilege user who becomes the owner of the database
Hit Submit
Figure 47. Shows new clone database
http://www.youtube.com/watch?v=J7fnfLS5Dxg&feature=youtu.be - setup
http://www.youtube.com/watch?v=9VK1z6nU1PU – provisioning
I wrote a note a few years ago about SQL*Net compression (this will open in a new window so that you can read the posts concurrently), showing how the order of the data returned by a query could affect the amount of network traffic. An example in the note demonstrated, using autotrace statistics that the number of bytes transferred could change dramatically if you sorted your return data set. At the time I asked, and postponed answering, the question: “but how come the number of SQL*Net round trips has not changed ?”
A couple of weeks ago someone asked me if I had ever got around to answering this question – and I hadn’t. So I started by writing an answering comment, then decided it was getting a little long so transferred it to a separate blog note, and here it is.
We can start with the two sets of stats:
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
143 consistent gets
0 physical reads
0 redo size
425479 bytes sent via SQL*Net to client
494 bytes received via SQL*Net from client
11 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
133 consistent gets
0 physical reads
0 redo size
79287 bytes sent via SQL*Net to client
494 bytes received via SQL*Net from client
11 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
10000 rows processed
Basically, the answer comes from interpreting the names of the statistics correctly.
The “SQL*Net roundtrips to/from client” is the number of pairs of waits for “SQL*Net Message from client”/“SQL*Net Message to client” – which is controlled (in this example) very strongly by the array fetch size and the total number of rows (Note: we have arraysize = 1000 and rows returned = 10000 … the number of round trips is very close to 10,000 / 1,000.) This is exactly how the statistic is described in the SQL*Plus User’s Guide and Reference which says: “Total number of Oracle Net messages sent to and received from the client” – the trouble is, this description is easy to mis-understand.
If we enable extended SQL tracing (event 10046 / dbms_monitor) we would see that there is more activity going on than is reported by the autotrace statistics:
Here’s a sample of waits from the first query (retested on 11.2.0.2) with one blank line inserted for clarity:
WAIT #140199974133928: nam='SQL*Net message from client' ela= 319 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699605359 WAIT #140199974133928: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699605402 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 34 driver id=1650815232 #bytes=8155 p3=0 obj#=-40016366 tim=1370786699605478 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 4 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699605528 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699605565 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699605603 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699605640 FETCH #140199974133928:c=0,e=262,p=0,cr=7,cu=0,mis=0,r=1000,dep=0,og=1,plh=3617692013,tim=1370786699605654 WAIT #140199974133928: nam='SQL*Net message from client' ela= 319 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699605993 WAIT #140199974133928: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699606024 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 19 driver id=1650815232 #bytes=8155 p3=0 obj#=-40016366 tim=1370786699606082 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699606127 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699606163 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699606199 WAIT #140199974133928: nam='SQL*Net more data to client' ela= 3 driver id=1650815232 #bytes=8148 p3=0 obj#=-40016366 tim=1370786699606234 FETCH #140199974133928:c=0,e=230,p=0,cr=7,cu=0,mis=0,r=1000,dep=0,og=1,plh=3617692013,tim=1370786699606247
And here’s the equivalent on the 2nd query:
WAIT #140199974133928: nam='SQL*Net message from client' ela= 1187 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699689809 WAIT #140199974133928: nam='SQL*Net message to client' ela= 4 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699689893 FETCH #140199974133928:c=0,e=540,p=0,cr=0,cu=0,mis=0,r=1000,dep=0,og=1,plh=2148421099,tim=1370786699690421 WAIT #140199974133928: nam='SQL*Net message from client' ela= 1096 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699691585 WAIT #140199974133928: nam='SQL*Net message to client' ela= 3 driver id=1650815232 #bytes=1 p3=0 obj#=-40016366 tim=1370786699691666 FETCH #140199974133928:c=0,e=506,p=0,cr=0,cu=0,mis=0,r=999,dep=0,og=1,plh=2148421099,tim=1370786699692161
My session data unit (SDU) has defaulted to roughly 8KB and if you convert the driver id (1650815232) to hexadecimal you get 0×62657100, which you can recognise as BEQ, the local (bequeath) connection.
When there is little de-duplication available in the data order (first example) the 1,000 rows I fetch on each fetch call total about 48KB and I end up doing one “SQL*Net message to client” with five follow-up “SQL*Net more data to client”. (The bytes parameter for the “SQL*Net message to client” always shows 1 – Cary Millsap, I think, has pointed out that this is a long-standing error, so I’m assuming the call would be passing 8KB like all the other calls).
When I get maximum de-duplication in the data the entire 1,000 rows manages to compress down to less than the 8KB that’s available and all I see is the single “SQL*Net message to client”. with no calls for more data.
So – SQL*Net roundtrips isn’t measuring what you think – based on an informal reading of the description you and I might expect the first test to report far more round trips than the second, but that’s not the way that Oracle reports it, it’s literally counting the number of specific pairs of waits for “SQL*Net message to/from client” – the “more data” trips don’t get counted. As a result the same number of round-trips can report a hugely (and arbitrarily) different number of bytes.
I’ve only just noticed that I seem to be the only person that uses SQL*Plus. When I searched at tahiti.oracle.com to find the documented description of this statistics – I notice that the manual title was “SQL*Plus User’s Guide and Reference”; if there were more users it would be “SQL*Plus Users’ Guide and Reference” (check the apostrophe). Since I know that I use SQL*Plus that must mean no-one else does ;) I guess everyone else uses GUIs nowadays.
I thought I’d try to spend some of today catching up on old comments – first the easier ones, then the outstanding questions on Oracle Core.
The very first one I looked at was about pushing predicates, and the specific comment prompted me to jot down this little note about the 10053 trace file (the CBO trace).
In “real-life” I don’t often look at 10053 trace files because they tend to be very long and messy and usually you can see all you need from an execution plan – perhaps through SQL Monitoring, or possible with the rowsource execution stats enabled. However there are questions that can sometimes be resolved very quickly by a simple text search of a trace file; questions of the form: “why isn’t the optimizer using feature X”, for example: “why didn’t the optimizer use predicate push for this view?” If you have a question of this type, then search the trace file for the word “bypass” – you may find comments like the following:
JPPD: JPPD bypassed: Outer query references right side of outer join
JPPD: JPPD bypassed: View has a single or group set function.
JPPD: JPPD bypassed: OLD_PUSH_PRED hint specified
JPPD: JPPD bypassed: User-defined operator
JPPD: JPPD bypassed: Push-down not enabled
JPPD: JPPD bypassed: View not on right-side of outer join
JPPD: JPPD bypassed: View contains a group by.
JPPD: JPPD bypassed: View contains a window function.
JPPD: JPPD bypassed: View contains a MODEL clause.
JPPD: JPPD bypassed: View contains a DISTINCT.
JPPD: JPPD bypassed: View contains a rownum reference.
JPPD: JPPD bypassed: START WITH query block.
JPPD: JPPD bypassed: View is a set query block.
JPPD: JPPD bypassed: Negative hint found
JPPD: JPPD bypassed: Outline does not contain hint
JPPD: JPPD bypassed: Base table missing statistics
JPPD: JPPD bypassed: Remote table
JPPD: JPPD bypassed: Security violation
JPPD: JPPD bypassed: Possible security violation.
JPPD bypassed: Table level NO_PUSH_PRED hint.
JPPD bypassed: Table-level hint.
JPPD bypassed: Query block NO_PUSH_PRED hint.
JPPD bypassed: View semijoined to table
JPPD bypassed: Contained view has no table in fro list.
JPPD bypassed: View contains a START WITH query block.
JPPD bypassed: Does not contain a view.
JPPD bypassed: Found branches of different types
According to a note that I’ve got with this list, by the way, I used “strings -a” on the Oracle executable for 10.2.0.1 to generate it. It’s probably about time I did the same for newer versions of Oracle.
My plans for addressing comments have been disrupted somewhat. Just as I published this note, an email holding two 10053 trace files arrived. (The author had asked before sending them, and I was sufficiently curious that I had agreed to take a quick look). So I’ve spent most of the last hour doing what I’ve just said I hardly ever to – looking at 10053 trace files.
The question was “why does this query run serially if I have a particular scalar subquery in the select list, but run parallel if I replace it with a function”. The immediate answer, after I’d seen the query and thought about it for a bit, was: “because the manuals (10.2 – the relevant version) say that you don’t parallelize if you have scalar subqueries in the select list”; but this changed the question to: “why is it just this one scalar subquery that causes serialization when the other two scalar subqueries don’t”. Of the three scalar subqueries, only one of them cause the query to serialize.
The answer to that question is a little more subtle – and I’ll blog about it when I can find time to model the scenario.
Hyperion is the first in the Hyperion Cantos series by Dan Simmons.
What a great Sci-Fi book! A group of seven travellers are on a pilgrimage to Hyperion. Six of the seven tell the stories of how they came to be there, with the sixth story kind-of linking things together. There is no real conclusion to the story as the next book carries on the story from the point the first one ends. It was definitely written as a series!
The timeline jumps around quite a bit through the book, but in a good way. It’s not done in a confusing way.
Definitely worth a look for any Sci-Fi readers out there.
Cheers
Tim…
PS. I put the authors name in the title, for fear of confusing people into thinking this was about work.
I should stress that this is all well covered in the documentation, but it was something I stumbled upon just recently.
Its common knowledge that you can lock a table:
SQL> create table T 2 ( x int ) 3 partition by range ( x ) 4 ( 5 partition p1 values less than (10), 6 partition p2 values less than (20), 7 partition p3 values less than (30) 8 ); Table created. SQL> lock table T in exclusive mode; Table(s) Locked.
However, you can also selectively lock just partitions of a table as well, using the standard partition clause, for example:
Session 1
SQL> lock table T partition ( p1 ) in exclusive mode; Table(s) Locked.
whilst the following is still possible in session 2:
SQL> insert into T values (20); 1 row created.
Ever wonder about Big Data and what exactly it means, especially if you are already an Oracle Database professional? Or, do you get lost in the jargon warfare that spews out terms like Hadoop, Map/Reduce and HDFS? In this post I will attempt to explain these terms from the perspective of a traditional database practitioner but getting wet on the Big Data issues by being thrown in the water. I was inspired to write this from the recent incident involving NSA scooping up Verizon call data invading privacy of the citizens.
It's not a news anymore that many people are reacting in different ways to the news of National Security Agency accessing the phone records of Verizon - cellphone and land lines - to learn who called who. But one thing is common - all these reactions are pretty intense. While the debate lingers on whether the government overstepped its boundaries in accessing the records of private citizens or whether it was perfectly justified in the context of the threats our country is facing right now - and there will be a debate for some time to come - I had a different thought of my own. I was wondering about the technological aspects of the case. The phone records must be massive. What kind of tools and technologies the folks in the "black room", a.k.a Room 641A must have used to collate and synthesize the records into meaningful intelligence - the very task NSA is supposed to gather and act upon? Along with anything it piques the interest from an engineering point of view.
And that brings the attention to the aspect of computation involving massive amounts of data. It's one thing to slice and dice a finite, reasonable amount of dataset; but in the case of the phone records, and especially collated with other records to identify criminal or pseudo-criminal activities such as financial records, travel records, etc., the traditional databases such as Oracle and DB2 likely will not scale well. But the challenge is not just in the realm of romanticized espionage; it's very much a concern for purely un-romantic corporations who, among other things, want to track customer behavior to fine tune their service and product offerings. Well, it's espionage of a slightly different kind. Perhaps the sources of data are different - website logs, Facebook feeds as opposed to phone records; but the challenges are the same - how to synthesize enormous amounts of seeming unrelated data into meaningful intelligence. This is the challenge of the "Big Data".
Several years ago, Yahoo! faced the same issue - how to present the attractiveness the webpages it puts on its portal and analyze the pattern of clicking by users to attract advertisers to put relevant ads on their pages. Google had a similar challenge of indexing the entire World Wide Web in its servers so that it present search results very, very quickly. Both of these issues represent the issues that others probably didn't face earlier. Here are the relatively unique aspects of this data, which are known as the "Three V's of Big Data".
Both these companies realized they are not going to address the challenges using the traditional relational databases, at least not in the scale they wanted. So, they developed tools and technologies to address these very concerns. They took a page from the super-computing paradigm of divide and conquer. Instead of dissecting the dataset as a whole, they divided it into smaller chunks to be processed by hundreds, even thousands of small servers. This approach solved three basic, crippling problems:
This is the fundamental concept that have rise to Hadoop. But before we cover that, we need to learn about another important concept.
A typical relational database works by logically arranging the data into rows and columns. Here is an example. You decide on a table design to hold your customers, named simply CUSTOMERS. It has the columns CUST_ID, NAME, ADDRESS, PHONE. Later, your organization decides to provide some incentives to the spouses as well and so you added another column - SPOUSE.
Everything was well, until the time you discovered customer 1 and spouse are divorced and there is a new spouse now. However the company decides to keep the names of the ex-spouses as well, for marketing analytics. Like the right relational application, you decide to break SPOUSE away from the main table and create a new table - SPOUSES, which is a child of CUSTOMERS, joined by CUST_ID. This requires massive code and database changes; but you survive. Later you had the same issue with addresses (people have different addresses - work, home, vacation, etc.) and phone numbers (cell phone, home phone, work phone, assistant's phone, etc.). So you decide to break them into different tables as well. Again, code and database changes. But the changes did not stop there. You had to add various tables to record hobbies, associates, weights, dates of birth - the list is endless. Every thing you record requires a database change and a code change. But worse - not all the tables will be populated for every customer. In your company's quest to build a 360 degree view of the customer, you collect some information; but there is no guarantee that all the data points will be gathered. you are left with sparse tables. Now, suddenly, someone says there is yet another attribute required for the customer - professional associations. So, off you go - build yet another table, followed by change control, code changes to incorporate that.
If you look at the scenario above, you will find that the real issue is trying force a structure around a dataset that inherently unstructured - akin to a square peg in a round hole. The lack of structure of the data is what makes it agile and useful; but the lack of structure is also what makes it difficult in a relational database that demands structure. This is the primary issue if you wan tto capture social media data - Twitter feeds, Facebook updates, LinkedIn updates and Pinterest posts. It's impossible to predict in advance, at least accurately, the exact information you will expect to see in them. So, putting a structure around the data storage not only makes life difficult for everyone - the DBAs will constantly need to alter the structures and the developers/designers will constantly wait for the structure to be in the form they want - slowing down capture and analysis of data.
So, what is the solution. If you think about it, think about how we - human beings - process information. Do we parse information in form of rows in some table? Hardly. We process and store information by associations. For instance, let's say I have a friend John. I probably have nuggets of information like this:
#fff2cc;">Last Name = Smith#fff2cc;">Lives at = 13 Main St, Anytown, USA#fff2cc;">Age = 40#fff2cc;">Birth Day = June 7th#fff2cc;">Wife = Jane#fff2cc;">Child = Jill#fff2cc;">Jill goes to school = Top Notch Academy#fff2cc;">Jill is in Grade = 3... and so on. Suppose I meet another person - Martha- who tells me that her child also goes to Grade 3 in Top Notch Academy. My brain probably goes through a sequence like this:
#fce5cd;">Search for "Top Notch Academy"#fce5cd;"> Found it. It's Jill#fce5cd;"> Search for Jill.#fce5cd;"> Found it. She is child of John#fce5cd;"> Who is John's wife?#fce5cd;"> Found it. It's Jane.#fce5cd;"> Where do John and Jill live? ...And finally, after this processing is all over, I say to Martha as a part of the coversation - "What a coincidence! Jill - the daughter of my friends John and Jane Smith goes there are well. Do you know them?". "Yes, I do," replies Martha. "In fact they are in the same class, that of Mrs Gillen-Heller".
Immediately my brain processed this new piece of information and filed the data as:
#fff2cc;">Jill's Teacher = Mrs. Gillen-Heller#fff2cc;">Jane's Friend = Martha#fff2cc;">Martha's Child Goes to = ...Months later, I meet with Jane and mention to her that I met Martha whose child went to Mrs. Gillen-Heller's class - the same one as Jill. "Glad you met Martha,", Jane says. "Oh, Jill is no longer in that class. Now she is in Mr. Fallmeister's class."
Aha! My brain probably stored that information as:
#fff2cc;">Jill's former teacher = Mrs. Gillen-HellerAnd it updated the already stored information:
#fff2cc;">Jill's Teacher = Mr. FallmeisterThis is called storing by a name=value pair. You see I stored the information as a pair of property and it value. As information goes on, I keep adding more and more pairs. When I need to retrieve information, I just get the proper property and by associations, I get all the data I need. But the storing of data by name=value pairs gives me enormous flexibility in storing all kinds of information without modifying any data structures I may currently have.
This is also how the Big Data is tamed for processing. Since the data coming of Twitter, Facebook, LinkedIn, Pinterest, etc. is impossible to categorize in advance, it will be practically impossible to put it all in the relational format. Therefore, a name=value pair type storage is the logical step in compiling and collating the data. the name is also known as "key"; so the model is sometimes called key-value pair. The value doesn't have to have a datatype. In fact it's probably a BLOB; so anything can go in there - booking amount, birth dates, comments, XML documents, pictures, audio and even movies. It provides an immense flexibility in capturing the information that is inherently unstructured.
Now that you know about name value pairs, the next logical question you may have is - how do we store these? Our thoughts about databases are typically colored by our long-standing association with relational databases, making them almost synonymous with the concept of database. Before relational databases were there, even as a concept, big machines called mainframes ruled the earth. The databases inside them were stored in hierarchical format. One such database from IBM was IMS/DB, which was hierarchical. Later, when relational databases were up and coming, another type of database concept - called a network database - was developed to compete against it. An example of that category was IDMS (now owned by Computer Associates) developed for mainframes. The point is, relational databases were not the answer to all the questions then; and it is clear that they are not now either.
This leads to the development a different type of database technologies based on the the key-value model. Relational database systems are queried by SQL language, which I am sure is familiar to almost anyone reading this blog post. SQL is a set-oriented language - it operates on sets of data. In the key-value pair mode, however, that does not work anymore. Therefore these key-value databases are usually known as NoSQL, to separate them from the relational SQL-based counterparts. Since their introduction, some NoSQL databases actually support SQL, which is why "NoSQL" is not a correct term anymore. Therefore sometimes it is referred to as "Not only SQL" databases. But the point is that their structure is not dependent on relational. But how exactly the data is stored is usually left to the implementer. Some examples are MongoDB, Dynamo, Big Table (from Google) etc.
I would stress here that almost any type of non-relational database can be classified as NoSQL; not just the name-value pair models. For instance, Object Store, an object database is also NoSQL. But for this blog post, I am assuming only key-value pair database as the NoSQL one.
Let's summarize what we have learned so far:
When the system gets a large chunk of data, e.g. a Facebook feed, the first task is to break it down to these keys and their corresponding values. After that the values may be collated for a summary result. The process of dividing the raw data into meaningful key-value pairs is known as "mapping". Later combining the values to form summaries, or just eliminating the noise from the data to extract meaningful information is known as "reducing". For instance you see "Name" and "Customer Name" in the keys. They mean the same thing; so you reduce them to a single key value - "Name". These are almost always used together; hence the operation is known as Map/Reduce.
Here is a very rudimentary but practical example of Map/Reduce. Suppose you get Facebook feeds and you are expected to find out the total of likes for our company's recent post. Facebook feed comes in the form of a massive dataset. The first task is to divide that among many servers - a principle described earlier to make the process scale well. Once the dataset is divided, each machine run some code to extract and collate the information and then present the data to some central coordinator. to collate for the final time. Here is a pseudo-code for the process for each server doing the processing on a subset of data:
begin
get post
while (there_are_remaining_posts) loop
extract status of "like" for the specific post
if status = "like" then
like_count := like_count + 1
else
no_comment := no_comment + 1
end if
end loop
end
Let's name this program counter(). Counter runs on all the servers, which are called Nodes. As shown in the figure, there are three nodes. The raw dataset is divided into three sub-datasets which are then fed to each of the three Nodes. A copy of the subdataset is kept another server as well. That takes care of redundancy. Each node perform their computation, send their results to an intermediate result set where they are collated.
![]() |
| Map/Reduce Processing |
How does this help? It does; in many ways. Let's see:
(1) First, since the data is stored in chunks and the copy of a chunk is stored in a different node, there is built-in redundancy. There is no need to protect the data being fed since there is a copy available elsewhere.
(2) Second, since the data is available elsewhere, if a node fails, all it needs to done is that some other nodes will pick up the slack. There is no need to reshuffle or restart the job.
(3) Third, since the nodes all perform task independently, when the datasize becomes larger, all you have to do is to add a new node. Now the data will be divided four ways instead of three and so will be processing load.
This is very similar to parallel query processes in Oracle Databases, with PQ servers being analogous to nodes.
There are two very important points to note here:
(1) The subset of data each node gets is not needed to be viewed by all the nodes. Each node gets its own set of data to be processed. A copy of the subset is maintained in a different node - making the simultaneous access to the data unnecessary. This means you can have the data in a local storage; not in expensive SANs. This is not only brings cost significantly down but may performs better as well due to a local access. As cost of Solid State Devices and flash-based storage plummets, it could also mean the that the storage cost per performance will be even better.
(2) The nodes need not be super fast. A relatively simple commodity class server is enough for the processing as opposed to a large server. Typically servers are priced for their use, e.g. a Enterprise class server with 32 CPUs is probably roughly equivalent in performance to eight 4-CPU blades. But the cost of the former is way more than 8 times the cost of the blade server. This model takes advantage of the cheaper computers by scaling horizontally; not vertically.
Now that you know how the processing data in parallel and using a concept called Map/Reduce allows you to shove in several compute intensive applications to dissect large amounts of data, you will often wonder - there are a lot of moving parts to be taken care of just to empower this process. In a monolithic server environment you just have to kick off multiple copies of the program. The Operating System does the job of scheduling these programs on the available CPUs, taking them off the CPU (paging) to roll in another process, prevent processes from corrupting each others' memory, etc. Now that these processes are occurring on multiple computers, there has to be all these manual processes to make sure they work. For instance, in this model you have to ensure that the jobs are split between the nodes reasonably equally, the dataset is split equitably, the queue for feeding data and getting data back from the Map/Reduce jobs are properly maintained, the jobs fail over in case of node failure, so on. In short, you need an operating system of operating systems to manage all these nodes as a monolithic processor.
What would you do if this operating procedures were already defined for you? Well, that would make things really easy, won't it? you can then focus on what you are good at - developing the procedures to slice and dice the data and derive intelligence from it. This "procedure", or the framework is available, and it is called Hadoop. It's an open source offering, similar to Mozilla and Linux; no single company has exclusive ownership of it. However, many companies have adopted it and evolved it into their offerings, similar to Linux distributions such as Red Hat, SuSe and Oracle Enterprise Linux. Some of those companies are Cloudera, Hortonworks, IBM, etc. Oracle does not have a Hadoop distribution. Instead it licenses the one from Cloudera for its own Big Data Appliance. The Hadoop framework runs on all the nodes of the cluster of nodes and acts as the coordinator.
A very important point to note here that Hadoop is just a framework; not the actual program that performs Map/Reduce. Compare that to the operating system analogy; an OS like Windows does not offer a spreadsheet. You will need to either develop or buy an off the shelf product such as Excel to have that functionality. Similarly, Hadoop offers a platform to run the Map/Reduce programs that you develop and you put that logic in the code what you "map" and how you "reduce".
Remember another important advantage you had seen in this model earlier - the ability to replicate data between multiple nodes so that the failure of a single node does not cause the processing to be abandoned. This is offered through a new type of filesystem called Hadoop Distributed Filesystem (HDFS). HDFS, which is a distributed (and not a clustered) filesystem, by default has 3 copies of data on three different nodes - two on the same rack and the third on a different rack. The nodes communicate to each other using a HDFS- specific protocol that is built on TCP/IP. The nodes are aware of the data present on the other nodes, which is precisely what allows Hadoop job scheduler to divide the work among the nodes. Oh, by the way, HDFS is not absolutely required for Hadoop; but as you can see, HDFS is the only way for Hadoop to know which node has what data for smart job scheduling. Without it, the division of labor will not be as efficient.
Now that you learned how Hadoop fills a major void for computations on a massive dataset, you can't help but see the significance for datawarehouses where massive datasets are common. Also common are jobs that churn through this data. However, there is a little challenge. Remember, NoSQL databases mentioned earlier? That means they do not support SQL. To get the data you have to write a program using the APIs the vendor supplies. This may reek of COBOL programs of yesteryear where you had to write a program to get the data out, making it inefficient and highly programmer driven (although it did strut up job-security, especially during the Y2K transition times). The inefficacy of the system gave rise to 4th Generation Languages like SQL, which brought the power of queries to common users, ripping the power away from programers. In other words, it brought the data and its users closer, reducing the role the middleman significantly. In datawarehouses, it was especially true since the power users issued queries to after getting the result from the previous queries. It was like a conversation - ask a question, get the answer, formulate your next question - and so on. If the conversation were dependent on writing programs, it would have been impossible to be effective.
With that in mind, consider the implications of lack of SQL in these databases highly suitable for datawarehouses. This requirement to write a program to get the data everytime would have taken it straight to the COBOL-days. Well, not to worry, Hadoop has another product that allows a SQL-like language called HiveQL. Just as users could query relational databases with SQL very quickly, HiveQL allows users to get the data for analytical processing directly. It was initially developed at Facebook.
When we talk about clustering in Hadoop, you may not help wonder - shouldn't the same functionality be provided by Oracle Real Application Cluster? Well, the short answer is - a big resounding NO. Oracle RAC combines the power of multiple nodes in the cluster which communicate with one another and transfer data (cache fusion); but that's where the similarity ends. The biggest difference is the way the datasets are accessed. In RAC, the datasets are common, i.e. they must be visible to all the nodes of the cluster. In Hadoop, the datasets are specific to the individual nodes, which allows them to be local. The filesystems in RAC can't be local; they have to be clustered or available globally, either by a clustered filesystem, shared volumes, clustered volume managers (such as ASM) or by NFS mounting. In Hadoop the local files are replicated to other nodes, which means there is no reason to create a RAID lever protection at the storage level. ASM does provide a software level mirroring, which may sound similar to Hadoop's replication; but remember, ASM's mirrors are not node specific. The mirrored copies of ASM must be visible to all the nodes. There is a preferred node concept in ASM; but that simply means that data is read by a specific node from one mirror copy. The mirror copies can't be local; they must be be globally visible.
Besides, Oracle RAC if for a relational database. The Hadoop cluster is not one. There are traits such as transnational integrity, multiple concurrent writes that are innate features of any modern database system. In Hadoop, these things are not present and not really necessarily. So while both are technically databases, a comparison may not be fair to either Hadoop or RAC. They are like apples and tomatoes. (Tomato is technically a fruit, just in case you are wondering about this analogy).
So, who are the players for this new branch of data processing? The major players are show below with small description. This list is by no means exhaustive. It simply is a collection of companies and products I have studied.
If the buzzing of the buzz-words surrounding any new technology annoy you and all you get is tons of websites on the topic but not a small consolidated compilation of terms, you are just like me. I was frustrated by the lack of information in a digestible form on these buzzwords that are too important to ignore but would take too much time to understand fully. This is my small effort to bridge that gap and get you going on your quest for more information. If you have 100's or 1000's of questions after reading this, I would congratulate myself - that is precisely what my objective was. For instance how HiveQL differs from SQL or how Map/Reduce jobs are written - these are questions that should be flying in your mind now. My future posts will cover them and some more topics like HBase, Zookeeper, etc. that will unravel the mysteries around the technology that is going to be commonplace in the very near future.
Welcome aboard. I wish you luck in learning. As always, your feedback will be highly appreciated.
FastStart Failover is a good solution for automatic promoting standby database to be primary and allow application to keep running in case of primary database failure. I don't want to describe whole solution here but want to focus on Observer problems I had recently.
Observer is a part of DataGuard Broker running on 3rd server and it is used to prevent a brain split between primary and standby database. Observer is started by DGMGRL program and it required a sys user credentials. Customer doesn't want to keep sys password in scripts to start and stop observer so Oracle Wallet looks like a good solution to address this issue.
Configuration overview:
- database name - TESTDB
- unique database names - TESTDB_PRIMARY and TESTDB_STANDBY
Wallet has been created and sys user with alias for primary and standby database has been added to it.
I was able to connect to both servers:
[oracle@orasvr3 ~]$ mkstore -wrl /home/oracle/testwallet -listCredential Oracle Secret Store Tool : Version 11.2.0.3.0 - Production Copyright (c) 2004, 2011, Oracle and/or its affiliates. All rights reserved. Enter wallet password: List credential (index: connect_string username) 2: TESTDB_STANDBY sys 1: TESTDB_PRIMARY sys [oracle@orasvr3 ~]$ dgmgrl DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production Copyright (c) 2000, 2009, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. DGMGRL> connect /@TESTDB_PRIMARY Connected. DGMGRL> connect /@TESTDB_STANDBY Connected. DGMGRL>
Observer has been started and I begun failover tests. Here is a first attempt:
12:09:25.21 Thursday, May 30, 2013
Initiating Fast-Start Failover to database "TESTDB_PRIMARY"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "TESTDB_PRIMARY"
12:09:31.78 Thursday, May 30, 2013
12:10:47.91 Thursday, May 30, 2013
Initiating reinstatement for database "TESTDB_STANDBY"...
Reinstating database "TESTDB_STANDBY", please wait...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
ORA-01031: insufficient privileges
Warning: You are no longer connected to ORACLE.
Please complete the following steps and reissue the REINSTATE command:
shut down instance "TESTDB" of database "TESTDB_STANDBY"
start up and mount instance "TESTDB" of database "TESTDB_STANDBY"
12:11:06.36 Thursday, May 30, 2013
Looks like Observer was unable to shutdown TESTDB_STANDBY database. This is typical error if there is no DGMGRL services registered in listeners but I was sure network configuration is fine as it was working perfectly fine when observer has been started with user and password in connection string. So this problem has to be related to Oracle Wallet only. There is a not well know switch for DGMGRL program to enable debug mode and have some more information about issue and I decided to use it. First of all I have started with switchover tests as there need this same interaction with starting and stopping instance like failover and are faster to perform.
Here is a output with debug option enabled:
[oracle@orasvr3 ~]$ dgmgrl -debug
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /@TESTDB_PRIMARY
[W000 05/30 12:25:02.04] Connecting to database using TESTDB_PRIMARY.
[W000 05/30 12:25:02.09] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 05/30 12:25:02.10] Broker version is '11.2.0.3.0'
Connected.
DGMGRL> switchover to TESTDB_PRIMARY;
Performing switchover NOW, please wait...
New primary database "TESTDB_PRIMARY" is opening...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
[W000 05/30 12:26:23.24] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))).
ORA-01031: insufficient privileges
Warning: You are no longer connected to ORACLE.
Please complete the following steps to finish switchover:
shut down instance "TESTDB" of database "TESTDB_STANDBY"
start up instance "TESTDB" of database "TESTDB_STANDBY"
It is trying to connect to TESTDB_STANDBY using following connection description
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated)))
but this one doesn't exist in wallet. Service
I have added above connection string to Oracle Wallet and completed all manual steps before next try. Here is a output for second switchover run:
[oracle@orasvr3 ~]$ dgmgrl -debug
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect /@TESTDB_PRIMARY
[W000 06/06 12:50:31.01] Connecting to database using TESTDB_PRIMARY.
[W000 06/06 12:50:31.05] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 06/06 12:50:31.06] Broker version is '11.2.0.3.0'
Connected.
DGMGRL> switchover to TESTDB_PRIMARY;
Performing switchover NOW, please wait...
New primary database "TESTDB_PRIMARY" is opening...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
[W000 06/06 12:50:45.95] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))).
ORACLE instance shut down.
Operation requires startup of instance "TESTDB" on database "TESTDB_STANDBY"
Starting instance "TESTDB"...
[W000 06/06 12:50:47.09] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
ORA-01031: insufficient privileges
Warning: You are no longer connected to ORACLE.
Please complete the following steps to finish switchover:
start up and mount instance "TESTDB" of database "TESTDB_STANDBY"
Still there is manual step to do but this time database was shut down automatically, and Observer was unable to connect to start up database. There is other connection string
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
This one can be found here
DGMGRL> show database verbose TESTDB_STANDBY;
Database - TESTDB_STANDBY
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds
Apply Lag: 0 seconds
Real Time Query: OFF
Instance(s):
TESTDB
Properties:
DGConnectIdentifier = 'TESTDB_STANDBY'
ObserverConnectIdentifier = ''
LogXptMode = 'SYNC'
DelayMins = '0'
Binding = 'optional'
MaxFailure = '0'
MaxConnections = '1'
ReopenSecs = '300'
NetTimeout = '10'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'AUTO'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '4'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = ''
LogFileNameConvert = 'a, a'
FastStartFailoverTarget = 'TESTDB_PRIMARY'
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
SidName = 'TESTDB'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED)))'
StandbyArchiveLocation = '/oralogs/arch/TESTDB'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = 'arch_%t_%s_%r.arc'
TopWaitEvents = '(monitor)'
Database Status:
SUCCESS
It has been added to Oracle Wallet as well and now my wallet looks like this
[oracle@orasvr3 ~]$ mkstore -wrl /home/oracle/testwallet -listCredential Oracle Secret Store Tool : Version 11.2.0.3.0 - Production Copyright (c) 2004, 2011, Oracle and/or its affiliates. All rights reserved. Enter wallet password: List credential (index: connect_string username) 6: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))) sys 5: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr1)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_PRIMARY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))) sys 4: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))) sys 3: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr1)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_PRIMARY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))) sys 2: TESTDB_STANDBY sys 1: TESTDB_PRIMARY sys
After manual completion of required steps I run switchover again and this time it completed without any errors
DGMGRL> switchover to TESTDB_PRIMARY;
Performing switchover NOW, please wait...
New primary database "TESTDB_PRIMARY" is opening...
Operation requires shutdown of instance "TESTDB" on database "TESTDB_STANDBY"
Shutting down instance "TESTDB"...
[W000 05/30 12:36:51.39] Connecting to database using (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGB)(INSTANCE_NAME=TESTDB)(SERVER=dedicated))).
ORACLE instance shut down.
Operation requires startup of instance "TESTDB" on database "TESTDB_STANDBY"
Starting instance "TESTDB"...
[W000 05/30 12:36:52.54] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
[W000 05/30 12:36:52.58] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 363 Serial number: 5
ORACLE instance started.
[W000 05/30 12:36:54.81] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
[W000 05/30 12:36:54.84] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 05/30 12:36:54.84] Broker version is '11.2.0.3.0'
alter database mount
Database mounted.
[W000 05/30 12:37:02.74] Connecting to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=orasvr2)(PORT=1522))(CONNECT_DATA=(SERVICE_NAME=TESTDB_STANDBY_DGMGRL)(INSTANCE_NAME=TESTDB)(SERVER=DEDICATED))).
[W000 05/30 12:37:02.77] Checking broker version [BEGIN :version := dbms_drs.dg_broker_info('VERSION'); END;].
[W000 05/30 12:37:02.78] Broker version is '11.2.0.3.0'
Switchover succeeded, new primary is "TESTDB_PRIMARY"
DGMGRL>
Lesson learned:
Hope it will help you with DataGuard Observer configuration.
regards,
Marcin
An interesting little problem appeared on the Oracle-L mailing list earlier on this week – a query ran fairly quickly when statistics hadn’t been collected on the tables, but then ran rather slowly after stats collection even though the plan hadn’t changed, and the tkprof results were there to prove the point. Here are the two outputs (edited slightly for width – the original showed three sets of row stats, the 1st, avg and max, but since the query had only been run once the three columns showed the same results in each case):
Rows (max) Row Source Operation
---------- ---------------------------------------------------
0 UPDATE CXT_FAKT_PROVISIONSBUCHUNG (cr=2039813 pr=3010 pw=0 time=47745718 us)
15456 TABLE ACCESS FULL CXT_FAKT_PROVISIONSBUCHUNG (cr=1328 pr=1325 pw=0 time=40734 us cost=370 size=880992 card=15456)
11225 VIEW CXV_HAUPT_VU_SPARTE (cr=2038477 pr=1684 pw=0 time=47297497 us cost=10 size=4293 card=1)
11225 SORT UNIQUE (cr=2038477 pr=1684 pw=0 time=47284436 us cost=10 size=824 card=1)
126457 VIEW (cr=2038167 pr=1669 pw=0 time=27753402 us cost=9 size=824 card=1)
126457 WINDOW SORT (cr=2038167 pr=1669 pw=0 time=27667835 us cost=9 size=853 card=1)
126457 WINDOW SORT (cr=2038167 pr=1669 pw=0 time=26884699 us cost=9 size=853 card=1)
126457 NESTED LOOPS (cr=2038167 pr=1669 pw=0 time=26342292 us)
126457 NESTED LOOPS (cr=1995241 pr=1615 pw=0 time=26192173 us cost=7 size=853 card=1)
141581 NESTED LOOPS (cr=642683 pr=1066 pw=0 time=3039331 us cost=5 size=499 card=1)
11225 TABLE ACCESS BY INDEX ROWID POP_INFO (cr=22473 pr=32 pw=0 time=109767 us cost=2 size=16 card=1)
11225 INDEX UNIQUE SCAN PK_POP_INFO (cr=11248 pr=4 pw=0 time=51796 us cost=1 size=0 card=1)(object id 1790009)
141581 TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=620210 pr=1034 pw=0 time=2889850 us cost=3 size=483 card=1)
1732978 INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=140952 pr=204 pw=0 time=2094982 us cost=2 size=0 card=1)(object id 1795724)
126457 INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=1352558 pr=549 pw=0 time=23078816 us cost=1 size=0 card=1)(object id 1795724)
126457 TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=42926 pr=54 pw=0 time=94791 us cost=2 size=354 card=1)
Rows (max) Row Source Operation
---------- ---------------------------------------------------
0 UPDATE CXT_FAKT_PROVISIONSBUCHUNG (cr=89894995 pr=1701 pw=0 time=318766975 us)
15456 TABLE ACCESS FULL CXT_FAKT_PROVISIONSBUCHUNG (cr=1328 pr=1031 pw=0 time=46975 us cost=370 size=880992 card=15456)
11225 VIEW CXV_HAUPT_VU_SPARTE (cr=89893656 pr=670 pw=0 time=1553653734 us cost=11 size=4293 card=1)
11225 SORT UNIQUE (cr=89893656 pr=670 pw=0 time=1553640071 us cost=11 size=419 card=1)
126457 VIEW (cr=89893656 pr=670 pw=0 time=1533733864 us cost=10 size=419 card=1)
126457 WINDOW SORT (cr=89893656 pr=670 pw=0 time=1533646166 us cost=10 size=155 card=1)
126457 WINDOW SORT (cr=89893656 pr=670 pw=0 time=1532847028 us cost=10 size=155 card=1)
126457 NESTED LOOPS (cr=89893656 pr=670 pw=0 time=1532238656 us)
3501658 NESTED LOOPS (cr=89167767 pr=665 pw=0 time=1529652013 us cost=8 size=155 card=1)
5300707 NESTED LOOPS (cr=1339312 pr=480 pw=0 time=9657987 us cost=5 size=81 card=1)
11225 TABLE ACCESS BY INDEX ROWID POP_INFO (cr=22473 pr=32 pw=0 time=119070 us cost=2 size=16 card=1)
11225 INDEX UNIQUE SCAN PK_POP_INFO (cr=11248 pr=3 pw=0 time=54707 us cost=1 size=0 card=1)(object id 1790009)
5300707 TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=1316839 pr=448 pw=0 time=8359987 us cost=3 size=65 card=1)
5300707 INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=140971 pr=87 pw=0 time=3603882 us cost=2 size=0 card=1)(object id 1795724)
3501658 INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=87828455 pr=185 pw=0 time=1518016475 us cost=2 size=0 card=1)(object id 1795724)
126457 TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=725889 pr=5 pw=0 time=1829196 us cost=3 size=74 card=1)
As you can see, the first run took 48 seconds (time=47,745,718 us in the first line) while the second execution took 319 seconds (time=318766975 us). If you check the execution plans carefully they appear to be the same plan and, in fact, the trace file showed that the two plans had the same plan hash value. Clearly, though, they do vastly different amounts of work – the most eye-catching detail, perhaps, is the way the bad plan blows the row count up to 5 million before collapsing it back to 126,000). What do you have to do to get the change in performance (and it’s a totally reproducible change) – create or drop stats: if you have stats on the tables you get a slow execution, if you delete the stats you get the fast execution.
So what’s the problem ? Look carefully at the plan(s) – they’re not actually the same plan, but you can’t see the difference you can only see clues that they must be different. Notice in the last four lines that you access the same table (TMP_VU_SPARTE) twice using the same index (IDX_TMP_VU_SPARTE) – when you collect stats you access the two tables in the opposite order, and it makes a difference to the work you do.
To demonstrate the point I’ve created a simplified model of the problem, based on some extra information supplied in the mail thread. The model requires a correlated update, based on a view which joins a table to itself, and range-based predicates. Here’s the data generation:
execute dbms_random.seed(0) create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select trunc(dbms_random.value(0,1e4)) contract, sysdate - 300 + trunc(dbms_random.value(0,300)) date_from, sysdate - 300 + trunc(dbms_random.value(0,300)) date_to, sysdate - 300 + trunc(dbms_random.value(0,300)) created, sysdate - 300 + trunc(dbms_random.value(0,300)) replaced from generator v1, generator v2 where rownum <= 1e5 ; create index t1_i1 on t1(contract, date_from, date_to); create or replace view v1 as select t1a.* from t1 t1a, t1 t1b where t1b.contract = t1a.contract and t1b.date_from <= t1a.date_from and t1b.date_to > t1a.date_from and t1b.created < t1a.replaced and t1b.replaced > t1a.created ; rem rem We are going to update t2 from v1 using rem equality on all columns in the index rem create table t2 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select trunc(dbms_random.value(0,1e4)) contract, sysdate - 300 + trunc(dbms_random.value(0,300)) date_from, sysdate - 300 + trunc(dbms_random.value(0,300)) date_to, sysdate - 300 + trunc(dbms_random.value(0,300)) replaced from generator v1, generator v2 where rownum <= 1e5 ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt =>'for all columns size 1' ); dbms_stats.gather_table_stats( ownname => user, tabname =>'T2', method_opt =>'for all columns size 1' ); end; /
I haven’t crafted my data particularly carefully and from the point of view of realism the content is a little bizarre (I’ve got “to” dates earlier than “from” dates, for example) – but all I’m interested in is getting the right sort of shape to demonstrate a point about the plan, I’m not trying to model the actual variation in activity.
I’ve created a view with a self-join that starts with equality on first column of the index I’ve created, but uses range-based predicates on the other columns in the table – that’s probably quite important as far as the real-world performance was concerned – partly because of the nature of the data (interesting skews when comparing “valid to/from dates”) and partly because Oracle is going to have to use its 5% guess for join selectivities for range based selectivities.
And now the update – two versions with their execution plans; starting with the plan where the correlated subquery visits t1 aliased as t1a first:
explain plan for
update t2 set
replaced = (
select
/*+ leading(v1.t1a v1.t1b) */
max(replaced)
from v1
where
v1.contract = t2.contract
and v1.date_from = t2.date_from
and v1.date_to = t2.date_to
)
;
select * from table(dbms_xplan.display(null,null,'alias'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 2328412027
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 100K| 2734K| 500K (20)| 00:41:41 |
| 1 | UPDATE | T2 | | | | |
| 2 | TABLE ACCESS FULL | T2 | 100K| 2734K| 64 (8)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 72 | | |
| 4 | NESTED LOOPS | | | | | |
| 5 | NESTED LOOPS | | 1 | 72 | 4 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 36 | 2 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | T1_I1 | 1 | | 1 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | T1_I1 | 1 | | 1 (0)| 00:00:01 |
|* 9 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 36 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - UPD$1
2 - UPD$1 / T2@UPD$1
3 - SEL$F5BB74E1
6 - SEL$F5BB74E1 / T1A@SEL$2
7 - SEL$F5BB74E1 / T1A@SEL$2
8 - SEL$F5BB74E1 / T1B@SEL$2
9 - SEL$F5BB74E1 / T1B@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("T1A"."CONTRACT"=:B1 AND "T1A"."DATE_FROM"=:B2 AND
"T1A"."DATE_TO"=:B3)
8 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND
"T1B"."DATE_FROM"<=:B3)
filter("T1B"."DATE_TO">:B1 AND "T1B"."CONTRACT"="T1A"."CONTRACT" AND
"T1B"."DATE_FROM"<="T1A"."DATE_FROM" AND "T1B"."DATE_TO">"T1A"."DATE_FROM")
9 - filter("T1B"."CREATED"<"T1A"."REPLACED" AND
"T1B"."REPLACED">"T1A"."CREATED")
Notice that there’s nothing in the body of the plan that tells you which copy of t1 is visited first – you can’t tell unless you look carefully at the predicate information, or unless you’ve requested the alias section that lets you see very easily that the t1 at line 6 (the first one accessed) is aliased as t1a and the t1 at line 9 is aliased as t1b. While you’re at it, check the plan hash value from the top of the plan output.
Now the plan when we hint the two tables into the reverse order:
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 2328412027
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 100K| 2734K| 600K (17)| 00:50:01 |
| 1 | UPDATE | T2 | | | | |
| 2 | TABLE ACCESS FULL | T2 | 100K| 2734K| 64 (8)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 72 | | |
| 4 | NESTED LOOPS | | | | | |
| 5 | NESTED LOOPS | | 1 | 72 | 5 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 36 | 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | T1_I1 | 1 | | 2 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | T1_I1 | 1 | | 1 (0)| 00:00:01 |
|* 9 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 36 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - UPD$1
2 - UPD$1 / T2@UPD$1
3 - SEL$F5BB74E1
6 - SEL$F5BB74E1 / T1B@SEL$2
7 - SEL$F5BB74E1 / T1B@SEL$2
8 - SEL$F5BB74E1 / T1A@SEL$2
9 - SEL$F5BB74E1 / T1A@SEL$2
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND
"T1B"."DATE_FROM"<=:B3)
filter("T1B"."DATE_TO">:B1)
8 - access("T1B"."CONTRACT"="T1A"."CONTRACT" AND "T1A"."DATE_FROM"=:B1 AND
"T1A"."DATE_TO"=:B2)
filter("T1A"."CONTRACT"=:B1 AND "T1B"."DATE_FROM"<="T1A"."DATE_FROM" AND
"T1B"."DATE_TO">"T1A"."DATE_FROM")
9 - filter("T1B"."CREATED"<"T1A"."REPLACED" AND
"T1B"."REPLACED">"T1A"."CREATED")
The plan hash value is the same – but if you look at the alias section you can see that the t1 we access first is the one with alias t1b.
Now compare the predicate sections, just for line 7 (the initial index access line):
First Plan
7 - access("T1A"."CONTRACT"=:B1 AND "T1A"."DATE_FROM"=:B2 AND "T1A"."DATE_TO"=:B3)
Second Plan
7 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND "T1B"."DATE_FROM"<=:B3)
filter("T1B"."DATE_TO">:B1)
Although my model data is random garbage, it’s easy to imagine that with a large data set the selectivities of these two predicates could be dramatically different, and there is clearly some “real-world” meaning to the date_to/date_from columns for a given row that the optimizer is unlikely to recognise when looking at the individual column stats for the table. It’s not surprising that a plan with no stats (which results in dynamic sampling) could find a better plan than a table with stats that leaves the optimizer using its default “call it 5% and hope for the best” strategy for range-based joins.
When you reference a table more than once in an execution plan, make sure you look very carefully at the predicate section – or even call for the alias section – so that you know exactly which copy of the table appears at which point in the plan.
Although the results don’t mean much for my example, here’s my output from tracing my queries – rather than report the whole query in each case, I’ve just given the hint to show the table order:
/*+ leading(v1.t1a v1.t1b) */
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 6.01 0.00 0 200722 204119 100000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 6.01 0.00 0 200722 204119 100000
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 UPDATE T2 (cr=200773 pr=0 pw=0 time=0 us)
100000 100000 100000 TABLE ACCESS FULL T2 (cr=459 pr=0 pw=0 time=0 us cost=64 size=2800000 card=100000)
100000 100000 100000 SORT AGGREGATE (cr=200248 pr=0 pw=0 time=0 us)
0 0 0 NESTED LOOPS (cr=200248 pr=0 pw=0 time=0 us)
0 0 0 NESTED LOOPS (cr=200248 pr=0 pw=0 time=0 us cost=4 size=72 card=1)
0 0 0 TABLE ACCESS BY INDEX ROWID T1 (cr=200248 pr=0 pw=0 time=0 us cost=2 size=36 card=1)
0 0 0 INDEX RANGE SCAN T1_I1 (cr=200248 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
0 0 0 INDEX RANGE SCAN T1_I1 (cr=0 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
0 0 0 TABLE ACCESS BY INDEX ROWID T1 (cr=0 pr=0 pw=0 time=0 us cost=2 size=36 card=1)
/*+ leading(v1.t1b v1.t1a) */
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 6.09 0.00 0 613372 200000 100000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 6.09 0.00 0 613372 200000 100000
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 UPDATE T2 (cr=613372 pr=0 pw=0 time=1 us)
100000 100000 100000 TABLE ACCESS FULL T2 (cr=459 pr=0 pw=0 time=0 us cost=64 size=2800000 card=100000)
100000 100000 100000 SORT AGGREGATE (cr=612913 pr=0 pw=0 time=1 us)
0 0 0 NESTED LOOPS (cr=612913 pr=0 pw=0 time=1 us)
0 0 0 NESTED LOOPS (cr=612913 pr=0 pw=0 time=1 us cost=5 size=72 card=1)
166078 166078 166078 TABLE ACCESS BY INDEX ROWID T1 (cr=368051 pr=0 pw=0 time=0 us cost=3 size=36 card=1)
166078 166078 166078 INDEX RANGE SCAN T1_I1 (cr=202108 pr=0 pw=0 time=0 us cost=2 size=0 card=1)(object id 81082)
0 0 0 INDEX RANGE SCAN T1_I1 (cr=244862 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
0 0 0 TABLE ACCESS BY INDEX ROWID T1 (cr=0 pr=0 pw=0 time=0 us cost=2 size=36 card=1)
There isn’t a lot of difference in the run-time (which is mostly due to the tablescan) – but there’s an obvious difference in the number of buffer visits and the amount of data found for the first t1 access.
Just like the OP my plan varied with stats – though in my case I got the better plan when I had stats, and the worse plan when I deleted stats and the optimizer used dynamic sampling.
This post is part of an ongoing series:
These blog posts on thin cloning have covered a number of ways to create database thin clones, yet there are still some challenges:
Database-managed Cloning with Oracle Clonedb
Copy on Write with EMC COFW
Redirect on Write with EMC VNX ROW
Write Anywhere File System with NetApp
Allocate on Write with ZFS
Each of these technologies faces different challenges. One obvious constraint is that any solution that depends on a vendor such as EMC, NetApp, or Oracle will tie the solution only to that storage solution. Some concepts such as Clonedb and Illumos ZFS can be run on any storage but have other constraints.
One of the consistent challenges across all of these technologies is the expert knowledge and extensive manual configuration that can impede implementation of these technologies to provide database thin clones. If database thin clones reduce storage so dramatically as well as reduce cloning times drastically then the thin clone technology should have taken off across multiple industries. This technology involving filesystem snapshots has existed since the mid 1990s—almost a decade and a half ago. In 1994, StorageTek introduced the virtual disk in their Iceberg release. In 1995 Iceberg started supporting filesystem snapshots. However after 15 years, database thin cloning is still rarely used.
The answer to this riddle lies in an analogy. The analogy is the Internet: it was around years before the browser was ever used. Before browsers one could do many things that they can do today on the Internet: use email, transfer files (via FTP), go to chat rooms, and use bulletin boards. But until the browser was created most activity on the Internet was from academics. It wasn’t until the introduction of the browser that usage of the Internet exploded. In a similar way it wasn’t until the introduction of Database Virtualization that usage of database thin cloning began to explode.
Recent comments
21 weeks 1 day ago
30 weeks 6 days ago
32 weeks 4 days ago
35 weeks 6 days ago
38 weeks 1 day ago
47 weeks 4 days ago
49 weeks 1 day ago
50 weeks 1 day ago
50 weeks 2 days ago
1 year 6 days ago