Search

OakieTags

Who's online

There are currently 0 users and 34 guests online.

Recent comments

Affiliations

Oakies Blog Aggregator

Group By Bug

This just in from OTN Database Forum – a surprising little bug with “group by elimination” exclusive to 12c.


alter session set nls_date_format='dd-Mon-yyyy hh24:mi:ss';

select
       /* optimizer_features_enable('12.1.0.1')*/
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from (
  select
        ts, max(fieldb) fieldb
  from (
  select trunc(sysdate) - 1/24 ts, 1 fieldb from dual
  union all
  select trunc(sysdate) - 2/24 ts, 2 fieldb from dual
  union all
  select trunc(sysdate) - 3/24 ts, 3 fieldb from dual
  union all
  select trunc(sysdate) - 4/24 ts, 4 fieldb from dual
  union all
  select trunc(sysdate) - 5/24 ts, 5 fieldb from dual
  )
  group by ts
)
group by trunc (ts,'DD')
/

You might expect to get one row as the answer – but this is the result I got, with the execution plan pulled from memory:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00          1
03-Sep-2014 00:00:00          5
03-Sep-2014 00:00:00          4
03-Sep-2014 00:00:00          2
03-Sep-2014 00:00:00          3

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    11 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    11  (10)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    10   (0)| 00:00:01 |
|   3 |    UNION-ALL     |      |       |       |            |          |
|   4 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

You’ll notice that I’ve got an “optimizer_features_enable()” comment in the code: if I change it into a hint I get the following (correct) result and plan:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00         15

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    12 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    12  (17)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    11  (10)| 00:00:01 |
|   3 |    HASH GROUP BY |      |     5 |    60 |    11  (10)| 00:00:01 |
|   4 |     VIEW         |      |     5 |    60 |    10   (0)| 00:00:01 |
|   5 |      UNION-ALL   |      |       |       |            |          |
|   6 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Somehow 12.1.0.2 has managed to get confused by the combination of “group by ts” and “group by trunc(ts,’DD’)” and has performed “group-by elimination” when it shouldn’t have. If you use the ‘outline’ option for dbms_xplan.display_cursor() you’ll find that the bad result reports the hint elim_groupby(@sel$1), which leads to an alternative solution to hinting the optimizer_features level. Start the code like this:


select
       /*+ qb_name(main) no_elim_groupby(@main) */
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from  ...

The (no_)elim_groupby is a hint that appeared in v$sql_hints only in the 12.1.0.2.

EM12c and Windows OS Failover Cluster- Keep it Simple

This is going to be a “KISS”, (Keep it Simple, Silly) post.  I was offered a second Windows cluster to test on when my original one was required for QA.  It was a gracious offer, but I also found out why a cluster should be kept simple and this cluster was anything but simple.  Many times, issues are created and we’ll never see them coming until we go through the pain and the EM12c on this cluster has been both “educational” and painful.

The failover cluster is a Windows 2008 server with three Central Access Points, (CAP).  Each CAP has a drive assigned to them.  The drives are 20Gb each, so they aren’t TOO large when we are talking an EM12c environment build with an OMR, (Oracle Management Repository) also residing on the same host.

It may get confusing for some folks fast, so I’ll try to build a legend to help everyone keep it all straight.

The cluster is #ff0000;">cluster-vip.us.oracle.com.

The nodes in the cluster are#ff00ff;"> node1.us.oracle.com and #800080;">node2.us.oracle.com

The CAP’s are cluster-vip1.us.oracle.com, cluster-vip2.us.oracle.com and cluster-vip3.us.oracle.com.  They each have one shared drive allocated to them that can fail over:

  • #ff6600;">F:\ = cluster-vip1.us.oracle.com
  • #0000ff;">G:\ = cluster-vip2.us.oracle.com
  • #008000;">H:\ = cluster-vip3.us.oracle.com

This should start sounding like an Abbott and Costello skit pretty quick, so I’ll try to explain as much as I can and hope that each reader has some knowledge of Windows clustering… :)

Each failover cluster CAP will appear like this in the cluster manager:

cluster_f1

Due to the size of each shared volume, we’re going to install the database on #008000;">cluster-vip3, (H:\ volume) and the EM12c console and repository on #0000ff;">cluster-vip2, (G:\)#ff6600;"> Cluster-vip1, (F:\) will be left available for database growth.

As our first step for installing on a Windows cluster, we need to set the %ORACLE_HOSTNAME%, (of for Unix/Linux $ORACLE_HOSTNAME) environment variable.  For the database, it’s going to be #008000;">cluster-vip3.us.oracle.com, as that is the hostname associated with the volume, (#008000;">H:\) we wish to install on, (anyone already seeing the dark road we are starting down on? :))

After the installation of the Oracle binaries, (software) and the database is complete, you need to verify that the listener is configured correctly and the database is able to be connected via TNS, (“sqlplus / as sysdba” won’t cut it, EM12c must be able to connect with “sys/system/sysman@).

This will most likely require a manual edit of the tnsnames.ora file to replace the hostname in the connection string with the IP Address:

EMREP =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = #008000;">)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = EMREP)
)
)
LISTENER_EMREP =
(ADDRESS = (PROTOCOL = TCP)(HOST = #008000;">)(PORT = 1521))

Now you might say, “but they are all on the same host!”, but the cluster doesn’t see it this way due to the way the CAP’s have all been configured.  Each drive is a different CAP and must be identified by their hostname or IP address.

Now here’s where the fun starts-  You now need to configure the %ORACLE_HOSTNAME% environment variable to the#0000ff;"> cluster-vip2.us.oracle.com for the EM12c installation.  Remember, our EM12c is going to reside on the #0000ff;">G:\ volume of shared storage.

So we now set our environment variable, restart the cluster, bring the cluster services all back online on #ff00ff;">node1, bring up the database, listener and then start the EM12c installation.  The install will auto-populate the #ff00ff;">node1 name, but you need to change it to the %ORACLE_HOSTNAME% environment variable, which will now be set to #0000ff;">cluster-vip2.us.oracle.com:

cluster_f5

It will verify the new ORACLE_HOSTNAME, (unless this matches the ORACLE_HOSTNAME environment variable, it will not succeed!) and the installation then asks for information about the database you want to use for your repository that is actually identified as being on a different VIP:

cluster_f6

There is a common step, even though you may have created your database without the dbcontrol in your repository database, (OMR) that will ask you to uninstall the dbcontrol schema so the EM12c SYSMAN schema can be installed.

The command looks like this:

/bin/emca -deconfig dbcontrol db -repos drop -SYS_PWD -SYSMAN_PWD

If you attempt to run this, with the ORACLE_HOSTNAME set to the correct address for the #0000ff;">G:\ drive, (#0000ff;">cluster-vip2.us.oracle.com), but the database is installed on the #008000;">H:\ drive, (#008000;">cluster-vip3.us.oracle.com) the following error will be seen:

cluster_f2

It will act as if it’s run the removal, but if you log back in, the SYSMAN schema is still intact and no removal is performed.  The trick here is that the command should be copied or saved off to run again from the command line and the installation for the EM12c should be cancelled.

Why?  This step has to be performed with the %ORACLE_HOSTNAME% set to the database VIP again, #008000;">(cluster-vip3.us.oracle.com) instead of the EM12c VIP, (#0000ff;">cluster-vip2.us.oracle.com.)

cluster_f3

There is a number of steps to complete this:

  1. cycle #ff00ff;">node1 of the cluster to set the environment variable.
  2. bring the cluster services back to #ff00ff;">node1 that would have failed over to #800080;">node2.
  3. bring the database and listener services back up.
  4. start a command prompt, run the command to remove dbcontrol.
  5. set the environment variable back to the #0000ff;">cluster-vip2 for the em12c installation.
  6. cycle #ff00ff;">node1 to put the environment variable back in place.
  7. bring the cluster services back over to #ff00ff;">node1, (there are three service, one for each drive, not just one with this configuration.)
  8. bring the database and listener back up, (again!)
  9. restart the EM12c installation.

Yeah, I wasn’t too thrilled, either!  I now was able to run through the installation without being hindered by the dbcontrol step that is automatically hit, even if I chose NOT to install dbcontrol when I created the database… :)  The rest of the installation to #800080;">node2 can be completed and failover testing can begin!

So, the moral of the story-  Don’t create three VIP’s for three shared storage drives on one failover cluster.  It’s just going to annoy you, your installation and even more so, me, if you come and ask me for assistance… :)

cluster_f4

 

 

 

 

 

 

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [EM12c and Windows OS Failover Cluster- Keep it Simple], All Right Reserved. 2014.

5 secrets 10% of Fortune 500 use to double application development output

#222222;">#808080;">This article liberally uses sections from #0000ff;">#0000ff;">The Threat That Lies Within  by Yaniv Yehuda

#000000;">

Databases receive much attention so they will not become THE risk factor of IT. Budgeting DRPs, backups, top of the line DBAs – and yet, they still pose a major threat. Why?

#222222;">Agile Development

In a fast-moving competitive market, if your competitor delivers relevant products, faster and with better quality, you’re eventually going to lose market share. This is exactly why companies need to be agile.  They need to have better control over information, make quicker decisions, accelerate software delivery, and optimize quality control.

To answer revenue driving business questions companies are constantly improving applications and creating new applications. Application development depends upon provisioning environments for developers and QA. Once developers and QA teams are up and running on it these environments, code development requires means of managing the changes and the deployment of the changes.  The slower and more costly provisioning and managing the development environments the more delays, bugs there will be in the applications and the less revenue businesses will generate.

#222222;">The Database Bottleneck

One of the slowest, most difficult and costliest steps in application development is provisioning copies of and managing changes in databases for development and QA.

Code management has become straight forward with the use of source control tools such as Git, SVN and Perforce. Provisioning development environments has become efficient with Chef, Puppet and Jenkins.

But what about the database? How does one provision environments with multi-terrabyte databases and version control these databases as changes are made by multiple different developers and as multiple versions of code and database states need to be maintained? For all the advances in application development such as machine virtualization, agile development and DevOps practices, the database still stands as a huge barrier to development agility.

Database branching, development, merges and deployments are tricky; unlike code, a database is not a collection of files. Following best practices using proven file based tools often fails when dealing with the database – the container of our most valued asset – the business data. Too often the database is left behind and becomes the weakest link in the chain.

The difference between database development and application code development is so great, that in time, silos were created.

While application developers are moving forward, adopting great new tools and practicing Agile, automation, and continuous delivery, database development is done using a less controlled process. More craft than engineering, database development is often disconnected from the entire SDLC process. Tools, processes, and best practices are not shared.

The database, being such a valued asset, can very easily become that wobbly wheel that destabilizes the whole car.

#222222;">Database Agility: Key to Fast Development

Databases can be managed just as efficiently as code.

In order to minimize the risks databases pose during the development cycle, the following should be addressed:

  1. Agile  - implement database changes as part of shorty cycle task-based development. Deploy code changes and database changes  hand-in-hand  based on change requests. To accomplish use tools such as #286ab2;" href="http://www.atlassian.com/software/jira" target="_blank">Jira
  2. Virtual –  Virtualize your  application stack and databases. Data virtualization can be accomplished with #286ab2;" href="http://kylehailey.com/delphix" target="_blank">Delphix.  Virtualized data and databases can be created in minutes no matter the size, can be branched and can be made via a self service interface or with by automated systems using APIs. A virtualized copy of a production environment will allow you to test production deployments, true to production’s latest data, and allow data to be developed and tested in parallel with with multiple environments. With this change, companies have doubled their application development output. Delphix also includes data masking in their solution to insure that each virtual database to insure data security.
  3. Change Management – Implement database tools to make sure database changes are properly managed as part of SDLC .  By using #286ab2;" href="http://www.dbmaestro.com/" target="_blank">DBmaestro to enforce version control and track and merge all changes, you’ll always have full control over who is doing what, where, and why.  Once a development cycle has concluded, relevant changes can be automatically deployed into integration environments, identifying change conflicts and merging them, or pushing them to production.
  4. Testing automation –  Follow change-focused testing and make sure nothing breaks as a result of changes will provide a solid safety net.  Running unit tests with #286ab2;" href="http://www.thatjeffsmith.com/archive/2014/04/unit-testing-your-plsql-with-oracle-sql-developer/" target="_blank">Oracle SQL Developer or Microsoft-focused #286ab2;" href="http://tsqlt.org/" target="_blank">tSQLt is a cost-free way to introduce unit testing for the database. Yet another way to test your database along with the application code, as one unit by using #286ab2;" href="http://smartbear.com/" target="_blank">TestComplete to validate your common -  and later less-common – scenarios will provide great confidence in your ability to deliver releases .
  5. Release Automation – The more you automate, the more accurately you can reproduce the changes introduced in development, test in testing environments, and repeat safely when going to production. A healthy SDLC with minimum risk and maximum efficiency can be achieved once you automate its processes and address its challenges on daily basis.  Using #286ab2;" href="http://www-03.ibm.com/software/products/en/ucdep" target="_blank">IBM Urbancode Deploy can help manage your release activities and your configurations, and orchestrate the overall process.

Best of all,  these types of solutions play very nicely with each other. So creating a virtual development branch with Delphix and masking its confidential content, managing development tasks with Jira, version controlling changes and packaging a release with DBmaestro, automatically testing this release with tSQLt or SQL Developer, and orchestrating the release process with uDeploy can all be combined into one slick, automated, and safe process.

#222222;">Summary

The database presents a real challenge when it comes to following best practices. This is exactly how silos are created. A development department might follow different processes when dealing with code development vs. database development, or follow best practices for only some of its change efforts.

The best way to eliminate major risks and bottlenecks is to virtualize and automate and using the above virtualization and automation tools companies have doubled their application development output while at the same time eliminating bugs and increasing quality.

 

Businesspeople Running Towards Finish Line

 

 

Teach Your Children Well

computerroom

Today’s the first day of school in my city, and the plethora of “OMG 1st day of school!” posts of Facebook and my own kids heading back in to slog through another year of fundamentals got me thinking about education. My own kids (12/daughter and 10/son) came home and went swimming since it was about 104 degrees today…the hottest day of the summer, as it were.

minecraft-2__140227213724Anyways, after that activity my son decided to get down to business. And by business I mean: play Minecraft. Some of my friends and family have banned Minecraft during the school week or even as a whole, because of the incredibly addictive nature of the game. I elected instead to make my son play it on the computer instead of the Xbox or iDevice. See, on those systems it’s slightly more mindless (yet still very creative); you just pick the things you want to make and it makes them, and the game is what the game is because it’s on a console. On the computer, you have to memorize the ingredients and patterns necessary to make objects (from making andesite to baking cakes). You can mod the game with a variety of amazing mods out there in the community. You can play on servers that have goals ranging from “kill everyone you see” to “let’s build something incredible together.” I like that. It’s like legos, except with social implications, unlimited blocks of every kind, and electronics lessons all rolled into one.

What’s more, in the last few months my son has learned to install Java, use basic DOS, modify heap parameters, create a .BAT file, and many other cool things. Add that to his foray into the world of cryptocurrency, and he’s growing a set of very valuable skills. My daughter’s no slouch either, she spent a couple years on Ubuntu Linux and actually came to like it quite a bit!

Okay, so enough bragging on my kids. They’re goofballs anyways.

Teach Kids Cool Things

A while back I posted about how I became a DBA while in High School. When I was 18, I offered to help someone at a community college with a presentation to teach college students about programming. I remember them telling me that it was a ridiculous concept, because kids can’t learn those kinds of things anyways. It struck a nerve as you might imagine. Kids can learn highly technical skills if someone is willing to teach them. And with the proliferation of technical gadgets and abilities in the world, they want to learn. You can even start them young…REALLY young.stillashark

There are some great resources out there beyond build-a-world-with-blocks games like Minecraft. There are simple learning sites like phpforkids.com, highly interactive and informative classes like Code Academy, and even specially made storyboarding and design programs like Scratch by MIT.

For books with associated exercises, we have 3D Game Programming for Kids, Learn to Program with Minecraft Plugins (Java and Minecraft!), and even Python with Python for Kids: A Playful Introduction to Programming.

Babies don’t get a pass from learning! Code Babies has a set of excellent introductions to programming including Web Design, CSS, and HTML. A is NOT for “Aardvark,” it’s for “Anchor Tag”!

There are even online courses like Learn JAVA with Minecraft (my son is doing this one and enjoys it), and summer camps as well, to teach kids online and in person.

Even Oracle has a part to play in the education of youngsters with Oracle Academy. Oracle Academy has resources for all levels of learning, contests, and even a self-study programming environment and course called Greenfoot that teaches Java from zero experience with 2D gaming design. I’ve had the privilege of judging some of their contests in the past and was astounded by the skills these young people had in highly advanced technology.

This is so awesome. I really wish these sorts of things were around when I was a kid. If you have children and want them to learn how to tech, the resources are plentiful and amazing.

The post Teach Your Children Well appeared first on Oracle Alchemist.

Who is using this index?

Or, to put it another way, I want to change or drop this index, who and what will I impact?

The Challenge 

The problem that I am going to outline is certainly not exclusive to PeopleSoft, but I am going to illustrate it with examples from PeopleSoft. I often find tables with far more indexes than are good for them.
  • The Application Designer tool makes it very easy for developers to add indexes to tables. Sometimes, too easy!
  • Sometimes, DBAs are too quick to unquestioningly follow the advice of the Oracle tuning advisor to add indexes.

Recently, I have been working on 3 different PeopleSoft Financials systems where I have found major tables with a host of indexes.

There are several concerns:

  • Indexes are maintained during data modification. The more indexes you have, the greater the overhead. 
  • The more indexes you have, particularly if they lead on the same columns, the more likely Oracle is to use the wrong one, resulting in poorer performance.
  • There is of course also a space overhead for each index, but this is often of less concern. 

If you can get rid of an index, Oracle doesn't store, maintain or use it. 

In some cases, I have wanted to remove unnecessary indexes, and in others to adjust indexes. However, this immediately raises the question of where are these indexes used, and who will be impacted by the change. Naturally, I turn to the Active Session History (ASH) to help me find the answers. 

Index Maintenance Overhead during DDL 

ASH reports the object number, file number, block number and (from 11g) row number within the block being accessed by physical file operations. However, the values reported in v$active_session_history (and later other views) are not reliable for other events because they are merely left over from the previous file event that reported them. So, we can profile the amount of time spent on physical I/O on different tables and indexes, but not for other forms of DB Time, such as CPU time, spent accessing the blocks in the buffer cache.
Let me take an extreme example from PeopleSoft Global Payroll. The table PS_GP_RSLT_ACUM is one of the principal result tables. It has only a single unique index (with the same name). The table is populated with the simplest of insert statements.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT /*GPPRDMGR_I_ACUM*/ INTO PS_GP_RSLT_ACUM
(CAL_RUN_ID ,EMPLID ,EMPL_RCD ,GP_PAYGROUP ,CAL_ID ,RSLT_SEG_NUM ,PIN_NUM ,EMPL_RCD_ACUM ,ACM_FROM_DT ,ACM_THRU_DT ,USER_KEY1 ,USER_KEY2 ,USER_KEY3 ,USER_KEY4 ,USER_KEY5 ,USER_KEY6 ,SLICE_BGN_DT ,SLICE_END_DT ,COUNTRY ,ACM_TYPE ,ACM_PRD_OPTN ,CALC_RSLT_VAL ,CALC_VAL ,USER_ADJ_VAL ,PIN_PARENT_NUM ,CORR_RTO_IND ,ORIG_CAL_RUN_ID ,SEQ_NUM8 ,VALID_IN_SEG_IND ,CALLED_IN_SEG_IND )
VALUES
(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,:26,:27,:28,:29,:30)

I can profile the ASH data for just this statement over the last week on a production system. Note that DBA_OBJECTS and DBA_DATA_FILES are outer joined to the ASH data and only matched for events like 'db file%'

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT o.object_type, o.object_name
, f.tablespace_name, NVL(h.event,'CPU+CPU Wait') event
, SUM(10) ash_Secs
FROM dba_hist_Active_sess_history h
LEFT OUTER JOIN dba_objects o
ON o.object_id = h.current_obj#
AND h.event like 'db file%'
LEFT OUTER JOIN dba_data_files f
ON f.file_id = h.current_file#
AND h.event like 'db file%'
WHERE h.sql_id = '4ru0618dswz3y'
AND h.sample_time >= sysdate-7
GROUP BY o.object_type, o.object_name, h.event, f.tablespace_name
ORDER BY ash_secs DESC
/

A full payroll calculation inserts over 3 million rows on this particular system. The calculation is run incrementally several times per week during which old rows are deleted and newly recalculated rows inserted.  Looking at just this insert statement:

  • 30% of the time is spent on CPU operations, we cannot profile that time further with ASH.
  • 38% of the time is spent reading from the table and index, yet this is a simple INSERT … VALUES statement.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">OBJECT_TYPE         OBJECT_NAME        TABLESPACE_NAME EVENT                      ASH_SECS
------------------- ------------------ --------------- ------------------------ ----------
CPU+CPU Wait 1040
UNDOTBS1 db file sequential read 900
INDEX SUBPARTITION PS_GP_RSLT_ACUM GP201408IDX db file sequential read 750
TABLE SUBPARTITION PS_GP_RSLT_ACUM GP201408TAB db file sequential read 550
gc current grant 2-way 70
cursor: pin S wait on X 60
db file sequential read 10
buffer exterminate 10
row cache lock 10
----------
3400

More time is spent reading the index than the table.  That is not a surprise.  When you insert a row into a table, you also insert it into the index. Rows in index leaf blocks are ordered by the key columns, and the new entry must go into the right place, so you have to read down the index from the root block, through the branch blocks, to find the correct leaf block for the new entry.
[Digression: Counter-intuitively index compression can improve DML performance. It does for this index.  The overhead of the compression processing can be outweighed by the saving in physical I/O.  It depends.]

Profile Physical I/O by Object 

I can twist this query around and profile DB_TIME by object for 'db file%' events

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT o.object_type, o.object_name, sum(10) ash_secs
FROM dba_hist_active_sess_history h
, dba_objects o
WHERE o.object_id = h.current_obj#
AND h.event LIKE 'db file%'
AND h.sample_time > sysdate-7
GROUP BY o.object_type, o.object_name
ORDER BY ash_Secs DESC

Now I can see upon which objects the most time is spent on physical I/O.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">OBJECT_TYP OBJECT_NAME          ASH_SECS
---------- ------------------ ----------
TABLE PS_ITEM 101130
INDEX PS_WS_ITEM 98750
TABLE PS_PROJ_RESOURCE 97410
TABLE PS_BI_LINE 85040
INDEX PSAPSAPMSGSUBCON 75070
TABLE PS_BI_HDR 37230
TABLE PS_RS_ASSIGNMENT 29460
INDEX PS_PSAPMSGPUBHDR 23230
INDEX PS_BI_ACCT_ENTRY 21490
TABLE PS_VOUCHER 21330
TABLE PS_VCHR_ACCTG_LINE 21250
TABLE PS_BI_ACCT_ENTRY 18860

----------
sum 1382680

This is a worthwhile exercise, it shows the sources of physical I/O in an application.

However, if you want to find where an index is used, then this query will also identify SQL_IDs where the index is either used in the query or maintained by DML. If I am interested in looking for places where changing or deleting an index could have an impact then I am only interested in SQL query activity. ASH samples which relate to index maintenance are a false positive. Yet, I cannot simply eliminate ASH samples where the SQL_OPNAME is not SELECT because the index may be used in a query within the DML statement.

Another problem with this method is that it matches SQL to ASH by object ID. If someone has rebuilt an index, then its object number changes.

A different approach is required.

Index Use from SQL Plans Captured by AWR 

During an AWR snapshot the top-n SQL statements by each SQL criteria in the AWR report (Elapsed Time, CPU Time, Parse Calls, Shareable Memory, Version Count) , see dbms_workload_repository. The SQL plans are exposed by the view DBA_HIST_SQL_PLAN.

On PeopleSoft systems, I generally recommend decreasing the snapshot interval from the default of 60 minutes to 15. The main reason is that SQL gets aged out of the library cache very quickly in PeopleSoft systems. They generate lots of dynamic code, often with literal values rather than bind variables. Cursor sharing is not recommended for PeopleSoft, so different bind variables result in different SQL_IDs. The dynamic code also results in different SQL IDs even with cursor sharing. Therefore, increasing the snapshot frequency means that will capture more SQL statement. This will increase total volume of the AWR repository simply because there are more snapshots. However, the overall volume of ASH data captured does not change, it just gets copied to the repository earlier.

On DBA_HIST_SQL_PLAN the object ID, owner, type and name are recorded, so I can find the plans which referenced a particular object. I am going to carry on with the example from a PeopleSoft Financials system, and look at indexes on the PS_PROJ_RESOURCE table.

These are some of the indexes on PS_PROJ_RESOURCE. We have 4 indexes that all lead on PROCESS_INSTANCE. I suspect that not all are essential, but I need to work out what is using them, and which one I should retain.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">                          Col
INDEX_NAME Pos COLUMN_NAME COLUMN_EXPRESSION
------------------ ---------- -------------------- ----------------------------------

PSJPROJ_RESOURCE 1 PROCESS_INSTANCE
2 BUSINESS_UNIT_GL
3 BUSINESS_UNIT
4 PROJECT_ID
5 ACTIVITY_ID
6 CUST_ID

PSLPROJ_RESOURCE 1 PROCESS_INSTANCE
2 EMPLID
3 EMPL_RCD
4 TRANS_DT

PSMPROJ_RESOURCE 1 PROCESS_INSTANCE
2 BUSINESS_UNIT
3 PROJECT_ID
4 ACTIVITY_ID
5 RESOURCE_ID

PSNPROJ_RESOURCE 1 PROCESS_INSTANCE
2 BUSINESS_UNIT
3 TIME_RPTG_CD

I find it easier to extract the ASH data to my own working storage table. For each index on PS_PROJ_RESOURCE, I am going to extract a distinct list of plan hash values. I will then extract all ASH data for those plans. Note, that I have not joined the SQL_ID on DBA_HIST_SQL_PLAN. That is because different SQL_IDs can produce the same execution plan. The plan is equally valid for all SQL_IDs that produce the plan, not just the one where the SQL_ID also matches.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">DROP TABLE my_ash purge
/
CREATE TABLE my_ash COMPRESS AS
WITH p AS (
SELECT DISTINCT p.plan_hash_value, p.object#, p.object_owner, p.object_type, p.object_name
FROM dba_hist_sql_plan p
WHERE p.object_name like 'PS_PROJ_RESOURCE'
AND p.object_type LIKE 'INDEX%'
AND p.object_owner = 'SYSADM'
)
SELECT p.object# object_id, p.object_owner, p.object_type, p.object_name
, h.*
FROM dba_hist_active_sess_history h
, p
WHERE h.sql_plan_hash_value = p.plan_hash_value
/

I am fortunate that PeopleSoft is a well instrumented application. Module and Action are set to fairly sensible values that will tell me whereabouts in the application the ASH sample relates. In the following query, I have omitted any ASH data generated by SQL*Plus, Toad, or SQL Developer, and also any generated by Oracle processes to prevent statistics collection jobs being included.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Set pages 999 lines 150 trimspool on
break on object_name skip 1
compute sum of ash_secs on object_name
column ash_secs heading 'ASH|Secs' format 9999999
column module format a20
column action format a32
column object_name format a18
column max_sample_time format a19 heading 'Last|Sample'
column sql_plans heading 'SQL|Plans' format 9999
column sql_execs heading 'SQL|Execs' format 99999
WITH h AS (
SELECT object_name
, CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
END as module
, CASE WHEN h.action LIKE 'PI=%' THEN NULL
ELSE h.action
END as action
, CAST(sample_time AS DATE) sample_time
, sql_id, sql_plan_hash_value, sql_exec_id
FROM my_ash h
)
SELECT object_name, module, action
, sum(10) ash_secs
, COUNT(DISTINCT sql_plan_hash_value) sql_plans
, COUNT(DISTINCT sql_id||sql_plan_hash_value||sql_exec_id) sql_execs
, MAX(sample_time) max_sample_time
FROM h
WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND NOT LOWER(module) LIKE 'sql%'
GROUP BY object_name, module, action
ORDER BY SUBSTR(object_name,4), object_name, ash_Secs desc
/
Spool off

I now have a profile of how much each index is used. In this particular case I found something using every index.  It is possible that you will not find anything that uses some indexes.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">                                                                             ASH   SQL    SQL Last
OBJECT_NAME MODULE ACTION Secs Plans Execs Sample
------------------ -------------------- -------------------------------- ------- ----- ------ -------------------

PSJPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step24.S 7300 1 66 06:32:57 27/08/2014
PC_PRICING GFCPBINT_AE.CallmeA.Step24.S 40 1 2 08:38:57 22/08/2014
****************** -------
sum 7340

PSLPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step28.S 1220 1 53 06:33:17 27/08/2014
****************** -------
sum 1220

PSMPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.XxBiEDM.Step07.S 60 2 6 18:35:18 20/08/2014
****************** -------
sum 60

PSNPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step26.S 6720 1 49 18:53:58 26/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step30.S 3460 1 60 06:33:27 27/08/2014
GFCOA_CMSN GFCOA_CMSN.01INIT.Step01.S 2660 1 47 19:19:40 26/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step06.S 1800 1 52 18:53:28 26/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeG.Step01.S 1740 1 61 06:34:17 27/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step02.S 1680 1 24 18:53:18 26/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step10.S 1460 1 33 17:26:26 22/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step08.S 920 1 26 17:26:16 22/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step36.S 460 1 18 18:26:38 20/08/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step09.S 420 1 16 06:33:07 27/08/2014
PC_PRICING GFCPBINT_AE.CallmeG.Step01.S 200 1 10 08:09:55 22/08/2014
PC_AP_TO_PC GFCPBINT_AE.CallmeH.Step00A.S 170 1 17 21:53:26 21/08/2014
PC_PRICING GFCPBINT_AE.CallmeA.Step36.S 20 1 1 08:02:46 05/08/2014
PC_PRICING GFCPBINT_AE.CallmeA.Step30.S 20 1 1 13:42:48 04/08/2014
PC_PRICING GFCPBINT_AE.CallmeA.Step06.S 20 1 1 15:58:35 28/07/2014
PC_TL_TO_PC GFCPBINT_AE.CallmeA.Pseudo.S 20 1 1 19:45:11 06/08/2014
****************** -------
sum 21770

The next stage is to look at individual SQL statements This query looks for which SQL statement is using a particular index on PROJ_RESOURCE. If I can't find the SQL which cost the most time, then just choose another SQL with the same plan

  • I have found that sometimes a plan is captured by AWR, but the SQL statement is not. Personally, I think that is a bug. Working around it has made the following query quite complicated.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Break on object_name skip 1 
column ash_secs heading 'ASH|Secs' format 9999999
Set long 50000
Column cmd Format a200
Spool dmk

WITH h AS (
SELECT h.object_name
, CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
END as module
, CASE WHEN h.action LIKE 'PI=%' THEN NULL
ELSE h.action
END as action
, h.sql_id, h.sql_plan_hash_value
, t.command_type –-not null if plan and statement captured
FROM my_ash h
LEFT OUTER JOIN (
SELECT t1.*
FROM dba_hist_sqltext t1
, dba_hist_sql_plan p1
WHERE t1.sql_id = p1.sql_id
AND p1.id = 1
) t
ON t.sql_id = h.sql_id
AND t.dbid = h.dbid
WHERE h.object_name IN('PSMPROJ_RESOURCE')
AND h.object_Type = 'INDEX'
AND h.object_owner = 'SYSADM'
And NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND NOT LOWER(module) LIKE 'sql%'
), x AS ( --aggregate DB time by object and statement
SELECT object_name, sql_id, sql_plan_hash_value
, sum(10) ash_secs
, 10*COUNT(command_type) sql_secs --DB time for captured statements only
FROM h
WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
AND NOT LOWER(module) LIKE 'sql%'
GROUP BY object_name, sql_id, sql_plan_hash_value
), y AS ( --rank DB time per object and plan
SELECT object_name, sql_id, sql_plan_hash_value
, ash_secs
, SUM(ash_secs) OVER (PARTITION BY object_name, sql_plan_hash_value) plan_ash_secs
, row_number() OVER (PARTITION BY object_name, sql_plan_hash_value ORDER BY sql_Secs DESC) ranking
FROM x
), z AS (
SELECT object_name
, CASE WHEN t.sql_text IS NOT NULL THEN y.sql_id
ELSE (SELECT t1.sql_id
FROM dba_hist_sqltext t1
, dba_hist_sql_plan p1
WHERE t1.sql_id = p1.sql_id
AND p1.plan_hash_value = y.sql_plan_hash_value
AND rownum = 1) --if still cannot find statement just pick any one
END AS sql_id
, y.sql_plan_hash_value, y.plan_ash_secs
, CASE WHEN t.sql_text IS NOT NULL THEN t.sql_text
ELSE (SELECT t1.sql_Text
FROM dba_hist_sqltext t1
, dba_hist_sql_plan p1
WHERE t1.sql_id = p1.sql_id
AND p1.plan_hash_value = y.sql_plan_hash_value
AND rownum = 1) --if still cannot find statement just pick any one
END AS sql_text
from y
left outer join dba_hist_sqltext t
on t.sql_id = y.sql_id
WHERE ranking = 1 --captured statement with most time
)
SELECT *
--'SELECT * FROM table(dbms_xplan.display_awr('''||sql_id||''','||sql_plan_hash_value||',NULL,''ADVANCED''))/*'||object_name||':'||plan_ash_Secs||'*/;' cmd
FROM z
ORDER BY object_name, plan_ash_secs DESC
/
Spool off

So now I can see the individual SQL statements.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">PSJPROJ_RESOURCE   f02k23bqj0xc4          3393167302          7340 UPDATE PS_PROJ_RESOURCE C SET (C.Operating_Unit, C.CHARTFIELD1, C.PRODUCT, C.CLA
SS_FLD, C.CHARTFIELD2, C.VENDOR_ID, C.contract_num, C.contract_line_num, …

PSLPROJ_RESOURCE 2fz0gcb2774y0 821236869 1220 UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM ps_job j WH
ERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND j.effdt = ( SELECT MAX (…

PSMPROJ_RESOURCE 96cdkb7jyq863 338292674 50 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
PS_PROJ_RESOURCE x WHERE x.process_instance = …

1kq9rfy8sb8d4 4135884683 10 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
PS_PROJ_RESOURCE x WHERE x.process_instance = …

PSNPROJ_RESOURCE ga2x2u4jw9p0x 2282068749 6760 UPDATE PS_PROJ_RESOURCE P SET (P.RESOURCE_TYPE, P.RESOURCE_SUB_CAT) = …

9z5qsq6wrr7zp 3665912247 3500 UPDATE PS_PROJ_RESOURCE P SET P.TIME_SHEET_ID = …

If I replace the last select clause with the commented line, then I can generate the commands to extract the statement and plan from the AWR repository.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM table(dbms_xplan.display_awr('45ggt0yfrh5qp',3393167302,NULL,'ADVANCED'))/*PSJPROJ_RESOURCE:7340*/;

SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;

SELECT * FROM table(dbms_xplan.display_awr('96cdkb7jyq863',338292674,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:50*/;

SELECT * FROM table(dbms_xplan.display_awr('1kq9rfy8sb8d4',4135884683,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:10*/;

SELECT * FROM table(dbms_xplan.display_awr('ga2x2u4jw9p0x',2282068749,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:6760*/;
SELECT * FROM table(dbms_xplan.display_awr('9z5qsq6wrr7zp',3665912247,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3500*/;
SELECT * FROM table(dbms_xplan.display_awr('b28btd6k3x8jt',1288409804,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3060*/;
SELECT * FROM table(dbms_xplan.display_awr('avs70c19khxmw',2276811880,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:2660*/;
SELECT * FROM table(dbms_xplan.display_awr('b78qhsch85g4a',1019599680,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1960*/;
SELECT * FROM table(dbms_xplan.display_awr('65kq2v1ubybps',3138703971,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1820*/;
SELECT * FROM table(dbms_xplan.display_awr('1dj17ra70c801',1175874548,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1460*/;
SELECT * FROM table(dbms_xplan.display_awr('3w71v896s7m5d',3207074729,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:500*/;
SELECT * FROM table(dbms_xplan.display_awr('35mz5bw7p5ubw',2447377432,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:170*/;

Ultimately, I have needed to look through the SQL plans that use an index to decide whether I need to keep that index or decide whether the statement would perform adequately using another index. In this case, on this particular system, I think the index PSMPROJ_RESOURCE would be adequate for this statement, and I would consider dropping PSLPROJ_RESOURCE.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">>SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;
--------------------
UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM
ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND
j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE j1.emplid =
j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <= p.trans_dt) AND
j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2 WHERE j2.emplid =
j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt = j.effdt)),
p.deptid )
WHERE p.process_instance = …
AND EXISTS ( SELECT
j.deptid FROM ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd =
p.empl_rcd AND j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE
j1.emplid = j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <=
p.trans_dt) AND j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2
WHERE j2.emplid = j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt =
j.effdt))

Plan hash value: 821236869

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | 63104 (100)| |
| 1 | UPDATE | PS_PROJ_RESOURCE | | | | |
| 2 | INDEX RANGE SCAN | PSLPROJ_RESOURCE | 365 | 11315 | 22 (0)| 00:00:01 |
| 3 | INDEX RANGE SCAN | PSAJOB | 1 | 23 | 3 (0)| 00:00:01 |
| 4 | SORT AGGREGATE | | 1 | 20 | | |
| 5 | INDEX RANGE SCAN| PSAJOB | 1 | 20 | 3 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 23 | | |
| 7 | INDEX RANGE SCAN| PSAJOB | 1 | 23 | 3 (0)| 00:00:01 |
| 8 | INDEX RANGE SCAN | PSAJOB | 1 | 29 | 3 (0)| 00:00:01 |
| 9 | SORT AGGREGATE | | 1 | 20 | | |
| 10 | INDEX RANGE SCAN | PSAJOB | 1 | 20 | 3 (0)| 00:00:01 |
| 11 | SORT AGGREGATE | | 1 | 23 | | |
| 12 | INDEX RANGE SCAN | PSAJOB | 1 | 23 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

I carried on with examination of SQL statements and execution plans to determine whether each index is really needed or another index (or even no index at all) would do as well.  This decision also requires some background knowledge about the application. Eventually, I decided that I want drop the J, L and N indexes on PROJ_RESOURCE and just keep M. 

Limitations of Method

    AWR does not capture all SQLs, nor all SQL plans. First the SQL has to be in the library cache and then it must be one of the top-n. A SQL that is efficient because it uses an appropriate index may not be captured, and will not be detected by this approach. This might lead you to erronously believe that the index could be dropped.

      ASH data is purged after a period of time, by default 31 days. If an index is only used by a process that has not run within the retention period, then it will not be detected by this approach. This is another reason to retain ASH and AWR in a repository for a longer period. I have heard 400 days suggested, so that you have ASH for a year and a month.

        • However, this also causes the SYSAUX tablespace to be become very large, so I would suggest regularly moving the data to a separate database. I know one customer who has built a central AWR repository for all their production and test databases and automated regular transfer of data. That repository has been of immense diagnostic value.

        [Update] This analysis will not detect index use in support constraint validation (PeopleSoft doesn't use database referential integrity constraints).  As Mark Farnham points out below, that may be a reason for retaining a particular index.

        Getting Rid of Indexes 

        Obviously any index changes need to be tested carefully in all the places that reference the index, but on the other hand it is not viable to do a full regression test every time you want to change an index.

          Therefore, if all the testing is successful and you decide to go ahead and drop the index in production, you might prefer to make it invisible first for a while before actually dropping it. It is likely that the indexes you choose to examine are large and will take time to rebuild. An invisible index will not be used by the optimizer, but it will continue to be maintained during DML. If there are any unfortunate consequences, you can immediately make the index visible without having to rebuild it.

            Who is using this index?

            Or, to put it another way, I want to change or drop this index, who and what will I impact?

            The Challenge 

            The problem that I am going to outline is certainly not exclusive to PeopleSoft, but I am going to illustrate it with examples from PeopleSoft. I often find tables with far more indexes than are good for them.
            • The Application Designer tool makes it very easy for developers to add indexes to tables. Sometimes, too easy!
            • Sometimes, DBAs are too quick to unquestioningly follow the advice of the Oracle tuning advisor to add indexes.

            Recently, I have been working on 3 different PeopleSoft Financials systems where I have found major tables with a host of indexes.

            There are several concerns:

            • Indexes are maintained during data modification. The more indexes you have, the greater the overhead. 
            • The more indexes you have, particularly if they lead on the same columns, the more likely Oracle is to use the wrong one, resulting in poorer performance.
            • There is of course also a space overhead for each index, but this is often of less concern. 

            If you can get rid of an index, Oracle doesn't store, maintain or use it. 

            In some cases, I have wanted to remove unnecessary indexes, and in others to adjust indexes. However, this immediately raises the question of where are these indexes used, and who will be impacted by the change. Naturally, I turn to the Active Session History (ASH) to help me find the answers. 

            Index Maintenance Overhead during DDL 

            ASH reports the object number, file number, block number and (from 11g) row number within the block being accessed by physical file operations. However, the values reported in v$active_session_history (and later other views) are not reliable for other events because they are merely left over from the previous file event that reported them. So, we can profile the amount of time spent on physical I/O on different tables and indexes, but not for other forms of DB Time, such as CPU time, spent accessing the blocks in the buffer cache.
            Let me take an extreme example from PeopleSoft Global Payroll. The table PS_GP_RSLT_ACUM is one of the principal result tables. It has only a single unique index (with the same name). The table is populated with the simplest of insert statements.
            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">INSERT /*GPPRDMGR_I_ACUM*/ INTO PS_GP_RSLT_ACUM
            (CAL_RUN_ID ,EMPLID ,EMPL_RCD ,GP_PAYGROUP ,CAL_ID ,RSLT_SEG_NUM ,PIN_NUM ,EMPL_RCD_ACUM ,ACM_FROM_DT ,ACM_THRU_DT ,USER_KEY1 ,USER_KEY2 ,USER_KEY3 ,USER_KEY4 ,USER_KEY5 ,USER_KEY6 ,SLICE_BGN_DT ,SLICE_END_DT ,COUNTRY ,ACM_TYPE ,ACM_PRD_OPTN ,CALC_RSLT_VAL ,CALC_VAL ,USER_ADJ_VAL ,PIN_PARENT_NUM ,CORR_RTO_IND ,ORIG_CAL_RUN_ID ,SEQ_NUM8 ,VALID_IN_SEG_IND ,CALLED_IN_SEG_IND )
            VALUES
            (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,:20,:21,:22,:23,:24,:25,:26,:27,:28,:29,:30)

            I can profile the ASH data for just this statement over the last week on a production system. Note that DBA_OBJECTS and DBA_DATA_FILES are outer joined to the ASH data and only matched for events like 'db file%'

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT o.object_type, o.object_name
            , f.tablespace_name, NVL(h.event,'CPU+CPU Wait') event
            , SUM(10) ash_Secs
            FROM dba_hist_Active_sess_history h
            LEFT OUTER JOIN dba_objects o
            ON o.object_id = h.current_obj#
            AND h.event like 'db file%'
            LEFT OUTER JOIN dba_data_files f
            ON f.file_id = h.current_file#
            AND h.event like 'db file%'
            WHERE h.sql_id = '4ru0618dswz3y'
            AND h.sample_time >= sysdate-7
            GROUP BY o.object_type, o.object_name, h.event, f.tablespace_name
            ORDER BY ash_secs DESC
            /

            A full payroll calculation inserts over 3 million rows on this particular system. The calculation is run incrementally several times per week during which old rows are deleted and newly recalculated rows inserted.  Looking at just this insert statement:

            • 30% of the time is spent on CPU operations, we cannot profile that time further with ASH.
            • 38% of the time is spent reading from the table and index, yet this is a simple INSERT … VALUES statement.
            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">OBJECT_TYPE         OBJECT_NAME        TABLESPACE_NAME EVENT                      ASH_SECS
            ------------------- ------------------ --------------- ------------------------ ----------
            CPU+CPU Wait 1040
            UNDOTBS1 db file sequential read 900
            INDEX SUBPARTITION PS_GP_RSLT_ACUM GP201408IDX db file sequential read 750
            TABLE SUBPARTITION PS_GP_RSLT_ACUM GP201408TAB db file sequential read 550
            gc current grant 2-way 70
            cursor: pin S wait on X 60
            db file sequential read 10
            buffer exterminate 10
            row cache lock 10
            ----------
            3400

            More time is spent reading the index than the table.  That is not a surprise.  When you insert a row into a table, you also insert it into the index. Rows in index leaf blocks are ordered by the key columns, and the new entry must go into the right place, so you have to read down the index from the root block, through the branch blocks, to find the correct leaf block for the new entry.
            [Digression: Counter-intuitively index compression can improve DML performance. It does for this index.  The overhead of the compression processing can be outweighed by the saving in physical I/O.  It depends.]

            Profile Physical I/O by Object 

            I can twist this query around and profile DB_TIME by object for 'db file%' events

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT o.object_type, o.object_name, sum(10) ash_secs
            FROM dba_hist_active_sess_history h
            , dba_objects o
            WHERE o.object_id = h.current_obj#
            AND h.event LIKE 'db file%'
            AND h.sample_time > sysdate-7
            GROUP BY o.object_type, o.object_name
            ORDER BY ash_Secs DESC

            Now I can see upon which objects the most time is spent on physical I/O.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">OBJECT_TYP OBJECT_NAME          ASH_SECS
            ---------- ------------------ ----------
            TABLE PS_ITEM 101130
            INDEX PS_WS_ITEM 98750
            TABLE PS_PROJ_RESOURCE 97410
            TABLE PS_BI_LINE 85040
            INDEX PSAPSAPMSGSUBCON 75070
            TABLE PS_BI_HDR 37230
            TABLE PS_RS_ASSIGNMENT 29460
            INDEX PS_PSAPMSGPUBHDR 23230
            INDEX PS_BI_ACCT_ENTRY 21490
            TABLE PS_VOUCHER 21330
            TABLE PS_VCHR_ACCTG_LINE 21250
            TABLE PS_BI_ACCT_ENTRY 18860

            ----------
            sum 1382680

            This is a worthwhile exercise, it shows the sources of physical I/O in an application.

            However, if you want to find where an index is used, then this query will also identify SQL_IDs where the index is either used in the query or maintained by DML. If I am interested in looking for places where changing or deleting an index could have an impact then I am only interested in SQL query activity. ASH samples which relate to index maintenance are a false positive. Yet, I cannot simply eliminate ASH samples where the SQL_OPNAME is not SELECT because the index may be used in a query within the DML statement.

            Another problem with this method is that it matches SQL to ASH by object ID. If someone has rebuilt an index, then its object number changes.

            A different approach is required.

            Index Use from SQL Plans Captured by AWR 

            During an AWR snapshot the top-n SQL statements by each SQL criteria in the AWR report (Elapsed Time, CPU Time, Parse Calls, Shareable Memory, Version Count) , see dbms_workload_repository. The SQL plans are exposed by the view DBA_HIST_SQL_PLAN.

            On PeopleSoft systems, I generally recommend decreasing the snapshot interval from the default of 60 minutes to 15. The main reason is that SQL gets aged out of the library cache very quickly in PeopleSoft systems. They generate lots of dynamic code, often with literal values rather than bind variables. Cursor sharing is not recommended for PeopleSoft, so different bind variables result in different SQL_IDs. The dynamic code also results in different SQL IDs even with cursor sharing. Therefore, increasing the snapshot frequency means that will capture more SQL statement. This will increase total volume of the AWR repository simply because there are more snapshots. However, the overall volume of ASH data captured does not change, it just gets copied to the repository earlier.

            On DBA_HIST_SQL_PLAN the object ID, owner, type and name are recorded, so I can find the plans which referenced a particular object. I am going to carry on with the example from a PeopleSoft Financials system, and look at indexes on the PS_PROJ_RESOURCE table.

            These are some of the indexes on PS_PROJ_RESOURCE. We have 4 indexes that all lead on PROCESS_INSTANCE. I suspect that not all are essential, but I need to work out what is using them, and which one I should retain.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">                          Col
            INDEX_NAME Pos COLUMN_NAME COLUMN_EXPRESSION
            ------------------ ---------- -------------------- ----------------------------------

            PSJPROJ_RESOURCE 1 PROCESS_INSTANCE
            2 BUSINESS_UNIT_GL
            3 BUSINESS_UNIT
            4 PROJECT_ID
            5 ACTIVITY_ID
            6 CUST_ID

            PSLPROJ_RESOURCE 1 PROCESS_INSTANCE
            2 EMPLID
            3 EMPL_RCD
            4 TRANS_DT

            PSMPROJ_RESOURCE 1 PROCESS_INSTANCE
            2 BUSINESS_UNIT
            3 PROJECT_ID
            4 ACTIVITY_ID
            5 RESOURCE_ID

            PSNPROJ_RESOURCE 1 PROCESS_INSTANCE
            2 BUSINESS_UNIT
            3 TIME_RPTG_CD

            I find it easier to extract the ASH data to my own working storage table. For each index on PS_PROJ_RESOURCE, I am going to extract a distinct list of plan hash values. I will then extract all ASH data for those plans. Note, that I have not joined the SQL_ID on DBA_HIST_SQL_PLAN. That is because different SQL_IDs can produce the same execution plan. The plan is equally valid for all SQL_IDs that produce the plan, not just the one where the SQL_ID also matches.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">DROP TABLE my_ash purge
            /
            CREATE TABLE my_ash COMPRESS AS
            WITH p AS (
            SELECT DISTINCT p.plan_hash_value, p.object#, p.object_owner, p.object_type, p.object_name
            FROM dba_hist_sql_plan p
            WHERE p.object_name like 'PS_PROJ_RESOURCE'
            AND p.object_type LIKE 'INDEX%'
            AND p.object_owner = 'SYSADM'
            )
            SELECT p.object# object_id, p.object_owner, p.object_type, p.object_name
            , h.*
            FROM dba_hist_active_sess_history h
            , p
            WHERE h.sql_plan_hash_value = p.plan_hash_value
            /

            I am fortunate that PeopleSoft is a well instrumented application. Module and Action are set to fairly sensible values that will tell me whereabouts in the application the ASH sample relates. In the following query, I have omitted any ASH data generated by SQL*Plus, Toad, or SQL Developer, and also any generated by Oracle processes to prevent statistics collection jobs being included.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Set pages 999 lines 150 trimspool on
            break on object_name skip 1
            compute sum of ash_secs on object_name
            column ash_secs heading 'ASH|Secs' format 9999999
            column module format a20
            column action format a32
            column object_name format a18
            column max_sample_time format a19 heading 'Last|Sample'
            column sql_plans heading 'SQL|Plans' format 9999
            column sql_execs heading 'SQL|Execs' format 99999
            WITH h AS (
            SELECT object_name
            , CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
            WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
            ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
            END as module
            , CASE WHEN h.action LIKE 'PI=%' THEN NULL
            ELSE h.action
            END as action
            , CAST(sample_time AS DATE) sample_time
            , sql_id, sql_plan_hash_value, sql_exec_id
            FROM my_ash h
            )
            SELECT object_name, module, action
            , sum(10) ash_secs
            , COUNT(DISTINCT sql_plan_hash_value) sql_plans
            , COUNT(DISTINCT sql_id||sql_plan_hash_value||sql_exec_id) sql_execs
            , MAX(sample_time) max_sample_time
            FROM h
            WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
            AND NOT LOWER(module) LIKE 'sql%'
            GROUP BY object_name, module, action
            ORDER BY SUBSTR(object_name,4), object_name, ash_Secs desc
            /
            Spool off

            I now have a profile of how much each index is used. In this particular case I found something using every index.  It is possible that you will not find anything that uses some indexes.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">                                                                             ASH   SQL    SQL Last
            OBJECT_NAME MODULE ACTION Secs Plans Execs Sample
            ------------------ -------------------- -------------------------------- ------- ----- ------ -------------------

            PSJPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step24.S 7300 1 66 06:32:57 27/08/2014
            PC_PRICING GFCPBINT_AE.CallmeA.Step24.S 40 1 2 08:38:57 22/08/2014
            ****************** -------
            sum 7340

            PSLPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step28.S 1220 1 53 06:33:17 27/08/2014
            ****************** -------
            sum 1220

            PSMPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.XxBiEDM.Step07.S 60 2 6 18:35:18 20/08/2014
            ****************** -------
            sum 60

            PSNPROJ_RESOURCE PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step26.S 6720 1 49 18:53:58 26/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step30.S 3460 1 60 06:33:27 27/08/2014
            GFCOA_CMSN GFCOA_CMSN.01INIT.Step01.S 2660 1 47 19:19:40 26/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step06.S 1800 1 52 18:53:28 26/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeG.Step01.S 1740 1 61 06:34:17 27/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step02.S 1680 1 24 18:53:18 26/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step10.S 1460 1 33 17:26:26 22/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step08.S 920 1 26 17:26:16 22/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step36.S 460 1 18 18:26:38 20/08/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Step09.S 420 1 16 06:33:07 27/08/2014
            PC_PRICING GFCPBINT_AE.CallmeG.Step01.S 200 1 10 08:09:55 22/08/2014
            PC_AP_TO_PC GFCPBINT_AE.CallmeH.Step00A.S 170 1 17 21:53:26 21/08/2014
            PC_PRICING GFCPBINT_AE.CallmeA.Step36.S 20 1 1 08:02:46 05/08/2014
            PC_PRICING GFCPBINT_AE.CallmeA.Step30.S 20 1 1 13:42:48 04/08/2014
            PC_PRICING GFCPBINT_AE.CallmeA.Step06.S 20 1 1 15:58:35 28/07/2014
            PC_TL_TO_PC GFCPBINT_AE.CallmeA.Pseudo.S 20 1 1 19:45:11 06/08/2014
            ****************** -------
            sum 21770

            The next stage is to look at individual SQL statements This query looks for which SQL statement is using a particular index on PROJ_RESOURCE. If I can't find the SQL which cost the most time, then just choose another SQL with the same plan

            • I have found that sometimes a plan is captured by AWR, but the SQL statement is not. Personally, I think that is a bug. Working around it has made the following query quite complicated.
            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Break on object_name skip 1 
            column ash_secs heading 'ASH|Secs' format 9999999
            Set long 50000
            Column cmd Format a200
            Spool dmk

            WITH h AS (
            SELECT h.object_name
            , CASE WHEN h.module IS NULL THEN REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
            WHEN h.module LIKE 'PSAE.%' THEN REGEXP_SUBSTR(h.module,'[^.]+',1,2)
            ELSE REGEXP_SUBSTR(h.program,'[^.@]+',1,1)
            END as module
            , CASE WHEN h.action LIKE 'PI=%' THEN NULL
            ELSE h.action
            END as action
            , h.sql_id, h.sql_plan_hash_value
            , t.command_type –-not null if plan and statement captured
            FROM my_ash h
            LEFT OUTER JOIN (
            SELECT t1.*
            FROM dba_hist_sqltext t1
            , dba_hist_sql_plan p1
            WHERE t1.sql_id = p1.sql_id
            AND p1.id = 1
            ) t
            ON t.sql_id = h.sql_id
            AND t.dbid = h.dbid
            WHERE h.object_name IN('PSMPROJ_RESOURCE')
            AND h.object_Type = 'INDEX'
            AND h.object_owner = 'SYSADM'
            And NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
            AND NOT LOWER(module) LIKE 'sql%'
            ), x AS ( --aggregate DB time by object and statement
            SELECT object_name, sql_id, sql_plan_hash_value
            , sum(10) ash_secs
            , 10*COUNT(command_type) sql_secs --DB time for captured statements only
            FROM h
            WHERE NOT LOWER(module) IN('oracle','toad','sqlplus','sqlplusw')
            AND NOT LOWER(module) LIKE 'sql%'
            GROUP BY object_name, sql_id, sql_plan_hash_value
            ), y AS ( --rank DB time per object and plan
            SELECT object_name, sql_id, sql_plan_hash_value
            , ash_secs
            , SUM(ash_secs) OVER (PARTITION BY object_name, sql_plan_hash_value) plan_ash_secs
            , row_number() OVER (PARTITION BY object_name, sql_plan_hash_value ORDER BY sql_Secs DESC) ranking
            FROM x
            ), z AS (
            SELECT object_name
            , CASE WHEN t.sql_text IS NOT NULL THEN y.sql_id
            ELSE (SELECT t1.sql_id
            FROM dba_hist_sqltext t1
            , dba_hist_sql_plan p1
            WHERE t1.sql_id = p1.sql_id
            AND p1.plan_hash_value = y.sql_plan_hash_value
            AND rownum = 1) --if still cannot find statement just pick any one
            END AS sql_id
            , y.sql_plan_hash_value, y.plan_ash_secs
            , CASE WHEN t.sql_text IS NOT NULL THEN t.sql_text
            ELSE (SELECT t1.sql_Text
            FROM dba_hist_sqltext t1
            , dba_hist_sql_plan p1
            WHERE t1.sql_id = p1.sql_id
            AND p1.plan_hash_value = y.sql_plan_hash_value
            AND rownum = 1) --if still cannot find statement just pick any one
            END AS sql_text
            from y
            left outer join dba_hist_sqltext t
            on t.sql_id = y.sql_id
            WHERE ranking = 1 --captured statement with most time
            )
            SELECT *
            --'SELECT * FROM table(dbms_xplan.display_awr('''||sql_id||''','||sql_plan_hash_value||',NULL,''ADVANCED''))/*'||object_name||':'||plan_ash_Secs||'*/;' cmd
            FROM z
            ORDER BY object_name, plan_ash_secs DESC
            /
            Spool off

            So now I can see the individual SQL statements.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">PSJPROJ_RESOURCE   f02k23bqj0xc4          3393167302          7340 UPDATE PS_PROJ_RESOURCE C SET (C.Operating_Unit, C.CHARTFIELD1, C.PRODUCT, C.CLA
            SS_FLD, C.CHARTFIELD2, C.VENDOR_ID, C.contract_num, C.contract_line_num, …

            PSLPROJ_RESOURCE 2fz0gcb2774y0 821236869 1220 UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM ps_job j WH
            ERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND j.effdt = ( SELECT MAX (…

            PSMPROJ_RESOURCE 96cdkb7jyq863 338292674 50 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
            PS_PROJ_RESOURCE x WHERE x.process_instance = …

            1kq9rfy8sb8d4 4135884683 10 UPDATE PS_GFCBI_EDM_TA04 a SET a.GFCni_amount = ( SELECT x.resource_amount FROM
            PS_PROJ_RESOURCE x WHERE x.process_instance = …

            PSNPROJ_RESOURCE ga2x2u4jw9p0x 2282068749 6760 UPDATE PS_PROJ_RESOURCE P SET (P.RESOURCE_TYPE, P.RESOURCE_SUB_CAT) = …

            9z5qsq6wrr7zp 3665912247 3500 UPDATE PS_PROJ_RESOURCE P SET P.TIME_SHEET_ID = …

            If I replace the last select clause with the commented line, then I can generate the commands to extract the statement and plan from the AWR repository.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">SELECT * FROM table(dbms_xplan.display_awr('45ggt0yfrh5qp',3393167302,NULL,'ADVANCED'))/*PSJPROJ_RESOURCE:7340*/;

            SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;

            SELECT * FROM table(dbms_xplan.display_awr('96cdkb7jyq863',338292674,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:50*/;

            SELECT * FROM table(dbms_xplan.display_awr('1kq9rfy8sb8d4',4135884683,NULL,'ADVANCED'))/*PSMPROJ_RESOURCE:10*/;

            SELECT * FROM table(dbms_xplan.display_awr('ga2x2u4jw9p0x',2282068749,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:6760*/;
            SELECT * FROM table(dbms_xplan.display_awr('9z5qsq6wrr7zp',3665912247,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3500*/;
            SELECT * FROM table(dbms_xplan.display_awr('b28btd6k3x8jt',1288409804,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:3060*/;
            SELECT * FROM table(dbms_xplan.display_awr('avs70c19khxmw',2276811880,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:2660*/;
            SELECT * FROM table(dbms_xplan.display_awr('b78qhsch85g4a',1019599680,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1960*/;
            SELECT * FROM table(dbms_xplan.display_awr('65kq2v1ubybps',3138703971,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1820*/;
            SELECT * FROM table(dbms_xplan.display_awr('1dj17ra70c801',1175874548,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:1460*/;
            SELECT * FROM table(dbms_xplan.display_awr('3w71v896s7m5d',3207074729,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:500*/;
            SELECT * FROM table(dbms_xplan.display_awr('35mz5bw7p5ubw',2447377432,NULL,'ADVANCED'))/*PSNPROJ_RESOURCE:170*/;

            Ultimately, I have needed to look through the SQL plans that use an index to decide whether I need to keep that index or decide whether the statement would perform adequately using another index. In this case, on this particular system, I think the index PSMPROJ_RESOURCE would be adequate for this statement, and I would consider dropping PSLPROJ_RESOURCE.

            #eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 85%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">>SELECT * FROM table(dbms_xplan.display_awr('8ntxj3694r6kg',821236869,NULL,'ADVANCED'))/*PSLPROJ_RESOURCE:1220*/;
            --------------------
            UPDATE ps_proj_resource p SET p.deptid = NVL (( SELECT j.deptid FROM
            ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd = p.empl_rcd AND
            j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE j1.emplid =
            j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <= p.trans_dt) AND
            j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2 WHERE j2.emplid =
            j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt = j.effdt)),
            p.deptid )
            WHERE p.process_instance = …
            AND EXISTS ( SELECT
            j.deptid FROM ps_job j WHERE j.emplid = p.emplid AND j.empl_rcd =
            p.empl_rcd AND j.effdt = ( SELECT MAX (j1.effdt) FROM ps_job j1 WHERE
            j1.emplid = j.emplid AND j1.empl_rcd = j.empl_rcd AND j1.effdt <=
            p.trans_dt) AND j.effseq = ( SELECT MAX (j2.effseq) FROM ps_job j2
            WHERE j2.emplid = j.emplid AND j2.empl_rcd = j.empl_rcd AND j2.effdt =
            j.effdt))

            Plan hash value: 821236869

            -----------------------------------------------------------------------------------------
            | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
            -----------------------------------------------------------------------------------------
            | 0 | UPDATE STATEMENT | | | | 63104 (100)| |
            | 1 | UPDATE | PS_PROJ_RESOURCE | | | | |
            | 2 | INDEX RANGE SCAN | PSLPROJ_RESOURCE | 365 | 11315 | 22 (0)| 00:00:01 |
            | 3 | INDEX RANGE SCAN | PSAJOB | 1 | 23 | 3 (0)| 00:00:01 |
            | 4 | SORT AGGREGATE | | 1 | 20 | | |
            | 5 | INDEX RANGE SCAN| PSAJOB | 1 | 20 | 3 (0)| 00:00:01 |
            | 6 | SORT AGGREGATE | | 1 | 23 | | |
            | 7 | INDEX RANGE SCAN| PSAJOB | 1 | 23 | 3 (0)| 00:00:01 |
            | 8 | INDEX RANGE SCAN | PSAJOB | 1 | 29 | 3 (0)| 00:00:01 |
            | 9 | SORT AGGREGATE | | 1 | 20 | | |
            | 10 | INDEX RANGE SCAN | PSAJOB | 1 | 20 | 3 (0)| 00:00:01 |
            | 11 | SORT AGGREGATE | | 1 | 23 | | |
            | 12 | INDEX RANGE SCAN | PSAJOB | 1 | 23 | 3 (0)| 00:00:01 |
            -----------------------------------------------------------------------------------------

            I carried on with examination of SQL statements and execution plans to determine whether each index is really needed or another index (or even no index at all) would do as well.  This decision also requires some background knowledge about the application. Eventually, I decided that I want drop the J, L and N indexes on PROJ_RESOURCE and just keep M. 

            Limitations of Method

              AWR does not capture all SQLs, nor all SQL plans. First the SQL has to be in the library cache and then it must be one of the top-n. A SQL that is efficient because it uses an appropriate index may not be captured, and will not be detected by this approach. This might lead you to erronously believe that the index could be dropped.

                ASH data is purged after a period of time, by default 31 days. If an index is only used by a process that has not run within the retention period, then it will not be detected by this approach. This is another reason to retain ASH and AWR in a repository for a longer period. I have heard 400 days suggested, so that you have ASH for a year and a month.

                  • However, this also causes the SYSAUX tablespace to be become very large, so I would suggest regularly moving the data to a separate database. I know one customer who has built a central AWR repository for all their production and test databases and automated regular transfer of data. That repository has been of immense diagnostic value.

                  [Update] This analysis will not detect index use in support constraint validation (PeopleSoft doesn't use database referential integrity constraints).  As Mark Farnham points out below, that may be a reason for retaining a particular index.

                  Getting Rid of Indexes 

                  Obviously any index changes need to be tested carefully in all the places that reference the index, but on the other hand it is not viable to do a full regression test every time you want to change an index.

                    Therefore, if all the testing is successful and you decide to go ahead and drop the index in production, you might prefer to make it invisible first for a while before actually dropping it. It is likely that the indexes you choose to examine are large and will take time to rebuild. An invisible index will not be used by the optimizer, but it will continue to be maintained during DML. If there are any unfortunate consequences, you can immediately make the index visible without having to rebuild it.

                      Order of Operation

                      One response to my series on reading execution plans was an email request asking me to clarify what I meant by the “order of operation” of the lines of an execution plan. Looking through the set of articles I’d written I realised that I hadn’t made any sort of formal declaration of what I meant, all I had was a passing reference in the introduction to part 4; so here’s the explanation.

                       

                      By “order of operation” I mean the order in which the lines of an execution plan start to produce a rowsource. It’s worth stating this a little formally as any other interpretation could lead to confusion; consider the following simple hash join:

                      
                      -------------------------------------
                      | Id  | Operation           | Name  |
                      -------------------------------------
                      |   0 | SELECT STATEMENT    |       |
                      |*  1 |  HASH JOIN          |       |
                      |   2 |   TABLE ACCESS FULL | T1    |
                      |   3 |   TABLE ACCESS FULL | T2    |
                      -------------------------------------
                      
                      

                      The tablescan at line 2 is the first operation to start producing a rowsoruce; the hash join at line 1 consumes the output and builds a hash table – so it definitely has to do some work before line 3 starts to run – but it doesn’t start generating a rowsource at this point. It’s only after line 3 starts its tablescan and starts to generate its rowsource that line 1 can produce a rowsource by consuming the rows coming from line 3 and probing the in-memory hash table. So line 1 starts to produce its rowsource only after line 3 starts producing its rowsource. The “order of operation” is 2, 3, 1, 0. Perhaps, for the purposes of avoiding confusion, it would be better in future if I remembered to say: “the order of rowsource generation”.

                      12.1.0.2 Introduction to Zone Maps Part I (Map Of The Problematique)

                      Zone Maps are new index-like structures that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A zone is simply a range of contiguous blocks within a table. Zone Maps are similar in concept to Exadata storage […]

                      Eight days to go – Who are you voting for in the NH Republican Senatorial Primary?

                      SLOB Patch. AWR Post-Processing Script (awr_info.sh) Fix.

                      BLOG UPDATE 2014.09.11: Please note: the following is a link to a more recent update of the awr_info.sh script. This version adds DB Time, DB CPU and Logical I/O: click here. The MD5 sum for this version of awr_info.sh is:  a28a38b11040bb94f08a8f817792c75c

                      The SLOB kit comes with a little script that extracts interesting information from the awr.txt file produced at the end of a SLOB test. This is just a quick blog entry to point folks to a patched version of awr_info.sh that works properly with all Oracle Database 11g releases as well as Oracle Database 12c.

                      Oracle changed AWR format in the 11.2.0.4 and 12c releases so the old awr_info.sh script (in the publicly available SLOB kit) has been faulty for some time now.

                      I have a release of SLOB in the works that will include this awr_info.sh as well as improved data loader and improvements to the driver script (runit.sh) that includes optional, tunable think time between iterations of the SLOB work loop in slob.sql. For the time being please get a copy of the patched version of awr_info.sh.

                      New awr_info.sh Output

                      This version of awr_info.sh also gleans and outputs logical read (SGA buffer pool cached block accesses) data.

                      The following screen shot shows the patched awr_info.sh generating proper output for awr.txt files collected by SLOB databases running out of the 11.2.0.3, 11.2.0.4 and 12c releases.

                      2014.09.11-awr_info-example

                      The following picture is what Microsoft Excel looks like when I cut and paste the output of awr_info.sh. I’ve highlighted the new column for logical reads.

                      2014.09.11-awr_info-xls-example

                      Exadata Cell Single Block Physical Reads?

                      Yes, the above picture does show AWR output from a run where the top wait event was cell single block physical read. Exadata? Yes! That’s because SLOB users often share their testing results from the Exadata platform.  However, I do not get enough Exadata AWR reports to work through all of the awr_info.sh issues related to Exadata. To that end, latency information is not calculated and presented as is the case with db file sequential read. For what it’s worth this particular AWR report shows Exadata single block reads serviced with average latencies of 507 microseconds ( 7233/14256602).

                      Where To Get The Patch?

                      The following is a link to my syncplicity share with the README for this patch: Click here for the patch README.
                      Finally, the patched version of awr_info.sh is available here:  Click here for the latest awr_info.sh script.  Please verify the MD5 sum after you download this script to match a28a38b11040bb94f08a8f817792c75c.

                      Filed under: oracle