Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Is this really a patch set?

The title of the post is a question taken from a MOS Doc ID 1189783.1 named “Important Changes to Oracle Database Patch Sets Starting With″. Yes, the first patch set for 11gR2 is out for Linux x86 and x86-64 (patch number is 10098816). It’s 5G in size, BTW. There’s no patch set release notes, [...]

PeopleSoft Run Control Purge Utility

Run Control records are used to pass parameters into processes scheduled processes. These tables tend to grow, and are rarely purged. Generally, once created a run control is not deleted.  When operator accounts are deleted, the Run Controls remain, but are no longer accessible to anyone else.

I have worked on systems where new Run Controls, whose IDs contain either a date or sequence number, are generated for each process. The result is that the Run Control tables, especially child tables, grow quickly and if not regularly managed will become very large. On one system, I found 18 million rows on one table!

#eeeeee; border: 0px solid rgb(0, 0, 0); overflow: auto; padding-left: 4px; padding-right: 4px; width: 100%;">
--------------- ------------------ ---------- ----------
TL_RUN_CTRL_GRP RUN_CNTL_ID 18424536 126377

I have written a simple Application Engine process, GFC_RC_ARCH, that purges old Run Controls from these tables.  It can be downloaded from my website.

Run Control records are easily identified. They are characterised by:

  • the first column of these tables is always OPRID, and the second is either RUNCNTLID or RUN_CNTL_ID,
  • these two columns are also the first two columns of the unique key,
  • the Run Control tables appear on pages of components that are declared as the process security component for that process.

I have decided that if the combination of OPRID and RUN_CNTL_ID does not appear in the process scheduler request table, PSPRCSRQST, then the Run Control record should be deleted. Thus, as the delivered Process Scheduler Purge process, PRCSPURGE, deletes rows from the Process Scheduler tables, so my purge process will delete rows from the Run Control tables.

I have chosen to make these two Application Engine processes mutually exclusive, so the Process Scheduler will not run both at the same time, but that configuration cannot be delivered in an Application Designer project.

Oracle Security Presentation Available

I was at the UKOUG UNIX SIG in Thames Valley park - Oracle's UK headquarters a couple of days ago, wednesday the 8th September to do a two part talk for the SIG. Of course my two talks were on....[Read More]

Posted by Pete On 10/09/10 At 12:59 PM

Oracle Open World 2010に行かれる方へ


Wallis Pereira, Kevin Closson:2010-09-20 (Monday) Moscone South Rm 302

プレゼンテーションタイトルが"Optimizing Servers for Oracle Database Performance"と平凡なので見落としてしまう。Kevin ClossonはExadataのオピニオン・リーダーです。

Oracle Closed Worldとい「オフ会」もある。
これは僕の所属するMiracleコンサルティング・グループの代表Mogens Nørgaardが発起人です。
We promise everything and demand nothing, and we are experts in the art of buying things expensively and selling them cheaply.

ここにもOaktableのACE、Alex Gorbachevがいる。



Oracle Closed World 2010

Get ready for Oracle Closed World 2010 !

Mon, Tue, Wed 1pm-4pm at the Thirsty Bear during Oracle Open World.
OCW is bigger and better this year. That means more beer, more technical goodness, chicken wings and 3+ keynotes.

Data visualization weak in the BI industry

From Stephen Few's blog

The industry that has claimed responsibility for helping organizations get real value from information goes by the name “business intelligence.” This term was originally coined by Hans Peter Luhn, an IBM researcher, way back in 1958. Luhn defined business intelligence as “the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal.” The term didn’t catch on, however, until sometime after Howard Dresner, best known for his work at Gartner, used it again to breathe new life into to the data warehousing industry. Dresner defined the term as concepts and methods to improve business decision making using fact-based support systems.

Contained in these early definitions was the seed of an inspiring vision that caused people like me to imagine a better world, but the business intelligence industry has done little to help us achieve the vision of the people who coined the term. When Thornton May was interviewing people for his book “The New Know”, he asked a prominent venture capitalist known for his 360-degree view of the technology industry what he thought of when he heard the phrase business intelligence. His response was “big software, little analysis.” Sadly, his response rings true.

Oracle Data Mining Primer (1 of 3): Training Models

This post is the first of three intended to provide a quick demonstration of the Oracle Data Mining API.  Deeper concepts will be glossed over in the interest of keeping this primer material short and to the point.

Training a data mining model involves the process of learning the patterns and relationships in the underlying data.  Identifying a business problem, gathering data, and cleaning data are critical steps that must be performed prior to training a model, and these steps tend to consume large amounts of time and effort, but they are outside the scope of this API-focused post.

This example will use the Census Income dataset (Ref: Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science).  The census income data has two tables: ADULT_DATA (32,561 rows) for training a model and ADULT_TEST (16,281 rows) for testing the accuracy of the trained model.  The data contains a number of attributes, including a field that separates the records into individuals earning >50K per year from those earning <=50K per year.

When training the model, it is important for the algorithm to correctly interpret the data.  For example, an attribute called "AGE" may contain numerical values (e.g., 18, 45, etc.), categorical values (e.g., 'TEEN', 'ELDERLY', etc.), and it may even contain numbers that simply represent categorical values (e.g., 1 represents 'ELDERLY', 2 represents 'TEEN', etc.).  Machine learning algorithms often process numerical data differently from categorical data, so it is important that the algorithms correctly interpret the input.  The Oracle Data Mining API uses the following heuristic: treat columns with a database datatype of NUMBER or FLOAT as numeric in nature and treat columns with a database datatype of CHAR or VARCHAR2 as categorical in nature.  If the database datatype does not match the resulting interpretation, then the type should be coerced before training a model.

For this example, the following census attributes are stored as NUMBER: AGE, FNLWGT, EDUCATION_NUM, CAPITAL_GAIN, CAPITAL_LOSS, and HOURS_PER_WEEK.  The remaining attributes, including INCOME level, are all stored as VARCHAR2.

To guide the machine learning process, an optional settings table may be provided as input to the training stage.  The settings table can identify a specific algorithm to use as well as various algorithm and data preparation choices.  For this example, we will use the Support Vector Machine algorithm with automatic data preparation.

create table odm_settings
(setting_name varchar2(30), setting_value varchar2(4000));
insert into odm_settings values ('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');
insert into odm_settings values ('PREP_AUTO','ON');
commit;  -- commit needed for releases prior to 11gR2

To train the model, we need to provide a few pieces of information (including the input data and the function that is to be performed).  The following API invocation trains a model to learn the patterns in the data that differentiate individuals based on their income level.

   model_name       => 'PREDICT_INCOME',     -- name of resulting object
   mining_function  => 'CLASSIFICATION',     -- mining function of interest
   data_table_name  => 'ADULT_DATA',         -- input data for training
   case_id_column_name   => NULL,            -- unique case/record identifier
   target_column_name    => 'INCOME',        -- attribute of interest
   settings_table_name   => 'ODM_SETTINGS'); -- optional settings

Great, now what?
The create_model procedure persists a new schema object in the database.  The next two posts will demonstrate the value of this mining model object.

SQL Tuning Best Practice – Visualizing the Query

SQL tuning can be a challenge to even the most experienced database professional. So, how can you give yourself a leg up in optimizing a difficult query? As you will discover in this best practice, visualizing the query can help you quickly understand the query, as well as define its optimal execution path.

A best practice for visualizing the query is to draw the query out as a Visual SQL Tuning (VST) diagram.

A VST diagram consists of the following elements:

  • tables as nodes
  • joins as connectors
  • filter percentages
  • relationships
  • join sizes

Historically, SQL tuning has been limited to the following two approaches:

  1. Explain plan for the query; and/or
  2. Trace the query

For example, a trace output might look something like this:e

PARSING IN CURSOR #2 len=53 dep=0 uid=61 oct=3 lid=61 tim=1151519905950403 hv=2296704914 ad='4e50010c'
SELECT 'Hello, world; today is ' || SYSDATE FROM dual
PARSE #2:c=4000,e=1540,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1151519905950397
EXEC #2:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1151519906034782
WAIT #2:
nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1151519906034809
FETCH #2:c=0,e=29,p=0,cr=0,cu=0,mis=0,r=1,dep=0,og=1,tim=1151519906034864
WAIT #2:
nam='SQL*Net message from client' ela= 215 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1151519906035133
FETCH #2:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1151519906035165
WAIT #2:
nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1151519906035188
WAIT #2:
nam='SQL*Net message from client' ela= 192 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1151519906035400
STAT #2 id=1
cnt=1 pid=0 pos=1 obj=0 op='FAST DUAL (cr=0 pr=0 pw=0 time=3 us)

Not exactly user-friendly. I think I might need a couple of double espressos before I plunge into this one.

How about an explain plan? An explain plan is better, but still lacking insightful information


| Id | Operation | Name | Starts | E-Rows | A-Rows |


| 1 | HASH GROUP BY | | 1 | 1 | 1 |

|* 2 | FILTER | | 1 | | 1909 |


| 4 | NESTED LOOPS | | 1 | 165 | 6827 |

|* 5 | HASH JOIN | | 1 | 165 | 3413 |

|* 6 | HASH JOIN | | 1 | 165 | 3624 |

| 7 | TABLE ACCESS BY INDEX ROWID | WB_JOB | 1 | 242 | 2895 |

| 8 | NESTED LOOPS | | 1 | 233 | 2897 |


|* 10 | INDEX RANGE SCAN | PS0PAY_CALENDAR | 1 | 1 | 1 |

|* 11 | INDEX RANGE SCAN | WBBJOB_B | 1 | 286 | 2895 |

|* 12 | TABLE ACCESS FULL | WB_RETROPAY_EARNS | 1 | 27456 | 122K|

| 13 | TABLE ACCESS FULL | PS_RETROPAY_RQST | 1 | 13679 | 13679 |

|* 14 | INDEX RANGE SCAN | PS#RETROPAYPGM_TBL | 3413 | 1 | 3413 |

| 15 | SORT AGGREGATE | | 1791 | 1 | 1791 |

| 16 | FIRST ROW | | 1791 | 1 | 1579 |

|* 17 | INDEX RANGE SCAN (MIN/MAX) | WB_JOB_F | 1791 | 1 | 1579 |

| 18 | SORT AGGREGATE | | 1539 | 1 | 1539 |

| 19 | FIRST ROW | | 1539 | 1 | 1539 |

|* 20 | INDEX RANGE SCAN (MIN/MAX) | WB_JOB_G | 1539 | 1 | 1539 |


Predicate Information (identified by operation id):


2 - filter(("B"."EFFDT"= AND "B"."EFFSEQ"=))

3 - filter("E"."OFF_CYCLE"="A"."PAY_OFF_CYCLE_CAL")


6 - access("C"."EMPLID"="B"."EMPLID" AND "C"."EMPL_RCD#"="B"."EMPL_RCD#")

10 - access("A"."RUN_ID"='PD2' AND "A"."PAY_CONFIRM_RUN"='N')

11 - access("B"."COMPANY"="A"."COMPANY" AND "B"."PAYGROUP"="A"."PAYGROUP")

12 - filter(("C"."RETROPAY_PRCS_FLAG"='C' AND "C"."RETROPAY_LOAD_SW"='Y'))

14 - access("E"."RETROPAY_PGM_ID"="D"."RETROPAY_PGM_ID")

17 - access("F"."EMPLID"=:B1 AND "F"."EMPL_RCD#"=:B2 AND "F"."EFFDT"<=:B3)

20 - access("G"."EMPLID"=:B1 AND "G"."EMPL_RCD#"=:B2 AND "G"."EFFDT"=:B3)

If I was greeted with this first thing in the morning, I'd definitely get a coffee before attempting to operate on it.

The explain plan is just a set of directions that the database takes to retrieve the data from the database in order to satisfy an SQL query. An analogy would be if we imagine the SQL query as a google maps request for directions to get from my home in the suburbs to my office downtown. The explain plan is like the turn-by-turn route list. But for any of us who have gotten lost well know, it’s also nice to be able to see the map as well as the turn-by-turn directions.

If google maps only gave me the directions without the map I'd be pretty upset, probably lost and totally anxious about how long it's going to take me to get to my destination.

Are these a good set of directions? (from where I live to where I work, in San Francisco):

It might help if I had a map:

On the map, the directions look pretty sensible, but what if there was a traffic jam on some streets and not others? How about a map that included traffic jams (i.e. bottlenecks)?

Whether the turn-by-turn directions are good or bad requires a map (i.e. visual information). So how do we get a map of for an SQL query? Let's start with the terrain. We can describe the terrain by laying out the tables in the query graphically and drawing connectors for every join, or routes through the diagram. A diagram could look like this:

Now we have a map of the terrain. Seeing the terrain allows us to pick out an optimal explain plan. To pick an optimal explain plan, find tables that have the most selective filter and start the execution there. In the above case there is only one table with a filter. This filter is denoted by the F icon on the table BROKER. Start at BROKER and then look for joins that keep the running row set size to a minimum. The running row set size is the number of rows we carry from the previous node to the next node in the VST. We have to visit all the nodes, but the fewer rows we have to carry with us from one node to the next, the faster we’re going to arrive at our destination.

Whether you draw out the VST by hand or use a tool, a best practice is to draw the child table above the parent table. This layout encourages us to navigate the VST join down to keep our running row set the same size vs. joining up which will generally increase the row set size:

Thus, the idea is to start at the most selective filter and join down before joining up. In our example there is only one filter on BROKER, so we start there, we join down to OFFICE_LOCATION, then finish off with CLIENT_TRANSACTION, and end at CLIENT.

But what if we have more than one filter and we have to choose? In that case we have to calculate the filter percentage. The filter percentage is simply the number of filter rows divided by total rows in the table. Filter rows are the number of rows returned after applying the filter criteria on that table from the query.

In the diagram below, BROKER is the most selective filter at .05% so we start at BROKER and follow the same path as we just discussed:

But what if we don’t have access to any information identifying the parent-child relationships (such as PK/FK constraints, unique indexes or unique constraints)? In those cases, we may be looking at many-to-many relationships. With many-to-many relationships, it's unclear how to navigate the VST. We no longer know whether to join up or down. To solve this impasse, we can add the two table join row sizes to help us navigate. Below, we’ve drawn the two table join sizes on the join lines:
To navigate the VST above, we again start at BROKER with the most selective filter, then look for the smallest two table join size which is OFFICE_LOCATION with a join size of 3. Then we join to CLIENT_TRANSACTION and finally CLIENT.

As you can see, VST diagrams act like the map for the terrain that an SQL query has to navigate. When a simple map is not enough, we can add statistics such as filter percentages and two table join sizes to complete the map. VST diagrams can be drawn on paper, with tools such as Visio or with database products that draw them automatically using the SQL query, schema definitions and table statistics.

For more perspective on the visual approach to tuning SQL and drawing VST diagrams see
SQL Tuning by Dan Tow
Refactoring SQL Applications by Stephan Faroult , chapter 5
"Designing Efficient SQL: A Visual Approach" by Jonathan Lewis

DB Optimizer 2.5.1 Released

I’m excited to announce the release of DB Optimizer 2.5.1 which has some big new features for a point release along with the regular minor bug fixes. Many thanks to the DBO team for another awesome release.
Download available at:
And DBO 2.5.1 is included in Embarcadero's All Access 1.8.
The new features are

  • Visual SQL Tuning (VST) diagram can be saved to file – previously it had to be regenerated every time a saved TUN session was reopened.
  • VST has new notations

o Sub-queries are encapsulated as well as nested sub-queries

o Exists/In and Not Exist/Not In are indicated on the diagram

o Outer Joins are denoted on the diagram

I've made 3 “.tun” files available here
that showcase some of these new features. For example, just drag and drop “bigstmt2.tun” into DB Optimizer 2.5.1 from windows explorer and then click on the analysis tab. The analysis tab immediately shows the VST diagram where you can the double click on “BIG_STATEMENT2” table on the right and you can, without being connected to a database, drill into all the sub-queries and views. Impressive and great for “oohs and aahs” in demos:
Now expanded
The only “gotcha” is that the 14 day trial version is a limited functionality trial license. With the limited trial license you can only tune 5 SQL statements and when you profile, you see the load but not the SQL text for the queries it find. You can get around this by getting a temporary retail license from a sales person, or I can try and get you one for you.Let me know if you download DB Optimizer and how it goes and if I can help.
Other examples:
Query from Jonathan Lewis webinar on visual approach to SQL tuning
Example of outer joins from q2_251outer.tun

Vincent Barbarino and the Book of Database Refactoring

Welcome back.  I realize this blog has been dead for a long time, but like Gabe Kotter, I’ve decided to return to my roots a bit and fire this blog back up.

Those of you that have spoken to me lately or have been following me on Twitter know that I’ve been pretty passionate about database design and development processes.  I’ve gotten to the point where I’ve almost never seen a database where the physical infrastructure is perfect, so ongoing changes to database structures and schemas are just a fact of life.

It’s managing that process of changing the database layout that’s been getting me all worked up lately – even when folks know what to change and how they’d like to change it, they don’t have tools or processes to introduce change into the database system in a traceable manner.

In one of my prior jobs, we were adamant about change-tracking all of our database changes – we used a rigorous change control and CM process based on object-change by object-change individual script files.  Little did I know at the time that what we were practicing was a form of database refactoring…

I had thought that almost everyone understood the importance of maintaining traceability for database changes, but I’ve recently encountered situations where the lack of standards and tools means that changes are applied to databases in a haphazard fashion.  While searching for some good arguments to use when introducing this concept, I came across a book by Scott Ambler and Pramod Sadalage entitled “Refactoring Databases: Evolutionary Database Design”.

Immediately I was happy with the concept: a whole “text-book” of how and why you need to manage the process of database structural change management.  In the remainder of this post, I’ll be reviewing and commenting on this book.

Before I begin, I think it’s interesting to look at the review quotes in the front of the book.  In some ways I wonder if folks know who this book is for – most of the quotes seem to patronize “data-professionals”  saying it’s high time that they joined the modern world in embracing agile development techniques.  References to “strong-armed DBAs” holding back projects seem a little harsh.

And yet.

I continue to believe that the jack-of-all-trades DBA moniker is mostly to blame for the sad state of database physical design today.  Ask folks what the primary task of a DBA group is, and chances are you’ll be told that it’s backup and recovery, not physical database design and construction.  I even have a hard time with the term database development as I don’t really feel like I’m writing code when I’m doing physical database layout.  I’ve been advocating the split of the DBA term into Database Operator (traditional DBA), Database Engineer (physical database designer and constructor) and Database Developer (stored procedure and SQL writer).

Using my terms, this book is best targeted at the Database Engineer and Developer.

What’s funny to me about the opprobrium heaped upon DBAs by agile developers is that I don’t think it’s a criticism of database technology in and of itself – but rather frustration with being forced to work with database professionals who lack the temperament, skills and experience to do database engineering and development.  Let’s face it, a conservative operations DBA is (rightly) concerned primarily with system availability and reliability through ensuring proper backups and minimizing potential workload on the server.  These are the DBAs who prefer to have hundreds of small independent databases in production all running at 2% utilization because it plays to their strengths.

It’s far harder to manage a large, multi-application tenant database running at 30-40% utilization experiencing continual structure changes – and that’s where this book starts to help.

The Preface has a nice section on “Why Evolutionary Database Development?” which starts us off into understanding why its necessary to resist the desire to have a full and complete set of logical and physical models before performing database development.  Early in my career I participated in efforts to create so-called Enterprise Data Models – which, being constructed by ivory-tower oversight and governance groups lacked any sort of applicability to business and mission requirements.  And sadly, were out-of-date even when they were eventually completed.  The book authors do a nice job of highlighting the benefits of the incremental approach, and also caution folks about the potential barriers to its adoption.  In particular they point out the severe lack of tools supporting database SCM (this is written in 2006).

They also mention the need for database sandbox environments – they suggest individual developers get their own databases to experiment with.  I’m not a big fan of this approach – I prefer a single development database that allows me to host a lot of data, with each developer getting their own schema to play around in.  I also ALWAYS enable DDL auditing in ALL of my databases – that way I can track potential changes that might need to be promoted to the next environment (I also get to validate that my changes were applied to the higher environment – and, as icing on the cake, I can trap dumb ideas like embedding DDL statements inside transactional operations).

Chapter 2 introduces the concept of Database Refactoring, with a quick introduction on refactoring in general (“a disciplined way to restructure code in small steps”).  The authors do a nice job of pointing out that database refactoring is conceptually more difficult than code refactoring – that code refactoring only needs to maintain behavioral semantics, while database refactorings must also maintain informational semantics (pg. 15).  The emphasis here includes the ability to introduce change in a transitional way that allows for multiple applications and multiple versions of applications to continue to run against the same database.  A simple example of moving a column from a parent table to a child table is also included.

In section 2.3, the authors categorize database refactorings into 6 broad categories: Structural (modifying table definitions), Data Quality (think check constraints), Referential Integrity (capturing rules that might currently be maintained by application code), Architectural (transferring common logic from applications into database procedures to increase their usefulness), Method (stored procedure refactorings), and Non-Refactoring Transformations (evolving the schema to handle new data concepts).

They also introduce the idea of indicators that your database may require refactoring – they call them “Database Smells” :-)

These include common problems like multipurpose columns, multipurpose tables, redundant storage of data items, overloaded columns, and fear of altering the database because it is too complex.

In section 2.6, the authors explain how it is easier to refactor your database schema when you decrease the coupling between applications and the database – through concepts like persistence layers.

Chapter 3 walks you through the basics of a database refactoring process – including giving you a list of process steps.  It also includes some good checks on determining whether or not the change is necessary and worth the effort.  Finally, they talk about version control and visibility.

Chapter 4 is pretty short, and deals with deploying or migrating changes from environment to environment.  This includes the concept of bundling changes together, scheduling and documenting deployments.  Finally, they discuss the actual deployment process, including defining and possibly testing your backout procedures.

In my environments, we’d break up these deployment items into 3 main groups: items that are pre-deployable (i.e., can be deployed ahead of time without affecting current applications), items that require application outages, and items that can be deployed “post-deployment” (perhaps cleanup activities that require the structure change, but aren’t required by the applications).

Chapter 5 discusses strategies (actually lessons learned) for successfully moving database refactorings through your development process, including implementing traceability for database changes, simplifying database change review processes, and hunting down and eliminating duplicate SQL.

The rest of the book, Chapters 6 through 11, goes through specific kinds of refactorings  (i.e., Introduce Calculated Column) along with basic pros/cons of each one and example SQL scripts (using the Oracle dialect).  It serves as a reference catalog of database change concepts and is useful from a delineation perspective.  I wish there was more meat in the pro and con section for each transformation, but in the end it’s a useful list.

Overall I thoroughly enjoyed the book and would recommend it for many development teams – project managers and developers should read at least the first 50 pages so as to understand how to integrate database development into the overall project plan.  Traditional DBAs supporting development teams absolutely must read this – if only to enhance their ability to interact and fully support development activities.

That’s all I have for now – look for shorter, more incisive posts in the future!

- Dom.