Search

OakieTags

Who's online

There are currently 0 users and 32 guests online.

Recent comments

Oakies Blog Aggregator

Data Cloning and Refresh

For some time now, I’ve been creating blog posts that walk you through using some of the Enterprise Manager technology to perform specific tasks. For example, there’s one here on setting up Chargeback in EM 12.1.0.4. I’ve never been really satisfied with the way these blog posts turn out, as to document a step by step process like this takes lots of screenshots and you end up with a post that’s a mile long. It also gives the impression that doing something can be quite complex, when in fact it might only take a few minutes to perform all those steps.

In this blog post, I thought I’d take a different approach. Recently, I recorded a couple of screenwatches for use on OTN. One was on creating a test master database with continuous flow using a Data Guard standby database (available here) and the other on how to use Snap Clone outside the cloud environment (available here).

Both those URL’s stream off YouTube. If you want to download them and play them back off your own computer, here are the relevant links:

Continuous Flow – the relevant checksums if you want to make sure it has downloaded properly are:

CRC-32: f4fb51db
MD4: 8c3fbefff8cb8c464bf946739d30121b
MD5: a40a2a8232e53265b04ce789f8fa5738
SHA-1: ceafcb9e2d395a7756f25799fb58e4a7c11dbd10

Snap Clone outside the cloud environment – the relevant checksums if you want to make sure it has downloaded properly are:

CRC-32: 9e0cea20
MD4: dd9d87ac4fc78a65d1ef79cb93c3b690
MD5: 9101f17ed289714aeee6b42974abc085
SHA-1: 63c2e2228b4e33a7a0552484903b51883fc80da5

The screenwatches both start with some explanation of what you can achieve using EM to clone data in the way referred to in the video, and then walk you through the process of actually doing it. Of course, the only problem with loading videos (apart from the size) instead of screenshots is you will need to manually pause them if you want to stop and read the details of the screen. In any case, try them out and let me know (post a comment below if you like) which way you prefer – long step by step screenshot posts, or these shorter video versions.

Tuning Database XQuery Statements (2)

So we concluded the post “Tuning Database XQuery Statements (1)” with the following SQL statement…

Oaktable World Las Vegas April 15, 2015 at Collaborate

oaktableworld_lasvegas1

 

Wednesday April 15 at Collaborate 2015 Las Vegas room Mandalay K
For more information see Pythian’s Blog post.

Screen Shot 2015-04-08 at 10.03.53 AM

 

Screen Shot 2015-04-08 at 10.06.48 AM

 

 

What is Oaktable World ? Oaktable World is a day organized by members of the Oaktable network. The Oaktable network is a network of Oracle database tuning geeks. Among the members are Tanel Poder, Jonathan Lewis, Cary Millsap etc.  Oaktable network was created by Mogens Nørgaard back in 2001 or so and Mogens started organizing underground get togethers, known as Oracle Closed World, during Oracle Open World in 2007, usually at Mogens’ “office” at Chevys. In 2010  Oracle Closed World became a bit more organized with T-shirts as shown below. In 2012 Oracle kindly informed us we were not allowed to use Oracle in the groups meeting name, so it was changed to Oaktable World. Since 2012 Oaktable Worlds have been hosted at Oracle Open World, UKOUG, Collaborate and DOAG.

 

OTW_2010

 

 

OTW_sponsors

Hey dude, where’s my memory? part 2

In my previous article I started exploring the memory usage of a process on a recent linux kernel (2.6.39-400.243.1 (UEK2)), recent means “recent for the Enterprise Linux distributions” in this context, linux kernel developers would point out that the kernel itself is at version 3.19 (“stable version” at the time of writing of this blogpost).

The previous article showed that every process has its own address space, and that different allocations exists for execution. These allocations can be seen in the proc pseudo filesystem, in a process specific file called ‘maps’. The process itself needs some administration area’s which are anonymous allocations which are marked with [heap] and [stack], and some a few others called [vdso] and [vsyscall]. Every process executes something (not all people realise that: a process can wait for something, but essentially is always executing). So there always will be an executable in a (regular) process’ address space. In a lot of cases, the executable uses shared libraries. In that case, the libraries are loaded in the address space of the process too, alongside the executable.

The executable contains (at least) two sections; the code segment, which is readonly and potentially shared, and the data segment, which is read write and gets truly allocated instead of used shared with the parent if the process needs to write. In a lot of cases, the executable uses shared libraries, which means it uses functions which are stored in that library. A library also needs to be loaded, and also contains multiple sections, which can be read only or read write, and are shared unless the process needs to write to that segment.

For completeness, here’s the complete maps output of a process executing the ‘cat’ executable again:

$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 fc:00 2605084                            /bin/cat
0060a000-0060b000 rw-p 0000a000 fc:00 2605084                            /bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0139d000-013be000 rw-p 00000000 00:00 0                                  [heap]
7f444468d000-7f444a51e000 r--p 00000000 fc:00 821535                     /usr/lib/locale/locale-archive
7f444a51e000-7f444a6a8000 r-xp 00000000 fc:00 3801096                    /lib64/libc-2.12.so
7f444a6a8000-7f444a8a8000 ---p 0018a000 fc:00 3801096                    /lib64/libc-2.12.so
7f444a8a8000-7f444a8ac000 r--p 0018a000 fc:00 3801096                    /lib64/libc-2.12.so
7f444a8ac000-7f444a8ad000 rw-p 0018e000 fc:00 3801096                    /lib64/libc-2.12.so
7f444a8ad000-7f444a8b2000 rw-p 00000000 00:00 0
7f444a8b2000-7f444a8d2000 r-xp 00000000 fc:00 3801089                    /lib64/ld-2.12.so
7f444aacd000-7f444aad1000 rw-p 00000000 00:00 0
7f444aad1000-7f444aad2000 r--p 0001f000 fc:00 3801089                    /lib64/ld-2.12.so
7f444aad2000-7f444aad3000 rw-p 00020000 fc:00 3801089                    /lib64/ld-2.12.so
7f444aad3000-7f444aad4000 rw-p 00000000 00:00 0
7fff51980000-7fff519a1000 rw-p 00000000 00:00 0                          [stack]
7fff519ff000-7fff51a00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

If you look closely, you will see that I didn’t explain one type of allocation yet: the anonymous allocations. The anonymous allocations are visible in lines 4, 11, 13 and 16. The anonymous mappings directly following the data segment of either an executable or a library mapped into a process address space is called the BSS. The data segment and the BSS store static variables, however the data segment stores initialised variables, the BSS stores uninitialised variables. The anonymous mapping for the BSS section might exist for a library, as seen above, or might not exist; not all Oracle database executable libraries use an anonymous memory mapping for example. Actually, there is one other memory allocation visible, /usr/lib/locale/locale-archive, which is a file for locale (multi-language support) functions in the C library, which is out of scope for this article.

When a process requests memory to store something, the system call malloc() (memory allocation) can be called. This system call inspects the size of the allocation, and will allocate memory from either the process’ heap (the memory mapping with [heap], using the system call brk()) or it will allocate space using a new anonymous memory segment, using the system call mmap(). If you follow the link with malloc(), you can read the source code of the malloc() call. There are different malloc()’s, which fulfil different purposes (embedded devices have different requirements than huge servers), the implementation that Enterprise Linuxes use is one called ptmalloc2, which is based on a version written bij Doug Lea. If you read the comments in the source code, specifically at ‘Why use this malloc?’, you will see that it tries to be smart with requests up to 128KB (for memory re-usability, to avoid fragmentation and memory wastage), which are allocated from the heap. If an allocation is larger than 128KB, it will use the system memory mapping facilities.

Okay, this brings us back at the original question: how much memory does this process take? I hope you recall from the first blogpost that Linux tries to share as much memory as possible, and when a new process is created, the allocations in the address space of this new process are pointers to the memory areas of the parent process. Let’s first use a utility a lot of people are using: top.

top - 10:53:40 up 9 days, 14:11,  2 users,  load average: 1.34, 1.36, 1.37
Tasks: 1124 total,   1 running, 1123 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  0.8%sy,  0.1%ni, 98.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  98807316k total, 97411444k used,  1395872k free,   400288k buffers
Swap: 25165820k total,  3560852k used, 21604968k free, 27573200k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 16391 oracle    -2   0 2369m 4960 4812 S  0.7  0.0  91:41.37 asm_vktm_+asm1
 16407 oracle    20   0 2388m  20m 8580 S  0.7  0.0  46:16.40 asm_dia0_+asm1

I edited the output a bit to show only two process from an ASM instance.
The columns that show information on memory are VIRT, RES, SHR, %MEM.

VIRT is described in the man-page of top as ‘The total amount of virtual memory used by the task’. This means it’s ALL the memory visible in the addressing space of the process. A useful utility to get the contents of the virtual memory allocations for a process is pmap, let’s use it for process 16391, which is asm_vktm_+asm1:

$ pmap -x 16391
16391:   asm_vktm_+ASM1
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000  246644    2520       0 r-x--  oracle
000000000f6dd000    1952      44      20 rw---  oracle
000000000f8c5000     140       0       0 rw---    [ anon ]
000000001042c000     356       0       0 rw---    [ anon ]
0000000060000000    4096       0       0 rw-s-  SYSV00000000 (deleted)
0000000061000000 2080768       0       0 rw-s-  SYSV00000000 (deleted)
00000000e0000000      12       4       4 rw-s-    [ shmid=0x4e78003 ]
00007fa275589000      72      12       0 r-x--  libnfsodm12.so
00007fa27559b000    2044       0       0 -----  libnfsodm12.so
00007fa27579a000       8       0       0 rw---  libnfsodm12.so
00007fa27579c000    1604       0       0 r-x--  libshpksse4212.so
00007fa27592d000    2044       0       0 -----  libshpksse4212.so
00007fa275b2c000      72       0       0 rw---  libshpksse4212.so
00007fa275b3e000      20       4       0 r-x--  libcxgb3-rdmav2.so
00007fa275b43000    2044       0       0 -----  libcxgb3-rdmav2.so
00007fa275d42000       4       0       0 rw---  libcxgb3-rdmav2.so
...snip...
00007fa27abd5000       8       0       0 r-x--  libodmd12.so
00007fa27abd7000    2044       0       0 -----  libodmd12.so
00007fa27add6000       4       0       0 rw---  libodmd12.so
00007fa27add7000     128     116       0 r-x--  ld-2.12.so
00007fa27adf8000     512      12      12 rw---    [ anon ]
00007fa27ae78000     212       4       4 r--s-  passwd
00007fa27aead000    1260      68      68 rw---    [ anon ]
00007fa27aff3000       4       4       0 rw-s-  hc_+ASM1.dat
00007fa27aff4000       8       0       0 rw---    [ anon ]
00007fa27aff6000       4       0       0 r----  ld-2.12.so
00007fa27aff7000       4       0       0 rw---  ld-2.12.so
00007fa27aff8000       4       0       0 rw---    [ anon ]
00007fff7b116000     132       8       8 rw---    [ stack ]
00007fff7b1ff000       4       4       0 r-x--    [ anon ]
ffffffffff600000       4       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB         2426668    5056     156

The column ‘Kbytes’ represents the full size of the executable, libraries, shared memory, anonymous mappings and other mappings of this process. For completeness sake: 2426668/1024=2369.79, which matches the 2369 in the top output. This is all the memory this process can see, and could use. Does this tell us anything on what memory the process 16391 actually takes? No. (parts of) the Oracle executable’s allocations are potentially shared, the shared memory (SYSV00000000 (deleted) and [ shmid=0x4e78003 ], which represent the Oracle SGA) is shared, the memory allocations for the libraries are potentially shared. The anonymous memory mappings have been defined, but the actual allocation is not visible in the Kbytes column. What this column in top does for me, is tell the approximate SGA size, especially if the SGA is larger (meaning multiple gigabytes).

The second memory column in top is RES. RES is described as: ‘The non-swapped physical memory a task is using’. RES is sometimes referred to as RSS, and called ‘resident set size’. As we can see from the total, the RSS is way lesser than the virtual memory size. One important thing in the RES description of top is that it described that swapped memory pages are not counted for the RES/RSS value. RES/RSS corresponds to the actual used (“touched”) memory by a process, and is directly usable. If you look back to the RSS column of pmap above, you see the oracle executable’s two mappings, one has a RSS size of 2520, and one has a RSS size of 44. But…if you remember that the code/readonly segment is potentially shared with other process, and then look at the 2520 value (which is of the oracle memory segment with the rights r-x–, which means the code segment), I hope you understand this just means this process (vktm) read a subset of the entire executable, and more importantly: the RSS size does not reflect physical memory uniquely allocated by this process.

If we look at the shared memory segments, it’s interesting to see what happens during normal life of a database session. I think it is needless to say that you should calculate shared memory outside of process memory usage, since it’s a distinct memory set that is truly shared by all the processes that are created for the instance.

This is a session which has just started:

$ pmap -x 43853
Address           Kbytes     RSS   Dirty Mode   Mapping
...
0000000060000000       4       0       0 r--s-    [ shmid=0xb87801d ]
0000000060001000    2860     348     348 rw-s-    [ shmid=0xb87801d ]
0000000061000000 4046848    2316    2316 rw-s-    [ shmid=0xb88001e ]
0000000158000000  144592       0       0 rw-s-    [ shmid=0xb88801f ]
0000000161000000      20       4       4 rw-s-    [ shmid=0xb890020 ]
...

This instance has a SGA set to 4G. Because the session just started, it only touched 2316(KB) of the SGA. Next, I do a big (buffered!) full table scan, requiring the session to put a lot of blocks into the buffercache. After the scan, look at the shared memory segment using pmap again:

$ pmap -x 43853
Address           Kbytes     RSS   Dirty Mode   Mapping
...
0000000060000000       4       0       0 r--s-    [ shmid=0xb87801d ]
0000000060001000    2860     384     384 rw-s-    [ shmid=0xb87801d ]
0000000061000000 4046848 2279040 2279040 rw-s-    [ shmid=0xb88001e ]
0000000158000000  144592   66564   66564 rw-s-    [ shmid=0xb88801f ]
0000000161000000      20       4       4 rw-s-    [ shmid=0xb890020 ]
...

The session has touched half of the SGA shared memory segment (visible in the RSS column of the 6th line). This is logical if you understand what is going on: the process does a buffered table scan, which means the blocks read from disk need to be stored in the buffer cache, which is one of the memory structures in the Linux shared memory segments. However, if you look strictly at the top utility output of a database that has just started up, you see the RSS size of the all the processes that work with the buffercache growing. This phenomenon has lead to a repetitive question on the Oracle Linux forums if Oracle database processes are leaking memory. Of course the correct answer is that the RSS size just grows because the process just touches more of the shared memory (=SGA) that has been mapped into its address space. It will stop increasing once it touched all the memory it could touch.

%MEM is the RES/RSS size expressed as a percentage of the total physical memory.

SHR is the amount of shared memory. The manpage of top says ‘It simply reflects memory that could be potentially shared with other processes’. Do not confuse this with shared memory segments mapped into the process’ address space. Empirical tests show the SHR value always seems to be lower than the RSS size, which means it seems to track the RSS value of memory, and shows RSS (touched) memory that could be shared (which seems to contain both touched memory from the shared memory segments, as well as privately mapped executable and libraries). At least from the perspective of an Oracle performance and troubleshooting perspective I can’t see any benefit from using this value.

The conclusion of this part on memory usage is that both the virtual set size (VIRT/VSZ) and resident set size (RES/RSS) are no figures you can add up to indicate physical memory usage of a single process or a group of processes.

The virtual set size gives you the total amount of virtual memory available to a single process, which includes the executable, and potentially shared libraries, anonymous memory mappings and files mapped into the address space and shared memory. In a lot of cases, you get an indication of the total SGA size of the instance, because the shared memory segments which contain the SGA are entirely mapped into the address space of the process.

The resident set size shows you how much of the memory and files mapped into the address space are actually “touched”, and directly usable. Some of the memory usage result in memory pages private to the process, because of writing to the memory and the Copy On Write mechanism of the Linux memory manager. A lot of other memory mappings can be used, increasing the resident set size, while these are shared with other processes. A third potential component is the actual usage of memory as a result of anonymous memory mappings (versus the total allocation, which can be much more), which are private to the process.

Tuning Database XQuery Statements (1)

I had a question of a colleague to have a look at a statement that…

ORA-31145 – “Duplicate listener, %s, found in resource configuration”

Today I had an error on my test environment which surprised me…initially… While I was…

Jay Whaaaat?<br#x3E;The Magic of jQuery in Squarespace

jQuery is a powerful JavaScript framework by which to examine a page's HTML
and add new content or change the page's structure. Use it with site
generators such as Squarespace to make design...



Read the full post at www.gennick.com/database.

Jay Whaaaat?<br#x3E;The Magic of jQuery in Squarespace

jQuery is a framework allowing you to write easy snippets of JavaScript to change a website's content and structure. It's especially helpful with Squarespace because of its expressive power enabling you to change the underlying CSS and HTML that Squarespace generates on your behalf, that is otherwise untouchable without taking your entire site into Developer Mode. 

Note: Taking a site into Developer Mode makes you into the one responsible for all future bug fixes and enhancements, including to all the template code you didn't write to begin with and probably don't have the time nor interest to try and understand. Moving a site to Developer Mode is not a light decision, and most customers are best off in avoiding it. 

My First Time

I've recently begun learning jQuery, and Figures 1 and 2 show my first-ever jQuery solution. It was written in response to a plea for help on the Squarespace Answers forum. The Pacific template makes no provision for a tagline under the site logo (Figure 1), and the person asking for help wanted such a tagline (Figure 2). It is jQuery that took me from Figure 1 to Figure 2.

Figure 1. Pacific does not support a tagline

Figure 1. Pacific does not support a tagline

Figure 2. But we can add one using jQuery

Figure 2. But we can add one using jQuery

I can't stress enough the need to be comfortable with HTML and CSS before proceeding. jQuery is a JavaScript framework, and JavaScript programming builds upon your knowledge of HTML and CSS. Writing JavaScript without knowing something of HTML and CSS is like building a house without first learning how to hammer nails. You might get the job done, but you'll bang your thumb a few times in the process.

Caution! Knowledge of CSS ad HTML required. Proceed at own risk.

Loading the Library

jQuery is a fairly large library of JavaScript code written and maintained by the jQuery Foundation for you to freely employ in any website. Your first task is to load the library into the browser of anyone visiting your site. Do that by writing a script tag like the following:

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js">
script>

Google hosts jQuery and numerous other libraries, and they are all listed on Google's Make the Web Faster page. Go there and scroll down to see the specific URLs for the different versions of jQuery that Google chooses to host and make available. You'll see that my tag above specifies the URL given for jQuery version 2.1.3.

Where do you place the script tag? Good question. Here's what to do:

  1. Start at your site's configuration page
  2. Go Settings -> Advanced -> Code Injection
  3. Find the field labeled "Header"
  4. Paste the script tag into that field as shown in Figure 3

jQuery is now available for use from any and all pages of your site. Don't worry about performance impacts from loading the library multiple times. Browsers are designed to keep track of source URLs and load libraries like jQuery just one time. Many sites load from Google, making it likely that any visitors to your own site already have the library loaded and cached and ready to execute. 

Figure 3. The Header field at Settings - /> Advanced -> Code Injection

Figure 3. The Header field at Settings -> Advanced -> Code Injection

Waiting for the Page to Load

Following is a good skeleton to use for jQuery solutions when you're just starting out. It ensures the current page loads before any code you've written executes, and you'll learn shortly why that matters. 

<script>
  $(document).ready(function(){
    /* Your code goes here */
  });
script>

Here's a quick rundown of what this code is doing for you:

  1. The $ at the beginning invokes jQuery to begin execution of your script
  2. The reference to $(document) makes the currently-loading page available for modification
  3. Invoking the ready(...) method causes jQuery to wait until the page loads completely
  4. Then function() {...} is invoked when the page is completely loaded

The syntax may be intimidating if you're unaccustomed to writing code. Consult a good book if that's the case, or work through a good tutorial on the web. 

The reason for Step 3 is that a browser's default behavior is to execute JavaScript (which includes jQuery) immediately upon encountering it. JavaScript in your page header will execute before the content portion of the page is loaded. You of course want to make changes to that content, and hence the invocation of the ready(...) method to pause while the content loads completely. Otherwise your code will execute ineffectively before the content is available to be changed.  

Adding the Tagline

Right-click the logo in a site built on the Pacific template. Select Inspect Element from the fly-out menu. You'll see results similar to those in Figure 4. Look closely and see that the logo image lies inside a hyperlink (the tag), which in turn lies within a Heading-1 element (the

tag), which finally lies within a div element (the
tag) named "logoWrapper". 

Figure 4. The site logo lies within an<br />
<h1> tag

Figure 4. The site logo lies within an

tag

Let's look at adding a tagline as an additional div element. It's a bit of a judgment call as to where and how to insert new content. The logo image lies in a div all its own, so it's reasonable to insert the tagline immediately following and also in a div of its own. jQuery makes this an easy task.

You can reference the div holding the logo by writing $("#logoWrapper"). For example:

<script>
  $(document).ready(function(){
    $("#logoWrapper")
   
});
script>

This code produces no visual effect, and merely references the element named "logoWrapper". Now invoke the after(...) method to insert some new HTML immediately following the"logoWrapper" element:

<script>
  $(document).ready(function(){
    $("#logoWrapper").after(
      '

A Washed-up Programmer Recapturing the Magic

'
); }); script>

Pass the HTML to add as a text parameter to the after(...) method. Take care about quote marks. You can enclose the string in either single- or double-quotes. I chose to enclose in single-quotes because doing so allows me to have double-quotes as part of the string that I'm passing.

Putting everything together so far and adding some comments to help keep track of things gives the following code:


<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js">
script>


<script>
  $(document).ready(function(){
    $("#logoWrapper").after(
      '

A Washed-up Programmer Recapturing the Magic

'
); }); script>

Compare Figure 5 with Figure 4. Do you see the new "tagLine" div in Figure 5? That and the paragraph inside are what have just been inserted into the page's HTML via jQuery.

Figure 5. The newly-inserted tagline

Figure 5. The newly-inserted tagline

The expression $("logoWrapper").after('...') finds the element named "logoWrapper" and inserts additional HTML immediately following that element. The additional HTML is a new div holding a tagline paragraph for the site. 

Formatting the Result

View the site now and the tagline will look awful. You won't see the easy to read, green tagline in Figure 5. Some formatting is needed first. Provide it via CSS like the following:

<style>
  #tagLine {
    color: #97c21a; 
    font-family: "Raleway";
    font-size: 22px;
    letter-spacing: 2px;
    font-weight: 600;
    text-align: center;
  }
style>

I made sure to specify id="tagLine" as an attribute in the div inserted earlier. That makes for easy targeting from CSS. Raleway is a font already in use elsewhere on the same page, which is the reason I'm able to reference it from custom CSS.

All the code goes into the Code Injection Header field shown earlier in Figure 3. Here's the complete solution:


<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js">
script>


<script>
  $(document).ready(function(){
    $("#logoWrapper").after(
      '

A Washed-up Programmer Recapturing the Magic

'
); }); script> <style> #tagLine { color: #97c21a; font-family: "Raleway"; font-size: 22px; letter-spacing: 2px; font-weight: 600; text-align: center; } style>

Figure 6 shows the final result -- a green tagline just under the site logo. The tagline sits in its own div that is easily targetable, and and you can format as you please.

Figure 6. A tagline on every page

Figure 6. A tagline on every page

Watch my blog for more on using jQuery with Squarespace. If you're interested in diving in and learning more right now, then have a look at the W3Schools jQuery Tutorial. And Jon Ducket's book JavaScript and jQuery looks to be a good book for absolute beginners. (Those with programming background will find themselves turning the pages too quickly). 

Note: The solution in this post is specific to the Pacific template. Like with CSS, anything you do involving jQuery will often be specific to whatever template you happen to be using. Solutions for one template should not be dropped into another without some upfront research to be sure they'll deliver the desired results in their new home.

Not In CTAS

Everyone gets caught out some of the time with NOT IN.

NOT IN is not the opposite of IN.

This came up in a (fairly typical) question on OTN recently where someone had the task of “deleting 6M rows from a table of 18M”. A common, and perfectly reasonable, suggestion for dealing with a delete on this scale is to consider creating a replacement table holding the data you do want rather than deleting the data you don’t want.  In this case, however, the query for deleting the data looked like this:


DELETE FROM EI.CASESTATUS
     WHERE CASEID NOT IN (SELECT CASEID FROM DO.STG_CASEHEADER);

The suggested code for creating the kept data was this:


CREATE TABLE newTable as
  SELECT * FROM EI.CASESTATUS
     WHERE CASEID IN (SELECT CASEID FROM DO.STG_CASEHEADER);

You might get the same result sets in the two tables after this process – but it depends on the CASEID in both tables never being NULL. (You might think that a column with ID in the name probably ought to be a primary key, or a foreign key with a NOT NULL declaration, but then again there’s that STG_ in the subquery table that might be indicative of a “staging” table, so who knows what might happen if the data’s allowed to start life as dirty data.) Here’s a quick demo to prove the point. First some simple data creation – with an optional insert so that you’ve got two tests in one script – followed by the two strategies for identifying data:


drop table t3;
drop table t2;
drop table t1;

create table t1 (n1 number);
create table t2 (n1 number);

insert into t1 values(null);
insert into t1 values(1);
insert into t1 values(2);

/* two tests available, with or without a null in t2 */

-- insert into t2 values(null);
insert into t2 values(1);

commit;

-- gather stats here

set null n/a
delete from t1
where  t1.n1 not in (select t2.n1 from t2);

prompt  Remainder after delete

select  *  from t1;

rollback;

prompt  Selected on create

create table t3 as
select  *  from t1
where   t1.n1 in (select t2.n1 from t2);

select * from t3;

Then the two sets of output from running the test, first with the the NULL insert into t2:


Remainder after delete

        N1
----------
n/a
         1
         2
Selected on create

        N1
----------
         1

We haven’t deleted ANY data from t1 when we were probably hoping that the 2 would disappear – after all, it’s not in t2; however since the equality comparison between a t1 row and every t2 row must evaluate to FALSE before a t1 row is deleted and the comparison of 2 and NULL evaluates to NULL the 2 row won’t be deleted (similarly the comparison for “t1.NULL = t2.anything” evaluates to NULL rather than FALSE, so the NULL isn’t deleted).

Still, perhaps the rewrite would have been okay for the data set where we don’t have a NULL in t2:


Remainder after delete

        N1
----------
n/a
         1
Selected on create

        N1
----------
         1

Oops – still doesn’t produce matching results . This time the row with the 2 has disappeared from t1 in both cases – which might have been closer to what the original OTN poster had hoped but we still have the difference in the survival of the NULLs from t1 – for the reason given for the previous data set

Footnote:

In passing, the execution plan subsequently supplied by the OP showed a “MERGE JOIN ANTI NA” with stg_caseheader (the subquery table) as the second table. The significance of the NA (Null-aware) is that it tells us that the join column in stg_caseheader definitely doesn’t have a NOT NULL constraint on it. (We can’t draw any conclusion about the join column in casestatus.)

(OT) an idea for Easter (and well…every day)

I don’t travel as frequently as some people do, but with OpenWorld and various Oracle conferences each year, I definitely see my fair share of the inside of a plane and a hotel room.  To pass the time on flights, I try to read, but when you’re reading on a work-based trip, it’s a case of reading a page or two here and there, rather than a true extended session of relaxed reading.

The net result ?  I generally buy what I call "airport crap" when it comes to books.  That is, cheap, easy to read, crime thrillers that are totally formulaic in nature.  Its pretty simple:

  • Our hero is a loner or non-conformist trying to live his own life,
  • Something bad happens to an innocent victim,
  • Our hero struggles with his conscience about whether to intervene,
  • He then has several near death struggles,
  • And amazingly…he wins out in the end !
  • (optional) teaser in last page suggesting the next book in the series…

These books suit the purpose of stop-start reading because (a) they are cheap, (b) if you lose it, it’s available in every book store around the world, and (c) it never takes more than two paragraphs to be fully abreast of the entire plot :-)

Of course, this strategy ultimately results in a book shelf full of crappy crime thriller books, that will never be read again, because it was a dull enough experience to read them the first time :-)

But there’s a solution to this, which I’ve used for some time now – but with Easter here, it dawned on me that I should share the concept.  It’s very simple.

Leave your completed book on a bus stop

Once the book has been read, I place a book mark inside it saying "A free book to read.  Once you are finished, please leave at a bus stop for the next reader".

This way, the book is not wasted on a book shelf – it continues to be read and enjoyed by others.  [ Admittedly, given the quality of the junk I’m reading on planes, one could argue that I’ve merely spreading the suffering and lowering the literary skills of the general public, but I try not to think about that :-) ]. 

So next time you finish a book, don’t just stow it on the shelf.  Give it forward at a bus stop.

image