It’s time for another phenomenal Collaborate conference next week, made even more fabulous by the fact that I’ll be there for the first time! For some reason, Collaborate is one of the few big user group conferences I haven’t made it to before, so when my boss asked me to go and present at it I leapt at the chance.
Originally, I had one presentation timeslot, but then Alex Gorbachev of Pythian / Oak Table fame noticed I was attending and asked me to do a couple of slots at Oak Table World on the Wednesday. Then the organizers had a drop-out and asked Seth Miller and I to fill it in, so my dance card is starting to look a bit more full! So dates and times for when I will be presenting currently stand at:
The first session is also part of the virtual conference. Seth will spend 15 minutes on a quick overview of setup required to support cloud in EM and then I’ll cover PDBaaS for the remaining time.
The Knowledge Sharing session is a different sort of talk for me. For a start, it’s non-technical. Part of it will cover the sorts of knowledge sharing I’ve been doing over the past 20+ years, and then I’ll branch into the pros and cons of a few different types of knowledge sharing. It should be an interesting experience!
The SnapClone++ session is a lightning talk. As Alex describes them, “Lightning Talks are 10-minute presentations done in a rapid-fire fashion. They are always a huge success—you’ve got to be there! They may be technical, motivational, or inspirational, but regardless they are always cool speeches. The sequence of the talks may change, but everything will be presented within the hour.”
The final one is back to the normal sorts of presentations I’ve been doing for many years. It will cover why you would want to use Snap Clone, and includes a demo of how the functionality actually works.
Apart from all that, I’ll be spending some time on the Demo Grounds, as well as venturing into some of the EM SIG meetings. Hopefully at some time you can catch up and have a chat, so come on by! As I’m still travelling in a wheelchair, it should be easy to spot me.
There’s a live example on OTN at the moment of an interesting class of problem that can require some imaginative thinking. It revolves around a design that uses a row in one table to hold the low and high values for a range of values in another table. The problem is then simply to count the number of rows in the second table that fall into the range given by the first table. There’s an obvious query you can write (a join with inequality) but if you have to join each row in the first table to several million rows in the second table, then aggregate to count them, that’s an expensive strategy. Here’s the query (with numbers of rows involved) that showed up on OTN; it’s an insert statement, and the problem is that it takes 7 hours to insert 37,600 rows:
INSERT INTO SA_REPORT_DATA (REPORT_ID, CUTOFF_DATE, COL_1, COL_2, COL_3) ( SELECT 'ISRP-734', to_date('&DateTo', 'YYYY-MM-DD'), SNE.ID AS HLR , SNR.FROM_NUMBER||' - '||SNR.TO_NUMBER AS NUMBER_RANGE , COUNT(M.MSISDN) AS AVAILABLE_MSISDNS FROM SA_NUMBER_RANGES SNR -- 10,000 rows , SA_SERVICE_SYSTEMS SSS -- 1,643 rows , SA_NETWORK_ELEMENTS SNE -- 200 rows , SA_MSISDNS M -- 72M rows WHERE SSS.SEQ = SNR.SRVSYS_SEQ AND SSS.SYSTYP_ID = 'OMC HLR' AND SNE.SEQ = SSS.NE_SEQ AND SNR.ID_TYPE = 'M' AND M.MSISDN >= SNR.FROM_NUMBER AND M.MSISDN <= SNR.TO_NUMBER AND M.STATE = 'AVL' GROUP BY SNE.ID,SNR.FROM_NUMBER||' - '||SNR.TO_NUMBER )
The feature here is that we are counting ranges of MSISDN: we take 10,000 number ranges (SNR) and join with inequality to a 72M row table. It’s perfectly conceivable that at some point the data set expands (not necessarily all at once) to literally tens of billions of rows that are then aggregated down to the 37,500 that are finally inserted.
The execution plan shows the optimizer joining the first three tables before doing a merge join between that result set and the relevant subset of the MSISDNs table – which means the MSISDNs have to be sorted and buffered (with a probably spill to disc) before they can be used. It would be interesting to see the rowsource execution stats for the query – partly to see how large the generated set became, but also to see if the ranges involved were so large that most of the time went in constantly re-reading the sorted MSISDNs from the temporary tablespace.
As far as optimisation is concerned, there are a couple of trivial things around the edges we can examine: we have 10,000 number ranges but insert 37,600 results, and the last stages of the plan generated those results so we’ve scanned and aggregated the sorted MSISDNs 37,600 times. Clearly we could look for a better table ordering that (eliminated any number ranges early), then did the minimal number of joins to MSISDN, aggregated, then scaled up to 37,600: with the best join order we might reduce the run time by a factor of 3 or more. (But that’s still a couple of hours run time.)
What we really need to do to make a difference is change the infrastructure in some way – prefereably invisibly to the rest of the application. There are a number of specific details relating to workload, read-consistency, timing, concurrency, etc. that will need to be considered, but broadly speaking, we need to take advantage of a table that effectively holds the “interesting” MSISDNs in sorted order. I’ve kept the approach simple here, it needs a few modifications for a production system. The important bit of the reports is the bit that produces the count, so I’m only going to worry about a two-table join – number ranges and msidn; here’s some model data:
execute dbms_random.seed(0) create table msisdns as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select trunc(dbms_random.value(1e9,1e10)) msisdn from generator v1, generator v2 where rownum <= 1e6 ; create table number_ranges as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select trunc(dbms_random.value(1e9,1e10)) from_number, trunc(dbms_random.value(1e9,1e10)) to_number from generator v1 where rownum <= 1000 ; update number_ranges set from_number = to_number, to_number = from_number where to_number < from_number ; commit;
I’ve created a table of numbers with values between 10e9 and 10e10 to represent 1 million MSISDNs, and a list of 1,000 number ranges – making sure that the FROM number is not greater than the TO number. Now I need a “summary” table of the MSISDNs, which I’m going to create as an index-organized table:
create table tmp_msisdns ( msisdn, counter, constraint tmp_pk primary key (msisdn, counter) ) organization index as select msisdn, row_number() over(order by msisdn) counter from msisdns ;
This is only a demonstration so I’ve haven’t bothered with production-like code to check that the MSISDNs I had generated were unique (they were); and I’ve casually included the row_number() as part of the primary key as a performance fiddle even though it’s something that could, technically, allow some other program to introduce bad data if I made the table available for public use rather than task specific.
Finally we get down to the report. To find out how many MSISDN values there are between the FROM and TO number in a range I just have to find the lowest and highest MSISDNs from tmp_msisdn in that range and find the difference between their counter values, and add 1. And there’s a very fast way to find the lowest or highest values when you have the appropriate index – the min/max range scan – but you have to access the table twice, once for the low, once for the high. Here’s the necessary SQL, with execution plan from 126.96.36.199:
select nr.from_number, nr.to_number, -- fr1.msisdn, fr1.counter, -- to1.msisdn, to1.counter, 1 + to1.counter - fr1.counter range_count from number_ranges nr, tmp_msisdns fr1, tmp_msisdns to1 where fr1.msisdn = ( select min(msisdn) from tmp_msisdns where tmp_msisdns.msisdn >= nr.from_number ) and to1.msisdn = ( select max(msisdn) from tmp_msisdns where tmp_msisdns.msisdn <= nr.to_number ) ; ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4008 (100)| | | 1 | NESTED LOOPS | | 1000 | 38000 | 4008 (1)| 00:00:01 | | 2 | NESTED LOOPS | | 1000 | 26000 | 2005 (1)| 00:00:01 | | 3 | TABLE ACCESS FULL | NUMBER_RANGES | 1000 | 14000 | 2 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | TMP_PK | 1 | 12 | 2 (0)| 00:00:01 | | 5 | SORT AGGREGATE | | 1 | 7 | | | | 6 | FIRST ROW | | 1 | 7 | 3 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN (MIN/MAX)| TMP_PK | 1 | 7 | 3 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | TMP_PK | 1 | 12 | 2 (0)| 00:00:01 | | 9 | SORT AGGREGATE | | 1 | 7 | | | | 10 | FIRST ROW | | 1 | 7 | 3 (0)| 00:00:01 | |* 11 | INDEX RANGE SCAN (MIN/MAX) | TMP_PK | 1 | 7 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("FR1"."MSISDN"=) 7 - access("TMP_MSISDNS"."MSISDN">=:B1) 8 - access("TO1"."MSISDN"=) 11 - access("TMP_MSISDNS"."MSISDN"<=:B1)
Execution time – with 1 million MSISDNs and 1,000 ranges: 0.11 seconds.
For comparative purposes, and to check that the code is producing the right answers, here’s the basic inequality join method:
select nr.from_number, nr.to_number, count(*) range_count from number_ranges nr, msisdns ms where ms.msisdn >= nr.from_number and ms.msisdn <= nr.to_number group by nr.from_number, nr.to_number order by nr.from_number ; ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 472K(100)| | | 1 | HASH GROUP BY | | 707K| 14M| 6847M| 472K (17)| 00:00:19 | | 2 | MERGE JOIN | | 255M| 5107M| | 13492 (77)| 00:00:01 | | 3 | SORT JOIN | | 1000 | 14000 | | 3 (34)| 00:00:01 | | 4 | TABLE ACCESS FULL | NUMBER_RANGES | 1000 | 14000 | | 2 (0)| 00:00:01 | |* 5 | FILTER | | | | | | | |* 6 | SORT JOIN | | 1000K| 6835K| 30M| 3451 (7)| 00:00:01 | | 7 | TABLE ACCESS FULL| MSISDNS | 1000K| 6835K| | 245 (14)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - filter("MS"."MSISDN"<="NR"."TO_NUMBER") 6 - access("MS"."MSISDN">="NR"."FROM_NUMBER") filter("MS"."MSISDN">="NR"."FROM_NUMBER")
The two queries produced the same results (apart from ordering); but the second query took 2 minutes 19.4 seconds to complete.
In a moment of idle curiosity I recreated the data with 40 Million rows in the MSISDNs table to get some idea of how fast the entire report process could go when re-engineered (remember the OP has 72M rows, but select the subset flagged as ‘AVL’). It took 1 minute 46 seconds to create the IOT – after which the report for 1,000 number ranges still took less than 0.2 seconds.
I’m an Oracle dinosaur, so I like using SQL Plus. Its simple, fast, comes with every version and installation and platform, and I’m very familiar with it. (And who knows, it might still be at the forefront of the Oracle development teams! http://www.slideshare.net/hillbillyToad/sqlcl-overview-a-new-command-line-interface-for-oracle-database )
But there is one important thing I always take care of when I’m using SQL Plus, and it’s easiest to explain with an example.
You start off by wanting to delete a couple of rows from a critical table as part of a patching process. You type this:
Uh oh….. I forgot the WHERE clause. That’s not so good…
Now, if I keep my cool, I can let that command finish and then type rollback, and the damage is limited to potentially blocking other users for a little while.
But of course, what normally happens, is that you see your command, your jaw drops to the floor, and you grab your mouse, race up to the top left or right corner and click to close that window
Phew ! Crisis averted – your statement never finished, so Oracle will happily clean up after you and roll all the deletions back.
But that is a very very risky thing to do … Because what if JUST WHEN YOU CLICK, the delete completes, and the screen looks like this:
Well… by default, when you exit SQL Plus, it will commit any outstanding changes. And since your delete has completed… you just wiped out your table, and you’re off to LinkedIn looking for a new place to work :-(
There is a better way. In your login.sql file for SQL Plus, make sure you always have the following:
set exitcommit off
You never, ever want to be committing (with any tool) unless you explicitly request it.
#333333; font-family: Verdana, Arial, sans-serif; font-size: 13px; line-height: 16.8999996185303px;">The next part of the video tutorial explaining the XPLAN_ASH Active Session History functionality continuing the actual walk-through of the script output.
#333333; font-family: Verdana, Arial, sans-serif; font-size: 13px; line-height: 16.8999996185303px;" />
#333333; font-family: Verdana, Arial, sans-serif; font-size: 13px; line-height: 16.8999996185303px;" />#333333; font-family: Verdana, Arial, sans-serif; font-size: 13px; line-height: 16.8999996185303px;">More parts to follow.
Additionally, there are some great opportunities in professional development at the IOUG Strategic Leadership Program. I’ll be speaking with James Lui on how to improve your value with personal brands and social media.
I’m the WIT luncheon speaker on Monday, so for those of you that signed up for this great, yearly event for women in tech before it sold out, I’m looking forward to meeting as many of the powerful women in our industry, (and hopefully a few men, too!) before and after the event.
There are also a few SIG’s that I’ll be attending that aren’t on the schedule:
Tue-12:30 p.m. – 1:00 p.m.Cloud Computing and Virtualization SIG Meeting (ID: 943)
Wed-12:30 p.m. – 1:00 p.m.Oracle Enterprise Manager SIG Meeting (ID: 949)
|Id Number||Date and Time||Session Title||Product Line(s)||Session Track(s)||Session Room|
|976||Sun. Apr. 12
9:00 am – 1:00 pm
|Everything I Needed to Know About Enterprise Manager I Learned at COLLABORATE – Hands-on Lab||Oracle Enterprise Manager||Manageability||Palm D|
|0||Mon. Apr. 13
9:15 – 10:15 am
|Zero to Manageability in One Hour: Build a Solid Foundation for Oracle Enterprise Manager 12c||Oracle Cloud – DBaaS/PaaS| Oracle Database| Oracle Enterprise Manager||Manageability||Banyan B|
|112||Mon. Apr. 13
3:15 – 4:15 pm
|The Power of the AWR Warehouse||Oracle Enterprise Manager||Manageability||Banyan B|
|967||Mon. Apr. 13
4:30 – 5:30 pm
|IOUG Strategic Leadership Program: Staying on Top of Key Trends and Growing Your Personal Brand with Social Media||Applicable to All||Professional Development||Banyan E|
|1003||Wed. Apr. 15
2:00 – 3:00 pm
|OakTable: TED-Style Lightning Talks (2:00-3:00pm)||Applicable to All||Database||Mandalay K|
|986||Mon. Apr. 13
12:45 – 1:45 pm
|Twitter 101 for Oracle Professionals||Applicable to All|||Professional Development||Banyan D|
Thanks to everyone at IOUG for the opportunity to be so involved with this great conference and see everyone soon!
For some time now, I’ve been creating blog posts that walk you through using some of the Enterprise Manager technology to perform specific tasks. For example, there’s one here on setting up Chargeback in EM 188.8.131.52. I’ve never been really satisfied with the way these blog posts turn out, as to document a step by step process like this takes lots of screenshots and you end up with a post that’s a mile long. It also gives the impression that doing something can be quite complex, when in fact it might only take a few minutes to perform all those steps.
In this blog post, I thought I’d take a different approach. Recently, I recorded a couple of screenwatches for use on OTN. One was on creating a test master database with continuous flow using a Data Guard standby database (available here) and the other on how to use Snap Clone outside the cloud environment (available here).
Both those URL’s stream off YouTube. If you want to download them and play them back off your own computer, here are the relevant links:
Continuous Flow – the relevant checksums if you want to make sure it has downloaded properly are:
Snap Clone outside the cloud environment – the relevant checksums if you want to make sure it has downloaded properly are:
The screenwatches both start with some explanation of what you can achieve using EM to clone data in the way referred to in the video, and then walk you through the process of actually doing it. Of course, the only problem with loading videos (apart from the size) instead of screenshots is you will need to manually pause them if you want to stop and read the details of the screen. In any case, try them out and let me know (post a comment below if you like) which way you prefer – long step by step screenshot posts, or these shorter video versions.
So we concluded the post “Tuning Database XQuery Statements (1)” with the following SQL statement…
Wednesday April 15 at Collaborate 2015 Las Vegas room Mandalay K
For more information see Pythian’s Blog post.
What is Oaktable World ? Oaktable World is a day organized by members of the Oaktable network. The Oaktable network is a network of Oracle database tuning geeks. Among the members are Tanel Poder, Jonathan Lewis, Cary Millsap etc. Oaktable network was created by Mogens Nørgaard back in 2001 or so and Mogens started organizing underground get togethers, known as Oracle Closed World, during Oracle Open World in 2007, usually at Mogens’ “office” at Chevys. In 2010 Oracle Closed World became a bit more organized with T-shirts as shown below. In 2012 Oracle kindly informed us we were not allowed to use Oracle in the groups meeting name, so it was changed to Oaktable World. Since 2012 Oaktable Worlds have been hosted at Oracle Open World, UKOUG, Collaborate and DOAG.
In my previous article I started exploring the memory usage of a process on a recent linux kernel (2.6.39-400.243.1 (UEK2)), recent means “recent for the Enterprise Linux distributions” in this context, linux kernel developers would point out that the kernel itself is at version 3.19 (“stable version” at the time of writing of this blogpost).
The previous article showed that every process has its own address space, and that different allocations exists for execution. These allocations can be seen in the proc pseudo filesystem, in a process specific file called ‘maps’. The process itself needs some administration area’s which are anonymous allocations which are marked with [heap] and [stack], and some a few others called [vdso] and [vsyscall]. Every process executes something (not all people realise that: a process can wait for something, but essentially is always executing). So there always will be an executable in a (regular) process’ address space. In a lot of cases, the executable uses shared libraries. In that case, the libraries are loaded in the address space of the process too, alongside the executable.
The executable contains (at least) two sections; the code segment, which is readonly and potentially shared, and the data segment, which is read write and gets truly allocated instead of used shared with the parent if the process needs to write. In a lot of cases, the executable uses shared libraries, which means it uses functions which are stored in that library. A library also needs to be loaded, and also contains multiple sections, which can be read only or read write, and are shared unless the process needs to write to that segment.
For completeness, here’s the complete maps output of a process executing the ‘cat’ executable again:
$ cat /proc/self/maps 00400000-0040b000 r-xp 00000000 fc:00 2605084 /bin/cat 0060a000-0060b000 rw-p 0000a000 fc:00 2605084 /bin/cat 0060b000-0060c000 rw-p 00000000 00:00 0 0139d000-013be000 rw-p 00000000 00:00 0 [heap] 7f444468d000-7f444a51e000 r--p 00000000 fc:00 821535 /usr/lib/locale/locale-archive 7f444a51e000-7f444a6a8000 r-xp 00000000 fc:00 3801096 /lib64/libc-2.12.so 7f444a6a8000-7f444a8a8000 ---p 0018a000 fc:00 3801096 /lib64/libc-2.12.so 7f444a8a8000-7f444a8ac000 r--p 0018a000 fc:00 3801096 /lib64/libc-2.12.so 7f444a8ac000-7f444a8ad000 rw-p 0018e000 fc:00 3801096 /lib64/libc-2.12.so 7f444a8ad000-7f444a8b2000 rw-p 00000000 00:00 0 7f444a8b2000-7f444a8d2000 r-xp 00000000 fc:00 3801089 /lib64/ld-2.12.so 7f444aacd000-7f444aad1000 rw-p 00000000 00:00 0 7f444aad1000-7f444aad2000 r--p 0001f000 fc:00 3801089 /lib64/ld-2.12.so 7f444aad2000-7f444aad3000 rw-p 00020000 fc:00 3801089 /lib64/ld-2.12.so 7f444aad3000-7f444aad4000 rw-p 00000000 00:00 0 7fff51980000-7fff519a1000 rw-p 00000000 00:00 0 [stack] 7fff519ff000-7fff51a00000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
If you look closely, you will see that I didn’t explain one type of allocation yet: the anonymous allocations. The anonymous allocations are visible in lines 4, 11, 13 and 16. The anonymous mappings directly following the data segment of either an executable or a library mapped into a process address space is called the BSS. The data segment and the BSS store static variables, however the data segment stores initialised variables, the BSS stores uninitialised variables. The anonymous mapping for the BSS section might exist for a library, as seen above, or might not exist; not all Oracle database executable libraries use an anonymous memory mapping for example. Actually, there is one other memory allocation visible, /usr/lib/locale/locale-archive, which is a file for locale (multi-language support) functions in the C library, which is out of scope for this article.
When a process requests memory to store something, the system call malloc() (memory allocation) can be called. This system call inspects the size of the allocation, and will allocate memory from either the process’ heap (the memory mapping with [heap], using the system call brk()) or it will allocate space using a new anonymous memory segment, using the system call mmap(). If you follow the link with malloc(), you can read the source code of the malloc() call. There are different malloc()’s, which fulfil different purposes (embedded devices have different requirements than huge servers), the implementation that Enterprise Linuxes use is one called ptmalloc2, which is based on a version written bij Doug Lea. If you read the comments in the source code, specifically at ‘Why use this malloc?’, you will see that it tries to be smart with requests up to 128KB (for memory re-usability, to avoid fragmentation and memory wastage), which are allocated from the heap. If an allocation is larger than 128KB, it will use the system memory mapping facilities.
Okay, this brings us back at the original question: how much memory does this process take? I hope you recall from the first blogpost that Linux tries to share as much memory as possible, and when a new process is created, the allocations in the address space of this new process are pointers to the memory areas of the parent process. Let’s first use a utility a lot of people are using: top.
top - 10:53:40 up 9 days, 14:11, 2 users, load average: 1.34, 1.36, 1.37 Tasks: 1124 total, 1 running, 1123 sleeping, 0 stopped, 0 zombie Cpu(s): 0.9%us, 0.8%sy, 0.1%ni, 98.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 98807316k total, 97411444k used, 1395872k free, 400288k buffers Swap: 25165820k total, 3560852k used, 21604968k free, 27573200k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16391 oracle -2 0 2369m 4960 4812 S 0.7 0.0 91:41.37 asm_vktm_+asm1 16407 oracle 20 0 2388m 20m 8580 S 0.7 0.0 46:16.40 asm_dia0_+asm1
I edited the output a bit to show only two process from an ASM instance.
The columns that show information on memory are VIRT, RES, SHR, %MEM.
VIRT is described in the man-page of top as ‘The total amount of virtual memory used by the task’. This means it’s ALL the memory visible in the addressing space of the process. A useful utility to get the contents of the virtual memory allocations for a process is pmap, let’s use it for process 16391, which is asm_vktm_+asm1:
$ pmap -x 16391 16391: asm_vktm_+ASM1 Address Kbytes RSS Dirty Mode Mapping 0000000000400000 246644 2520 0 r-x-- oracle 000000000f6dd000 1952 44 20 rw--- oracle 000000000f8c5000 140 0 0 rw--- [ anon ] 000000001042c000 356 0 0 rw--- [ anon ] 0000000060000000 4096 0 0 rw-s- SYSV00000000 (deleted) 0000000061000000 2080768 0 0 rw-s- SYSV00000000 (deleted) 00000000e0000000 12 4 4 rw-s- [ shmid=0x4e78003 ] 00007fa275589000 72 12 0 r-x-- libnfsodm12.so 00007fa27559b000 2044 0 0 ----- libnfsodm12.so 00007fa27579a000 8 0 0 rw--- libnfsodm12.so 00007fa27579c000 1604 0 0 r-x-- libshpksse4212.so 00007fa27592d000 2044 0 0 ----- libshpksse4212.so 00007fa275b2c000 72 0 0 rw--- libshpksse4212.so 00007fa275b3e000 20 4 0 r-x-- libcxgb3-rdmav2.so 00007fa275b43000 2044 0 0 ----- libcxgb3-rdmav2.so 00007fa275d42000 4 0 0 rw--- libcxgb3-rdmav2.so ...snip... 00007fa27abd5000 8 0 0 r-x-- libodmd12.so 00007fa27abd7000 2044 0 0 ----- libodmd12.so 00007fa27add6000 4 0 0 rw--- libodmd12.so 00007fa27add7000 128 116 0 r-x-- ld-2.12.so 00007fa27adf8000 512 12 12 rw--- [ anon ] 00007fa27ae78000 212 4 4 r--s- passwd 00007fa27aead000 1260 68 68 rw--- [ anon ] 00007fa27aff3000 4 4 0 rw-s- hc_+ASM1.dat 00007fa27aff4000 8 0 0 rw--- [ anon ] 00007fa27aff6000 4 0 0 r---- ld-2.12.so 00007fa27aff7000 4 0 0 rw--- ld-2.12.so 00007fa27aff8000 4 0 0 rw--- [ anon ] 00007fff7b116000 132 8 8 rw--- [ stack ] 00007fff7b1ff000 4 4 0 r-x-- [ anon ] ffffffffff600000 4 0 0 r-x-- [ anon ] ---------------- ------ ------ ------ total kB 2426668 5056 156
The column ‘Kbytes’ represents the full size of the executable, libraries, shared memory, anonymous mappings and other mappings of this process. For completeness sake: 2426668/1024=2369.79, which matches the 2369 in the top output. This is all the memory this process can see, and could use. Does this tell us anything on what memory the process 16391 actually takes? No. (parts of) the Oracle executable’s allocations are potentially shared, the shared memory (SYSV00000000 (deleted) and [ shmid=0x4e78003 ], which represent the Oracle SGA) is shared, the memory allocations for the libraries are potentially shared. The anonymous memory mappings have been defined, but the actual allocation is not visible in the Kbytes column. What this column in top does for me, is tell the approximate SGA size, especially if the SGA is larger (meaning multiple gigabytes).
The second memory column in top is RES. RES is described as: ‘The non-swapped physical memory a task is using’. RES is sometimes referred to as RSS, and called ‘resident set size’. As we can see from the total, the RSS is way lesser than the virtual memory size. One important thing in the RES description of top is that it described that swapped memory pages are not counted for the RES/RSS value. RES/RSS corresponds to the actual used (“touched”) memory by a process, and is directly usable. If you look back to the RSS column of pmap above, you see the oracle executable’s two mappings, one has a RSS size of 2520, and one has a RSS size of 44. But…if you remember that the code/readonly segment is potentially shared with other process, and then look at the 2520 value (which is of the oracle memory segment with the rights r-x–, which means the code segment), I hope you understand this just means this process (vktm) read a subset of the entire executable, and more importantly: the RSS size does not reflect physical memory uniquely allocated by this process.
If we look at the shared memory segments, it’s interesting to see what happens during normal life of a database session. I think it is needless to say that you should calculate shared memory outside of process memory usage, since it’s a distinct memory set that is truly shared by all the processes that are created for the instance.
This is a session which has just started:
$ pmap -x 43853 Address Kbytes RSS Dirty Mode Mapping ... 0000000060000000 4 0 0 r--s- [ shmid=0xb87801d ] 0000000060001000 2860 348 348 rw-s- [ shmid=0xb87801d ] 0000000061000000 4046848 2316 2316 rw-s- [ shmid=0xb88001e ] 0000000158000000 144592 0 0 rw-s- [ shmid=0xb88801f ] 0000000161000000 20 4 4 rw-s- [ shmid=0xb890020 ] ...
This instance has a SGA set to 4G. Because the session just started, it only touched 2316(KB) of the SGA. Next, I do a big (buffered!) full table scan, requiring the session to put a lot of blocks into the buffercache. After the scan, look at the shared memory segment using pmap again:
$ pmap -x 43853 Address Kbytes RSS Dirty Mode Mapping ... 0000000060000000 4 0 0 r--s- [ shmid=0xb87801d ] 0000000060001000 2860 384 384 rw-s- [ shmid=0xb87801d ] 0000000061000000 4046848 2279040 2279040 rw-s- [ shmid=0xb88001e ] 0000000158000000 144592 66564 66564 rw-s- [ shmid=0xb88801f ] 0000000161000000 20 4 4 rw-s- [ shmid=0xb890020 ] ...
The session has touched half of the SGA shared memory segment (visible in the RSS column of the 6th line). This is logical if you understand what is going on: the process does a buffered table scan, which means the blocks read from disk need to be stored in the buffer cache, which is one of the memory structures in the Linux shared memory segments. However, if you look strictly at the top utility output of a database that has just started up, you see the RSS size of the all the processes that work with the buffercache growing. This phenomenon has lead to a repetitive question on the Oracle Linux forums if Oracle database processes are leaking memory. Of course the correct answer is that the RSS size just grows because the process just touches more of the shared memory (=SGA) that has been mapped into its address space. It will stop increasing once it touched all the memory it could touch.
%MEM is the RES/RSS size expressed as a percentage of the total physical memory.
SHR is the amount of shared memory. The manpage of top says ‘It simply reflects memory that could be potentially shared with other processes’. Do not confuse this with shared memory segments mapped into the process’ address space. Empirical tests show the SHR value always seems to be lower than the RSS size, which means it seems to track the RSS value of memory, and shows RSS (touched) memory that could be shared (which seems to contain both touched memory from the shared memory segments, as well as privately mapped executable and libraries). At least from the perspective of an Oracle performance and troubleshooting perspective I can’t see any benefit from using this value.
The conclusion of this part on memory usage is that both the virtual set size (VIRT/VSZ) and resident set size (RES/RSS) are no figures you can add up to indicate physical memory usage of a single process or a group of processes.
The virtual set size gives you the total amount of virtual memory available to a single process, which includes the executable, and potentially shared libraries, anonymous memory mappings and files mapped into the address space and shared memory. In a lot of cases, you get an indication of the total SGA size of the instance, because the shared memory segments which contain the SGA are entirely mapped into the address space of the process.
The resident set size shows you how much of the memory and files mapped into the address space are actually “touched”, and directly usable. Some of the memory usage result in memory pages private to the process, because of writing to the memory and the Copy On Write mechanism of the Linux memory manager. A lot of other memory mappings can be used, increasing the resident set size, while these are shared with other processes. A third potential component is the actual usage of memory as a result of anonymous memory mappings (versus the total allocation, which can be much more), which are private to the process.
I had a question of a colleague to have a look at a statement that…