This is just a short blog post about a simple DTrace script (dtrace_kghal_pga_code), that i recently wrote and published due to a PGA memory leak troubleshooting assignment. A client of mine noticed a major PGA memory increase after upgrading to Oracle 12c. The PL/SQL code did not change - just the database release. He already troubleshooted the issue with help of Tanel Poder's blog post "Oracle Memory Troubleshooting, Part 4: Drilling down into PGA memory usage with V$PROCESS_MEMORY_DETAIL" and identified the corresponding heap and allocation reason. However there is one issue by just looking at V$PROCESS_MEMORY_DETAIL or heap dumps - you can not trace back the causing PL/SQL or SQL code very easily.
The basic idea behind the DTrace script is to check the memory allocation reason on probe entry level (for KGHAL memory allocator functions) and stop the Oracle process at this point if the check is true. After the Oracle process has been stopped by DTrace you can request an error stack with oradebug and continue the Oracle process. As the process continues you get an errorstack trace which includes the responsible code for this particular memory allocation request (in PGA).
The following demo is run with Oracle 188.8.131.52 on Solaris 11.2 x86. I also used Frits Hoogland's PL/SQL code from here and put it into PL/SQL function called mem_alloc to allocate a lot of PGA memory.
SQL> select mem_alloc() from dual;
After a few seconds Tanel Poder's script smem_detail.sql can be used to determine the detailed PGA memory usage.
SID CATEGORY NAME HEAP_NAME BYTES ALLOCATION_COUNT
---------- --------------- -------------------------- --------------- ---------- ----------------
25 PL/SQL pmuccst: adt/record koh-kghu call 896860424 54901
25 PL/SQL pl/sql vc2 koh-kghu call 112961768 6915
25 PL/SQL pmucalm coll koh-kghu call 31622336 1936
25 Other permanent memory pga heap 263952 32
Now you know that most memory is allocated from private heap "koh-kghu call" due to reason "pmuccst: adt/record", but you still don't know which PL/SQL code line is responsible for requesting this memory. In this demo case it is pretty simple as Frit's code allocates such amount of memory at only one place, but this is just for demonstration purpose. In reality the PL/SQL code is usually much larger and much more complex - so no way to determine the corresponding code line easily.
Let's use the DTrace script "dtrace_kghal_pga_code" to intercept the process when it requests memory from heap "koh-kghu call" due to "pmuccst: adt/record".
shell> ./dtrace_kghal_pga_code 1502 kghalf "pmuccst: adt/record"
dtrace: description 'pid$target::kghalf:entry
' matched 1 probe
dtrace: allowing destructive actions
CPU ID FUNCTION:NAME
0 78530 kghalf:entry
Process is stopped due to memory allocation reason "pmuccst: adt/record" from heap "koh-kghu call ".
Please run the following commands in separate SQL*Plus to dump an errorstack:
SQL> ORADEBUG SETOSPID 1502
SQL> ORADEBUG DUMP ERRORSTACK 3 <<<< ORADEBUG will hang until process is continued by prun - works as designed
Please run the following command in separate shell after executing ORADEBUG:
shell> /usr/bin/prun 1502
I think the DTrace output is self-explanatory and you just have to follow the instructions.
Received ORADEBUG command (#1) 'DUMP ERRORSTACK 3' from process '1519'
*** 2016-01-30 09:04:30.651
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
----- Current SQL Statement for this session (sql_id=8y13z00p0sgc6) -----
SELECT * FROM T2
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
#ff0000; font-size: 8pt; font-family: courier new,courier;">199ddab20 14 function TEST.MEM_ALLOC
----- Call Stack Trace -----
calling call entry
location type point
-------------------- -------- --------------------
*** 2016-01-30 09:04:30.946
ksedst()+307 call skdstdst()
__sighndlr()+6 call ssprtmin()
call_user_handler()+718 call __sighndlr()
#ff0000;">sigacthandler#ff0000;">()+219 call call_user_handler()
#ff0000; font-size: 8pt; font-family: courier new,courier;">kghalf()+2 signal sigacthandler()
#ff0000;">kohalmc()+214 call kghualloc()
kohalc()+145 call kohalmc()
pmuocon2_con_recur()+145 call kohalc()
pmuocon()+127 call pmuocon2_con_recur()
pfrrun_no_tool()+298 call pfrinstr_EXECC()
pfrrun()+1216 call pfrrun_no_tool()
plsql_run()+648 call pfrrun()
peidxr_run()+317 call plsql_run()
ssthrdmain()+558 call opimai_real()
main()+164 call ssthrdmain()
_start()+123 call main()
Checking the call stack trace in the generated trace file reveals that the errostack trace is created at the right time.
The function kghalf() is the last executed function right before the signal handler (by oradebug) kicks in: "sigacthandler()+219<-kghalf()+2". You also can see the PL/SQL call stack and the currently running SQL statement (if a "plain" SQL statement runs at this time - otherwise you see the "entry" PL/SQL call). PL/SQL code line 14 in function TEST.MEM_ALLOC is exactly related to "select * bulk collect into c_tmp from t2;".
I hope you may find this DTrace script a little bit useful in case of troubleshooting unclear PGA memory allocation.
Just let me know if you have some issues with it. If you have any further questions - please feel free to ask or get in contact directly, if you need assistance by troubleshooting Oracle database (performance) issues. You can also follow or drop me a note on Twitter of course.