While presenting at Oaktable World 2014 in San Fransisco, I discussed the in-memory pre-population speed and indicated that it takes about 30 minutes to 1 hour to load ~300GB of tables. Someone asked me “Why?” and that was a fair question. So, I profiled the in-memory pre-population at startup.
I profiled all in-memory worker sessions using Tanel’s snapper script and also profiled the processes in OS using Linux perf tool with 99Hz sample rate. As there is no other activity in the database server, it is okay to sample everything in the server. Snapper output will indicate where the time is spent; if the time is spent executing in CPU, then the perf report output will tell us the function call stack executing at that CPU cycle. Data from these two profiling methods will help us to understand the root cause of slowness.
I have been testing the inmemory column store product extensively and the product is performing well for our workload. However, I learnt a bit more about inmemory column store and I will be blogging a few them here. BTW, I will be talking about internals of inmemory in Oaktable world presentation, if you are in the open world 2014, you can come and see my talk: http://www.oraclerealworld.com/oaktable-world/agenda/
I blogged earlier about heap dump shared pool heap duration and was curious to see how the inmemory – 22.214.171.124 new feature – is implemented. This is a short blog entry to discuss the inmemory area heap.
I have set the initialization parameters sga_target=32G and inmemory_size=16G, meaning, out of 32GB SGA, 16GB will be allocated to inmemory area and the remaining 16GB will be allocated to the traditional areas such as buffer_cache, shared_pool etc. I was expecting v$sgastat view to show the memory allocated for inmemory area, unfortunately, there are no rows marked for inmemory area (Command “show sga” shows the inmemory area though). However, dumping heapdump at level 2 shows that the inmemory area is defined as a sub-heap of the top level SGA heap. Following are the commands to take an heap dump.
Data visualization is a useful method to identify performance patterns. In most cases, I pull custom performance metrics from AWR repository and use tableau to visualize the data. Of course, you can do the visualization using excel spreadsheet too.
We had huge amount of PX qref waits in a database:
After collaborating with many performance engineers in a RAC database, I have come to realize that there are common pattern among the (mis)diagnosis. This blog about discussing those issues. I talked about this in Hotsos 2014 conference also.
Here are the golden rules of RAC performance diagnostics. These rules may not apply general RAC configuration issues though.
Looks like, this may be better read as a document. So, please use the pdf files of the presentation and a paper. Presentation slide #10 shows indepth coverage on gc buffer busy* wait events. I will try to blog about that slide later (hopefully).
I blogged about Dynamic Resource Mastering (DRM) in RAC here . DRM freezes the global resources during the reconfiguration event and no new resources can be allocated during the reconfiguration. This freeze has a dramatic effect of inducing huge amount of waits for gc buffer busy [acquire|release] events and other gcs drm freeze release, gcs remaster events. In database version 12c, DRM has been improved further.
A major improvement I see is that not all resources are frozen at any time. Essentially, resources are broken down in to partitions and only a resource partition is frozen. This improvement should decrease the impact of DRM related waits tremendously.
LMON Trace file
I will be presenting in HOTSOS symposium 2014 discussing correct methods to diagnose RAC performance issues. Very surprisingly, even very senior performance engineers make mistakes in their analysis while reviewing RAC issues. Come to my presentation and learn the golden rules of RAC performance diagnostics.
It is easier to create one or two AWR reports quickly using OEM. But, what if you have to create AWR reports for many snapshots? For example, your Oracle support analyst wants you to supply 10 1-hour AWR reports from 10AM to 8PM in a 8 node cluster? That’s about 80 AWR reports to create! Okay, okay, I may(!) be overselling it, but you get the point. It is useful to have a script to create AWR report for all instances for a given range of snapshot IDs. Following scripts are handy:
I blogged about DFS lock handle contention in an earlier blog entry. SV resources in Global Resource Directory (GRD) is used to maintain the cached sequence values. I will further probe the internal mechanics involved in the cached sequences. I will also discuss minor changes in the resource names to support pluggable databases (version 12c).
Let’s create an ordered sequence in rs schema and then query values from the sequence few times.
create sequence rs.test_seq order cache 100; select rs.test_seq.nextval from dual; -- repeated a few times. ... / 21
Sequence values are permanently stored in the seq$ dictionary table. Cached sequence values are maintained in SV resources in GRD and SV resource names follows the naming convention to include object_id of the sequence. I will generate a string using a small helper script and we will use that resource name to search in the GRD.
A quick note, Expert Oracle RAC book co-written by me is available now: Expert Oracle RAC 12c. I have written about 6 chapters covering the RAC internals that you may want to learn :) I even managed to discuss the network internals in deep, after all, network is one of the most important component of a RAC cluster.