Werelds grootste IT conferentie, Oracle OpenWorld in San Francisco is voorbij. Meerdere Ordina mensen zijn…
I’ve spent the last couple of evenings playing with the new SQL pattern matching feature in Oracle 12c.
I’m doing some sessions on analytic functions in some upcoming conferences and I thought I should look at this stuff. I’m not really going to include much, if anything, about it as my sessions are focussed on beginners and I don’t really want to scare people off. The idea is to ease people in gently, then let them scare themselves once they are hooked on analytics. I’m thinking about Hooked on Monkey Fonics now…
At first glance the pattern matching seems pretty scary. There are a lot of options and as soon as you throw regular expressions into the mix it does make your head swim a little. After a couple of half-baked attempts, where I found convenient excuses to give in when the going got tough, I finally sat down and plugged through the docs. If you actually RTFM it is a lot easier than hoping to wing it. Who’da thunk it?
I’ve tried to keep the article really light. The docs are pretty good for this stuff (if you read them) and they have a lot of examples. I started adding more and more detail to the article, then chopped most of it out. There is no point regurgitating all the options when it is in the docs. Most of the examples I’ve seen before just talk about basic patterns, like V and W shapes, but it’s quite simple to do complicated stuff once you start playing. In fact it takes more time to set up the example data than it does to figure out the queries to bring it back.
In the near future I will be copy/pasting examples and adjusting them or just sitting with my article and the docs when trying to use this stuff. I think it’s going to take a long time before I can type this stuff from memory. Partly that’s because I can’t see myself having lots of cause to use it. I can’t think of any scenarios I’ve experienced where this would have been a natural fit. Having said that, I’ve never worked in things like stock markets, betting and stuff like that where I can imagine this sort of pattern matching is needed all the time. I seem to remember one person at a conference, who shall remain nameless, saying this feature was one of their drivers for upgrading to 12c. I wonder if that was for real or an exaggeration?
Anyway, if you need this sort of analysis, I think it’s worth checking out, but try to remember it’s not as scary as it first looks.
The aim of this post is not to explain how the APPROX_COUNT_DISTINCT function works (you find basic information in the documentation and in this post written by Luca Canali), but to show you the results of a test case I run to assess how well it works.
Here’s what I did…
I created a table with several numerical columns (the name of the column shows how many distinct values it contains), loaded 100 million rows into it (the size of the segment is 12.7 GB), and gathered the object statistics.
SQL> CREATE TABLE t 2 AS 3 WITH 4 t1000 AS (SELECT /*+ materialize */ rownum AS n 5 FROM dual 6 CONNECT BY level <= 1E3) 7 SELECT rownum AS id, 8 mod(rownum,2) AS n_2, 9 mod(rownum,4) AS n_4, 10 mod(rownum,8) AS n_8, 11 mod(rownum,16) AS n_16, 12 mod(rownum,32) AS n_32, 13 mod(rownum,64) AS n_64, 14 mod(rownum,128) AS n_128, 15 mod(rownum,256) AS n_256, 16 mod(rownum,512) AS n_512, 17 mod(rownum,1024) AS n_1024, 18 mod(rownum,2048) AS n_2048, 19 mod(rownum,4096) AS n_4096, 20 mod(rownum,8192) AS n_8192, 21 mod(rownum,16384) AS n_16384, 22 mod(rownum,32768) AS n_32768, 23 mod(rownum,65536) AS n_65536, 24 mod(rownum,131072) AS n_131072, 25 mod(rownum,262144) AS n_262144, 26 mod(rownum,524288) AS n_524288, 27 mod(rownum,1048576) AS n_1048576, 28 mod(rownum,2097152) AS n_2097152, 29 mod(rownum,4194304) AS n_4194304, 30 mod(rownum,8388608) AS n_8388608, 31 mod(rownum,16777216) AS n_16777216 32 FROM t1000, t1000, t1000 33 WHERE rownum <= 1E8; SQL> execute dbms_stats.gather_table_stats(user,'T')
Then, for every column, I ran two queries and measured the elapsed time, the maximum amount of PGA used by the query, and the precision of the result. Note that the test case was designed to avoid the usage of a temporary segment. In other words, all data required for the aggregation was stored into the PGA. As a result, the processing was CPU bound.
SELECT /*+ no_parallel */ count(DISTINCT n_2) FROM t
SELECT /*+ no_parallel */ approx_count_distinct(n_2) FROM t
Let’s have a look to three charts summarizing how well the APPROX_COUNT_DISTINCT function works:
According to this test case, in my opinion, the performance and accuracy of the APPROX_COUNT_DISTINCT function with numerical values (I still have to test other data types) are good. Hence, I see no reason for not using it when an estimate of the number of distinct values is enough.
I got a comment today on my recent Oracle fanboy post, which I thought was very interesting and worthy of a blog post in reply. The commenter started by criticising the Oracle license and support costs (we’ve all had that complaint) as well as the quality of support (yes, I’ve been there too), but that wasn’t the thing that stood out. The final paragraph was as follows…
“One addition. I know you, your past work and you are very brainy person but since last couple of years you became Oracle doctrine submissive person just like most of the rest of ACE Directors. When you were just ACEs, you were more trustworthy than now and you weren’t just Oracle interpreters… And unfortunately I’m not the only person with this opinion, but probably I’m only one who is not affraid to make it public.”
I think that’s a really interesting point and one that I feel compelled to write about…
Let me start by saying I don’t believe this comment was in any way directed at the main body of my website. The articles there have always been “how-to” style articles and typically don’t contain much in the way of opinions about the functionality. I’ve always tried to keep facts in the articles and opinions and random junk on the blog. With that distinction in place, let’s talk about my blog…
When I first joined the Oracle ACE Program in 2006 I was very concious of what *I thought it meant* about what I could and couldn’t say. On the one hand I didn’t want to piss off Oracle as I was very proud of my little ACE badge, but I also didn’t want to be considered Oracle’s Bitch. I quickly learned a couple of things:
So have I become one of Oracle’s bitches over the last few years? Well, I’ve been an ACE since 1st April 2006 (yes, April fool’s day) and I’ve been an ACE Director since some time in 2007 or 2008. I can’t really remember to be honest, but let’s say for the sake of argument it’s been 6 years as an ACED. If it was becoming an ACED that made me an “Oracle doctrine submissive person” in the last couple of years, it must have taken Oracle four years of work to make me that way.
I don’t believe I alter my beliefs to fit any criteria, but I guess it is really difficult to be subjective about yourself and I would be very interested to know what other people think about this. If I think about some common topics of discussion over the last few years where I don’t fall “on message”, they would probably be:
I’ve just had a look through my posts over the last year and if anything, I would say I’m promoting KeePass and MobaXterm more than Oracle. I know I get a little gushy about the ACE Program during conference write ups, and maybe that annoys people a bit, but I just can’t see that I’ve become a total drone… (Denial is not just a river in Africa?)
Anyway, I have two things to say in closing:
PS. For those that feel the need to, please don’t wade in with comments in my defence as I don’t think this is either necessary or helpful. I think the person in question had a genuine concern and quite frankly that makes it a concern of mine also…
A recent question on the OTN forum asked about narrowing down the cause of deadlocks, and this prompted me to set up a little example. Here’s a deadlock graph of a not-quite-standard type:
Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-00040001-000008EC-00000000-00000000 50 249 X 48 9 X TX-000A001F-000008BC-00000000-00000000 48 9 X 50 249 S
My session (the one that dumped the trace file) is 249, and I was blocked by session 9. The slight anomaly, of course, is that I was waiting on a TX lock in mode 4 (Share) rather than the more common mode 6 (eXclusive).
There are plenty of notes on the web these days to tell you that this wait relates in some way to a unique index (or some associated referential integrity) or an ITL wait. (Inevitably there are a couple of other less frequently occurring and less well documented reasons, such as waits for tablespaces to change state but I’m going to ignore those for now). The question is, how do I tell whether this example is related to uniqueness (indexing) or ITLs ? For recent versions of Oracle the answer is in the rest of the trace file which now hold the recent wait history for the session that dumped the trace file.
Reading down my trace file, past the line which says “Information for THIS session”, I eventually get to this:
Current Wait Stack: 0: waiting for 'enq: TX - allocate ITL entry' name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc wait_id=80 seq_num=81 snap_id=1
So it didn’t take me long to find out I had an ITL problem (which should be a pretty rare occurrence in newer versions of Oracle); but there’s more:
... There is at least one session blocking this session. Dumping 1 direct blocker(s): inst: 1, sid: 9, ser: 40192 Dumping final blocker: inst: 1, sid: 9, ser: 40192 There are 2 sessions blocked by this session. Dumping one waiter: inst: 1, sid: 357, ser: 7531 wait event: 'enq: TX - allocate ITL entry' ... Session Wait History: elapsed time of 0.000035 sec since current wait 0: waited for 'enq: TX - allocate ITL entry' name|mode=0x54580004, usn<<16 | slot=0x5000c, sequence=0xa39 wait_id=79 seq_num=80 snap_id=1 wait times: snap=5.002987 sec, exc=5.002987 sec, total=5.002987 sec wait times: max=5.000000 sec wait counts: calls=2 os=2 occurred after 0.000047 sec of elapsed time 1: waited for 'enq: TX - allocate ITL entry' name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc wait_id=78 seq_num=79 snap_id=1 wait times: snap=1 min 4 sec, exc=1 min 4 sec, total=1 min 4 sec wait times: max=1 min 4 sec wait counts: calls=22 os=22 occurred after 0.000032 sec of elapsed time ... 8: waited for 'enq: TX - allocate ITL entry' name|mode=0x54580004, usn<<16 | slot=0x5000c, sequence=0xa39 wait_id=71 seq_num=72 snap_id=1 wait times: snap=5.001902 sec, exc=5.001902 sec, total=5.001902 sec wait times: max=5.000000 sec wait counts: calls=2 os=2 occurred after 0.000042 sec of elapsed time 9: waited for 'enq: TX - allocate ITL entry' name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc wait_id=70 seq_num=71 snap_id=1 wait times: snap=4.005342 sec, exc=4.005342 sec, total=4.005342 sec wait times: max=4.000000 sec wait counts: calls=2 os=2 occurred after 0.000031 sec of elapsed time ... Sampled Session History of session 249 serial 3931 --------------------------------------------------- The history is displayed in reverse chronological order. sample interval: 1 sec, max history 120 sec --------------------------------------------------- [9 samples, 11:14:50 - 11:14:58] waited for 'enq: TX - allocate ITL entry', seq_num: 81 p1: 'name|mode'=0x54580004 p2: 'usn<= 8 sec (still in wait) [5 samples, 11:14:45 - 11:14:49] waited for 'enq: TX - allocate ITL entry', seq_num: 80 p1: 'name|mode'=0x54580004 p2: 'usn<<16 | slot'=0x5000c p3: 'sequence'=0xa39 time_waited: 5.002987 sec (sample interval: 4 sec) ...
The little report that follows the initial wait state shows that the situation was a little messy – session 9 was my first and last blocker, but there was another session tangled up in the chain of waits, session 357.
Following this there’s a set of entries from my v$session_wait_history - and if you look carefully at the slot and sequence that appears on the second line of each wait you’ll notice that my waits have been alternating between TWO other sessions/transactions before I finally crashed.
Finally there’s a set of entries for my session extracted from v$active_session_history. (Question: I’m only allowed to query v$active_session_history if I’ve licensed the Diagnostic Pack – so should I shut my eyes when I get to this part of the trace file ;) This breakdown also shows my session alternating between waits on the two different blockers, giving me a pretty good post-event breakdown of what was going on around the time of the deadlock.
Empowering users! Giving users access to the information they need, when they need it! Allowing users to decide what they need! These are all great ideas and there are plenty of products out there that can be used to achieve this. The question must be, is it really necessary?
There will always be some users that need this functionality. They will need up-to-the-second ad hoc reporting and will invest their time into getting the most from the tools they are given. There is also a large portion of the user base that will quite happily use what they are given and will *never* invest in the tool set. They don’t see it as part of their job and basically just don’t care.
Back when I started IT, most projects had some concept of a reporting function. A group of people that would discuss with the user base the type of reporting that was needed and identify what was *really needed* and what were just the never ending wish list of things that would never really be used. They would build these reports and they would go through user acceptance and be signed off. It sounds like the bad old days, but what you were left with were a bunch of well defined reports, written by people who were “relatively speaking” skilled at reporting. What’s more, the reporting function could influence the application design. The quickest way to notice that “One True Lookup Table” is a bad design is to try and do some reporting queries. You will soon change your approach.
With the advent of ad hoc reporting, the skills base gradually eroded. We don’t need a reporting function any more! The users are in charge! All we need is this semantic layer and the users can do it all for themselves! Then the people building the semantic layers got lazy and just generated what amounts to a direct copy of the schema. Look at any database that sits behind one of these abominations and I can pretty much guarantee the most horrendous SQL in the system is generated by ad hoc reporting tools! You can blame the users for not investing more time in becoming an expert in the tool. You can blame the people who built the semantic layer for doing a poor job. You can blame the tools. What it really comes down to is the people who used ad hoc reporting as a “quick and easy” substitute for doing the right thing.
There will always be a concept of “standard reports” in any project. Stuff that is known from day one that the business relies on. These should be developed by experts who do it using efficient SQL. If they are not time-critical, they can be scheduled to spread out the load on the system, yet still be present when they are needed. This would relieve some of the sh*t-storm of badly formed queries hitting the database from ad hoc reporting.
A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 months ago referencing 184.108.40.206; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the depth calculated by Oracle doesn’t match the level that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:
create table t1 as select rownum id, lpad(rownum,200) padding from all_objects where rownum <= 2500 ; create table t2 as select * from t1 ; -- call dbms_stats to gather stats explain plan for select case mod(id,2) when 1 then (select max(t1.id) from t1 where t1.id <= t2.id) when 0 then (select max(t1.id) from t1 where t1.id >= t2.id) end id from t2 ; select * from table(dbms_xplan.display);
It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:
----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2500 | 10000 | 28039 (2)| 00:00:02 | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL | T1 | 125 | 500 | 11 (0)| 00:00:01 | | 3 | SORT AGGREGATE | | 1 | 4 | | | |* 4 | TABLE ACCESS FULL| T1 | 125 | 500 | 11 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | T2 | 2500 | 10000 | 11 (0)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T1"."ID"<=:B1) 4 - filter("T1"."ID">=:B1)
As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.
SQL> select id, parent_id from plan_table; ID PARENT_ID ---------- ---------- 0 1 0 2 1 3 0 4 3 5 0 -------------------------------------------------------------------------------- | Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | -------------------------------------------------------------------------------- | SELECT STATEMENT | | 2K| 9K| 28039 | | | | SORT AGGREGATE | | 1 | 4 | | | | | TABLE ACCESS FULL |T1 | 125 | 500 | 11 | | | | SORT AGGREGATE | | 1 | 4 | | | | | TABLE ACCESS FULL |T1 | 125 | 500 | 11 | | | | TABLE ACCESS FULL |T2 | 2K| 9K| 11 | | | --------------------------------------------------------------------------------
So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.
A question via twitter – does the error also show up with dbms_xplan.display_cursor(), SQL tuning sets, AWR, etc. or is it just a defect of explain plan. Since the depth is (probably) a derived value for display purposes that Oracle doesn’t use internally for executing the plan I would be inclined to assume that the defect is universal, but I’ve only checked it through explain plan/display, and through execution/display_cursor().
I spoke at a one day DOUG meeting yesterday. It was pretty cool. Very small intimate group of about 50. The speakers were Nitin Vengurlekar, Charles Kim, Cary Millsap and myself. All are Ace Directors and either work at Viscosity or Enkitec. As a bonus, Tanel Poder showed up to weigh in on some open discussion. Anyway, I thoroughly enjoyed it. I promised the group I would post my slides and a zip file with some of my scripts that I demoed. So here it is (click on the image to download a zip file with PDF and scripts):
This is a quick post on using git on a server. I use my Synology NAS as a fileserver, but also as a git repository server. The default git package for Synology enables git usage on the command line, which means via ssh, or via web-DAV. Both require a logon to do anything with the repository. That is not very handy if you want to clone and pull from the repository in an automated way. Of course there are ways around that (basically setting up password-less authentication, probably via certificates), but I wanted simple, read-only access without authentication. If you installed git on a linux or unix server you get the binaries, but no daemon, which means you can only use ssh if you want to use that server for central git repositories.
Running git via inetd
What I did is using inetd daemon to launch the git daemon. On any linux or unix server with the inetd daemon, and on Synology too, because it uses linux under the covers, it’s easy to setup git as a server.
First, check /etc/services for the following lines:
git 9418/tcp # git pack transfer service git 9418/udp # git pack transfer service
Next, add the following line in the inetd.conf (which is /etc/inetd.conf on my synology):
git stream tcp nowait gituser /usr/bin/git git daemon --inetd --verbose --export-all --base-path=/volume1/homes/gituser
What you should look for in your setup is:
– gituser: this is the user which is used to run the daemon. I created a user ‘gituser’ for this.
– /usr/bin/git: of course your git binary should be at that fully specified path, otherwise inetd can’t find it.
– git daemon:
— –inetd: notify the git executable that it is running under inetd
— –export-all: all git repositories underneath the base directory will be available
— –base-path: this makes the git root directory be set to this directory. In my case, I wanted to have all the repositories in the home directory of the gituser, which is /volume1/homes/gituser in my case.
And make the inetd deamon reload it’s configuration with kill -HUP:
# killall -HUP inetd
Please mind this is a simple and limited setup, if you want to set it up in a way with more granular security, you should look into gitolite for example.
My old NAS went pop a little while ago and I’ve spent the last few weeks backing up to alternate servers while trying to decide what to get to replace it.
Reading the reviews on Amazon is a bit of a nightmare because there are always scare stories, regardless how much you pay. In the end I decided to go for the “cheap and cheerful” option and bought a ReadyNAS 104. I got the diskless one and bought a couple of 3TB WD Red disks, which were pretty cheap. It supports the 4TB disks, but they are half the price again and I’m mean. Having just two disks means I’ve got a single 3TB RAID 1 volume. I can add a third and fourth disk, which will give me approximately 6 or 9 TB. It switches to RAID 5 by default with more than 2 disks.
The setup was all web based, so I didn’t have any OS compatibility issues. Installation was really simple. Slipped in the disks. Plugged the ethernet cable to my router and turned on the power. I went to the website (readycloud.netgear.com), discovered my device and ran through the setup wizard. Job done. I left it building my RAID 1 volume overnight, but I was able to store files almost immediately, while the volume was building.
The web interface for the device is really simple to use. I can define SMB/AFP/NFS shares in a couple of clicks. Security is really obvious. I can define iSCSI LUNs for use with my Linux machines and it has TimeMachine integration if you want that.
The cloud-related functionality is optional, so if you are worried about opening up a potential security hole, you can easily avoid it. I chose not to configure it during the setup wizard.
I was originally going to spend a lot more on a NAS, but I thought I would chance this unit. So far I’m glad I did. It’s small, solid and silent. Fingers crossed it won’t go pair-shaped.
I’ve got all the vital stuff on it now. I’ll start backing up some of my more useful VMs to it and see if I need to buy some more disks. I’ve got about 10TB of storage on servers, but most of it is taken up with old VMs I don’t care about, so I will probably be a bit selective.
PS. I think NetGear might be doing a revamp of their NAS lineup soon, so you might get one of these extra cheap in the near future. They’re already claim to be about 50% RRP, but most RRPs are lies.