Search

Top 60 Oracle Blogs

Recent comments

Why oh Why Do We Still Not Have a Fast Bulk “SQL*Unloader” Facility?

Way back in 2004 I was working at the UK side of the Human Genome project. We were creating a massive store of DNA sequences in an Oracle database (this was one of two world-wide available stores of this information, for free & open use by anyone {* see note!}). The database was, for back then, enormous at 5-6TB. And we knew it would approx double every 12 months (and it did, it was 28TB when I had to migrate it to Oracle 10 in 2006, over 40TB 6 months later and grew to half a petabyte before it was moved to another organisation). And were contemplating storing similar massive volumes in Oracle – Protein, RNA and other sequence stores, huge numbers of cytological images (sorry, microscope slides).

I did my little bit to work out how we all tickhttps://mwidlake.files.wordpress.com/2016/12/nature_hg_hires.jpg?w=458&h... 458w, https://mwidlake.files.wordpress.com/2016/12/nature_hg_hires.jpg?w=114&h... 114w" sizes="(max-width: 229px) 100vw, 229px" />


I did my little bit to work out how we all tick

Significant chunks of this data were sometimes processed in specialist ways using Massively Parallel Processing farms (hundreds or thousands of compute nodes so we could do “2 years of compute” in a day or even a few hours). Oracle was never going to be able to support this within the database.

But Oracle had no fast, bulk data unloading offering, a “SQL*Unloader”. Other RDBMSs did. And that gave me an issue. I had to argue as to why hold this data in Oracle where it was hard to yank out chunks of it rather than keep it in say a simple file store or MySQL?

That year I was invited to Oracle Open World and I was given a slot with a senior VP in the RDBMS/OCFS area (as that is what were using – I know! But we were). I had been warned by someone that he could be a little “robust”. In fact my friend called him “Larry’s Rottweiler”. However my chat with the SVP went fine. Until he asked me for the one thing I would really want and I said “well, it’s been promised for the last 2 or 3 versions of Oracle, but it still does not exist in V10 – I’d like a SQL*Unloader so we can rapidly extract bulk data from Oracle into a structured file format”. He went very cold. I continued: “We need to do really intensive processing of the data with ‘C’ and if it is locked into the database we can’t do that. It’s hard for me to justify using Oracle as our primary data store if the data is now stuck there…”

I honestly thought he would bite!https://mwidlake.files.wordpress.com/2016/12/rottweiler-snarling.jpg?w=150 150w" sizes="(max-width: 222px) 100vw, 222px" />


I honestly thought he would bite!

He went ballistic. He was furious, he was raising his voice, leaning forward, I honestly thought he might hit me. I can’t remember the whole rant but one line stuck: “We will NEVER help you get your data out of Oracle!”. I ended the discussion as soon as I could – which obviously suited him fine also.

And here we are over 10 years later and no SQL*Unloader has appeared. And, as far as I can see, there is no longer even a suggestion that “one day” one will exist. You can obviously get data out of Oracle using SQL*Plus, SQLcl or PL/SQL’s UTL_FILE. But you have to write your own code and to make it parallel or fast is not that simple. And there are some commercial tools out there to do it. But fundamentally we do not have a simple, robust & fast data unloader tool within the Oracle toolset.

But I remain mystified as to why Oracle Corp will not provide a tool to do this. I can see the argument that doing so would help you migrate your Oracle Data set to a different platform and thus move away from Oracle, but I think that is a dumb argument. If you are willing to dump your Oracle environment for something else you are already not happy – and making life difficult is only going to make you less happy and almost certain to never come back! It’s incredibly inconvenient to extract all your data at present but compared to the cost to your organisation of changing a central part of your infrastructure, it’s just a bloody annoyance. It’s like domestic service providers (telephone, internet, TV) that make it easy for you to sign up but a nightmare to leave. Thus guaranteeing that once you leave (cos you will) you will certainly NOT be going back to them in a hurry!

So for the sake of what I see as a misplaced protectionist stance they close the door to rapidly extracting data from Oracle databases for processing in other ways. I’ve come across other situations like this, before and since, but the Human Genome issue threw it into sharp relief for me. The end result I’ve seen a few times (and effectively what we had at the Sanger) is the data gets stored at least twice – once in Oracle and then somewhere else where it is easy to access. Now that’s a daft and bad place to be, multiple copies and one of them at least lacking RI. Thanks for forcing that one on us Oracle.

Something that is changing is Cloud. Oracle wants us all to move our data and systems into the sky. Their whole marketing message at the moment is nothing but cloud-cloud-cloud and they give the impression that cloud is in fact everything, every solution. So how do you get 10TB of data into the cloud? It’s not so hard really. You can trickle it in over time, after all networks are constantly getting faster, but for bulk data this is a poor solution. Or you can ship your data physically. You can’t beat the bandwidth of a transit van. I’ve heard Oracle people at OOW and other conferences saying this is easy-peasy to arrange (as it should be, I’ve done it for clients doing migrations a couple of times).

But how are you going to get it out again? How well do you think that a company that has never provided SQL*Unloader is going to help you get bulk data back out of your Oracle cloud-based databases into on-premises systems? And how well are they going to help you shift it from one of their systems to another cloud system? One they have not sold you? Sometimes people talk about business relationships being like a marriage. Well once you gave gone cloud, you might have ensured any divorce is going to be really messy!

Update – see the comment by Noons on the fact that VLDBs and Data Warehouses are often not as static as you think – so there can be a constant need to move a lot data into and out of such very large data stores. Cloud is not currently going to support this well. And if you have a data life-cycle management policy, say archiving off data 3+ years old. Where are you archiving it off to and how will you do this once it is in the web?

* Please note, this was not human DNA sequence information gathered for medical reasons! It was mostly sequence information for other organisms (chimps, flies, potatoes, rice, viruses, bacteria all sorts) generated by the scientific and, in some cases, biopharma community. There was human DNA sequence in there but by consent and anonymised – you can’t have a human genome sequence without involving some humans! We did not hold your Dodgy Uncle’s DNA test that was taken for that court case…