Data lakes, like legacy storage arrays, are passive. They only hold data. Hadoop is an active reservoir, not a passive data lake. HDFS is a computational file system that can digest, filter and analyze any data in the reservoir, not just store it. HDFS is the only economically sustainable, computational file system in existence. Some […]
I was recently invited to speak about big data at the Rocky Mountain Oracle User's Group. I presentedto Oracle professionals who are faced with an onslaught of hype and mythology regarding Big Data in generaland Hadoop in particular. Most of the audience was familiar with the difficulty of attempting to engineer even Modest Data on […]
In the 1960s, Bank of America and IBM built one of the first credit card processing systems. Although those early mainframes processed just a fraction of the data compared to that of eBay or Amazon, the engineering was complex for the day. Once credit cards became popular, processing systems had to be built to handle […]
#555555;">Have you used Oracle’s SQL Tuning Pack? What were your experiences like?
At my last consulting gig 2 out 3 statements I “tuned” with the SQL tuning pack actually ran significantly more slowly after applying the profiles recommended. I thought it was just my bad karma and luck of the draw until recently. Recently I’ve started asking other DBAs what their experiences were and generally finding less than enthusiastic responses. In the classes on Oracle 10g and OEM, I’ve always cautioned that profiles should not be blindly trusted and give an example using a Swingbench query where the SQL runs more slowly after the profile.If applying the profile, I recommend to jump immediately to the SQL performance page and to verify the effect of the profile on the load of the query.
I hear this from time to time:
“Can’t I create a database copy in seconds with a file system snapshot?”
First let’s take a step back. There is a huge need to clone database quickly, easily and with as little resource as possible for development, QA, UAT, debugging, reporting and backup yet hardly any of the industry uses file system snapshots for cloning. Cloning with file system snapshots has been a possibility for almost 20 years. Why, if there is a solution to a huge problem, is no one using it? Because it requires specialized hardware, storage system experts coordinating with DBAs and system admins and a lot of scripting. (Unless using Delphix)
#555555;">Three challenges specifically stand out when considering Copy on Write filesystem snapshots described in the previous section:
#555555;">These challenges highlight a specific need: to create thin provision clones of a source database from multiple points of time at the same time without using any additional space consumption. This requirement is important, as it allows one base image to serve as the foundation for all subsequent clones and imposes no unplanned storage or refresh requirements on users of the target (cloned) systems.
#555555;">With EMC, thin cloning can only be achieved by using backup technology; in essence, the process has to be architected manually in order to support databases. How can the same goals be achieved but with database thin cloning specifically in mind?
#555555;">Copy on write is a storage or filesystem mechanism that allows storage or filesystems to create snapshots at specific points in time. Whereas Clonedb is a little known and rarely used option, storage technologies are widely known and used in the industry. These snapshots maintain an image of a stroage a specific point in time. If the active storage makes a change to a block, the original block will be read from disk in its original form and written to a save location. Once the block save is completed, the snapshot will be updated to point to the new block location. After the snapshot has been updated, the active storage datablock can be written out and overwrite the original version.
#555555;">A production database is full of data that makes sense for its purpose, whether the database is 10GB or 10TB.
#555555;">Now if you take that database and clone it for QA or development suddenly the data is cumbersome, unnecessary. Terabytes of disk are provisioned simply to hold a replica for the purpose of testing a small subset of it. An entire architecture with its myriad support structures, historical data, indexes, and large objects cloned and ready just to be partially used, trashed, and rebuilt. This is waste, both of time and storage resources. To the business, having a duplicate environment makes absolute sense; however, from an IT point of view the repeated duplication of storage space and the time drain it cause just makes no sense at all.
I. “The Database is hanging!” AKA “the application has problems, good grief”