Recently I had the pleasure of corresponding with Hans-Peter Sloot. After looking at my simple tool in this post to gather cell IO data from cellcli, he took it a several steps further and created a nice python version that goes to the next level to pull IO statistics from the cells.
This script provides breaks down the IO by “Small” and “Large” as is commonly done by the Enterprise manager. It also provides a summary by cell. Here is a sample output from this script.
Recently I have been asked to investigate the following error on an Exadata system.
ORA-64307: hybrid columnar compression is not supported for tablespaces on this storage type
Well, that’s simple I thought! Must be (d)NFS mounted storage, right? Everyone knows that you can have HCC on Exadata (and a few other storage products). So I looked at the problem and soon found out that the data files in question all resided on the cells. Here is the sequence of events:
When the Smart Flash Cache was introduced in Exadata, it was caching reads only. So there were only read “optimization” statistics like cell flash cache read hits and physical read requests/bytes optimized in V$SESSTAT and V$SYSSTAT (the former accounted for the read IO requests that got its data from the flash cache and the latter ones accounted the disk IOs avoided both thanks to the flash cache and storage indexes). So if you wanted to measure the benefit of flash cache only, you’d have to use the cell flash cache read hits metric.
This post also applies to non-Exadata systems as hard drives work the same way in other storage arrays too – just the commands you would use for extracting the disk-level metrics would be different.
I just noticed that one of our Exadatas had a disk put into “predictive failure” mode and thought to show how to measure why the disk is in that mode (as opposed to just replacing it without really understanding the issue ;-)
In the previous post about in-memory parallel execution I described in which cases the in-mem PX can kick in for your parallel queries.
A few years ago (around Oracle 18.104.22.168 and Exadata X2 release time) I was helping a customer with their migration to Exadata X2. Many of the queries ran way slower on Exadata compared to their old HP Superdome. The Exadata system was configured according to the Oracle’s “best practices”, that included setting the parallel_degree_policy = AUTO.
This post applies both to non-Exadata and Exadata systems.
This is the fourth post on a serie of postings on how to get measurements out of the cell server, which is the storage layer of the Oracle Exadata database machine. Up until now, I have looked at the measurement of the kind of IOs Exadata receives, the latencies of the IOs as as done by the cell server, and the mechanism Exadata uses to overcome overloaded CPUs on the cell layer.
Exadata is about doing IO. I think if there’s one thing people know about Exadata, that’s it. Exadata brings (part of the) processing potentially closer to the storage media, which will be rotating disks for most (Exadata) users, and optionally can be flash.
When you are administering an Exadata or more Exadata’s, you probably have multiple databases running on different database or “computing” nodes. In order to understand what kind of IO you are doing, you can look inside the statistics of your database, and look in the data dictionary what that instance or instances (in case of RAC) have been doing. When using Exadata there is a near 100% chance you are using either normal redundancy or high redundancy, of which most people know the impact of the “write amplification” of both normal and high redundancy of ASM (the write statistics in the Oracle data dictionary do not reflect the additional writes needed to satisfy normal (#IO times 2) or high (#IO times 3) redundancy). This means there might be difference in IOs between what you measure or think for your database is doing, and actually is done at the storage level.
Exadata gets its performance by letting the storage (the exadata storage server) participate in query processing, which means part of the processing is done as close as possible to where the data is stored. The participation of the storage server in query processing means that a storage grid can massively parallel (depending on the amount of storage servers participating) process a smart scan request.