Expecting Sum-Of-Parts Performance From Shared Solid State Storage? I Didn’t Think So. Neither Should Exadata Customers. Here’s Why.

#000000;" src="https://kevinclosson.files.wordpress.com/2016/06/sum-of-parts.png?w=500&h=282" alt="sum-of-parts" width="500" height="282" srcset="https://kevinclosson.files.wordpress.com/2016/06/sum-of-parts.png?w=500&h=282 500w, https://kevinclosson.files.wordpress.com/2016/06/sum-of-parts.png?w=150&... 150w, https://kevinclosson.files.wordpress.com/2016/06/sum-of-parts.png?w=300&... 300w, https://kevinclosson.files.wordpress.com/2016/06/sum-of-parts.png?w=768&... 768w, https://kevinclosson.files.wordpress.com/2016/06/sum-of-parts.png 972w" sizes="(max-width: 500px) 100vw, 500px" />

Last month I had the privilege of delivering the key note session to the quarterly gathering of Northern California Oracle User Group. My session was a set of vignettes in a theme regarding modern storage advancements. I was mistaken on how much time I had for the session so I skipped over a section about how we sometimes still expect systems performance to add up to a sum of its parts. This blog post aims to dive in to this topic.

To the best of my knowledge there is no marketing literature about XtremIO Storage Array that suggests the array performance is due to the number of solid state disk (SSD) drives found in the device. Generally speaking, enterprise all-flash storage arrays are built to offer features and performance–otherwise they’d be more aptly named Just a Bunch of Flash (JBOF).  The scope of this blog post is strictly targeting enterprise storage.

Wild, And Crazy, Claims

Lately I’ve seen a particular slide–bearing Oracle’s logo and copyright notice–popping up to suggest that Exadata is vastly superior to EMC and Pure Storage arrays because of Exadata’s supposed unique ability to leverage aggregate flash bandwidth of all flash components in the Exadata X6 family. You might be able to guess by now that I aim to expose how invalid this claim is. To start things off I’ll show a screenshot of the slide as I’ve seen it. Throughout the post there will be references to materials I’m citing.

DISCLAIMER: The slide I am about to show was not a fair use sample of content from oracle.com and it therefore may not, in fact, represent the official position of Oracle on the matter. That said, these slides do bear logo and copyright! So, then, the slide:

#000000;" src="https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=559&h=296" alt="X6-sum-of-parts" width="559" height="296" srcset="https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=559&h=296 559w, https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=1... 1118w, https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=1... 150w, https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=3... 300w, https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=7... 768w, https://kevinclosson.files.wordpress.com/2016/06/x6-sum-of-parts.png?w=1... 1024w" sizes="(max-width: 559px) 100vw, 559px" />

Figure 1

I’ll start by listing a few objections. My objections are always based on science and fact so objecting to content–in particular–is certainly appropriate.

  1. The slide (Figure 1) suggests an EMC XtremIO 4 X-Brick array is limited to 60 megabytes per second per “flash drive.”
    1. Objection: An XtremIO 4 X-Brick array has 100 Solid State Disks (SSD)–25 per X-Brick. I don’t know where the author got the data but it is grossly mistaken. No, a 4 X-Brick array is not limited to 60 * 100 megabytes per second (6,000MB/s). An XtremIO 4 X-Brick array is a 12GB/s array: click here. In fact, even way back in 2014 I used Oracle Database 11g Real Application Clusters to scan at 10.5GB/s with Parallel Query (click here). Remember, Parallel Query spends a non-trivial amount of IPC and work-brokering setup time at the beginning of a scan involving multiple Real Application cluster nodes. That query startup time impacts total scan elapsed time thus 10.5 GB/s reflects the average scan rate that includes this “dead air” query startup time. Everyone who uses Parallel Query Option is familiar with this overhead.
  2. The slide (Figure 1) suggests that 60 MB/s is “spinning disk level throughput.”
    1. Objection: Any 15K RPM SAS (12Gb) or FC hard disk drive easily delivers sequential scan throughput of more than 200 MB/s.
  3. The slide (Figure 1) suggests XtremIO cannot scale out.
    1. Objection: XtremIO architecture is 100% scale out so this indictment is absurd. One can start with a single X-Brick and add up to 7 more. In the current generation scaling out in this fashion with XtremIO adds 25 more SSDs, storage controllers (CPU) and 4 more Fibre Channel ports per X-Brick.
  4. The slide (Figure 1) suggests “bottlenecks at server inputs” further retard throughput when using Fibre Channel.
    1. Objection: This is just silly. There are 4 x 8GFC host-side FC ports per XtremIO X-Brick. I routinely test Haswell-EP 2-socket hosts with 6 active 8GFC ports (3 cards) per host. Can a measly 2-socket host really drive 12 GB/s Oracle scan bandwidth? Yes! No question. In fact, challenge me on that and I’ll show AWR proof of a single 2-socket host sustaining Oracle table scan bandwidth at 18 GB/s. No, actually, I won’t make anyone go to that much trouble. Instead, click the following link for AWR proof that a single host with 2 6-core Haswell-EP (2s12c24t) processors can sustain Oracle Database 12c scan bandwidth of 18 GB/s: click here. I don’t say it frequently enough, but it’s true; you most likely do not know how powerful modern servers are!
  5. The slide (Figure 1) says Exadata achieve “full flash throughput.”
    1. Objection: I’m laughing, but that claim is, in fact, the perfect segue to the next section.

Full Flash Throughput

Scan Bandwidth

The slide in Figure 1 accurately states that the NVMe flash cards in the Exadata X6 model are rated at 5.5GB/s. This can be seen in the F320 datasheet. Click the following link for a screenshot of the F320 datasheet: click here. So the question becomes, can Exadata really achieve full utilization of all of the NVMe flash cards configured in the Exadata X6? The answer no, but sort of. Please allow me to explain.

The following graph (Figure 2) shows data cited in the Exadata datasheet and depicts the reality of how close a full-rack Exadata X6 comes to realizing full flash potential.

As we know, a full-rack Exadata has 14 storage servers. The High Capacity (HC) model has 4 NVMe cards per storage server purposed as a flash cache. The HC model also comes with 12 7,200 RPM hard drives per storage server as per the datasheet.

The following graph shows that yes, indeed Exadata X6 does realize full flash potential when performing a fully-offloaded scan (Smart Scan). After all, 4 * 14 * 5.5 is 308 and the datasheet cites 301 GB/s scan performance for the HC model. This is fine and dandy but it means you have to put up with 168 (12 * 14) howling 7,200 RPM hard disks if you are really intent on harnessing the magic power of full-flash potential!

Why the sarcasm? It’s simple really–just take a look at the graph and notice that the all-flash EF model realizes just a slight bit more than 50% of the full flash (aggregate) performance potential. Indeed, the EF model has 14 * 8 * 5.5 == 616 GB/s of full potential available–but not realizable.

No, Exadata X6 does not–as the above slide (Figure 1) suggests–harness the full potential of flash. Well, not unless you’re willing to put up with 168 round, brown, spinning thingies in the configuration. Ironically, it’s the HDD-Flash hybrid HC model that enjoys the “full flash potential.” I doubt the presenter points this bit out when slinging the slide shown in Figure 1.

#000000;" src="https://kevinclosson.files.wordpress.com/2016/07/scanbw.png?w=500&h=293" alt="ScanBW" width="500" height="293" srcset="https://kevinclosson.files.wordpress.com/2016/07/scanbw.png?w=500&h=293 500w, https://kevinclosson.files.wordpress.com/2016/07/scanbw.png?w=1000&h=586 1000w, https://kevinclosson.files.wordpress.com/2016/07/scanbw.png?w=150&h=88 150w, https://kevinclosson.files.wordpress.com/2016/07/scanbw.png?w=300&h=176 300w, https://kevinclosson.files.wordpress.com/2016/07/scanbw.png?w=768&h=450 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 2

IOPS

The slide in Figure 1 doesn’t actually suggest that Exadata X6 achieves full flash potential for IOPS, but since these people made me crack open the datasheets and use my brain for a moment or two I took it upon myself to do the calculations. The following graph (Figure 3) shows the delta between full flash IOPS potential for the full-rack HC and EF Exadata X6 models using data taken from the Exadata datasheet.

No…Exadata X6 doesn’t realize full flash potential in terms of IOPS either.

#000000;" src="https://kevinclosson.files.wordpress.com/2016/06/exa-iops-2.png?w=500&h=291" alt="Exa-IOPS-2" width="500" height="291" srcset="https://kevinclosson.files.wordpress.com/2016/06/exa-iops-2.png?w=500&h=291 500w, https://kevinclosson.files.wordpress.com/2016/06/exa-iops-2.png?w=1000&h... 1000w, https://kevinclosson.files.wordpress.com/2016/06/exa-iops-2.png?w=150&h=87 150w, https://kevinclosson.files.wordpress.com/2016/06/exa-iops-2.png?w=300&h=175 300w, https://kevinclosson.files.wordpress.com/2016/06/exa-iops-2.png?w=768&h=447 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 3

References

Here is a link to the full slide deck containing the slide (Figure 1) I focused on in this post: http://konferenciak.advalorem.hu/uploads/files/INFR_Sarecz_Lajos.pdf.

Just in case that copy of the deck disappears, I pushed a copy up the the WayBack Machine: click here.

Summary

XtremIO Storage Array literature does not suggest that the performance characteristics of the array are a simple product of how many component SSDs the array is configured with. To the best of my knowledge neither does Pure Storage suggest such a thing.

Oracle shouldn’t either. I have now made that point crystal clear.

Filed under: oracle