Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

A Better Way to Discover Blogs and Get Inspired

WordPress is home to millions of sites across countless topics. It’s a big and beautiful world, and we want to make it easier for you to discover new voices. Over the past few months, the mobile team has been working hard to improve the experience of your WordPress Reader on the mobile app. In particular, we’ve been exploring different ways for you to discover new blogs and find inspiration.

The new Discover tab on your Reader will recommend blogs and posts based on topics you follow. These changes give you more control over what you see, making it easier for you to find interesting voices, while also giving you and your site the opportunity to find a bigger audience. 

https://en-blog.files.wordpress.com/2020/09/top-shot-4.png?w=2048 2048w, https://en-blog.files.wordpress.com/2020/09/top-shot-4.png?w=150 150w, https://en-blog.files.wordpress.com/2020/09/top-shot-4.png?w=300 300w, https://en-blog.files.wordpress.com/2020/09/top-shot-4.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

How it works

Add appropriate tags and categories when drafting your blog posts — this helps us recommend your posts to the right audience. 

The topics you now see in your improved Reader are a combination of tags and categories. If you want to find interesting blogs, follow topics you care about. The Discover tab will then show you recommended blogs and posts based on those topics.

https://en-blog.files.wordpress.com/2020/09/discover-feed-2.png?w=2048 2048w, https://en-blog.files.wordpress.com/2020/09/discover-feed-2.png?w=150 150w, https://en-blog.files.wordpress.com/2020/09/discover-feed-2.png?w=300 300w, https://en-blog.files.wordpress.com/2020/09/discover-feed-2.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

Each post on the Discover tab has a list of topics on top. If you want to go deeper into a topic, tap on it to see a feed of blog posts from that specific topic.

https://en-blog.files.wordpress.com/2020/09/topic-chips.png?w=2048 2048w, https://en-blog.files.wordpress.com/2020/09/topic-chips.png?w=150 150w, https://en-blog.files.wordpress.com/2020/09/topic-chips.png?w=300 300w, https://en-blog.files.wordpress.com/2020/09/topic-chips.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

If you’d like to see more posts from a particular topic on your Discover feed, tap the Follow button from that topic feed.

https://en-blog.files.wordpress.com/2020/09/topic-feed-1.png?w=2048 2048w, https://en-blog.files.wordpress.com/2020/09/topic-feed-1.png?w=150 150w, https://en-blog.files.wordpress.com/2020/09/topic-feed-1.png?w=300 300w, https://en-blog.files.wordpress.com/2020/09/topic-feed-1.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

Soon we’ll be rolling out improvements to posts on the Reader as well. To give blog posts more room to shine, the featured image will be more prominent. 

https://en-blog.files.wordpress.com/2020/09/post-detail-view.png?w=2048 2048w, https://en-blog.files.wordpress.com/2020/09/post-detail-view.png?w=150 150w, https://en-blog.files.wordpress.com/2020/09/post-detail-view.png?w=300 300w, https://en-blog.files.wordpress.com/2020/09/post-detail-view.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

If you’d like to try the new Discover tab, make sure you update your WordPress app to the latest version. If you don’t have the app yet, you can download it for free, on both Android and iOS. We’d love to hear your thoughts on the new experience. For specific feedback on the updates, reach out to us from within the app by going to My Site, tapping your photo on the top right, tapping Help & Support → and then selecting Contact Support.

Toward zero: Reducing and offsetting our data center power emissions

Following the massive Australian bushfires earlier this year, I was motivated to act within my role as a data scientist at Automattic to help fight anthropogenic climate change. Together with colleagues from across the company, we formed an employee resource group focused on sustainability. We are pleased to announce that as a result of our efforts, Automattic now offsets data center power emissions produced from non-renewable sources. This means that the servers running WordPress.com, WordPress VIP, Tumblr, and other Automattic services contribute net zero carbon emissions to our shared atmosphere.

Measuring and offsetting emissions is not a trivial task. In the interest of transparency, this post provides more details on the decisions we made and answers questions that readers may have on the topic. We hope that this will benefit other organizations that are in a similar position to Automattic. We welcome feedback and are happy to answer any other questions you may have.

The decision: For 2020, we decided to purchase offsets from Simoshi via the United Nations’ offset platform. These offsets are produced by improving the efficiency of cooking stoves in Ugandan schools. Emission reductions are achieved by using less wood to cook the same amount of food. This project also has third-party certification from the Gold Standard, and it contributes to nine of the United Nations’ Sustainable Development Goals, including No Poverty, Quality Education, and Gender Equality. See the project page and the following video for more details:


Why did we choose this project? Anyone who’s tried to purchase offsets knows that it can be complicated. We don’t have in-house sustainability experts, so we relied on publicly-available information to better understand the topic. Resources we found useful include: Carbon Offset Guide, atmosfair, and Greenhouse Gas Protocol. As the price of offsets varies widely, we chose to follow Microsoft’s approach and set our own internal price of $15 per metric tonne of CO2e. Simoshi’s project stood out because it matches our budget, has a clear emission reduction mechanism, is certified by the United Nations and the Gold Standard, and has many benefits beyond emission reductions, which align with our company’s values.

What emissions do our offsets cover? Automattic has servers in many data centers around the world, operated by different providers. As we don’t control the data center providers’ choice of energy utilities, we treat the emissions from data center power use as being in Scope 3, i.e., as indirect emissions from our value chain. For each data center, we used publicly-available information from our providers to determine whether they’re powered by renewable resources. This led us to conclude that approximately half of our data center energy use is covered by renewables paid for by the data center providers. For the other data centers, we used our servers’ power consumption logs to get the estimated power used over a period of one year. We then multiplied these figures by 1.5 to obtain a conservative estimate that accounts for power usage effectiveness. Using a variety of resources on grid carbon intensity, such as those published by the American Environmental Protection Agency and the European Environment Agency, we converted these power use estimates to emission estimates. This gave us an overall figure of 1,850 tonnes of CO2e for 2020.

Why offset rather than reduce emissions? We are aware that offsetting is an imperfect solution. Ideally, we would source all our energy from renewables. In a perfect world, it wouldn’t even be possible to buy energy generated by burning fossil fuels. However, given the current reality, setting our own price on carbon and offsetting non-renewable data center emissions is a good temporary solution. This also gives us a financial incentive to work with providers and shift toward greener data centers. In fact, this sort of shift happened last year when we changed our main European data center to a provider that operates on 100% renewables. We hope to continue making such changes in coming years, i.e., reducing emissions where feasible and offsetting the rest.

Why aren’t we doing more? From watching the climate action space, it seems like every announcement is greeted with demands to do more. This is a positive thing — society should hold companies accountable for their actions. As a company, we believe that we can always do better: The opening sentence of our creed is “I will never stop learning”, and we know that we are “in a marathon, not a sprint.” It is our hope that as we learn more about the space and our impact, we will be able to take stronger climate action.

What are we planning to do next? Automattic is a fully-distributed company. This means that our employees aren’t required to commute to central offices, which leads to significant savings in carbon emissions. However, we historically relied on flying to in-person meetups a few times a year to foster collaboration and bonding. Since March 2020, all business travel has been suspended, and it is still unclear what travel will look like in the post-pandemic world. In any case, as an employee resource group, we are planning on quantifying our travel emissions, and advocating for reducing avoidable trips and offsetting emissions from trips that are deemed essential. One change that is already taking place is aligning more teams around fewer time zones. In addition to helping with synchronous collaboration and decreasing isolation, this will reduce the distance traveled per person once meetups resume. We will share more on other actions we take in the future — watch this space! We also welcome feedback from our customers, so please comment on this post or contact us to share your thoughts.

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk)

As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other […]

Video : Multitenant : Dynamic CPU Scaling – Resource Manager Control of CPU using CPU_COUNT and CPU_MIN_COUNT

In today’s video we’ll discuss how Resource Manager can control CPU usage in PDBs using the CPU_COUNT and CPU_MIN_COUNT parameters. Oracle call this Dynamic CPU Scaling. This can be useful to stop a small number of PDBs using all CPU resources assigned to the instance.

This video is based on the following article.

Most of this information was in my instance caging article, but I’ve moved it into this separate article now.

The star of today’s video is Bailey. He has a human called Connor McDonald. I suspect Bailey got is human to voice the video…

Cheers

Tim…

The post Video : Multitenant : Dynamic CPU Scaling - Resource Manager Control of CPU using CPU_COUNT and CPU_MIN_COUNT first appeared on The ORACLE-BASE Blog.


Video : Multitenant : Dynamic CPU Scaling – Resource Manager Control of CPU using CPU_COUNT and CPU_MIN_COUNT was first posted on September 21, 2020 at 7:35 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Amazon Aurora Serverless (PostgreSQL compatibility)

By Franck Pachot

.
I’ve written a blog post about serverless databases and here is an example of Amazon RDS Aurora PostgreSQL in serverless mode:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 2048w" sizes="(max-width: 2560px) 100vw, 2560px" />

When I’ve created the instance (15:55 – CloudWatch is GMT+2 but event log is UTC), it started with 0 capacity unit (18:03), which means that it was paused (you pay for storage only). Then I connected and the instance was resumed (18:19) to its maximum capacity unit (4 here). And then scaled down to the minimum capacity unit (2 here) 5 minutes later (18:27). I’ve run some short activity (pgbench) and the instance scaled-up to the maximum capacity unit (18:45) and went down to zero (paused) after 5 minutes (18:50). Finally, I started some heavy load and the instance resumed to maximum capacity (18:57).

Here is how I’ve created this database:

  • Service: Amazon RDS
  • Engine: Amazon Aurora
  • Edition: Amazon Aurora with PostgreSQL compatibility
  • Capacity type: Serverless
  • Version: Aurora PostgreSQL (compatible with PostgreSQL 10.7)

You see the first limit here: PostgreSQL 10.7 is the only version available for serverless.
About the connectivity, you cannot have public access: the endpoint, which is actually a proxy, is in the VPC

Then you define the minimum and maximum instance size the auto-scaling can play with. The values are in ACU – Amazon Capacity Unit – for which the console displays the memory only. You can go from 2 ACU (4GB RAM) to 384 ACU (768GB RAM) for the PostgreSQL flavor. It goes from 1 ACU (2GB) to 64 ACU (122GB) for the MySQL flavor.

You can imagine the instance that is behind from the equivalent “provisionned” instance size: 4GB RAM is probably a db.t3.medium with 2VCPU burstable, and 768 GB RAM a db.r5.24xlarge with 96 vCPU.

Stopping completely the instance is an option here: “Pause compute capacity after consecutive minutes of inactivity” where you can define the inactive time triggering the pause with a minimum of 5 minutes (the GUI allows less, but that doesn’t work).

Here I’ve waited for 5 minutes before running the following:


# PGPASSWORD=postgres pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -i -s 8 postgres | ts


the “ts” pipe adds the timestamp to each outout line:

Sep 20 16:56:31 + PGPASSWORD=postgres
Sep 20 16:56:31 + pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -i -s 8 postgres
Sep 20 16:56:59 creating tables...
Sep 20 16:56:59 10000 tuples done.
Sep 20 16:56:59 20000 tuples done.
Sep 20 16:56:59 30000 tuples done.
Sep 20 16:56:59 40000 tuples done.
Sep 20 16:56:59 50000 tuples done.
Sep 20 16:56:59 60000 tuples done.
Sep 20 16:56:59 70000 tuples done.
...
Sep 20 16:57:02 790000 tuples done.
Sep 20 16:57:02 800000 tuples done.
Sep 20 16:57:02 set primary key...
Sep 20 16:57:03 vacuum...done.

You can see the latency to start a stopped instance here: 28 seconds from the connection to the endpoint (a proxy that is still listening when the database is paused) to the first command processed. Of course, your application and network timeouts must be able to wait if you are using the auto-scaling with pause mode (0 ACU). But the advantage is that you don’t pay for any compute instance when it is paused. That’s perfect for databases that are not used often and where you may accept one minute latency for the first connections.

The other important thing to know with serverless Aurora: you should be ready to accept some level of unpredictable performance. The auto-scaling algorithm is based on the CPU usage, memory and number of connections and decide to open your database from different compute instances.

I have run the following to run pgbench for one minute, with a one minute pause in between, increasing the number of concurrent connections each time:


{
set -x
date
PGPASSWORD=postgres pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -i -s 8 postgres
for i in 1 2 3 4 5 6 7 8 9 10
do
 sleep 60
 PGPASSWORD=postgres pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -c $i -T 60  postgres
done
} 2>&1 | ts


This is what you see in the CloudWatch ramp-up around 19:00 (which is 17:00 UTC):
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 2048w" sizes="(max-width: 2560px) 100vw, 2560px" />
As you see there was a point where the capacity was not at its maximum despites the homogeneous pattern of 1 minute of inactivity between each 1 minute run. And that shows-up as a drop in number of connections. Here is my pgbench results around that time:


Sep 20 17:09:04 + sleep 60                                                                                      [25/23724]
Sep 20 17:10:04 + PGPASSWORD=postgres
Sep 20 17:10:04 + pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -c 7 -T 60 po
stgres
Sep 20 17:10:04 starting vacuum...end.
Sep 20 17:11:04 transaction type: TPC-B (sort of)
Sep 20 17:11:04 scaling factor: 8
Sep 20 17:11:04 query mode: simple
Sep 20 17:11:04 number of clients: 7
Sep 20 17:11:04 number of threads: 1
Sep 20 17:11:04 duration: 60 s
Sep 20 17:11:04 number of transactions actually processed: 32205
Sep 20 17:11:04 tps = 536.501877 (including connections establishing)
Sep 20 17:11:04 tps = 537.634705 (excluding connections establishing)
Sep 20 17:11:04 + for i in 1 2 3 4 5 6 7 8 9 10
Sep 20 17:11:04 + sleep 60
Sep 20 17:12:04 + PGPASSWORD=postgres
Sep 20 17:12:04 + pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -c 8 -T 60 po
stgres
Sep 20 17:12:05 starting vacuum...end.
Sep 20 17:13:05 transaction type: TPC-B (sort of)
Sep 20 17:13:05 scaling factor: 8
Sep 20 17:13:05 query mode: simple
Sep 20 17:13:05 number of clients: 8
Sep 20 17:13:05 number of threads: 1
Sep 20 17:13:05 duration: 60 s
Sep 20 17:13:05 number of transactions actually processed: 22349
Sep 20 17:13:05 tps = 372.281817 (including connections establishing)
Sep 20 17:13:05 tps = 373.361168 (excluding connections establishing)
Sep 20 17:13:05 + for i in 1 2 3 4 5 6 7 8 9 10
Sep 20 17:13:05 + sleep 60
Sep 20 17:14:05 + PGPASSWORD=postgres
Sep 20 17:14:05 + pgbench -h database-1.cluster-czdvjquf08hs.us-west-2.rds.amazonaws.com -p 5432 -U postgres -c 9 -T 60 p$
stgres
Sep 20 17:14:05 starting vacuum...end.
Sep 20 17:15:05 transaction type: TPC-B (sort of)
Sep 20 17:15:05 scaling factor: 8
Sep 20 17:15:05 query mode: simple
Sep 20 17:15:05 number of clients: 9
Sep 20 17:15:05 number of threads: 1
Sep 20 17:15:05 duration: 60 s
Sep 20 17:15:05 number of transactions actually processed: 24391
Sep 20 17:15:05 tps = 406.361680 (including connections establishing)
Sep 20 17:15:05 tps = 407.628903 (excluding connections establishing)
Sep 20 17:15:05 + for i in 1 2 3 4 5 6 7 8 9 10
Sep 20 17:15:05 + sleep 60


Rather than increasing the number of transactions per seconds, as you expect with more threads, it has decreased for this run.
Here is a zoom on the capacity unit auto-scaling:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 1024w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/Screens... 2048w" sizes="(max-width: 2560px) 100vw, 2560px" />

How does it work? The endpoint you connect to, on port 5432, is actually a proxy which can redirect you to the read-write instance. If no instance is up (the database is paused) it will start one. Aurora storage is shared, even across AZs. If the load changes, it may switch to another instance with a different compute size. In Aurora Serverless, scaling means running on a compute instance with different vCPU and RAM.

In my blog post about serverless databases I compared Amazon with Oracle cloud. This Aurora Serverless runs very differently than the Oracle Autonomous Database for which the compute shape is soft limits only: the database runs as a container (Pluggable Database) within a shared instance (Container Database). And it goes even further: auto-scaling pushes this soft limit higher, to the maximum capacity, and the monitoring feature measures the actual usage. You are billed for the minimum capacity plus this additional usage measured. Amazon Aurora Serverless scales on hard limits only (the instance size) but has the advantage to stop completely all compute resource when you chose the “pause” option.

Of course, limitations, versions, features will probably evolve. But the idea is simple. As with many services, when you want predictable performance you choose “provisioned” services where the resources are always up and available. The non-serverless Aurora is the “provisioned” one here. When you want to reduce the cost, you must accept that the resources are used by others when you don’t need them. It is cheaper but may introduce some latency with you need them again. Cloud is elastic: you choose.

Cet article Amazon Aurora Serverless (PostgreSQL compatibility) est apparu en premier sur Blog dbi services.

Do you know where your installation media is?

This was inspired by a Twitter comment and subsequent DMs, but I’m not going to name names. You know who you are… </p />
</p></div>

    	  	<div class=

Amazon or AWS services?

By Franck Pachot

.
When I’m writing about a product I like to be precise about the name, the upper and lower case, and even more: do you know that was taking special care of writing Oracle 12cR2 before then non-italic came with 18c? And that’s also the reason I’m not writing a lot about VMware as it takes me 5 minutes to put the uppercase right </p />
</p></div>

    	  	<div class=

What is Object Storage?

By Franck Pachot

.
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/09/IMG_201... 600w" sizes="(max-width: 300px) 100vw, 300px" />
I’ve always been working with databases. Before the cloud era, the most abstract term was “data”. A variable in memory is data. A file is data. A block of disk contains data. We often created a ‘/data’ directory to put everything that is not binaries and configuration files. I’ll always remember when I did that while working in Dakar. My colleagues were laughing for minutes – my Senegalese followers will understand why. “Data”, like “information” is abstract (which is a reason why it is not plural). It makes sense only when you associate it with a container: data bank, database, datafile, datastore… In database infrastructure, we store data in files or block storage where we read and write by pages: read one or many continuous blocks, bring them in memory, update them in memory, write those blocks back. And to export and import data outside of the database, we store them in files, within filesystem that can be local or remote (like NFS). But it is basically the same: you open a file, you seek to the right offset, you read or write, you synchronize, you keep the file opened until you don’t need to work on it anymore, and then you close it. This API is so convenient, that finally in Linux everything is a file: you write to the network with file descriptors, you access the block devices with /dev ones, you output to the screen with stderr and stdout,…

And then came the cloud which maps most of the artifacts we have in the data center: virtual network, virtual machines, block storage, network file systems,… but then came a new invention: the Object Storage. What is an object? Well.. it is data… Then, what is new? It has some metadata associated with it… But that’s what a filesystem provides then? Well.. here we have no hierarchy, no directories, but more metadata, like tags. What? Everything in the cloud without folders and directories? No, we have buckets… But without hierarchy, how do you avoid name collision? No problem, each object has a UUID identifier.

My hope is that you find this blog post when you “google” to know what is object storage. It is quite common that “experts” answer a quick “did you google for it?” in forums without realizing that actually, what makes you an expert is not what you know but how accurate you can be when “googling” for it.

If you already know what is an Object Storage, you will probably find more information on each cloud provider documentation. But what if you don’t know at all. I’m writing this blog post because a friend without cloud knowledge was trying to understand what is Object Storage.

He is an Oracle DBA and came to this page about Oracle Cloud Object Storage:
https://docs.cloud.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm

Overview of Object Storage
Oracle Cloud Infrastructure offers two distinct storage class tiers to address the need for both performant, frequently accessed “hot” storage, and less frequently accessed “cold” storage. Storage tiers help you maximize performance where appropriate and minimize costs where possible.
Use Object Storage for data to which you need fast, immediate, and frequent access. Data accessibility and performance justifies a higher price to store data in the Object Storage tier.

And you know what a DBA thinks where he reads “performant, frequently accessed hot storage”, and “fast, immediate, and frequent access”? This is for a database. In the whole page, there are some use case described. But no mention of “read” and “write”. No mention of “random” or “sequential” access. Nothing that explicitly tells you that the Object Store is not where you want to put your database datafiles. It mentions some use-cases that seem very related with databases (Big Data Support, Backup, Repository, Large Datasets,…) and it mentions features that are very related to databases (consistency, durability, metadata, encryption).

Basically, if you already know what it is, you have a nice description. But if you are new to the cloud and try to match this service with something you know, then you are completely misled. Especially if you didn’t read about Block Storage before.

Are all cloud providers doing the same mistake? Here is AWS definition:
https://aws.amazon.com/what-is-cloud-object-storage/
Cloud object storage makes it possible to store practically limitless amounts of data in its native format

The Amazon Simple Storage Service (Amazon S3) has also no mention of read/write workloads and random/sequential access. The features are Durability, Availability, & Scalability. Nothing is telling you that it is not designed for database files.

Google Cloud may be more clear:
https://cloud.google.com/storage
Object storage for companies of all sizes. Store any amount of data. Retrieve it as often as you’d like. Good for “hot” data that’s accessed frequently, including websites, streaming videos, and mobile apps.
The “Store” and “Retreive as often” gives the idea of write once and read many, as a whole. This is not for databases. But again “accessed frequently” should mention “read” workload.

Microsoft is known to listen and talk to their users. Look at Azure name and definition for this:
https://azure.microsoft.com/en-us/services/storage/
Blob storage: Massively scalable and secure object storage for cloud-native workloads, archives, data lakes, high-performance computing, and machine learning

Yes, that rings a bell or two for a database person. BLOB is exactly what we use in databases to store what a cloud practitioner stores in an Object Storage. Here is the “O” for “Object”, the same as in “Binary Large Object”. You do not store database files in an Object Storage. Databases need block volumes and you have block storage services. You don’t store a hierarchy of folders and files in an Object Storage. File servers need protocols providing shared access and a structure of filesystem trees. You store everything else in an Object Storage. Think of it like writing to tape but reading like if the tape was transformed to SSD. In this store you put files and can get them efficiently. I’m talking about “store”, “put” and “get” like in a document database. But documents can be terabytes. And you can read those files with a database, as if it were a BLOB, like Oracle Autonomous Datawarehouse reading ORC, Parquet, or Avro. Or Amazon Athena running SQL queries on S3 files.

I hope what is an Object Storage is more clear for you, especially if you are in databases. And also remember that what is easy to google for you may be impossible to find for someone else. You need concepts and Cloud Practitioner certifications are really good for that.

Cet article What is Object Storage? est apparu en premier sur Blog dbi services.

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat)

Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality […]

Amazon DynamoDB: a r(el)ational Glossary

By Franck Pachot

.
There are many NoSQL databases. And, because SQL is an ISO standard, “No SQL” also means “No Standard”. Many have a similar API and similar objects, but with completely different names. Today, NoSQL databases are used as an additional datastore for some well-defined use cases for which a hashed key-value store fits better than a relational table. And it quickly became “Not Only SQL” as it is complementary to RDBMS databases using SQL. But at the origin, the idea was to replace the RDBMS databases, refusing the SQL API, and then inventing a “No SQL” data store. When you want to replace something rather than proposing something new, you often adopt the same language to make it look similar. And this why, in my opinion we find some relational database terms like “Table” and “Index”. But they have a different meaning. Here is a dictionary where I try to explain the DynamoDB artifacts and differentiate from their Relational and SQL meaning.

Attribute

This is taken from the Relational semantic. An “Attribute” (a “Column” in SQL) is an element of a “Relation” (a “Table” in SQL) for which a “Tuple” (a “Row” in SQL) can have a value. Basically, when you store a list of your friends, they have a first name, a last name, a birthdate,… and many other properties. “First Name”, “Last Name” are attributes of the relation “Friend”. The combination of those properties is a fact like a sentence saying “I have a friend whose first name is Donald and last name is Chamberlain and was born on December 21, 1944”. In DynamoDB an attribute is a JSON member. The previous tuple could be written as: {“first name”:”Donald”, “last name”: “Chamberlain”, birthdate: “1944-12-21”}.

But there are many differences. A relational attribute has a “Domain” (a “Data Type” in SQL) which defines the set of possible values. A “First Name” or “Last Name” is a combination of characters within a characterset, with a specific maximum length (because the application that manipulates it have some memory buffers, screen width, paper length,… physical limits). A “Birthdate” is a day within a specific calendar,… They are constrained so that the developer that reads an attribute knows that she will not encounter any edge case at runtime. If the domain says that the first name holds maximum 30 of [A-Za-z-‘] characters, then she knows that she will not buffer overflow when she allocates 30 bytes (or 60 in UTF-16, or 120 in UTF-8), and that she doesn’t have to code some additional assertions in her code to be sure there’s no non-alphanumeric characters there. In JSON you can encounter any character string in a value. And DynamoDB attributes have only a few data type constraints: Null, Number, String, Boolean, Binary. And they are defined per attribute value as there is no schema. The format of a date is just a character string with a convention. Nothing will tell you that “1752-09-13” is not a valid date.

If you come from Oracle, you can see a DynamoDB attribute like the ANYDATA data type:

DEMO@atp> create table demo (att anydata);
Table DEMO created.
DEMO@atp> insert into demo values (anydata.convertNumber(42));
1 row inserted.
DEMO@atp> insert into demo values (anydata.convertVarchar2(q'{Don't panic}'));
1 row inserted.
DEMO@atp> select * from demo;
                                                                            ATT
_______________________________________________________________________________
OPAQUE(ANYDATA TypeCode: "TYPECODE_NUMBER" - ANYDATA Value: "42")
OPAQUE(ANYDATA TypeCode: "TYPECODE_VARCHAR2" - ANYDATA Value: "Don't panic")

Another difference is that a relational attribute should be scalar (holds only one value). DynamoDB can store a whole hierarchy in a JSON attribute. The 1st Normal Form says that the value must be atomic even if Chris Date accepts nested tables in 1NF. And SQL allows this. Actually, it is not NoSQL vs. SQL here. Usually, in SQL you store hierarchies into another table but can store it as a hierarchy with nested tables, XML, or JSON. Usually, in NoSQL document database you store a whole hierarchy as one value but the “single-table” design splits them into multiple items of a collection.

If you compare with MongoDB, a DynamoDB attribute is similar to a MongoDB field.

Item

Even if an Item in DynamoDB looks like a relational “Tuple” (or “Row” in SQL) there’s no ambiguity. We are in a key-value store and the item is the value. We are in a key-value store and the item is the value. This would be stored in a SQL BLOB (Binary Large Object) in a relational database. As JSON, it could be the OSON datatype in Oracle or JSONB in PostgreSQL. It is a column, not a row.

The equivalent of a DynamoDB item would be a BLOB storing JSON in a table with a key and this BLOB only. This is, for example, how SODA, the NoSQL API for Oracle RDBMS, stores a collection:


DEMO@atp1_tp> soda create MyDocumentStore
Successfully created collection: MyDocumentStore

DEMO@atp1_tp> ddl "MyDocumentStore"

  CREATE TABLE "DEMO"."MyDocumentStore"
   (    "ID" VARCHAR2(255) COLLATE "USING_NLS_COMP" NOT NULL ENABLE,
        "CREATED_ON" TIMESTAMP (6) DEFAULT sys_extract_utc(SYSTIMESTAMP) NOT NULL ENABLE,
        "LAST_MODIFIED" TIMESTAMP (6) DEFAULT sys_extract_utc(SYSTIMESTAMP) NOT NULL ENABLE,
        "VERSION" VARCHAR2(255) COLLATE "USING_NLS_COMP" NOT NULL ENABLE,
        "JSON_DOCUMENT" BLOB,
         CHECK ("JSON_DOCUMENT" is json format oson (size limit 32m)) ENABLE,
         PRIMARY KEY ("ID") 
   ) ;

Here JSON_DOCUMENT is in format OSON, the Oracle binary implementation for JSON, stored in a BLOB which can be any size. The “soda” and “ddl” are standard SQLcl commands.

If you compare with MongoDB, a DynamoDB item is similar to a MongoDB document except that it is a JSON string value where MongoDB can store larger documents (up to 16MB) in binary format (BSON). But we will see that the idea is to split a document to multiple items in DynamoDB, with a key starting with the same prefix to be hashed and range partitioned together.

Collection

This is a NoSQL term. A “Collection” is a set of “Item”. But in DynamoDB this has a specific meaning and the SQL equivalent would be a table “Partition”. When you HASH partition an RDBMS table, you do the same as a DynamoDB collection: you apply a hash function on the partition key value which determines the physical segment of the table where you want to store it. This is how you scale any database, NoSQL or RDBMS: partition by hash and distribute those partition over the storage, processes, and maybe nodes.

A DynamoDB collection is not the same as a MongoDB collection. It is more like a chunk. What is called a collection in MongoDB is called a table in DynamoDB.

Table

This is where using the same name in DynamoDB as in SQL database can mislead to a bad data model. A “Relation” (“Table” in SQL) is a set of tuples (“rows”) that belongs to the same business entity. In DynamoDB you can store anything in the same table as it is just a key-value store. For example, you store the customer information with their orders within the same table and within the same collection. And that’s the goal: the data model is optimized for one use-case and if this use case is to display one customer orders, then you want to retrieve all that with one ‘get’. A SQL table has the opposite goal: centralize the data for many use-cases. You have one table with your customers. And one table with your orders. And the CRM can query the customers only. The Delivery will query only the orders. The Marketing wants to get some customer information for specific orders. The billing wants to get all orders for a customer…

Again, it is not NoSQL vs. SQL and you can have some nuances of both. In a RDBMS when you want to pre-build a hierarchy to optimize a specific use-case, you get the RDBMS to maintain transparently a redundant physical store like an index or a materialized view. And in DynamoDB the “single-table” design splits a document into multiple items to get it accessed from multiple use cases.

Actually, a Table in DynamoDB could be seen as a Materialized View in SQL rather than as a table. It is pre-built by the application and not by the RDBMS but it is like a view (purpose-built for specific queries) and materialized (the physical colocation of data is there by design). The worst mistake a developer can do is consider it as a SQL table and store related items in multiple DynamoDB tables. This would be bad for performance and cost. What you store in a DynamoDB table is probably at the same scope as what you store in a SQL schema for a small application.

Index

An index in RDBMS is a physical structure that provides fast access to a small number of rows because it is ordered on a value. That’s the same idea in DynamoDB: access with another attribute than the key. Another similarity is the fact that the index is transparently maintained. The developer declares the index and doesn’t have to take care of the maintenance in his code. Of course, in NoSQL the maintenance of the index can be in eventual consistency but it is still automatically maintained. However, there is a huge difference for your coding effort. In RDBMS you never query the index. The developer doesn’t have to know the physical design. An index is a physical structure created for performance reasons. SQL provides a logical view on the database: you query the table and the RDBMS optimizer decides how it is the most efficient get the rows required for the result. In DynamoDB you see the indexes as a table: you need to “get” on them if you want to use them.

In NoSQL, you have no optimizer. You do it in the code. If you add a new index for fast access, you need to change the code to use it. In RDBMS you have the agility to add an index (even online without application interruption in some RDBMS) and queries will use it when it is appropriate. It is all about having a simple API with calls to embed in your procedural code or calling a database service that does the procedural access for you.

In DynamoDB you have local indexes and global indexes. That’s the same idea as in RDBMS where in index can be local, partitioned in the same way as the table, or global, partitioned on another value. And you find the same optimization ideas. I have a few blog posts related to covering indexes in DynamoDB.

Primary Key

In DynamoDB as well as in SQL a Primary Key is a unique identifier for an item or row. It is enforced to be unique: If you put an item that already exists, it will replace it (the default in DynamoDB and can do the same in SQL with the MERGE statement) or reject it (the default in SQL and you can do the same in DynamoDB with an attribute_not_exists condition). RDBMS usually uses the primary key for the physical organization as well. Oracle is very agile on that and only specific table organization (like IOT) requires that the physical order follows the primary key. Other RDBMS (like MySQL InnoDB) will store the tables clustered on the primary key. In DynamoDB, like in any key-value store, the primary key determines the physical access through physical structures. A DynamoDB is hash partitioned on the first column of the primary key. And may be sorted on a second attribute. If you want to hash on multiple attributes, your code will concatenate the values to be the first attribute. Again, NoSQL provides a simple API and the developer does the additional logic in the procedural code. In RDBMS, a primary key is a declared constraint that may, or may not, be related to the physical organization. And can be compound with multiple columns. RDBMS can also have additional keys for which the uniqueness is enforced and for which the referential integrity can be validated.

In summary

This glossary is aimed at people working in SQL and NoSQL to avoid confusion between the different meanings of the same terms used in RDBMS and DynamoDB. The most important you should remember is that a Table in DynamoDB is different than a Table in SQL. Yon don’t need to separate to different tables because of the size: tables are automatically partitioned. You don’t need to separate to different tables because of the structure: all are key-value. And, you will quickly have a generic key name where you put different entities and the key value being build to control the clustering of data. Think of a DynamoDB table as a key-value store. You can have multiple stores to isolate multiple applications with different performance requirement (and cost). But an application should store its data into one store only. I’m not giving examples here because you can follow the two links on “single-table” which will bring you to the excellent examples from Rick Houlian and Alex DeBrie. I wrote this blog post as an expansion of my following tweet:

Cet article Amazon DynamoDB: a r(el)ational Glossary est apparu en premier sur Blog dbi services.