Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

WPBlockTalk: A Free Online Event Focused on the Block Editor

Ready to explore the possibilities with the block editor? WPBlockTalk is a free and live virtual event that will bring together designers, developers, and other WordPress enthusiasts from across the WordPress community.

Topics to expect:

  • Building the block editor: what it takes to develop the block editor, what features are on the roadmap, and how you can contribute
  • Developing blocks: inspiration and ideas for developing your own custom blocks
  • Designing with blocks: learn more about using blocks to make powerful and versatile layouts and templates

If you’re passionate and curious about the future of WordPress, then this April 2 event is for you!

If you’re busy that day, don’t worry — all the talks will also be published on WordPress.tv for you to watch (and re-watch) whenever you like.

In the meantime, join the WPBlockTalk email list for registration details, speaker and schedule updates, and more. We look forward to seeing you online!

USING bug

The Oracle Developer Community forum often sees SQL that is hard to read – sometimes because it’s a brutal tangle of subqueries, sometimes because the format it bad, sometimes because the use of table and column aliases is poorly done. One particular case of the last weakness is the code where the same table alias (typically the letter A) is used a dozen times in the course of the query.

I’ve said that every table in a query should have a different alias and the alias should be used at every column usage in the query (the note at this URL includes a couple of refinements). I’ve just discovered another reason why this is a good idea and why you shouldn’t use the same alias twice in a query. Here’s a simplified demonstration of the threat – tested on 19.3.0.0:


rem     Script:         using_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t2
as
select  rownum id, object_id, object_name, object_type, rpad('x',100) padding 
from    all_objects 
where   rownum <= 5000 
and     mod(object_id,2) = 1
/

create table t1
as
select  rownum id, object_id, object_name, object_type, rpad('x',100) padding 
from    all_objects 
where   rownum <= 5000 
and     mod(object_id,2) = 0
/

I’ve created two tables from the view all_objects, one of the tables holds rows where the object_id is even, the other where it is odd, so if I join these two tables on object_id the result set will be empty. So here are three queries that join the two tables – with the little twist that I’ve (accidentally) given both tables the same alias X in all three cases:


prompt  =======================================
prompt  Here's a query that might "confuse" the
prompt  optimizer when we try to explain it
prompt  =======================================

explain plan for
select max(object_name) from t1 X join t2 X using (object_id);

prompt  ==================================
prompt  So might this one, but it doesn't.
prompt  ==================================

explain plan for
select max(object_id) from t1 X join t2 X using (object_id);
select * from table(dbms_xplan.display);

prompt  ===================================================
prompt  With this one A-rows matches E-rows: and it's NOT 0
prompt  ===================================================

alter session set statistics_level = all;

set serveroutput off
set linesize 156

select count(*) from t1 X join t2 X using (object_id);
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));


In the absence of the explicit aliases the first query should produce an execution plan; but when both tables are given the same alias the attempt to explain (or run) the query produced the error “ORA-00918: column ambiguously defined”.

The second query does better – or worse, depending on your point of view. Nominally the join is perfectly valid and the optimizer produces an execution plan for the query. But the plan predicts a Cartesian merge join with a result set of 25M rows – which doesn’t look like a good estimate – and the plan doesn’t have a Predicate Information section.

So we use a count(*) for the third query – just in case the result set is, somehow, 25M rows – and enable rowsource execution statistics, and acquire the plan from memory after running the query (which takes nearly 14 seconds of hammering the CPU to death). And here’s the output:


  COUNT(*)
----------
  25000000

1 row selected.


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  85ygrcg4n3ymz, child number 0
-------------------------------------
select count(*) from t1 X join t2 X using (object_id)

Plan hash value: 4259280259

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |      1 |        | 67358 (100)|      1 |00:00:13.38 |     200 |    198 |       |       |          |
|   1 |  SORT AGGREGATE       |      |      1 |      1 |            |      1 |00:00:13.38 |     200 |    198 |       |       |          |
|   2 |   MERGE JOIN CARTESIAN|      |      1 |     25M| 67358   (5)|     25M|00:00:10.55 |     200 |    198 |       |       |          |
|   3 |    TABLE ACCESS FULL  | T2   |      1 |   5000 |    15   (7)|   5000 |00:00:00.01 |     100 |     99 |       |       |          |
|   4 |    BUFFER SORT        |      |   5000 |   5000 | 67343   (5)|     25M|00:00:04.54 |     100 |     99 |   133K|   133K|  118K (0)|
|   5 |     TABLE ACCESS FULL | T1   |      1 |   5000 |    13   (0)|   5000 |00:00:00.01 |     100 |     99 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------

Yes, instead of zero rows Oracle managed to produce 25M rows. The execution plan is a Cartesian merge join, and that’s because the optimizer has lost the join predicate (I didn’t supply a Predicate Information section because there wasn’t one – note the absence of any asterisks against any operations in the plan).

Interestingly the Query Block / Alias section of the plan (when I called for it) reported the two aliases as X_0001 and X_0002, so internally Oracle did manage to find two different aliases – but too late, presumably.

Conclusion

Give a little thought to using table aliases sensibly and safely. It’s trivial to fix this example, but some of the messy SQL that goes into production might end up running into the same issue without it being so easy to spot the source of the anomaly.

Footnote

This is Bug 25342699 : WRONG RESULTS WITH ANSI JOIN USING AND IDENTICAL TABLE ALIASES reported Jan 2017 against 12.1.0.2, not yet fixed.

 

Working from home… Welcome!

Let me start by saying that by no means am I a working from home expert; I do not have a degree in ergonomics or human resources or time management. I am just your normal IT worker that happens to have been working from home for a number of years now.

If you have just had that email from your manager or your Human Resources Department that says “Guess what! Time to grab a laptop and set up shop from home”, I thought I would share a couple of my experiences with you to ease the transition, and who knows? Maybe this will be the launch pad for you to pursue a remote working position full time with your current employer (or even a new one Smile)

Firstly, when you start working from home no matter how much preparation you put into it, expect that things will feel different and perhaps even isolating for a little while. Yes, there will be video conference meetings perhaps and chat channels such a Slack/Teams/etc, but mentally it’s very easy initially to get sucked into a place where you think “I am missing all the water cooler discussions”, or be longing for the simple act of wandering around the office, bumping into people and giving them a cursory nod to acknowledge the fact that both you and they exist!

But be patient – this passes very quickly as you work out how to manage your day. Hence let me talk about what I find are the benefits of working from home.

1) Your time is now your own. Yes there will be meetings at particular schedules that you need to attend, but by and large, what you now have is the ability to reshuffle your time on a day by day basis, and any commuting time you used to endure is now your time. If your commute used to be a 45 minute commute, you could choose to spend that extra 90 minutes working extra on a given day, knowing that you can then get that 90 minutes back on a different day, and still be meeting the requirements of your workplace.

Similarly, you may opt to do small chunks of work on the weekend or other times when you would normally not work in order to gain free time elsewhere. I stress I am not suggesting you work flat out 7 days a week, but conversely I often find it very beneficial to do 5 days of work across 7 days if a particular week suits that need, and similarly sometimes put in a “bender” and smash out 5 days of work 4 days in order to gain some more free time for family events and the like. The key thing is – you now have more flexibility.

2) When working from home is it is easier to find opportunities to take those important mental breaks that we often skip in the workplace. In the workplace it is easy to get mentally bogged down at your desk because there is no obvious outlet besides just drinking more coffee! When working from home and you feel that need for a mental break, then yes, it is also easy to just grab a coffee, but you could also throw a load of washing on, or grab some food at the shops, or go for a walk outside. The opportunities for taking those five or 10 minute breaks are far greater, which I think keeps you more fresh than if you were at the office just guzzling down more and more caffeine, because often the kitchen area is the only respite from your cubicle.

3) You get more flexibility with gear. One of the big benefits of a home office is that typically you are not limited to the equipment that your workplace would provide. In terms of physical comfort don’t be afraid to treat yourself a little. My aging eyes are very pleased with the enormous monitor I have at home; something that would not be available in the workplace. Similarly I recently bought a high-end mouse on a whim, thinking that …well … after all a mouse is just a mouse. But I was pleasantly surprised with the productivity I have gained with it, especially with the fiddly precision needed when editing videos. I’m not commission Smile but for anyone interested it is a Logitech MX master 3 and I highly recommend it.

Having listed the benefits, let me delve into perhaps some “rebuttal” of stuff I’ve read on the internet. Don’t forget – this is what works for me – your mileage may vary.

I see a lot of blog posts about working from home requiring a dedicated space that is away from the day to day living space. I concede that this is probably the best option for most people but don’t feel forced to do that. The most important thing is that you have a space that you feel very comfortable in. That might be physical comfort in terms of having a really good chair or a really good desk setup as I mentioned, but also it may just feel about your mental comfort. I work within an open space such that I easily have interaction with my children even if I am working. They are at the age now where they can easily interpret from my mannerisms whether I am flat out busy and perhaps should not be interrupted, versus when I am easily interruptible and it will not cause a fuss. So I prefer being embedded in the norms of the household rather than being separated from it, because the reality is when you are working from home, sometimes you can feel like you are missing out on time spent with family. To each their own.

In these times of virus outbreak, there has been a flurry of posts recently about the issues of having interruptions from children and family at home. Obviously this is very dependent on the age of your children – mine have just become teenagers but I believe it can be a beneficial thing especially in the world of IT. Let’s face it, often the rest of your family really doesn’t have any idea what your job really entails, so letting them into that part of your life, especially if you are passionate about it, is a good thing in my opinion. Surely it is always beneficial to let your family, children and partner see what you are passionate about even if there will always be the occasional throw away comment from them about why on earth you could be passionate about such things Smile

I sit on the fence in terms of the importance of communication mediums such as Slack or Teams etc. Don’t get me wrong – they are very useful and I utilise them regularly in my job, but the reason I say you should not put too much importance on them is that I work in a totally different time zone to most of my team, and most of Oracle. Yet I have not found that minimal chat interaction is a big problem on my day. However I also know that should I need to reach out to someone other as part of a work or personal communication then the onus is on me to either wake up a little earlier or stay up a little later at night in order to sync up the time zones. Your mileage may vary – if chat lines make you feel more comfortable about your remote location, then by all means dive in hard.

Anyway, that’s my brief thoughts on working from home. If you are new to this – Welcome to the club! As always if you have any questions, thoughts, ideas or are looking for guidance, then I can’t claim to have all the answers but I’ll happily help where I can.

DynamoDB: adding a Local covering index to reduce the cost

By Franck Pachot

.
This is a continuation on the previous post on DynamoDB: adding a Global Covering Index to reduce the cost. I have a DynamoDB partitioned on “MyKeyPart”,”MyKeySort” and I have many queries that retrieve a small “MyIndo001” attribute. And less frequent ones needing the large “MyData001” attribute. I have created a Global Secondary Index (GSI) that covers the same key and this small attribute. Now, because the index is prefixed by the partition key, I can create a Local Secondary Index (LSI) to do the same. But there are many limitations. The first one is that I cannot add a local index afterwards. I need to define it at the table creation.

Drop table

Here I am in a lab so that I can drop and re-create the table. In real live you may have to create a new one, copy the items, synchronize it (DynamoDB Stream),…


aws dynamodb delete-table --table-name Demo
while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "DELETING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

-------------------------------------------------------------------------------
|                                 DeleteTable                                 |
+-----------------------------------------------------------------------------+
||                             TableDescription                              ||
|+----------------+----------------------------------------------------------+|
||  ItemCount     |  0                                                       ||
||  TableArn      |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo   ||
||  TableId       |  d8238050-556c-4158-85e8-bf90d483d9e2                    ||
||  TableName     |  Demo                                                    ||
||  TableSizeBytes|  0                                                       ||
||  TableStatus   |  DELETING                                                ||
|+----------------+----------------------------------------------------------+|
|||                          ProvisionedThroughput                          |||
||+-----------------------------------------------------------+-------------+||
|||  NumberOfDecreasesToday                                   |  0          |||
|||  ReadCapacityUnits                                        |  25         |||
|||  WriteCapacityUnits                                       |  25         |||
||+-----------------------------------------------------------+-------------+||
..        "TableStatus": "DELETING",

Re-create table

I use the same create-table where I add the definition for a “DemoLSI” local index that includes “MyInfo001”:


aws dynamodb create-table \
 --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --local-secondary-indexes ' [{
  "IndexName": "DemoLSI",
  "KeySchema":[
   {"AttributeName":"MyKeyPart","KeyType":"HASH"},
   {"AttributeName":"MyKeySort","KeyType":"RANGE"}
  ],
  "Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["MyInfo001"]}
  }] ' \
 --table-name Demo

while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "CREATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

Here is the output:


-----------------------------------------------------------------------------------------------
|                                         CreateTable                                         |
+---------------------------------------------------------------------------------------------+
||                                     TableDescription                                      ||
|+----------------------+--------------------------------------------------------------------+|
||  CreationDateTime    |  1584347785.29                                                     ||
||  ItemCount           |  0                                                                 ||
||  TableArn            |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo             ||
||  TableId             |  ebc9488a-fc5f-4022-b947-72b733784a6a                              ||
||  TableName           |  Demo                                                              ||
||  TableSizeBytes      |  0                                                                 ||
||  TableStatus         |  CREATING                                                          ||
|+----------------------+--------------------------------------------------------------------+|
|||                                  AttributeDefinitions                                   |||
||+-------------------------------------------+---------------------------------------------+||
|||               AttributeName               |                AttributeType                |||
||+-------------------------------------------+---------------------------------------------+||
|||  MyKeyPart                                |  N                                          |||
|||  MyKeySort                                |  N                                          |||
||+-------------------------------------------+---------------------------------------------+||
|||                                        KeySchema                                        |||
||+-----------------------------------------------------+-----------------------------------+||
|||                    AttributeName                    |              KeyType              |||
||+-----------------------------------------------------+-----------------------------------+||
|||  MyKeyPart                                          |  HASH                             |||
|||  MyKeySort                                          |  RANGE                            |||
||+-----------------------------------------------------+-----------------------------------+||
|||                                  LocalSecondaryIndexes                                  |||
||+----------------+------------------------------------------------------------------------+||
|||  IndexArn      |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo/index/DemoLSI   |||
|||  IndexName     |  DemoLSI                                                               |||
|||  IndexSizeBytes|  0                                                                     |||
|||  ItemCount     |  0                                                                     |||
||+----------------+------------------------------------------------------------------------+||
||||                                       KeySchema                                       ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                    AttributeName                   |             KeyType              ||||
|||+----------------------------------------------------+----------------------------------+|||
||||  MyKeyPart                                         |  HASH                            ||||
||||  MyKeySort                                         |  RANGE                           ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                                      Projection                                       ||||
|||+-----------------------------------------------------+---------------------------------+|||
||||  ProjectionType                                     |  INCLUDE                        ||||
|||+-----------------------------------------------------+---------------------------------+|||
|||||                                  NonKeyAttributes                                   |||||
||||+-------------------------------------------------------------------------------------+||||
|||||  MyInfo001                                                                          |||||
||||+-------------------------------------------------------------------------------------+||||
|||                                  ProvisionedThroughput                                  |||
||+------------------------------------------------------------------------+----------------+||
|||  NumberOfDecreasesToday                                                |  0             |||
|||  ReadCapacityUnits                                                     |  25            |||
|||  WriteCapacityUnits                                                    |  25            |||
||+------------------------------------------------------------------------+----------------+||
.....        "TableStatus": "CREATING",

What is very different from the global index is that here I didn’t specify read and write capacity for it. The RCU and WCU is counted within the table provisioned ones. That gives more agility in my case as I don’t have to think about the ratio of my two use cases (read from the index only or read the whole item from the table).

Put items

I’m using the same script as in the previous post to put 8 items here. What is different is the output as I can see one additional WCU for the index entry:


----------------------------------
|             PutItem            |
+--------------------------------+
||       ConsumedCapacity       ||
|+----------------+-------------+|
||  CapacityUnits |  TableName  ||
|+----------------+-------------+|
||  246.0         |  Demo       ||
|+----------------+-------------+|
|||    LocalSecondaryIndexes   |||
||+----------------------------+||
||||          DemoLSI         ||||
|||+------------------+-------+|||
||||  CapacityUnits   |  1.0  ||||
|||+------------------+-------+|||
|||            Table           |||
||+------------------+---------+||
|||  CapacityUnits   |  245.0  |||
||+------------------+---------+||
||     ItemCollectionMetrics    ||
|+------------------------------+|
|||      ItemCollectionKey     |||
||+----------------------------+||
||||         MyKeyPart        ||||
|||+------------+-------------+|||
||||  N         |  2          ||||
|||+------------+-------------+|||
|||     SizeEstimateRangeGB    |||
||+----------------------------+||
|||  0.0                       |||
|||  1.0                       |||
||+----------------------------+||

Query the local index

Again, as with the global one, the local index will be used only when explicitely mentioned. Reading only the small attributes, and mentioning only the table, still reads the whole item. Again, I read 4 items (all from the same partition key):


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyInfo001" \
 --no-consistent-read \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }
}

4 items of 245KB is 122.5 RCUs.
Here is the same query mentioning the local index:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w" sizes="(max-width: 735px) 100vw, 735px" />
There, no RCU from the table and 0.5 from the index. This is exactly the same as what I had with the global index, and this is a similar way to optimize the queries that read only a small part of the item.

Consistent reads

One advatage of local indexes is that they support consistent reads (–consistent-read):


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001" \
 --consistent-read \
 --index-name DemoLSI \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 1.0,
        "TableName": "Demo",
        "LocalSecondaryIndexes": {
            "DemoLSI": {
                "CapacityUnits": 1.0
            }
        },
        "Table": {
            "CapacityUnits": 0.0
        }
    }
}

The cost doubles for consistent reads: 1 RCU for 4 items (smaller than 4KB) instead of 0.5 with eventual consistency.
Another advantage of local indexes is that it automatically get the full item transparently when asking for an additional attribute projection which is not in the index:


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001","MyData001" \
 --no-consistent-read \
 --index-name DemoLSI \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 124.5,
        "TableName": "Demo",
        "LocalSecondaryIndexes": {
            "DemoLSI": {
                "CapacityUnits": 0.5
            }
        },
        "Table": {
            "CapacityUnits": 124.0
        }
    }
}

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w" sizes="(max-width: 802px) 100vw, 802px" />
There, I still have 0.5 RCU from the index and now 124 RC from the table. So it is transparent, a bit more expensive than querying directly the table in this case (was 122.5 RCU).
Again, I am in a case where I read all items from a partitioned key so the index is not useful for filtering the items. This is a special case for this demo on covering indexes.

“Item Collection Size Limit”

In the previous blog post as in the current one I always mentioned –return-item-collection-metrics with the put-item. This returned nothing when the table had no local index but you can see above that, with local indexes, each put-item returned:


||+------------------+---------+||
||     ItemCollectionMetrics    ||
|+------------------------------+|
|||      ItemCollectionKey     |||
||+----------------------------+||
||||         MyKeyPart        ||||
|||+------------+-------------+|||
||||  N         |  2          ||||
|||+------------+-------------+|||
|||     SizeEstimateRangeGB    |||
||+----------------------------+||
|||  0.0                       |||
|||  1.0                       |||
||+----------------------------+||

This is the size of the collection of items for MyKeyPart=2 and there’s a hard limit of 10GB. This happens when there are local indexes on the table. So, be certain to control the size of the item collection.

Because we chose DynamoDB for its scalability (the storage can grow) and availability (the structure can evolve without downtime), the limitations of Local Secondary Indexes (collection size limit and rebuild the table to create them) make them not very appealing and Global Secondary Indexes may be prefered even when both are possible.

Cet article DynamoDB: adding a Local covering index to reduce the cost est apparu en premier sur Blog dbi services.

DynamoDB: adding a Global covering index to reduce the cost

By Franck Pachot

.
People often think of indexes as a way to optimize row filtering (“get item” faster and cheaper). But indexes are also about columns (“attribute projection”) like some kind of vertical partitioning. In relational (“SQL”) databases we often add more columns to the indexed key. This is called “covering” or “including” indexes, to avoid reading the whole row. The same is true in NoSQL. I’ll show in this post how, even when an index is not required to filter the items, because the primary key partitioning is sufficient, we may have to create a secondary index to reduce the cost of partial access to the item. Here is an example with AWS DynamoDB where the cost depends on I/O throughput.

Create table

I create a Demo table with a generic name for the key columns: MyKeyPart is the partition key and MyKeySort is the sort key. This is a (HASH,RANGE) partitioning where rows (“items”) with same MyKeyPart and close MyKeySort are clustered together. Different MyKeyPart are scattered across the storage.
Those keys are numbers (AttributeType=N) and I have defined the provisioned read and write IOPS throughput (in RCU/WCU units) at their maximum for the free tier (25).


aws dynamodb create-table \
 --attribute-definitions \
  AttributeName=MyKeyPart,AttributeType=N \
  AttributeName=MyKeySort,AttributeType=N \
 --key-schema \
  AttributeName=MyKeyPart,KeyType=HASH \
  AttributeName=MyKeySort,KeyType=RANGE \
 --billing-mode PROVISIONED \
 --provisioned-throughput ReadCapacityUnits=25,WriteCapacityUnits=25 \
 --table-name Demo

---------------------------------------------------------------------------------
|                                  CreateTable                                  |
+-------------------------------------------------------------------------------+
||                              TableDescription                               ||
|+-------------------+---------------------------------------------------------+|
||  CreationDateTime |  1584271061.57                                          ||
||  ItemCount        |  0                                                      ||
||  TableArn         |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo  ||
||  TableId          |  557cd9b8-a739-4c4a-9aea-cdae4d8cc6c2                   ||
||  TableName        |  Demo                                                   ||
||  TableSizeBytes   |  0                                                      ||
||  TableStatus      |  CREATING                                               ||
|+-------------------+---------------------------------------------------------+|
|||                           AttributeDefinitions                            |||
||+------------------------------------+--------------------------------------+||
|||            AttributeName           |            AttributeType             |||
||+------------------------------------+--------------------------------------+||
|||  MyKeyPart                         |  N                                   |||
|||  MyKeySort                         |  N                                   |||
||+------------------------------------+--------------------------------------+||
|||                                 KeySchema                                 |||
||+---------------------------------------------+-----------------------------+||
|||                AttributeName                |           KeyType           |||
||+---------------------------------------------+-----------------------------+||
|||  MyKeyPart                                  |  HASH                       |||
|||  MyKeySort                                  |  RANGE                      |||
||+---------------------------------------------+-----------------------------+||
|||                           ProvisionedThroughput                           |||
||+-------------------------------------------------------------+-------------+||
|||  NumberOfDecreasesToday                                     |  0          |||
|||  ReadCapacityUnits                                          |  25         |||
|||  WriteCapacityUnits                                         |  25         |||
||+-------------------------------------------------------------+-------------+||

Here is how I wait for the creation to be completed because I like asynchronous operations but my mind is synchronous:


while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "CREATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

...        "TableStatus": "CREATING",

Put items

I’ll insert (“put”) 8 rows (“items”) which contain one small column (“attribute”): MyInfo001. And a large one (150000 random characters text, which is 245KB): MyData001. I build an “item.json” file to load them through “aws dynamodb put-item –item file://item.json”.

I have 2 different MyKeyPart with 4 different MyKeySort each. As many examples for DynamoDB are taken from retail (because it was built by Amazon for their e-commerce business) you can think of it as customer transactions. You want to distribute customers to many nodes for scalability and you want to get the history of each customer grouped and ordered.

Here is my script:


for CusID in {1..2} ; do
for Xdate in {1..4} ; do
cat > item.json <

Each put-item execution returns:


----------------------------------
|             PutItem            |
+--------------------------------+
||       ConsumedCapacity       ||
|+----------------+-------------+|
||  CapacityUnits |  TableName  ||
|+----------------+-------------+|
||  245.0         |  Demo       ||
|+----------------+-------------+|
|||            Table           |||
||+------------------+---------+||
|||  CapacityUnits   |  245.0  |||
||+------------------+---------+||

Look at the consumed capacity for the table: 245 WCU. I “inserted” eight 245KB items. I’m in the default “standard” (aka non-transactional aka no ACID) write where 1KB is 1 WCU, and that is the “cost” of those writes: 245 WCU. This cost becomes money because you pay for the maximum throughput capacity. Here I didn’t see it with 8 items inserted in a few seconds. But my reserved capacity is 25 per second which means that if I continue to insert I’ll wait for retries and get ProvisionedThroughputExceededException after the retries are exhausted. So either I pay for more provisioned WCU (reserving them or through auto-scaling) or I choose on-demand, or I accept some write throttling affecting the response time, which has also a cost in your business.

So, basically, if you read and write fewer items and smaller items, you save money. Like in any IT system but here the performance metric is easily converted to dollars. There’s not a lot I can do here, as I have to insert those items, but let’s see the read scenario now.

Query without index

I have two read use-cases in my scenario. One is reading the customer transaction history with all info like the MyInfo001 attribute here. The second one is reading the detail for one transaction, like the large MyData001 attribute here. For both of them, I don’t need any index to access the specific items because I am partitioned on the filtering attributes. But, in NoSQL as in Relational databases, indexes are required not only to filter rows (row selection) but also to filter attributes (column projection). And this is what I’ll show here

I’m reading all attributes here (–select ALL_ATTRIBUTES) for one customer (–key-condition-expression “MyKeyPart = :k”) and evaluating the cost (-return-consumed-capacity):

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select ALL_ATTRIBUTES \
 --table-name Demo | cut -c 1-80
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1644w" sizes="(max-width: 1024px) 100vw, 1024px" />

The read cost in capacity units is 122.5 RCU. Why a fraction? Because reads are cheaper than writes and reading 8KB is only 1 RCU where writing it is 1 WCU for each KiloByte. All is “eventually consistent” read here, and would be the double (1 RCU for 4KB) with “strongly consistent” reads (because it needs to read 2 copies to get the quorum on the 3 copies). It can even double again (1 RCU for 2KB) for “transactional reads”. So, in this case, I’ve consumed 122.5 to read the 4 items.

Now if I do not query for the large MyData001 attribute, including only the MyInfo001 in the projection:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeySort","MyInfo001" \
 --table-name Demo

...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }
}

The cost is still the same. Because even if the large MyData001 was not in my query projection I had to read the full item before discarding it.

I mentioned earlier that if you want strong consistency (–consistent-read) it is more expensive:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeySort","MyInfo001" \
 --consistent-read \
 --table-name Demo

...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 245.0,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 245.0
        }
    }
}

The read capacity unit has doubled to 245 RCU for strong consistency: 1RCU every 4KB and I had to read 4 items that are 245 KB each.

Global Secondary Index

I’ll show that creating an index can help with this. Technically, as I access by the partition key (MyKeyPart) I can create a Local Secondary Index. But this comes with some restrictions, like the need to re-create the table, and I’ll show it later. Creating a global index is much easier and then can be a solution even when prefixed by the partition key.


aws dynamodb update-table \
 --attribute-definitions \
    AttributeName=MyKeyPart,AttributeType=N \
    AttributeName=MyKeySort,AttributeType=N \
 --global-secondary-index-updates ' [{"Create":{
  "IndexName": "DemoGSI",
  "KeySchema":[
   {"AttributeName":"MyKeyPart","KeyType":"HASH"},
   {"AttributeName":"MyKeySort","KeyType":"RANGE"}
  ],
  "ProvisionedThroughput":{"ReadCapacityUnits":10,"WriteCapacityUnits":5},
  "Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["MyInfo001"]}
  }}] ' \
 --table-name Demo

-----------------------------------------------------------------------------------------------
|                                         UpdateTable                                         |
+---------------------------------------------------------------------------------------------+
||                                     TableDescription                                      ||
|+----------------------+--------------------------------------------------------------------+|
||  CreationDateTime    |  1584277665.92                                                     ||
||  ItemCount           |  8                                                                 ||
||  TableArn            |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo             ||
||  TableId             |  d8238050-556c-4158-85e8-bf90d483d9e2                              ||
||  TableName           |  Demo                                                              ||
||  TableSizeBytes      |  2000331                                                           ||
||  TableStatus         |  UPDATING                                                          ||
|+----------------------+--------------------------------------------------------------------+|
|||                                  AttributeDefinitions                                   |||
||+-------------------------------------------+---------------------------------------------+||
|||               AttributeName               |                AttributeType                |||
||+-------------------------------------------+---------------------------------------------+||
|||  MyKeyPart                                |  N                                          |||
|||  MyKeySort                                |  N                                          |||
||+-------------------------------------------+---------------------------------------------+||
|||                                 GlobalSecondaryIndexes                                  |||
||+----------------+------------------------------------------------------------------------+||
|||  Backfilling   |  False                                                                 |||
|||  IndexArn      |  arn:aws:dynamodb:eu-central-1:802756008554:table/Demo/index/DemoGSI   |||
|||  IndexName     |  DemoGSI                                                               |||
|||  IndexSizeBytes|  0                                                                     |||
|||  IndexStatus   |  CREATING                                                              |||
|||  ItemCount     |  0                                                                     |||
||+----------------+------------------------------------------------------------------------+||
||||                                       KeySchema                                       ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                    AttributeName                   |             KeyType              ||||
|||+----------------------------------------------------+----------------------------------+|||
||||  MyKeyPart                                         |  HASH                            ||||
||||  MyKeySort                                         |  RANGE                           ||||
|||+----------------------------------------------------+----------------------------------+|||
||||                                      Projection                                       ||||
|||+-----------------------------------------------------+---------------------------------+|||
||||  ProjectionType                                     |  INCLUDE                        ||||
|||+-----------------------------------------------------+---------------------------------+|||
|||||                                  NonKeyAttributes                                   |||||
||||+-------------------------------------------------------------------------------------+||||
|||||  MyInfo001                                                                          |||||
||||+-------------------------------------------------------------------------------------+||||
||||                                 ProvisionedThroughput                                 ||||
|||+----------------------------------------------------------------------+----------------+|||
||||  NumberOfDecreasesToday                                              |  0             ||||
||||  ReadCapacityUnits                                                   |  10            ||||
||||  WriteCapacityUnits                                                  |  5             ||||
|||+----------------------------------------------------------------------+----------------+|||
|||                                        KeySchema                                        |||
||+-----------------------------------------------------+-----------------------------------+||
|||                    AttributeName                    |              KeyType              |||
||+-----------------------------------------------------+-----------------------------------+||
|||  MyKeyPart                                          |  HASH                             |||
|||  MyKeySort                                          |  RANGE                            |||
||+-----------------------------------------------------+-----------------------------------+||
|||                                  ProvisionedThroughput                                  |||
||+------------------------------------------------------------------------+----------------+||
|||  NumberOfDecreasesToday                                                |  0             |||
|||  ReadCapacityUnits                                                     |  25            |||
|||  WriteCapacityUnits                                                    |  25            |||
||+------------------------------------------------------------------------+----------------+||

Building the index takes some time with my free tier capacity units limitation:


while aws --output json dynamodb describe-table --table-name Demo | grep '"TableStatus": "UPDATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'
while aws --output json dynamodb describe-table --table-name Demo | grep '"IndexStatus": "CREATING"' ; do sleep 1 ; done 2>/dev/null | awk '{printf "."}END{print}'

....        "TableStatus": "UPDATING",
.................................................................................................................................                "IndexStatus": "CREATING",

This index has the same HASH/RANGE keys as the table be it will be smaller because the only additional attribute is MyInfo001 ( “Projection”:{“ProjectionType”:”INCLUDE”,”NonKeyAttributes”:[“MyInfo001”]} )

Note that I mentioned a ProvisionedThoughput in the index create: this is mandatory. Global indexes have their own provisioning and that means that you need to think about the frequency of queries which will use them as the goal is to reduce the table’s provisioning in order to reduce the cost.

We are not in the SQL / relational nirvana where you create an index and transparently all queries that can benefit from it will do:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyInfo001" \
 --no-consistent-read \
 --table-name Demo
...
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 122.5,
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 122.5
        }
    }
}

Same cost as without the index: 4 items read from the table, that’s 4x245KB and each 8KB takes 1 RCU with eventual consistent reads (–no-consistent-read) so that’s 122.5 RCI.

There’s no optimizer or query planner here, we need to explicitly mention the index usage (–index-name DemoGSI ):

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyInfo001" \
 --no-consistent-read \
 --index-name DemoGSI \
 --table-name Demo
...
   ],
    "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 0.5,
        "GlobalSecondaryIndexes": {
            "DemoGSI": {
                "CapacityUnits": 0.5
            }
        },
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 0.0
        }
    }
}

No RCU consumed from the table and only 0.5 RCU consumed from the index: 4 entries that are smaller than 4KB, which is only 0.5 RCU. That is the huge benefit from this index which does not contain the large MyData001 attribute.

The key columns were not displayed, so I can add them in the projection (–projection-expression=”MyKeyPart”,”MyKeySort”,”MyInfo001″):

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001" \
 --no-consistent-read \
 --index-name DemoGSI \
 --table-name Demo
...
   "ScannedCount": 4,
    "ConsumedCapacity": {
        "CapacityUnits": 0.5,
        "GlobalSecondaryIndexes": {
            "DemoGSI": {
                "CapacityUnits": 0.5
            }
        },
        "TableName": "Demo",
        "Table": {
            "CapacityUnits": 0.0
        }
    }
}

This is the same cost because all is there in the index.

One drawback with Global Secondary Indexes is that they support eventual consistency only. Trying a –consistent-read:

aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001" \
 --consistent-read \
 --index-name DemoGSI \
 --table-name Demo

An error occurred (ValidationException) when calling the Query operation: Consistent reads are not supported on global secondary indexes

This raises an error. Global Secondary Indexes are like another table and DynamoDB does not ensure multi-table consistency.

I mentioned that the index is like another table, and that I have to mention it to read it (–index-name). But there is worse (when compared with Relational SQL databases). An index is supposed to be a pointer to a table row but that is not the case here. If I want to get more attributes from the table I have to do it in another query. I cannot just add those attributes in the projection (–projection-expression) if they are not present in the index:


aws --output json dynamodb query \
 --key-condition-expression "MyKeyPart = :k" \
 --expression-attribute-values  '{":k":{"N":"1"}}' \
 --return-consumed-capacity TABLE \
 --return-consumed-capacity INDEXES \
 --select SPECIFIC_ATTRIBUTES \
 --projection-expression="MyKeyPart","MyKeySort","MyInfo001","MyData001" \
 --no-consistent-read \
 --index-name DemoGSI \
 --table-name Demo

An error occurred (ValidationException) when calling the Query operation: One or more parameter values were invalid: Global secondary index DemoGSI does not project [MyData001]

 
Here, because the index is prefixed with the partition key I can also choose a local index. This may have some advantages (like allowing strong consistency) but also many drawbacks which makes them no so friendly. I’ll continue on that in the next post on DynamoDB: adding a Local covering index to reduce the cost.

Cet article DynamoDB: adding a Global covering index to reduce the cost est apparu en premier sur Blog dbi services.

Video : Kata Containers : Running Containers Inside Lightweight Virtual Machines on Oracle Linux 7 (OL7)

Today’s video demonstrates how to configure Kata Containers on Oracle Linux 7 (OL7), allowing you to run containers inside lightweight virtual machines (VMs).

This video is is based on an article of the same name, but relates to a bunch of other articles and videos on the subject of containers.

The star of today’s video is Jake Kuramoto, originally of The AppsLab fame, and now at WorkDay.

Cheers

Tim…


Video : Kata Containers : Running Containers Inside Lightweight Virtual Machines on Oracle Linux 7 (OL7) was first posted on March 16, 2020 at 9:54 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

COVID-19: Information And Outlook

Outlook for the months ahead >>

I decided to put together some information on COVID-19 purely for my own interest – but then decided I might as well put it on a blog post. I’m only going to link to what I feel are reputable sources, nothing from tabloid papers or people promoting conspiracy theories.

If you know of a good site I should include or there is an area I have not touched on that you would like more information on, please feel free to let me know.

Update. At long last, as of the evening of Monday 16th March, the UK government listened to the WHO and other epidemiologists and accepted that draconian measures to suppress COVID-19 (reduce the R rate, the number of people each infected person in turn infects to below 1) rather than mitigate it (reduce the natural R value of 2.4 towards 1 but above 1) will save thousands of lives.

This paper by the Imperial College London in conjunction with the Medical Reaserch Council & WHO is being cited as the root of this change in opinion. It’s a hard read as it is a scientific paper, but it is excellent. It helps make clear many things such as the local spread rate, infection rate, how it transmits between countries. the likely number of real cases as opposed to tested and verified cases. And the simulations match what we have seen to date.

In summary, suppression, such has been managed in South Korea and China, virtually stops the disease for a while. It does not end it. When the measures to suppress it (very strong social control) it will burst out again. There is always a chance it will escaped to areas it is not suppressed and blow up again. But it buys time to work on a vaccine and develop better treatment regimes.

Mitigation slows the spread down. But it continues to spread. An argument was put forward that this will develop “herd immunity” by letting most people get the disease. It means it would be over sooner – but at the cost of hundreds of thousands of lives, just in the UK. The NHS would be utterly swamped during this time.

I’ll move this down into the body of this post later.

 

Firstly, for anyone who does not know me or just stumbles over this page via “Google”, I am not an expert in any of this – I am not a medic, I am not a scientist, and I am certainly not an epidemiologist (someone who studies the transmission of disease). I’m a computer professional with a really old degree in genetics & zoology who has at times worked on systems for the UK National Health Service (NHS), the Human Genome project, and some other scientific organisations.

Secondly, although this is a very serious disease and it is going to continue to have a huge impact,  most people who get it will not be seriously ill. We are not all going to die!

Most people with underlying medical conditions or who are elderly are also going to be fine

The press, at least in the UK, keeps making a huge point that anyone who dies had “Underlying medical conditions” and it is affecting “the old” more. This is true, but the message that comes across is that if you are old or have an underlying medical condition you will die. This is not true.

Even if you are 79 with diabetes and are diagnosed with COVID-19, you have over an 85% chance of being OK, even if you develop the symptoms.

However, the fact that this disease is eventually going to kill tens, hundreds of thousands of people is why saying “I’m stronger than this” or “I’m not letting it impact ME!” is, in my opinion, a highly arrogant or stupid approach. Just as wrong is making it the focus of your life. Most of us, around 90-95%, will be mildly ill at most, or not noticeably ill at all. {Caveat – by mildly ill, you may well feel terrible and spend a few days in bed, but that’s like a normal dose of ‘flu.  Take it from someone who has spent a week on ventilators recenlty, a few days in bed is nothing </p />
</p></div>

    	  	<div class=

Sequence Costs

You’re probably aware of the “identity” construct that appeared in 12.1 and uses Oracle’s sequence mechanism to model the Autonumber or Identity columns that other databases have. A posting from Clay Jackson on the Oracle-L list server suggests that something about their code path has introduced a surprising overhead in 19c … when you’re not using them.

The following code is a slightly modified version of a test case that Clay Jackson posted to demonstrate a strange difference in performance between 12.2 and 19.3

rem
rem     Script:         19c_sequence_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

drop table test_tab2;
drop table test_tab1;

drop sequence test_seq1;
drop sequence test_seq2;

create sequence test_seq1;
create sequence test_seq2;

create table test_tab1 (
   id number primary key,
   c1 varchar2(1000)
);

create table test_tab2 (
   id number,
   t1_id number,
   c2 varchar2(1000),
   constraint test_tab2_fk foreign key (t1_id) 
        references test_tab1 (id) on delete cascade
);

spool 19c_sequence_cost.lst

set feedback off
set pagesize 0
set serveroutput off

execute snap_my_stats.start_snap
alter session set events '10046 trace name context forever';

prompt  ==============================================================================
prompt  start_1000.sql cascades to start_1.sql
prompt  start_1.sql holds:
prompt
prompt  insert into test_tab1 values (test_seq1.nextval,'hello');
prompt  insert into test_tab2 values (test_seq2.nextval, test_seq1.currval, 'byebye');
prompt  commit;
prompt
prompt  ==============================================================================

start start_1000

set serveroutput on
set pagesize 60
alter session set events '10046 trace name context off';
execute snap_my_stats.end_snap

spool off

I’ve got a couple of tables with columns that I plan to generate from sequences, and I’ve got a referential integrity constraint between those tables. I’ll be using nextval from one sequence to populate the first table, then use currval from the same sequence for the foreign key and nextval from the other sequence as a unique identifier for the second table.

I’ve used my “repeater” code to run a short script from the SQL*Plus prompt 1,000 times (as suggested by Clay Jackson in his posting). I’ve also added lines to enable SQL_trace at the basic level, and taken a snapshot of the session activity stats.

Just as Clay Jackson had reported – 19.3.0.0 took significantly longer than 12.2.0.1 to complete the 1,000 calls to the script. (Though in absolute terms we’re only talking fractions of a second for a small single user test.) Examination of the stats – which prompted me to add the 10046 trace and repeat – made it very clear where the difference lay. Here are a few figures picked from the session activity stats:

Version    Statistic                        Value
========   ========================         ===== 
12.2.0.1   CPU used by this session            53  
19.3.0.0   CPU used by this session           114

12.2.0.1   recursive cpu usage                 18
19.3.0.0   recursive cpu usage                 44

12.2.0.1   recursive calls                  1,182
19.3.0.0   recursive calls                  7,215

12.2.0.1   execute count                    2,137                                     
19.3.0.0   execute count                    8,151

Clearly it’s worth finding out what was happening in those 6,000 extra recursive calls. And this is what the 19.3 trace file (after passing through tkprof … sort=execnt) showed:


select count(1)
from
 sys.idnseq$ where seqobj# = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   6000      0.07       0.09          0          0          0           0
Fetch     6000      0.03       0.04          0       6000          0        6000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    12001      0.10       0.14          0       6000          0        6000

If you can’t guess what the table idnseq$ is about, the following definition appears in $ORACLE_HOME/rdbms/admin/dcore.bsq:

create table idnseq$     /* stores table identity column to sequence mapping */
( obj#         number not null,                       /* table object number */
  intcol#      number not null,                    /* identity column number */
  seqobj#      number not null,                    /* sequence object number */
  startwith    number not null,                   /* sequence starting value */
  supplemental log group idnseq$_log_grp (obj#, intcol#, seqobj#) always
)
/

We’re not using identity columns in this test, but every time we run the script we do six checks against this data dictionary table to see (presumably) if we’re using a sequence that is internally associated with an identity column. The requirement doesn’t seem completely unreasonable – but it’s a bit surprising that we do every check twice. (Note – we have to check test_seq1 for both statements but test_seq2 only for the second statement, which is how we get 6,000 executions: 2 * (2 checks for test_seq1 + 1 check for test_seq2) * 1,000 executions of the script.

The doubling looks like a bug, and you have to wonder why a suitable piece of information isn’t kept in the dictionary cache anyway to allow Oracle to avoid executing the recursive statement. I should point out that the surprisingly large impact is visible because I’ve executed 2,000 separate statements from the client side; mechanisms like array inserts and pl/sql loops might only have to do this check once for each array or pl/sql call. (Tests of this hypothesis are left as exercise for the interested reader.)

 

dense_rank

I’ve just been prompted to complete and publish a draft I started a few years ago. It’s (ultimately) about a feature that appeared in 9i but doesn’t seem to show up very often at client sites or as a common solution to performance problems on the various Oracle forums – but maybe that’s not surprising given how slowly analytic functions have been taken up.

I want to work towards the feature by starting with a requirement, then examine several solutions. To supply a touch of realism I’ll create an orders table, which holds a customer id and an order date (including time), ,and then ask for a report of the most recent order for each customer. Here’s some starting data:

rem
rem     Script:         order_sample.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2006
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.0        Costs are consistent
rem             11.2.0.4        Costs become consistent by 11.2.0.3
rem             11.1.0.7
rem             10.2.0.3
rem              9.2.0.8
rem

create table orders
as
with generator as (
        select
                rownum id 
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid wordpress format issue
)
select
        rownum                                                                  id,
        mod(rownum-1,200)                                                       customer,
        sysdate - 20 + dbms_random.value(0,20)                                  date_ordered,
        rpad('x' || to_char(trunc(dbms_random.value(0,1000)),'FM009'),100)      padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid wordpress format issue
;

alter table orders modify customer not null;
alter table orders modify date_ordered not null;
alter table orders add constraint ord_pk primary key(id);

create index ord_cus on orders(customer);
-- create unique index ord_cus_date on orders(customer, date_ordered);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'orders',
                method_opt       => 'for all columns size 1',
                cascade          => true
        );
end;
/

I’ve got 200 customers, at 50 orders per customer dating back over the last 20 days. There’s a primary key on the table and (as it stands) an obvious “foreign key index” on the customer column, though I’ve allowed for changing this to a more useful (customer, date_ordered) combination which I’ve decided could be declared as unique.

With this data, how do I report “the most recent order for each customer”? The first question to ask in response to this request is: “do you literally mean ‘THE’ most recent; what if the customer has placed two or more orders on the same day or, in my initial case, at the same time?” There’s a special case to think about the moment you start to turn the natural language request into a formal language specification.

In this case I’m going to run with the “customer-only” index and allow for the fact that two or more orders could be placed at the same time by the same customer, and report both (all) of them if any such simultaneously placed orders appear.

Strategy number 1:

Start with a list showing the most recent order date for each customer and then report all orders that we can identify using that list of (customer, date_ordered). To do that I’ll start with a simple aggregate query and use the result it produced in an “IN” subquery:


prompt  =========================
prompt  Use IN subquery for max()
prompt  =========================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        orders  ord1
where
        (ord1.customer, ord1.date_ordered) in (
                select  /*+ qb_name(subq) */
                        ord2.customer, max(ord2.date_ordered)
                from
                        orders  ord2
                group by 
                        ord2.customer
        )
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

Plan hash value: 1500776991

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |          |      1 |      1 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN RIGHT SEMI|          |      1 |      1 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1568K (0)|
|   3 |    VIEW               | VW_NSO_1 |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |          |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1421K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS   |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS   |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."CUSTOMER"="CUSTOMER" AND "ORD1"."DATE_ORDERED"="MAX(ORD2.DATE_ORDERED)")

I’ve included the qb_name() hint in both query blocks here – it’s always a good idea as it gives you a little extra edge in interpreting the execution plan when the queries get more complicated.

The first thing you’ll notice about the resulting execution plan is that the optimizer has “unnested” the subquery to create an inline view (which it has named VW_NSO_1) and then used a simple join to get the final result. That’s an interesting observation, and it’s something that will often happen with an “IN” subquery – and that brings us to strategy 2.

Strategy number 2:

Some people will take as gospel the claim that the optimizer “cannot handle subqueries efficiently” and will prefer to write their own inline views (possibly using the “WITH subquery” a.k.a. “Common Table Expression (CTE)” mechanism). There will be occasions, even in the latest versions of Oracle, where you may need to do this but there will also be occasions where the optimizer hasn’t done it because it would produce the wrong results – and I have seen a couple of accidents go into production code where this variant has been written incorrectly.


prompt  ==============================
prompt  Introduce max() as inline view
prompt  ==============================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        (
                select  /*+ qb_name(in_line) */
                        ord2.customer, max(ord2.date_ordered) date_ordered
                from
                        orders  ord2
                group by 
                        ord2.customer
        )       ordv,
        orders  ord1
where
        ord1.customer     = ordv.customer
and     ord1.date_ordered = ordv.date_ordered
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));

Plan hash value: 2750501889

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |        |      1 |    200 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN           |        |      1 |    200 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1531K (0)|
|   3 |    VIEW               |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1413K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."CUSTOMER"="ORDV"."CUSTOMER" AND "ORD1"."DATE_ORDERED"="ORDV"."DATE_ORDERED")

You’ll notice, of course, the remarkable similarity between the previous plan and this one – the only significant difference being that the optimimzer used a “plain” hash join here rather than the “hash join right semi” that appeared in the previous plan. The “right semi” is an indication that the optimizer has first transformed your “IN” subquery to an equivalent “EXISTS” (“=ANY”) subquery. Don’t be misled by the “right”, by the way, this isn’t indicating any sort of outer join it’s just trying to let you know which table is the one where Oracle should stop its probe after finding the first row. It is, however, unfortunate that it gets a little awkward trying to sort out left from right when Oracle can do a “swap join inputs” on you.

It would have been nice if the VIEW operatio1n had reported the name of my inline view (to correspond to the generated VW_NSO_1 viewname from the previous plan) – but you if you included the ‘alias’ formatting option in the call to display_cursor() it would have reported the alias ordv@main at operation 3.

Strategy Number 3:

We might have decided to check every row in the table to see if the date in that row was the most recent date for the customer in that row, which we could do by running a correlated subquery to do the check for every row in the table.

prompt  ========================================
prompt  Orders with correlated EQUALITY subquery
prompt  ========================================

select  
        /*+ qb_name(main) */
        ord1.* 
from
        orders  ord1
where
        ord1.date_ordered = (
                select  /*+ qb_name(subq) */
                        max(ord2.date_ordered)
                from
                        orders  ord2
                where
                        ord2.customer = ord1.customer
        )
order by
        ord1.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last'));


Plan hash value: 1152467146

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |    54 (100)|    200 |00:00:00.01 |     344 |       |       |          |
|   1 |  SORT ORDER BY        |         |      1 |    200 |    54  (15)|    200 |00:00:00.01 |     344 |   115K|   115K|  102K (0)|
|*  2 |   HASH JOIN           |         |      1 |    200 |    53  (14)|    200 |00:00:00.01 |     344 |  1695K|  1695K| 1622K (0)|
|   3 |    VIEW               | VW_SQ_1 |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |       |       |          |
|   4 |     HASH GROUP BY     |         |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |  1484K|  1484K| 1435K (0)|
|   5 |      TABLE ACCESS FULL| ORDERS  |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
|   6 |    TABLE ACCESS FULL  | ORDERS  |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD1"."DATE_ORDERED"="MAX(ORD2.DATE_ORDERED)" AND "ITEM_1"="ORD1"."CUSTOMER")

Yet again we end up with the same execution plan (barring the “right semi” issue) but with a different generated name for the unnested subquery. This is an interesting facet of Oracle (and SQL in general) – completely different ways of stating a requirement can end up doing the same work in the same way.

An important corrollary to this observation is that the first thing you should do when you start writing an SQL statement is to write it in a way that clearly expresses the requirement and is easy for others to comprehend. Don’t (at the first stage) try to do anything clever because (a) you may do it wrong and (b) the optimizer might have taken your clear, simple, code and done the clever bit behind the scenes for you.

However, we may have to move on to doing something new (-ish) and exciting.

Strategy number 4:

An “obvious” defect in the three plans so far is that we have to visit the orders table twice. Is there a way we can avoid doing this? The answer is yes. Oracle 8.1.6 gave us the analytic functions:


prompt  =======================
prompt  Analytic max() function
prompt  =======================

column padding noprint
column date_ordered noprint

select
        /*+ qb_name(main) */
        ordv.* 
from    (
        select  /*+ qb_name(inline) */
                customer, id, date_ordered, padding,
                max(date_ordered) over (
                        partition by customer
                ) max_date
        from    orders  ord2
        )       ordv
where
        ordv.date_ordered = ordv.max_date
order by
        ordv.customer
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline allstats last'));

Plan hash value: 813391662

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   262 (100)|    200 |00:00:00.01 |     172 |       |       |          |
|*  1 |  VIEW               |        |      1 |  10000 |   262   (3)|    200 |00:00:00.01 |     172 |       |       |          |
|   2 |   WINDOW SORT       |        |      1 |  10000 |   262   (3)|  10000 |00:00:00.01 |     172 |  1612K|   624K| 1432K (0)|
|   3 |    TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ORDV"."DATE_ORDERED"="ORDV"."MAX_DATE")

By adding the analytic max() function I can acquire the necessary “raw” data once and post-process it to find the max(date_ordered) for each customer before discarding all the rows where the row’s date doesn’t match the maximum date. The expression “max(date_ordered) over (partition by customer)” is like a virtual column that tells Oracle to partition the data by customer and find the maximum date within customer. Imagine copying the original data into a spreadsheet, sorting it by customer, then using one of the spreadsheet functions to add an extra column that derives it’s value by looking at the rows that have the same customer as the current row and you’ve got an exact replica of what Oracle is doing here.

So we’ve managed to produce the same result with a single tablescan of orders instead of the two tablescans we saw in every other plan. But there’s a drawback – to be able to partition by customer Oracle has had to fetch every row and column we’re interested in and sort the data before deriving values for the new column: the cost of this plan (262) is much higher than the cost of the plan (54) we got from the previous three queries.

In this case the variation in actual run-time for the two different plans was undetectable – and insignificant compared to the time spent getting the result set to the terminal and displaying. In general, though, you need to consider the trade off between the sorting that goes into the use of analytic functions and the “double visit” work of using subqueries.

Strategy number 5:

There is (at least) one more possibility that I’ve used in the past when the data structure has allowed it to produce the right answers; and it’s the one that is the ultimate target of this blog. Consider the following SQL:


select
        customer, 
        max(id)                 keep (dense_rank last order by date_ordered) as max_id,
        max(date_ordered)       keep (dense_rank last order by date_ordered) as max_date,
--      max(padding)            keep (dense_rank last order by date_ordered) as max_padding
        trim(
                max(padding)    keep (dense_rank last order by date_ordered)
        )       as max_padding
from
        orders
group by
        customer
;

(The trim() function on the padding column doesn’t change the fundamental behaviour of this query, it’s just there to avoid line-wrapping on my output.)

I’ve written a query that does an aggregate on customer, so “customer, max() group by customer”, but it’s a variant of the analytic max() function based on “keep(dense_rank last order by …)” rather then the more familiar “over(partition by … order by …)” form.

Because of the group by customer, the max() function is applied per customer (i.e. behaving like over(partition by customer)), and we’re not actually looking for the maximum value of the referenced column, we’re first ordering by the date_ordered (within customer) applying the dense_rank mechanism, keeping only the rows that have the highest (last) dense_rank, and then taking the maximum of that subset of the data.

Here’s an example applying the combination of mechanisms to a tiny data set:

Raw data
=========
   N1           V1
-----           ---
   93           'C'
   54           'Q',
   43           'A'
   13           'X'
   93           'G'
   54           'W',

Ordered by N1 and dense_rank() appended
========================================
   N1           V1              dr()
-----           ---             ----
   13           'X'             1
   43           'A'             2
   54           'W',            3
   54           'Q',            3
   93           'G'             4
   93           'C'             4

Keep(dense rank last)
=====================
   N1           V1              dr()
-----           ---             ----
   93           'G'             4
   93           'C'             4


max(v1) keep(dense rank last order by n1)
V1
---
'G'

In this tiny example we had cases where there were multiple rows for some of the rankings, but if we go back to our orders table and guarantee (by adding a unique constraint) that a customer will never have more than one row for any one value of date_ordered, then the expression max(id) keep (dense_rank last order by date_ordered) for any one customer will be the id of the row that has the maximum order date for that customer and, similarly, max(date_ordered) keep(…), and max(padding) keep (,,,) will also be the values from that same row.

Given the (absolutely critical) uniqueness constraint, we can get the data for the most recent for the customer using this dense_rank() strategy.

The question, of course, is why would we do something that may not be entirely intuitive and looks as if it could make Oracle do a lot of extra work to get the result. Here’s the answer – which is just the execution plan for the query on my orders table – with the unique constraint added:


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |    28 (100)|    200 |00:00:00.01 |     172 |       |       |          |
|   1 |  SORT GROUP BY     |        |      1 |    200 |    28  (18)|    200 |00:00:00.01 |     172 |   142K|   142K|  126K (0)|
|   2 |   TABLE ACCESS FULL| ORDERS |      1 |  10000 |    24   (5)|  10000 |00:00:00.01 |     172 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

The path uses a basic SORT GROUP BY, that “sorts” only 200 rows (A-rows) using only 126KB of memory. Compare that with the plan for the analytic max() over() in strategy 4 that takes 1.6MB of memory and sorts 10,000 rows and you’ll appreciate that the keep(dense_rank last) mechanism is doing something much more efficient. For cases where the drop from “num_rows” to “num_distinct” for the aggregating column(s) the benefit of using the somewhat non-intuitive dense_rank() approach may make a significant difference to memory, CPU, and even (if it avoids a spill to disk) I/O.

Footnotes

There are two major variations on how you can use the dense_rank() function, as well as this form in which dense_rank appears in the KEEP LAST (and FIRST) mechanism.

Remember the absolutely critical point that the “keep dense_rank last” strategy is only correct if there is a suitable unique constraint on the data set viz: unique on ({group by column(s)},{order by column(s)}).

There is another major option for getting the same “most recent” rows, which is to use the match_recognize() functionality, but I think I probably wrote this draft before the mechanism even existed – so it’s left as an exercise to the reader to work it out.  A key reason why I’m not going to do it myself is that (like the analytic over() in strategy 4) it will require all 10,000 rows to be sorted, and is therefore likely to be less efficient than strategy 5.

Finally – I thought I’d written a note explaining why a “sort group by” can use far less memory and CPU then a basic “sort order by”, but if I have it wasn’t on this blog.  I do have a note on how the mechanism to handle “rownum <= N” with a preceding “order by” minimises its use of memory, and that note should give you some idea of what the “sort group by” is doing to minimise memory usage. I’ll try to write a little note on the aggregate mechanism some time over the next few days.

 

 

Passing complex data types to Ansible on the command line

Earlier this year I wrote a post about passing JSON files as --extra-vars to ansible-playbook in order to simplify deployments and to make them more flexible. JSON syntax must be used to pass more complex data types to Ansible playbooks, the topic of this post. Unlike last time though I’ll pass the arguments directly to the playbook rather than by means of a JSON file. This should cover both methods of passing extra variables.

A lot of what you are about to read depends on Ansible configuration settings. I have used Ansible on Debian 10. When I installed it earlier today I found it to be version 2.7.7. It’s the distribution’s (stock) Ansible version:

vagrant@debian10:~$ lsb_release -a
No LSB modules are available.
Distributor ID:    Debian
Description:       Debian GNU/Linux 10 (buster)
Release:           10
Codename:          buster
vagrant@debian10:~$ ansible-playbook --version
ansible-playbook 2.7.7
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/vagrant/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 3.7.3 (default, Dec 20 2019, 18:57:59) [GCC 8.3.0]
vagrant@debian10:~$  

The only change I consciously made was to set the output to debug:

vagrant@debian10:~$ export ANSIBLE_STDOUT_CALLBACK=debug

However, as with all information you find on the Internet – this post explicitly included – your mileage may vary. Don’t blindly copy/paste. Test everything you deem useful on an unimportant, disposable, lower-tier test system and make sure you understand any code before you even think about using it! Vagrant is a pretty good tool for this purpose by the way.

Having said that, let’s go over some examples.

Dictionaries

Let’s assume you’d like to use a dictionary in your playbook, like so:

 ---
- hosts: localhost
  connection: local
  vars:
    dictExample:
      propertyA: propertyA-key
      propertyB: propertyB-key

  tasks:
  - name: dump dictExample
    debug:
      var: dictExample 

Unsurprisingly, when invoking the playbook, the output matches the code exactly:

$ ansible-playbook -i localhost, dict-example.yml 

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [dump dictExample] ********************************************************
ok: [localhost] => {
    "dictExample": {
        "propertyA": "propertyA-key",
        "propertyB": "propertyB-key"
    }
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0 

Overriding dictExample on the command line requires the use of JSON, while paying attention to shell expansion at the same time. Here is an example:

$ ansible-playbook -i localhost, dict-example.yml --extra-vars "{
>     "dictExample": {
>       "propertyA": "'property A set on the command line'",
>       "propertyB": "'property B set on the command line'"
>     }
> }"
Using /etc/ansible/ansible.cfg as config file

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [dump dictExample] ********************************************************
ok: [localhost] => {
    "dictExample": {
        "propertyA": "property A set on the command line",
        "propertyB": "property B set on the command line"
    }
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0 

As per the Ansible documentation, variables passed as extra-vars take precedence over those defined in the playbook.

Lists

Similarly it is possible to pass lists to playbooks. Here is an example:

---
- hosts: localhost
  connection: local
  vars:
    listExample:
    - one
    - two
    - three

  tasks:
  - name: dump listExample
    debug:
      var: listExample 

Invoking it with the defaults yields the expected result:

$ ansible-playbook -i localhost, list-example.yml 

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [dump listExample] ********************************************************
ok: [localhost] => {
    "listExample": [
        "one",
        "two",
        "three"
    ]
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0 

You can override listExample as shown here:

$ ansible-playbook -i localhost, list-example.yml --extra-vars "{
>     "listExample": [
>         "'commandline one'",
>         "'commandline two'",
>         "'commandline three'"
>     ]
> }"

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [dump listExample] ********************************************************
ok: [localhost] => {
    "listExample": [
        "commandline one",
        "commandline two",
        "commandline three"
    ]
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0 

Combinations

If you worked with perl, you probably used a Hash Of Hashes (HoH) as it’s a very powerful data structure. Something similar is also possible in Ansible. Here is the example playbook:

---
- hosts: localhost
  connection: local
  vars:
    complexExample:
      propertyA:
      - a_one
      - a_two
      - a_three
      propertyB:
      - b_one
      - b_two
      - b_three

  tasks:
  - name: dump complexExample
    debug:
      var: complexExample 

By now you are probably tired of seeing the result of the call to the playbook, so I’ll skip that and move on to an example where I’m overriding the variable:

$ ansible-playbook -i localhost, complex-example.yml --extra-vars "{
    "complexExample": {
        "propertyA": [
            "a_one_changed",
            "a_two_changed",
            "a_three_changed"
        ],
        "propertyB": [
            "b_one_changed",
            "b_two_changed",
            "b_three_changed"
        ]
    }
}"

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [dump complexExample] *****************************************************
ok: [localhost] => {
    "complexExample": {
        "propertyA": [
            "a_one_changed",
            "a_two_changed",
            "a_three_changed"
        ],
        "propertyB": [
            "b_one_changed",
            "b_two_changed",
            "b_three_changed"
        ]
    }
}

PLAY RECAP *********************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0 

Summary

Passing variables to Ansible playbooks is a powerful way of working with automation. Apart from simple variables you could read about in the previous post on the topic, you can pass dictionaries, lists and combinations thereof using JSON notation. Happy automating!