Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

YugaByteDB 2.1: the Open Source multi-region distributed database with PostgreSQL API is in GA with huge performance improvement

By Franck Pachot

.
9 months ago I was looking at YugaByteDB which was still in beta version for its ‘YSQL’ API. I published my first test on Medium: https://medium.com/@FranckPachot/running-pgbench-on-yugabytedb-1-3-3a15450dfa42. I have been very enthusiastic about the idea, the architecture, the way they open-sourced it and how all was documented in their blog. I’ve even met them in Sunnyvale when I traveled to California for Oracle Open World. Great people with a great vision on the future of databases. From this first test, I was not impressed by the performance but it was an early beta and a multi-master/multi-index database has many challenges to solve before tuning the details of implementation. This tuning task has been done for the General Availability version 2.1 released in February. I was eager to test it, but this first month back to dbi-services consulting was very busy. So finally here it is.

This post takes the same test I did last July and the result is impressive: the pgbench initialization time is fully fixed and the pgbench run shows 9x higher throughput.

I’m on the same VM which, as funny as it might sound, is an Oracle Linux 7 running on the Oracle Cloud.

Install and start:

I install this version 2.1 in the same way I installed the 1.3 in the past. All is documented: https://docs.yugabyte.com/latest/quick-start/install/linux/


wget -O yugabyte-2.1.0.0-linux.tar.gz  https://downloads.yugabyte.com/yugabyte-2.1.0.0-linux.tar.gz
tar -xzvf yugabyte-2.1.0.0-linux.tar.gz
yugabyte-2.1.0.0/bin/post_install.sh
export PATH="~/yugabyte-2.1.0.0/bin:~/yugabyte-2.1.0.0/postgresql/bin:$PATH"

I create a 3 nodes cluster with replication factor 3, with all nodes in the same host for this test:


yb-ctl --rf 3 create
yb-ctl status

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1112w" sizes="(max-width: 1024px) 100vw, 1024px" />

I create database for this:


ysqlsh
 \timing on
 drop database if exists franck;
 create database franck;
 \q

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1101w" sizes="(max-width: 1024px) 100vw, 1024px" />

pgbench initialization

I use pgbench that I have from a “normal” PostgreSQL installation on the same server.


[opc@db192 ~]$ type pgbench
pgbench is hashed (/usr/pgsql-11/bin/pgbench)
time pgbench --initialize --host localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1141w" sizes="(max-width: 1024px) 100vw, 1024px" />

If you compare with the previous post on version 1.3 you will see many differences.

  • No “ERROR: DROP multiple objects not supported yet” here. This issue has been fixed.
  • No “ERROR: VACUUM not supported yet” but just a warning because there’s no vaccum here in the YugaByteDB storage layer
  • And the best: 8 seconds instead of the 2 minutes we had before

However the “ALTER TABLE” to add the constraints afterward is still not supported so I run the same manually with the FOREIGN KEY declaration in the CREATE TABLE:


ysqlsh franck
 drop table if exists pgbench_history;
 drop table if exists pgbench_tellers;
 drop table if exists pgbench_accounts;
 drop table if exists pgbench_branches;
CREATE TABLE pgbench_branches (
    bid integer NOT NULL
   ,bbalance integer
   ,filler character(88)
   ,CONSTRAINT pgbench_branches_pkey PRIMARY KEY (bid)
 );
CREATE TABLE pgbench_accounts (
    aid integer NOT NULL
   ,bid integer references pgbench_branches
   ,abalance integer
   ,filler character(84)
   ,CONSTRAINT pgbench_accounts_pkey PRIMARY KEY (aid)
 );
CREATE TABLE pgbench_tellers (
    tid integer NOT NULL
   ,bid integer references pgbench_branches
   ,tbalance integer
   ,filler character(84)
   ,CONSTRAINT pgbench_tellers_pkey PRIMARY KEY (tid)
 );
CREATE TABLE pgbench_history (
    tid integer references pgbench_tellers
   ,bid integer references pgbench_branches
   ,aid integer references pgbench_accounts
   ,delta integer
   ,mtime timestamp without time zone
   ,filler character(22)
 );
\q

Now remains to insert the rows there. This was very long (about an hour) in 1.3:


time pgbench --initialize --init-steps=g -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

No foreign key error and 6 seconds only!

Do you remember that I had to switch to SERIALIZABLE isolation level? I don’t have to here:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1862w" sizes="(max-width: 1024px) 100vw, 1024px" />

This has been fixed with the support of SELECT locks, so no need to go to optimistic locking with SERIALIZABLE (which requires that the application implements a ‘retry’ logic).

Simple Update in 1 thread

Then, as I did before, I run a Simple Update workload from one session during 30 seconds:


pgbench --no-vacuum --builtin=simple-update --protocol=prepared --time 30 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

When compared with the previous run on 1.3 I’ve updated 9x more transactions. Yes, that’s exactly what has been announced for this version: huge performance imprevements:

Simple Update in 10 threads

Then, as I did before I’m running with 10 concurrent threads


pgbench --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

Again, that 7x better than my test on 1.3 and still in read commited isolation level.

Note that I was lucky here but it can happen that we get a serialization error even in read commited like in this second run of the same:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

The community implementation of pgbench has no retry logic. Once an error is enountered the thead finishes, and that’s unusable for a benchmark. The patch proposed many times was, unfortunately, always rejected: https://commitfest.postgresql.org/18/1645/.

But YugaByteDB has a solution for that, which I’ll show in the next post: https://blog.dbi-services.com/ysql_bench/

Cet article YugaByteDB 2.1: the Open Source multi-region distributed database with PostgreSQL API is in GA with huge performance improvement est apparu en premier sur Blog dbi services.

RAC on Azure- Link to Microsoft Tech Blog

Some folks stated they were having trouble finding this, so I’m posting a link here for the blog I wrote on the Microsoft Tech Community Blog on my opinion about building out RAC on Azure.

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [RAC on Azure- Link to Microsoft Tech Blog], All Right Reserved. 2020.

Unreal Load Testing in the Cloud

Load testing as part of a cloud migration project is expected and  should be built into the requirements.  The goal is to set yourself up for success.

Log Latency

Recently I had a Cloud Solution Architect, (CSA) escalate an Oracle performance problem after migrating from on-prem to the cloud.

The information provided from the customer stated they were experiencing log sync latency in Oracle and that they hadn’t experienced this previously on-prem.  They said there weren’t really any changes, just a simple lift and shift to Azure, so they were at a loss as to why the latency was being experienced.  I was provided an AWR of the current environment, the user having taken manual snapshots for a 5 minute interval report.  I was initially told they didn’t have an AWR from the previous environment for comparison which was challenging, but some times that’s how it is.

Big Jump

Upon review, I quickly noted that the database was Oracle 19c.  This is the newest version of Oracle and when asked, they quickly admitted the database on-prem was an 11g RAC database and they had upgraded the database to 19c single instance for Azure.  They didn’t realize this would make a huge difference and hadn’t realized this was no longer a simple lift and shift.

Lift and Shift, Strike 1:  Upgrading from a very old release to very new release and had moved from RAC to single instance. Yeah, that’s a bit of a change.

This is something I would highly recommend you DON’T do.  When doing a lift and shift to the cloud, you want the database version to stay as close to the same as possible to eliminate added challenges to the migration.

If a major upgrade is required, do it on-prem first and then lift and shift to the cloud, addressing any upgrade problems beforehand.

From 11g to 19c there were significant changes just to archive logging, including LGWR slaves and other features that were assuredly going to make a different outside of 100’s of other changes.

Code Blues

The next challenge that struck me about the workload from the AWR run in the 19c database was the amount of waits from commits.

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306-300x41.png 300w, https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306-768x104.png 768w, https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306.png 1051w" sizes="(max-width: 800px) 100vw, 800px" />

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_1-300x47.png 300w, https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_1-768x119... 768w" sizes="(max-width: 812px) 100vw, 812px" />

I had nothing to compare it to, but the code logic built into the DML that I was observing, with the sheer quantity of single inserts and  updates that had to be committed were going to create log syncing issues, but…the customer said the workload was the same as in their on-prem and they hadn’t had the problem there.

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_2-300x282... 300w" sizes="(max-width: 526px) 100vw, 526px" />

Now I’ve highlighted that second count on executions for a reason.  It will all make sense in a moment and remember, this is a 5 minute workload in the database that was captured, so the number of executions is what transpired in 5 minutes.

AWR reports are pretty awesome, but they don’t have everything and commonly you would take the report, find the red flags and then investigate deeper into areas that cause you concern.  After a review of the waits and the customer assuring me that the log syncs were not an issue on-prem, I directed my attention to the redo logs and disk they resided on, asking for:

  • The sizes on the logs and were they located on a separate disk from the data.
  • Information about the disk that the logs were residing on, (type, IOPS, size, etc.)
  • Was read caching turned on the redo log disk
  • Had they aslo added write accelerator on the redo log disk

Lift and Shift, Strike 2: Didn’t follow guidelines for Oracle VMs on Azure around redo logs.

All of recommendations either came back as yes or were able to addressed, improving the overall performance for the 19c database, but the claim they’d never experienced this type of log sync latency in the past, considering the code logic really troubled me.

Why Comparison Data is Important

Even though they were satisfied with the results of the changes, someone finally, located an older AWR report.  Unlike the 5 minute report, the older report was for 45 minutes.

The first thing that it demonstrated is that log sync latency and commits were a problem before, just not as prevalent as they were in the 19c database:

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_3-300x67.png 300w" sizes="(max-width: 640px) 100vw, 640px" />https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_4-300x169... 300w" sizes="(max-width: 372px) 100vw, 372px" />

I knew, whatever I was looking at for DML in this older, 11g report, it should have counts of 9 times higher in executions then multiplied by 4 for each of the RAC nodes it was scaled out to, than displayed in my 19c report.

Except….it didnt’:

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_5-236x300... 236w" sizes="(max-width: 421px) 100vw, 421px" />

I had the other RAC node reports and the executions were similar to the first node, (just in case the workload was askew across the nodes, I wanted to verify).

So….If I took the executions for each node and calculated it down to a 5 minute execution window for this main statement:

(22136 + 22714 + 23528 + 23363) / 9 =10193 executions in 5 min

174198 in the 19c, which results in 17 times the workload of all the workload thrown at the original RAC environment on 11g on-prem.

Lift and Shift, Strike 3:  The workload on the 19c environment was completely different than on-prem

When I brought this to the CSA and he asked the customer, discovered that they were just doing “load testing” against the environment and then escalated when they experienced the latency.  The problem is, their code logic was the real reason the system couldn’t scale, not Oracle or Azure.  We just ended up addressing it with the two latter platforms to workaround the software problem.

The reason they hadn’t experienced this in 11g or on-prem is that they’d never run the workload at this level before, but we have to ask, what was the purpose of this type of load testing?

Set Load Testing Up for Success

What do I mean by this?  Some might think the job of load testing is to find out how much you can throw at the cloud environment you’ve built before it falls over.  The truth is, this isn’t very helpful in the long run:

  1.  It can be considerably time consuming and provides very little valuable information on a globally distributed infrastructure, which contains numerous layers of redundancy and a resilient interconnected network.
  2. It can create fake issues to trouble-shoot or escalate.
  3. Doesn’t provide you with real data about how your on-prem environment will perform in the cloud.
  4. It doesn’t prove that the cloud can’t handle it, it just means you didn’t add enough resources to handle the silly monstrosity you decided to run.

The goal of load testing should be to take a current snapshot of your workload, including peak requirements and run it against what you’ve architected to support your deployment.

Take the time to identify a real workload, or consider using Oracle database replay.  It’s been around as part of Real Application Testing pack since 11g and is loved by many. It will allow you to take the SAME workload you run in your production system and duplicate everything in a new one without scrambling around trying to figure out what really changed.

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Unreal Load Testing in the Cloud], All Right Reserved. 2020.

A Crash Course in Remote Management

Remote work is a prominent topic lately, as people around the world are doing their best to live their lives and keep themselves and their families safe and prepared during the COVID-19 outbreak. The impact of this outbreak is felt across societies and cultures as well as in the workplace.  

Automattic, the company behind WordPress.com, is a primarily distributed company with more than 1,000 employees across 76 countries. I’m an engineering lead, currently working on the Developer Experience team. As Automattic has grown, we’ve learned a lot about working remotely and across time zones, and have shared insights on what we see as the future of work on the Distributed podcast, hosted by our CEO, Matt Mullenweg. 

This week, Nicole Sanchez, the founder of Vaya Consulting and an expert on workplace culture, and I had an opportunity to co-present a Crash Course in Remote Management, a free one-hour webinar hosted on Zoom. Nicole has previously held social impact and leadership roles at GitHub and the Kapor Center for Social Impact.

Nicole and I walked an engaged audience through proven practices and what they’ve learned about leading, communicating with, and measuring the success of remote teams. Participants offered insightful questions, leading to lively discussions around:

  • Collaboration and relationship-building.
  • The cost, benefit, and ideal frequency of bringing teams together for face-to-face interaction (in general, if not as commonly right now).
  • Communicating and prioritizing messages across a variety of channels. 
  • Encouraging people to go outside, exercise, spend time with family, or otherwise step away from the computer (also known as being “AFK,” or “Away From Keyboard”) without the fear of being judged or anxiety over being less productive.

Some companies are encouraging employees to experiment with working from home, which can feel very different from in-person and office work. If you’re interested in learning more, please check out the full video recording of the course:


Matt’s latest blog post, “Coronavirus and the Remote Work Experiment No One Asked For,” is also worth a read. For more information and advice on COVID-19, please visit resources from the CDC, World Health Organization, and other health authorities.

Postponing Ireland Conference – & Maybe Myself?

As I tweeted a couple of days ago, I never thought I would write something announcing cancelling an event due to a worldwide pandemic. And yet that was what I was asked to do earlier this week (first week of March 2020). It will be interesting to look back at this in the future and judge if it was a wise decision or an over-reaction. At present, I am 100% for Wise Decision.

https://mwidlake.files.wordpress.com/2020/03/screenhunter_-309.jpg?w=150... 150w, https://mwidlake.files.wordpress.com/2020/03/screenhunter_-309.jpg?w=300... 300w, https://mwidlake.files.wordpress.com/2020/03/screenhunter_-309.jpg 611w" sizes="(max-width: 460px) 100vw, 460px" />

This week UKOUG decided that, in light of the impacts & concerns around the COVID-19 coronavirus, to postpone this year’s annual Irish conference we hold in Dublin. I thought it would be interesting to some of you to know a little of how we came to that decision.

Firstly, this was a joint decision made by the event committee, the UKOUG board, and the UKOUG senior management. Discussions around the topic of COVID-19 and  had taken place between some of us over the prior 24 hours and the event committee had decided that, in their opinion, there was a strong case to not hold the event at this time. They discussed this with the UKOUG senior management and our CEO decided this deserved an emergency board call. This board call would not just consider the event in Ireland but also our future events, our staff, and our members. (In this post I’m only talking about the Irish event, but enough to say that we are taking steps to protect our staff and consider future events and how they might impact our members & the public, plus how we may replace physical events with remote ones).

Secondly, as you can see above, this was a considered decision and not a knee-jerk reaction.  We had people who live in Ireland involved, we considered feedback we had received from partners/sponsors and also delegates. We talked with the venue. We looked at factual information about COVID-19, it’s communicability & mortality rate (how easily you can be infected and how likely you are to die respectively). In the end the decision was easy as we were all in agreement, we needed to postpone the event.

Thirdly, there were several factors behind the decision to postpone OUG Ireland.

Public Concern

We had several presenters pull out from the event. For most their employing company had banned non-essential (or even all) travel, and some had decided that they personally did not want to risk exposure. A couple of sponsors were in the same situation of being told they could not attend. Further, we had concerned delegates contacting us asking if the event was still on or what steps we were taking. Some cancelled coming, again a mixture of diktat from employer or a personal decision not to attend.

Interestingly, we were getting as many new delegates signing up for the event as dropping out, so obviously some people felt COVID-19 was not an issue.

We knew we had enough speakers in reserve that we could call on to fill agenda holes but we also could see that more and more events were being cancelled across Europe and more companies were announcing travel limitations, so the cancellations were likely to escalate on the run-up to the event. What happens months ahead, no one knows, but for now the public concern is very, very high.

I considered titling this section as FUD – Fear, Uncertainty, & Doubt. But FUD is usually a derogatory term indicating a baseless over-reaction. I think there is a lot of FUD going on in the general public, but people in IT tend to be smarter than average and more balanced. I think it is very reasonable to be concerned to some degree and, as you will see at the end of this piece, the concern will vary depending on your personal circumstances. For some people (e.g those with Asthma or similar decreased lung capacity) there is a significant increased personal risk from this specific illness, it is not always a case of a simple “I’m worried about a pandemic”.

Financial Considerations

With the best will in the world, user groups needs money to put on events. There is a commercial aspect to this. Putting on an event that fails and loses money is a danger. We at UKOUG do insure our major events against Force Majeure, basically events beyond our control, but we are like all user groups in that we walk a tightrope of finance.

Cancelling an event does not always save any money as it has already been paid out. But if a sponsor gets a poor experience in return for their sponsorship £/€/$ they are not happy (and neither are we as the organisers). If delegates come and the event feels like an empty room or the agenda is not what they want, they may not come again. As you can see, it is complex

I have to say that for Ireland we benefit from an excellent relationship with our venue, we have held the event at the Gresham in Dublin for several years and our committee & office know them well. They reduced the potential financial impact on us by offering us flexibility in re-arranging this event.

I make this point as some user groups (and of course, other companies) putting on public events in the near future may find that they have no such flexibility. For them cancelling a conference could actually kill the user group financially or result in individuals losing a lot of money. Did you know that sometimes it is individuals or a very small company that is bank-rolling your usergroup events?

For some user groups the financial consideration will be far more acute than it is for UKOUG.

Public Health

This is not the same as public concern. Public concern is about the actions people take in response to a danger or threat. Public Health is about the actual, real threat.

At present you (yes, you reading this) are almost certainly in more danger of being murdered, killed in a road accident, or dying of normal ‘flu than of dying from COVID-19. And have been all year. And yet none of you stopped living your normal life because of those threats. Most people who will think they have COVID-19 over the next month will actually have either a standard cold or normal influenza. And in fact 90% or so of those who catch COVID-19 will not be that ill. Medical testing is the only sure way of knowing which disease you have had.

But COVID-19 spreads relatively easily via fluid contact – droplets in the air through coughs & sneezes but, more commonly, similar dampness on hard surfaces by people touching their mucous membranes (think eyes, nose, and mouth) and then door handles, surfaces, smart devices. You then touch these surfaces and then your face and you have transmitted the disease to yourself.  Prevention methods are all about constant washing of hands and avoiding touching things. Face masks do diddly squit except if you are in the situation where people might cough in your direction (so medical staff) or to help prevent you coughing the virus out and infecting others. I find it somewhat ironic that in some places so many people have rushed to wear face masks to protect themselves from others but actually it will be doing more to protect others from them.

COVID-19 also has a higher degree or mortality than ‘flu. It stands at about 3.4% at present, compared to 0.1% for standard influenza. I’ve seen arguments that “the real rate is lower as it kills mostly old people or those with underlying conditions”. Well, of COURSE it kills those groups more, that is true for all other diseases. Influenza mostly kills the old, the very young, and the at-risk. That 0.1% is measuring a similar spread of deadliness as the 3.4%. If you get COVID-19 you are something like 30-40 times more likely to die of it than if you get typical influenza. An oddity of COVID-19 is that it does not seem to affect babies and toddlers as much as influenza does. So this new disease is overall more dangerous to adults, especially older adults, than flu than the basic figures indicate…

The mortality rate has increased from around 2% to 3.4% over the last month. Why? Mostly as people are now aware of COVID-19 and deaths will be correctly attributed to it rather than wrongly to other, similar things (like ‘flu). It’s almost certainly not getting more potent. In fact, we might expect the mortality rate to drop as people with a mild version of the disease were probably not being recorded or were being wrongly diagnosed, so the total number of cases would be a lot higher. I expect this figure to drop below 2% for countries with a good health service and no unusually high elderly population.

So what are the chances of holding a user group event and someone infected with the disease coming to the event? Very, very low. The number of known cases outside China are, as a percentage of the population, sod all. But if someone infectious does come to the event? Catching COVID-19 (and in fact a lot of people catching it and it becoming a new source or widespread infection) is quite high.

For those of use who look at project risks it is a very low likelihood/very high impact risk. Something like a hard disk overheating and setting fire to the server. I’ve had that, by the way.

So far the steps taken to keep this disease from spreading are proving effective at slowing it down. But it is spreading. I personally think it is going to get worse before it gets better. Maybe a lot worse, and I am pretty cynical about most “we are doomed” news stories.

Large Oracle user group events are more of a risk than say a big party. Why? A lot of speakers and exhibitors come from geographically distant places, so you are bringing people together from a large area. These people travel a lot and meet a lot of people. It increases the risk. At a party everyone is probably local and if there is no one local with the disease, you are safe. Safer.

This is partly why I was very much in favour of postponing the Irish conference, it had an enhanced risk associated with it as we had an international contingent coming.

What makes me feel qualified to think this? I am not a medic and I am certainly not an epidemiologist (someone who studies disease spread), but I have the advantage of a degree in genetics & zoology and many years of working with the National Health Service and biological academia (some of it on disease and immunology). I am not an expert, but by accident of my history I am better informed than most.

These factors made Ireland too much of a risk, even if the likelihood of something bad happening was actually very low.

Smaller events are less risky and, at present, will go ahead. But all will be reviewed.

 

People want the event

The final factor is that people want the event. Either they do not think the risk is real or they feel that they will be OK anyway as they are young(ish) and healthy or “fate” or whatever. So they will come to the event anyway and cancelling it is “giving in”. Lots of large sporting events are now being cancelled (such as come 5 nations rugby matches) and I am sure a lot of fans are not happy about this. But these are exactly the mass gatherings of disparate people that will really help to spread COVID-19 and create a true epidemic.

In some ways, cancelling a large event could be seen as protecting the ignorant </p />
</p></div>

    	  	<div class=

Docker Birmingham March 2020

https://oracle-base.com/blog/wp-content/uploads/2019/09/docker-birmingha... 300w, https://oracle-base.com/blog/wp-content/uploads/2019/09/docker-birmingha... 768w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" />

Last night was Docker Birmingham March 2020. It clashed with the Midlands Microsoft 365 and Azure User Group for the second time, so it was Docker Birmingham’s turn this time. </p />
</p></div>

    	  	<div class=

Birmingham Digital & DevOps Meetup – March 2020

On Tuesday evening I was at the Birmingham Digital & DevOps Meetup – March 2020 event, which had four speakers this time.

https://oracle-base.com/blog/wp-content/uploads/2020/03/dig-devops-march... 300w" sizes="(max-width: 276px) 85vw, 276px" />

First up was Mike Bookham from Rancher Labs with “Rancher Labs – Kubernetes”. The session demonstrated how to set up a Kubernetes cluster using RKE (Ranchers Kubernetes Engine). The tool looked pretty straight forward, and Rancher got referenced a few times during this event and the one the next day, so there seems to be some love for them as a company out there.

https://oracle-base.com/blog/wp-content/uploads/2020/03/dig-devops-march... 300w" sizes="(max-width: 292px) 85vw, 292px" />

Next up was Dave Whyte from Auto Trader with “Under the bonnet at Auto Trader”. He did a talk about how Auto Trader use Google Kubernetes Engine (GKE) and Istio for a bunch of their microservices. They do the hundreds of production deployments a day that you come to expect from microservice folks, but the main “Wow!” moment for me was the diagnostics and observability they’ve got. It was amazing. I was just sitting there thinking, there is no way on earth we could do this! Very… Wow! Many of the points are cover in this video.

https://oracle-base.com/blog/wp-content/uploads/2020/03/dig-devops-march... 300w" sizes="(max-width: 317px) 85vw, 317px" />

After the break it was Patricia McMahon from Generation with “AWS re/Start – Resourcing Cloud Teams”. The session was about the work they are doing re-skilling long term unemployed young people as AWS Cloud Engineers, and of course getting them into jobs. I love this sort of stuff. My background was a bit different, but I entered the technology industry via a retraining course after my PhD in cabbage sex. The course I did was available for all age groups, not just young people, but it was a similar thing. I hope they continue to do great work. If you are looking for fresh, enthusiastic and diverse talent, I’m sure Patricia and Generation would love to hear from you!

https://oracle-base.com/blog/wp-content/uploads/2020/03/dig-devops-march... 300w" sizes="(max-width: 293px) 85vw, 293px" />

Last up was Toby Pettit from CapGemini with “Multilingual, Multi-Cloud Apps – A Reality Check”. His abstract said, “All I wanted to do is run any language on any cloud with state and with no servers to maintain. Of course it also needs to be highly available, observable, maintainable, recoverable and all of the other “ables”. How hard can it be?” Well it turns out the answer is bloody hard! I don’t even know where to begin with this. It was Alpha this product and Beta that product. Of course Kubernetes and Istio were in there along with OpenFaaS and loads of other stuff. He showed a demo of a workload being split between AWS, Azure and Google Public Cloud, so it “worked”, but by his own admissions this was a POC, not something you could go to production with. Interesting, but crazy mind blowing. </p />
</p></div>

    	  	<div class=

Nulls are not stored in indexes … most of the time

This question got posed on Twitter today

image

I answered briefly on Twitter, but I thought I should give it some “meat” with a blog post.

Firstly, lets explore the commonly understood mechanism where in a conventional index, nulls are not stored and hence you can have multiple index entries (so to speak) where the value is null.

Here is my simple table T with a standard (ascending) unique index, and we can see immediately that there is no limit to the number of null values, where by “null” I mean that all indexed columns are null, that you can have in the table.


SQL> create table t ( x varchar2(10) );

Table created.

SQL>
SQL> create unique index ix1 on t ( x );

Index created.

SQL>
SQL> begin
  2  for i in 1 .. 1000 loop
  3    insert into t values (null);
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

To see what has been stored in the index, we can get the OBJECT_ID for the index, perform a tree dump on that object and examine the trace file.


SQL> select object_id from user_objects where object_name = 'IX1';

 OBJECT_ID
----------
    194498

SQL> alter session set events 'immediate trace name treedump level 194498';

Session altered.

Trace file C:\ORACLE\diag\rdbms\db18\db18\trace\db18_ora_20504.trc
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production

...
...
...

----- begin tree dump
leaf: 0x301070b 50398987 (0: row:0.0 avs:8000)
----- end tree dump

The key piece of information here is that the number of rows in the index is zero. The null values had no effect on the index, which is why there is no limit on them.

Before looking at nulls in a descending index, I’ll first look a non-null value in a descending index. I’ll recreate the table T with a varchar2, so I can put an easy to find piece of text in a row (my name).


SQL> create table t ( x varchar2(10) );

Table created.

SQL>
SQL> create unique index ix1 on t ( x desc );

Index created.

SQL>
SQL> insert into t values ('Connor');

1 row created.

SQL> commit;

Commit complete.

Now I’ll repeat the same tree dump, but explore a little further. The tree dump also contains the location of where this leaf block is stored. We call this the “block address”. There are supplied routines in DBMS_UTILITY to convert this into the file and block# where this leaf block is stored.


SQL> select object_id from user_objects where object_name = 'IX1';

 OBJECT_ID
----------
    194511

SQL> alter session set events 'immediate trace name treedump level 194511';

Session altered.

----- begin tree dump
leaf: 0x301070b 50398987 (0: row:1.1 avs:7982)
----- end tree dump



SQL> set serverout on
SQL> begin
  2    dbms_output.put_line(dbms_utility.data_block_address_file(50398987));
  3    dbms_output.put_line(dbms_utility.data_block_address_block(50398987));
  4  end;
  5  /
12
67339

PL/SQL procedure successfully completed.

Now I can dump the block to a trace file, and explore the contents.


SQL>
SQL>  alter system dump datafile 12 block 67339;

System altered.



Leaf block dump
===============
header address 132096100=0x7dfa064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 1
kdxcofbo 38=0x26
kdxcofeo 8020=0x1f54
kdxcoavs 7982
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 6
kdxlebksz 8036
row#0[8020] flag: -------, lock: 2, len=16, data:(6):  03 00 09 d7 00 00
col 0; len 7; (7):  bc 90 91 91 90 8d ff
----- end of leaf block Logical dump -----

The byte representation of the string “Connor” is “bc 90 91 91 90 8d” which we can see in the dump toward the bottom. Notice however, that there is also a trailing “ff” in the key. This is added to allow the index entries to be stored in descending order, as per the index specification (otherwise variable length data would not be sorted correctly, eg, “Conn” must come after “Connor”).

Now I can return to original question – I will try to store 2 rows, both being null, in a descending index.


SQL> create table t ( x varchar2(10) );

Table created.

SQL>
SQL> create unique index ix1 on t ( x desc );

Index created.

SQL>
SQL> insert into t values (null);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into t values (null);
insert into t values (null)
*
ERROR at line 1:
ORA-00001: unique constraint (MCDONAC.IX1) violated


SQL> commit;

Commit complete.

The second row is rejected. I’ll do the same tree dump


SQL> select object_id from user_objects where object_name = 'IX1';

 OBJECT_ID
----------
    194542

SQL> alter session set events 'immediate trace name treedump level 194542';

Session altered.

----- begin tree dump
leaf: 0x3000b33 50334515 (0: row:1.1 avs:7988)
----- end tree dump

Notice that unlike the first example, the index contains a row now. The null was not ignored from the index. To see why, we obtain the index leaf block data as before.


SQL> set serverout on
SQL> begin
  2    dbms_output.put_line(dbms_utility.data_block_address_file(50334515 ));
  3    dbms_output.put_line(dbms_utility.data_block_address_block(50334515 ));
  4  end;
  5  /
12
2867

PL/SQL procedure successfully completed.

SQL>
SQL> alter system dump datafile 12 block 2867;

System altered.

Leaf block dump
===============
header address 418168932=0x18ecc064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 1
kdxcofbo 38=0x26
kdxcofeo 8026=0x1f5a
kdxcoavs 7988
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 6
kdxlebksz 8036
row#0[8026] flag: -------, lock: 0, len=10, data:(6):  03 00 09 d7 00 00
col 0; len 1; (1):  00
----- end of leaf block Logical dump -----

and as per the example with “Connor”, a trailing value was added to the key to ensure that nulls are “first” (as opposed to being last by default in a SELECT statement).  Hence the null value has resulted in a non-null value in the index. And for a unique index, that non-null cannot be duplicated.

Oracle Shared Pool Internals: List Chunk Position in the LRU List

There was a discussion at Oracle-L about shared pool chunk eviction and I thought I’d show how to see a shared pool memory chunk’s position in the LRU list yourself. This is a post mostly about Oracle internals and less about practical every day troubleshooting.
There’s a view X$KGHLU (@kghlu.sql script) that shows you a summary of shared pool (sub)heap allocations and how many of these chunks are in the “transient list” (used once) vs.

Oracle Shared Pool Internals: List Chunk Position in the LRU List

There was a discussion at Oracle-L about shared pool chunk eviction and I thought I’d show how to see a shared pool memory chunk’s position in the LRU list yourself. This is a post mostly about Oracle internals and less about practical every day troubleshooting.
There’s a view X$KGHLU (@kghlu.sql script) that shows you a summary of shared pool (sub)heap allocations and how many of these chunks are in the “transient list” (used once) vs.