Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Supplemental Defect

Here’s an anomaly that appeared in a question on the ODC recently about tweaking the output of dbms_metadata.get_ddl(), As the title suggests, though, this posting isn’t about the dbms_metadata package it’s about supplemental logging and something that should not to be possible and may have unfortunate side effects.

We start with a little script that creates a table, inserts some data, adds a couple of constraints, and then introduces some supplemental logging requirements. As the script header indicates I’ve only tested this on 19.3.0.0:

rem
rem     Script:         supplemental_defect.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2021
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1 (
        n1      number,
        n2      number,
        n3      number,
        n4      number
);

insert into t1 (n1, n2, n3, n4) values (1,2,3,4);
commit;

alter table t1 add constraint t1_pk primary key(n1, n2)
/

alter table t1 add constraint t1_uk unique(n3) using index (
        create index t1_uk on t1(n3, n4)
)
/

alter table t1 add supplemental log data (primary key, unique) columns
/

alter table t1 add supplemental log group t1_g1 (n1, n2) always
/

alter table t1 add supplemental log group t1_g2 (n1, n2) always
/

There’s nothing particularly special or complex about this code, and every statement runs successfully. You might notice that I’ve created two identical supplemental log groups, of course, and that’s clearly a mistake and I’d say that Oracle should have raised an error when I tried to create group t1_g2.

In fact my script is slightly worse than you might think at first sight because (n1, n2) is the primary key of the table, and I’ve added supplemental logging on the primary key already, so even supplemental log group t1_g1 really ought to raise an error as well.

Side note: you’ll notice that I’ve got a single-column unique constraint protected by non-unique multi-column index, and part of my supplemental log data specifies unique columns. If you check the syntax this option can be expressed as “unique” or “unique indexes” – which leaves me wondering whether Oracle my code would treat this as a command relating to n3 and n4, or just to n3.

So far all I’ve done is create a table in a way that’s left me with a couple of questions – let’s see what we get when we use dbms_metadata.get_ddl() to generate code to recreate the table:

begin
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'PRETTY',true);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SQLTERMINATOR',true);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES',false);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'STORAGE', false);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'TABLESPACE',false);
        dbms_metadata.set_transform_param(dbms_metadata.session_transform,'CONSTRAINTS',true);
end;
/

set long 20000
set heading off

select dbms_metadata.get_ddl('TABLE','T1') from dual;

================================================================

  CREATE TABLE "TEST_USER"."T1"
   (	"N1" NUMBER,
	"N2" NUMBER,
	"N3" NUMBER,
	"N4" NUMBER,
	 CONSTRAINT "T1_PK" PRIMARY KEY ("N1", "N2")
  USING INDEX  ENABLE,
	 CONSTRAINT "T1_UK" UNIQUE ("N3")
  USING INDEX  ENABLE,
	 SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS,
	 SUPPLEMENTAL LOG DATA (UNIQUE INDEX) COLUMNS,
	 SUPPLEMENTAL LOG GROUP "T1_G1" ("N1", "N2") ALWAYS,
	 SUPPLEMENTAL LOG GROUP "T1_G2" ("N1", "N2") ALWAYS
   ) ;

The “create table” statement includes 4 supplemental log clauses – notice how my original “two-part” clause for the primary key and unique constraints has been split into two – with the key word index being added to the latter.

You might note that if you try to execute this statement to recreate the table you’re going to run into an immediate problem – Oracle will create a unique singe-column index to protect the unique constraint – it doesn’t know that the constraint should be protected by a two-column non-unique index.

But there’s another problem that will protect you from the indexing issue. Here’s what I get when I try to execute the statement:

	 SUPPLEMENTAL LOG GROUP "T1_G1" ("N1", "N2") ALWAYS,
	                                *
ERROR at line 12:
ORA-02261: such unique or primary key already exists in the table

The funniest thing about this error comes when you look at the error message file ($ORACLE_HOME/rdbms/mesg/oraus.msg in my case), where we can find the “Cause” and “Action” that Oracle supplies for the error:

02261, 00000, "such unique or primary key already exists in the table"
// *Cause: Self-evident.
// *Action: Remove the extra key.

It’s not too difficult to guess why the error has appeared but “Self-evident” seems a little optimistic. In fact your first guess about the error may be wrong. Obviously I could simply delete the lines that create the t1_g1 and t1_g2 logs since they appear to be redundant copies of the (primary key) supplemental log – but what if I just delete the line that creates the (primary key) supplemental log? Oracle still raises the ORA-02261 error. It’s only when I delete the declaration of the primary key (and I can leave the declaration of the (primary key) supplemental log in place) that the error stops appearing and I recreate the table … even if I’ve left the decarations of the two supplemental log groups on (n1, n2) in place.

On the plus side – although the export/import utilities expdp and impdp make use of the dbms_metadata package they don’t run into the same problem. After I’d exported, dropped and imported the t1 table with the sqlfile=xxx.sql option this is how impdp reported the code it had run to recreate the table (edited to remove the space management bits):

CREATE TABLE "TEST_USER"."T1" 
   (    "N1" NUMBER, 
        "N2" NUMBER, 
        "N3" NUMBER, 
        "N4" NUMBER
   ) 
;

ALTER TABLE "TEST_USER"."T1" ADD CONSTRAINT "T1_PK" PRIMARY KEY ("N1", "N2")
;
ALTER TABLE "TEST_USER"."T1" ADD CONSTRAINT "T1_UK" UNIQUE ("N3")
;
ALTER TABLE "TEST_USER"."T1" ADD SUPPLEMENTAL LOG DATA (UNIQUE INDEX) COLUMNS;
ALTER TABLE "TEST_USER"."T1" ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
ALTER TABLE "TEST_USER"."T1" ADD SUPPLEMENTAL LOG GROUP "T1_G1" ("N1", "N2") ALWAYS;
ALTER TABLE "TEST_USER"."T1" ADD SUPPLEMENTAL LOG GROUP "T1_G2" ("N1", "N2") ALWAYS;

The underlying code for expdp/impdp generates constraints separately from the table and then adds the constraints to the tables – and it views supplemental logs as a variant on constraints (you can stop the supplemental log clauses appearing by using the dbms_metadata.transform_param() procedure to disable the generation of constraints) and adds them later.

You might notice in passing that the index created by impdp to support the unique constraint is NOT the index originally specified. This is a very old problem – though the nature of the problem has varied with time – so watch out if you move data from a production system to a test system using export/import.

Summary

The internal code to handle supplemental logging allows you to create redundant supplemental log groups that will cause errors if you use dbms_metadata.get_ddl() in the most straightforward way to generate code to recreate the table.

The error message that appears if you haven’t allowed for the anomaly is a little counter-intuitive.

Video : JSON Data Type in Oracle Database 21c

In today’s video we demonstrate the JSON data type, introduced in Oracle database 21c.

The video is based on this article.

It assumes knowledge of existing JSON support in the Oracle database. If you are not familiar with the functionality Oracle have been including in the database since Oracle 12c, you might want to check out these.

The star of today’s video is my sister-in-law Maria Colgan. As if being related to me by marriage is not enough of a claim to fame, she is also the current reigning queen of the Oracle database. Long live the Queen!

Cheers

Tim…

The post Video : JSON Data Type in Oracle Database 21c first appeared on The ORACLE-BASE Blog.


Video : JSON Data Type in Oracle Database 21c was first posted on January 18, 2021 at 9:30 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Oracle Database 21c : It’s New to Everyone

https://oracle-base.com/blog/wp-content/uploads/2021/01/21c-218x300.png 218w" sizes="(max-width: 187px) 85vw, 187px" />

A few days ago Oracle made the official announcement about the release of Oracle database 21c (here). This was a rather late announcement, as the product has been available on Oracle Cloud for over a month, but that’s not the subject of this post.

With each new release I feel the need to write a post like this, so here goes…

Newbies

This is a new product, so there will be a lot of people in the community writing content about it, including me. The one thing common to all those people, including me, is none of us have any real experience of this version of the database. Very few, if any, will use this version in anger for a real production system for quite some time, if ever (see innovation release). Does that mean you should ignore the content being produced at the moment? No. You just have to understand that everyone is a newbie at the moment. It will take some time for people to not only understand the basics of the functionality, but also get a feel for its relevance in the real world. Just keep that in mind.

But what about the people that were using the new features in the 20c preview release on the cloud for a year? Yep. They are all beginners too. The preview release of 20c never became generally available, so nobody got something I would consider “production experience” on that release. Even the folks at Oracle don’t have significant production experience of 21c yet. It’s only been a month. Many of the 21c new features were initially showcased in the 20c preview, but things have changed since then, so it’s likely articles you are reading that were based on 20c are out of date and need revision.

I’m not saying all this to be negative. I guess part of it is a self-protection mechanism. People will be producing content now, and as they get more experience they will hopefully revise that content to reflect their current understanding. Don’t expect to find any 21c gurus for a long time… </p />
</p></div>

    	  	<div class=

Getting started with VR Table Tennis

Difficult to play table tennis in real life (IRL) these days. I found this simulation:

https://uhesse.files.wordpress.com/2021/01/image.png?w=2048 2048w, https://uhesse.files.wordpress.com/2021/01/image.png?w=150 150w, https://uhesse.files.wordpress.com/2021/01/image.png?w=300 300w, https://uhesse.files.wordpress.com/2021/01/image.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

Eleven TT is really close to IRL table tennis – I’m a club player myself and I was surprised how close this comes to reality! Check this YouTube channel to get an idea about the quality of the simulation. The guy who uploads these videos might be a bit too competitive sometimes, but he’s clearly a good player and challenges the best available opponents regularly.

The game itself costs 20 Euro, which is a real bargain for the great quality you get. Many will have no VR headset, though, and that’s a bit more expensive. I recommend the Oculus Quest 2 that works autonomously without a workstation and without wires. That’s about 400 Euro.

As IRL player, you will want to transfer as much of your skills as possible, which is why a paddle adapter is helpful. I recommend the Sanlaki F8 adapter.

https://uhesse.files.wordpress.com/2021/01/image-1.png?w=724 724w, https://uhesse.files.wordpress.com/2021/01/image-1.png?w=135 135w, https://uhesse.files.wordpress.com/2021/01/image-1.png?w=269 269w" sizes="(max-width: 362px) 100vw, 362px" />

They sell them, but also provide the 3d printer files for free so you can print it yourselves or let someone else print it for you. I ordered mine for 15 Euro (shipping included) via Treatstock (Black PLA, 20% fill) and find it quite useful.

So overall, you have to spend about 500 Euro to get started, but it’s totally worth it in my opinion:

A TT table + ballmachine would cost you more (if you have the space to place it at home). That is included in the game, as well as an AI opponent with configurable levels. That alone is worth it already. Given the choice, I’d favor the simulation over the IRL ballmachine – and I had one before, so I can compare.

In multiplayer mode, you can play against other human opponents across the world. Whenever you want to play table tennis, you’ll find an opponent without having to leave the comfort of your home. And you will never have to pick up a ball from the floor again </p />
</p></div>

    	  	<div class=

PL/SQL in 21c gets amazing new loop iterators

“It’s a dead language” people said.

“There’s never anything new” people said.

Well, let me serve those people a slice of humble pie Smile as we take a look at huge suite of enhancements to iteration handling in PL/SQL in 21c. In particular, there is one thing that I really find nice about these new changes. Most of them are improvements to functionality that you could already do, but had to do it in a cumbersome way. Thus these changes are not about dramatic new features, but (in my opinion) something equally if not more important, namely, allowing developers to achieve that functionality with less code, cleaner code and more maintainable code.

That’s a big win in my book. Here’s a look at some “before” (19c and below) and “after” (21c) code in PL/SQL to illustrate the new language facilities in PL/SQL.

Multiple loop ranges

How we used to it …

We needed individual loops


SQL> set serverout on
SQL> begin
  2    for i in 1 ..10 loop
  3      dbms_output.put_line(i);
  4    end loop;
  5    for i in 100..110 loop
  6      dbms_output.put_line(i);
  7    end loop;
  8    for i in 200..210 loop
  9      dbms_output.put_line(i);
 10    end loop;
 11  end;
 12  /
1
2
3
4
5
6
7
8
9
10
100
101
102
103
104
105
106
107
108
109
110
200
201
202
203
204
205
206
207
208
209
210

PL/SQL procedure successfully completed.

How we can do it in 21c…


SQL> begin
  2    for i in 1 ..10, 100..110, 200..210 loop
  3      dbms_output.put_line(i);
  4    end loop;
  5  end;
  6  /
1
2
3
4
5
6
7
8
9
10
100
101
102
103
104
105
106
107
108
109
110
200
201
202
203
204
205
206
207
208
209
210

PL/SQL procedure successfully completed.

Arbitrary loop increment

How we used to it …

What if we want to cycle through every 3rd iteration? We had to go back to high school modulo shenanigans.


SQL> begin
  2    for i in 1 .. 20 loop
  3       if mod(i,3)=1 then
  4         dbms_output.put_line(i);
  5       end if;
  6    end loop;
  7  end;
  8  /
1
4
7
10
13
16
19

PL/SQL procedure successfully completed.

How we can do it in 21c…


SQL> begin
  2    for i in 1 .. 20 by 3 loop
  3       dbms_output.put_line(i);
  4    end loop;
  5  end;
  6  /
1
4
7
10
13
16
19

PL/SQL procedure successfully completed.

And yes, it does not have to be an integer! Just make sure you tell the PL/SQL compiler that your loop variable is no longer an integer, otherwise (like standard Oracle) we’ll round it to the nearest whole number. So whilst this looks odd:


SQL> begin
  2    for i in 1 .. 10 by 0.5 loop
  3      dbms_output.put_line(i);
  4    end loop;
  5  end;
  6  /
1
2
3
4  --
5  -- no fractional parts?
6  --
7
8
9
10

PL/SQL procedure successfully completed.

but all you need to do is nominate the datatype of your iteration variable. That’s right, now you can choose the datatype of the iterator!


SQL> begin
  2    for i number(3,1) in 1 .. 10 by 0.5 loop
  3      dbms_output.put_line(i);
  4    end loop;
  5  end;
  6  /
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
10

PL/SQL procedure successfully completed.

Assign collection data

How we used to it …

You had to assign each individual element if you were not fortunate enough to be sourcing the data from (say) a BULK COLLECT statement or an identical existing collection.


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4  begin
  5    for i in 1 .. 10 loop
  6      s1(i) := i*10;
  7    end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Or we could use the more compact but still quite verbose assignment syntax


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4  begin
  5    s1 := num_list(1=>10,2=>20,3=>30,4=>40,5=>50,
  6                   6=>60,7=>70,8=>80,9=>90,10=>100);
  7  end;
  8  /

PL/SQL procedure successfully completed.

How we can do it in 21c…

Now you can just assign them in the same way you would initialise a VARRAY or NESTED TABLE.


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4  begin
  5    s1 := num_list(10,20,30,40,50,60,70,80,90,100);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Copy collection subset

How we used to it …

If we wanted to grab a subset of a collection, then typically we loop through the elements and select what we need. For example, to grab every 2nd element, our code would look like this:


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4    s2   num_list;
  5  begin
  6    for i in 1 .. 10 loop
  7      s1(i) := i*10;
  8    end loop;
  9
 10    for i in 2 .. 10 by 2 loop
 11      s2(i) := s1(i);
 12    end loop;
 13    for i in 1 .. 10 loop
 14      if s2.exists(i) then
 15        dbms_output.put_line(i||'='||s2(i));
 16      else
 17        dbms_output.put_line(i||' not exists');
 18      end if;
 19    end loop;
 20  end;
 21  /
1 not exists
2=20
3 not exists
4=40
5 not exists
6=60
7 not exists
8=80
9 not exists
10=100

PL/SQL procedure successfully completed.

How we can do it in 21c…


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4    s2   num_list;
  5  begin
  6    s1 := num_list(1,2,3,4,5,6,7,8,9,10);
  7    s2 := num_list(for i in 2 .. 10 by 2 => s1(i));
  8
  9    for i in 1 .. 10 loop
 10      if s2.exists(i) then
 11        dbms_output.put_line(i||'='||s2(i));
 12      else
 13        dbms_output.put_line(i||' not exists');
 14      end if;
 15    end loop;
 16  end;
 17  /
1 not exists
2=2
3 not exists
4=4
5 not exists
6=6
7 not exists
8=8
9 not exists
10=10

PL/SQL procedure successfully completed.

Did you notice? That is a loop iteration right there in the assignment statement for a collection!

Iterate through a sparse collection

How we used to it …

If you wanted to make sure you avoided a “no data found” error, when a collection did not have contiguous indexes we have to ensure the index entry exists before we can use it


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4    s2   num_list;
  5    idx  int;
  6  begin
  7     
  8    s2 := num_list(2=>20,4=>40,6=>60,8=>80,10=>100);
  9
 10    idx := s2.first;
 11    loop
 12      if idx is not null then
 13        dbms_output.put_line(idx||'='||s2(idx));
 14      else
 15        exit;
 16      end if;
 17      idx := s2.next(idx);
 18    end loop;
 19  end;
 20  /
2=20
4=40
6=60
8=80
10=100

PL/SQL procedure successfully completed.

How we can do it in 21c…


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4    s2   num_list;
  5  begin
  6    s1 := num_list(10,20,30,40,50,60,70,80,90,1000);
  7    s2 := num_list(for i in 2 .. 10 by 2 => s1(i));
  8
  9    for idx in indices of s2 loop
 10        dbms_output.put_line(idx||'='||s2(idx));
 11    end loop;
 12  end;
 13  /
2=20
4=40
6=60
8=80
10=1000

PL/SQL procedure successfully completed.

But there’s more here as well. The INDICES clause gets us the index entries. But I don’t have to use that if I want the values of those entries because I can those directly now as well.


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4    s2   num_list;
  5  begin
  6    s1 := num_list(10,20,30,40,50,60,70,80,90,1000);
  7    s2 := num_list(for i in 2 .. 10 by 2 => s1(i));
  8
  9    for idx in values of s2 loop
 10        dbms_output.put_line(idx);
 11    end loop;
 12  end;
 13  /
20
40
60
80
1000

PL/SQL procedure successfully completed.

And I can even have both the indices and the values using the PAIRS clause


SQL> declare
  2    type num_list is table of int index by pls_integer;
  3    s1   num_list;
  4    s2   num_list;
  5  begin
  6    s1 := num_list(10,20,30,40,50,60,70,80,90,1000);
  7    s2 := num_list(for i in 2 .. 10 by 2 => s1(i));
  8
  9    for x,y in pairs of s2 loop
 10        dbms_output.put_line(x||','||y);
 11    end loop;
 12  end;
 13  /
2,20
4,40
6,60
8,80
10,1000

PL/SQL procedure successfully completed.

Manipulate the iteration variable – part 1

Like the MOD example earlier, if the iteration variable was not an ascending integer, we would need to perform additional work to manipulate it to satisfy our need. For example, what if I needed to loop through the powers of 2 in order to examine some bit masks?

How we used to it …

I need another expression to get the job done


SQL> declare
  2    bits int := 739;
  3  begin
  4    for i in 0 .. 10 loop
  5      if bitand(bits, power(2,i)) > 0 then
  6        dbms_output.put_line(power(2,i));
  7      end if;
  8    end loop;
  9  end;
 10  /
1
2
32
64
128
512

PL/SQL procedure successfully completed.

How we can do it in 21c…

Now we have full control over our iterator


SQL> declare
  2    bits int := 739;
  3  begin
  4    for power2 in 1, repeat power2*2 while power2 <= 1024 loop
  5      if bitand(bits, power2) > 0 then
  6        dbms_output.put_line(power2);
  7      end if;
  8    end loop;
  9  end;
 10  /
1
2
32
64
128
512

PL/SQL procedure successfully completed.

Manipulate the iteration variable – part 2

And we are not limited to simple REPEAT commands etc whilst iterating. We have total control over the iteration variable, including being able to change it on the fly and utilise its value in the loop definition itself.

How we used to it …


--
-- I don't know, but its a lot of code :-)
--

How we can do it in 21c…


SQL> begin
  2   for i in 1 .. 10,
  3            i+1 while i<5,
  4            6..15 by trunc(i/4),
  5            i .. i+10 when mod(i,3) = 0
  6   loop
  7     dbms_output.put_line(i);
  8   end loop;
  9  end;
 10  /
1
2
3
4
5
6
7
8
9
10
6
8
10
12
14
15
18
21
24

PL/SQL procedure successfully completed.

Loop through a ref cursor

How we used to it …


SQL> variable rc refcursor
SQL> exec open :rc for select empno from emp;

PL/SQL procedure successfully completed.

SQL> declare
  2    type rec is record ( r1 number );
  3    type rec_list is table of rec index by pls_integer;
  4    r rec_list;
  5  begin
  6    loop
  7      fetch :rc bulk collect into r limit 10;
  8      for i in 1 .. r.count loop
  9        dbms_output.put_line(r(i).r1);
 10      end loop;
 11      exit when :rc%notfound;
 12    end loop;
 13  end;
 14  /
7369
7499
7521
7566
7654
7698
7782
7788
7839
7844
7876
7900
7902
7934

PL/SQL procedure successfully completed.

How we can do it in 21c…


SQL> variable rc refcursor
SQL> exec open :rc for select empno from emp;

PL/SQL procedure successfully completed.

SQL> begin
  2    for r number in values of :rc loop
  3      dbms_output.put_line(r);
  4    end loop;
  5  end;
  6  /
7369
7499
7521
7566
7654
7698
7782
7788
7839
7844
7876
7900
7902
7934

PL/SQL procedure successfully completed.

And even with more complex statements we don’t need to worry about the collection definitions etc


SQL> variable rc refcursor
SQL> exec open :rc for select empno, ename from emp;

PL/SQL procedure successfully completed.

SQL> declare
  2    type rec is record ( r1 number, r2 varchar2(30));
  3  begin
  4    for r rec in values of :rc loop
  5      dbms_output.put_line(r.r1||','||r.r2);
  6    end loop;
  7  end;
  8  /
7369,SMITH
7499,ALLEN
7521,WARD
7566,JONES
7654,MARTIN
7698,BLAKE
7782,CLARK
7788,SCOTT
7839,KING
7844,TURNER
7876,ADAMS
7900,JAMES
7902,FORD
7934,MILLER

PL/SQL procedure successfully completed.

Rest assured, this blog post is just scratching the surface of the improvements to iteration code in PL/SQL in 21c. Check out the docs for more examples and more keywords that let you do even more. Dynamic SQL within a FOR-LOOP? No problem!

So here’s my challenge to you. Take a fresh look at PL/SQL because it’s power to manipulate sets of data in a procedural language is unmatched.

Enjoy!

New Theme: Twenty Twenty One

Twenty Twenty One is the latest WordPress default theme, which is now available to all WordPress.com sites. Designed by Mel Choyce-Dwan, the muted tones and timeless design will let your work shine.

https://en-blog.files.wordpress.com/2021/01/tt1-rainbow.png?w=2048 2048w, https://en-blog.files.wordpress.com/2021/01/tt1-rainbow.png?w=150 150w, https://en-blog.files.wordpress.com/2021/01/tt1-rainbow.png?w=300 300w, https://en-blog.files.wordpress.com/2021/01/tt1-rainbow.png?w=768 768w" sizes="(max-width: 1024px) 100vw, 1024px" />

Twenty Twenty One takes advantage of all the latest features of the Block Editor — the new block patterns allow you to create a beautiful layout in seconds.

https://en-blog.files.wordpress.com/2021/01/tt1-screenshots.png?w=1814 1814w, https://en-blog.files.wordpress.com/2021/01/tt1-screenshots.png?w=133 133w, https://en-blog.files.wordpress.com/2021/01/tt1-screenshots.png?w=266 266w, https://en-blog.files.wordpress.com/2021/01/tt1-screenshots.png?w=768 768w" sizes="(max-width: 907px) 100vw, 907px" />

Learn more about TwentyTwentyOne, or check out the demo site!

Between

Reading Richard Foote’s latest blog note about automatic indexing and “non-equality” predicates I was struck by a whimsical thought about how the optimizer handles “between” predicates. (And at the same time I had to worry about the whimsical way that WordPress treats “greater than” and “less than” symbols.)

It’s probably common knowledge that if your SQL has lines like this:

columnA between {constant1} and {constant2}

the optimizer will transform them into lines like these:

    columnA >= {constant1}
and columnA <= {constant2}

The question that crossed my mind – and it was about one of those little details that you might never look at until someone points it out – was this: “does the optimizer get clever about which constant to use first?”

The answer is yes (in the versions I tested). Here’s a little demonstration:

rem
rem     Script:         between.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2021
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
select
        rownum  rn,
        ao.*
from
        all_objects ao
where
        rownum <= 50000
;

set autotrace traceonly explain

select  object_name
from    t1
where
        rn between 45 and 55
;


select  object_name
from    t1
where
        rn between 49945 and 49955
;


select  object_name
from    t1
where
        rn between 24945 and 24955
;

select  object_name
from    t1
where
        rn between 25045 and 25055
;

set autotrace off

All I’ve done is create a table with 50,000 rows and a column that is basically a unique sequence number between 1 and 50,000. Then I’ve checked the execution plans for a simple query for 11 rows based on the sequence value – but for different ranges of values.

Two of the ranges are close to the low and high values for the sequence; two of the ranges are close to, but either side of, the mid-point value (25,000) of the sequence. The big question is: “does the execution plan change with choice of range?”. The answer is Yes, and No.

No … because the only possible execution path is a full tablescan

Yes … because when you examine the plan properly you’ll notice a change in the Predicate Information. Here are the first two execution plans produced by the calls to dbms_xplan.display():

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    12 |   528 |   140   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    12 |   528 |   140   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"<=55 AND "RN">=45)

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    12 |   528 |   140   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    12 |   528 |   140   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN">=49945 AND "RN"<=49955)

Notice how the order of the filter predicates has changed as we move from one end of the range to the other. The optimizer has decided do the test that is more likely to fail first, and the test that is more likely to succeed second (which means there won’t be many rows where it has to run both tests which will make a small difference in the CPU usage).

Picking out just the filter predicate line from the output for this script (host grep filter between.lst) you can see the same pattern appear when the values supplied are very close to the mid-point (25,000).

SQL> host grep filter between.lst
   1 - filter("RN"<=55 AND "RN">=45)
   1 - filter("RN">=49945 AND "RN"<=49955)
   1 - filter("RN"<=24955 AND "RN">=24945)
   1 - filter("RN">=25045 AND "RN"<=25055)

My code has used literal values to demonstrate an effect. It’s worth checking whether we would still see the same effect if we were using bind variables (and bind variable peeking were enabled). So here’s a little more of the script:

set serveroutput off

variable b1 number
variable b2 number

exec :b1 := 45
exec :b2 := 55

select
        /* low_test */
        object_name
from    t1
where
        rn between :b1 and :b2
/

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate'));

exec :b1 := 49945
exec :b2 := 49955

select
        /* high_test */
        object_name
from    t1
where
        rn between :b1 and :b2
/

select * from table(dbms_xplan.display_cursor(format=>'basic +predicate'));
set serveroutput on

Since autotrace simply calls “explain plan” and doesn’t know anything about bind variables (treating them as unpeekable character strings) I’ve used code that executes the statements and pulls the plans from memory. Here are the results (with some of the script’s output deleted):

EXPLAINED SQL STATEMENT:
------------------------
select  /* low_test */  object_name from t1 where  rn between :b1 and :b2

Plan hash value: 3332582666

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  FILTER            |      |
|*  2 |   TABLE ACCESS FULL| T1   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:B2>=:B1)
   2 - filter(("RN"<=:B2 AND "RN">=:B1))


EXPLAINED SQL STATEMENT:
------------------------
select  /* high_test */  object_name from t1 where  rn between :b1 and :b2

Plan hash value: 3332582666

-----------------------------------
| Id  | Operation          | Name |
-----------------------------------
|   0 | SELECT STATEMENT   |      |
|*  1 |  FILTER            |      |
|*  2 |   TABLE ACCESS FULL| T1   |
-----------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:B2>=:B1)
   2 - filter(("RN">=:B1 AND "RN"<=:B2))

As you can see, when we query the low value the first comparison is made against :b2, when we query the high range the first comparison is made against :b1.

It is actually worth knowing that this can happen. How many times have you heard the question: “the plan’s the same, why is the performance different?”. Maybe the body of the plan looks the same and has the same plan_hash_value, but today the first person to execute the query supplied bind values that made the optimizer choose to apply the filters in the opposite order to usual. This probably won’t make much difference to CPU usage in most cases there are bound to be a few cases where it matters.

You’ll notice, by the way, that the plan with bind variables includes a FILTER operation that doesn’t appear in the plans with literal values. This is an example of “conditional SQL” – if you check the predicate information for operation 1 you’ll see that it’s checking that :b2 is greater than :b1, if this test doesn’t evaluate to true then operation 1 will not make a call to operation 2, i.e. the tablescan is in the plan but won’t happen at run-time.

(I believe that there may be some RDBMS which will treat (e.g.) “X between 20 and 10” as being identical to “X between 10 and 20” – Oracle doesn’t.)

Left as an exercise

The test data was created as a completely evenly spaced (by value) and evenly distributed (by count) set of values. How would things change if the data were sufficiently skewed that the optimizer would default to creating a histogram when gathering stats.

Left as another exercise**

There are lots of little bits of arithmetic that go into the CPU_COST component of an execution plan – including a tiny factor to allow for the number of columns that Oracle has to “step over” (by counting bytes) as it projects the columns needed by the query; so if you had a second “between” predicate on another column in the table, could you manage to get all 24 possible orders for the 4 transformed predicates by adjusting the ranges of the between clauses and/or moving the two columns to different positions in the row.

** For those in lockdown who need something to do to fill the time.

Oracle 19c Automatic Indexing: Non-Equality Predicates Part I (Lucy Can’t Dance)

  I’ve been waiting a while before posting a series on the various limitations associated with Automatic Indexing, in order to see how the feature matures over time. The following have all been re-tested post 1 January 2021 on the Autonomous ATP Database Cloud service, using Oracle Database version 19.5.0.0.0. In the Oracle Documentation (including […]

Check Constraints

This is a note I drafted in 2018 but never got around to publishing. It’s an odd quirk of behaviour that I discovered in 12.2.0.1 but I’ve just checked and it’s still present in 19.3.0.0.

Here’s a funny little thing that I found while checking some notes I had on adding constraints with minimum service interruption – a topic I last wrote about a couple of years ago {ed. now nearly 5 years ago]. This time around I did something a little different, and here’s a cut-n-paste from the first couple of steps when I had previously deleted a row from another session without committing (table t1 is a table I created as select * from all_objects).

Note that the first SQL statement uses “disable” while the second uses “enable”:


SQL> alter table t1 add constraint c1 check(owner = upper(owner)) disable novalidate;
alter table t1 add constraint c1 check(owner = upper(owner)) disable novalidate
            *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

SQL> alter table t1 add constraint c1 check(owner = upper(owner)) enable novalidate;

At this point my session was hanging – and I find it a little surprising that the attempt to create the constraint disabled returns an immediate ORA-00054, while the attempt to create it enabled waits. A quick check of v$lock showed that my session was requesting a TX enqueue in mode 4 (transaction, share mode) waiting for the other session to commit or rollback .

In the following output from 12.1.0.2 my session is SID 16 and I’ve simply reported all the rows for the two sessions from v$lock:


       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK     CON_ID
---------- -- ---------- ---------- ---------- ---------- ---------- ---------- ----------
        16 TX     327704      12790          0          4        169          0          0
           TX      65550       9613          6          0        169          0          0
           TM     192791          0          2          0        169          0          0
           OD     192791          0          4          0        169          0          0
           AE        133          0          4          0        579          0          0

       237 TX     327704      12790          6          0        466          1          0
           TM     192791          0          3          0        466          0          0
           AE        133          0          4          0        582          0          0

You’ll notice my session is holding an OD enqieie in mode 4 and a TM lock in mode 2 – the value 192791 is the object_id of the table in question. The OD lock is described in v$lock_type as “Lock to prevent concurrent online DDLs”.

It would appear, therefore, that we are stuck until the other session commits – so I hit ctrl-C to interrupt the wait, and then tried to add the constraint again, stil without committing (or rolling back) the other session. Here’s the cut-n-paste from that sequence of events:


alter table t1 add constraint c1 check(owner = upper(owner)) enable novalidate
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

SQL> alter table t1 add constraint c1 check(owner = upper(owner)) enable novalidate;
alter table t1 add constraint c1 check(owner = upper(owner)) enable novalidate
                              *
ERROR at line 1:
ORA-02264: name already used by an existing constraint

I’ve interrupted the command and “cancelled” the current operation – but it seems that I have successfully added the constraint anyway!

SQL> select constraint_name, constraint_type, search_condition from user_constraints where table_name = 'T1';

CONSTRAINT_NAME      C SEARCH_CONDITION
-------------------- - --------------------------------------------------------------------------------
SYS_C0018396         C "OWNER" IS NOT NULL
SYS_C0018397         C "OBJECT_NAME" IS NOT NULL
SYS_C0018398         C "OBJECT_ID" IS NOT NULL
SYS_C0018399         C "CREATED" IS NOT NULL
SYS_C0018400         C "LAST_DDL_TIME" IS NOT NULL
SYS_C0018401         C "NAMESPACE" IS NOT NULL
C1                   C owner = upper(owner)

And this is what happened when I switched to the other session – where I had still not committed or rolled back – and tried to execute an update:


SQL> update t1 set owner = lower(owner) where owner = 'SYSTEM' and rownum = 1;
update t1 set owner = lower(owner) where owner = 'SYSTEM' and rownum = 1
*
ERROR at line 1:
ORA-02290: check constraint (TEST_USER.C1) violated

So the constraint really is present and is visible to other sessions – even though the attempt to add it hung and had to be interrupted!

I can’t think of any reason why this might cause a problem in the real world – but it is an oddity that might have echoes in other cases where it matters.

Update (next day)

When I posted a link to this post on twitter one of the replies referred me to a very similar post by Oren Nakdimon with a very similar date, which makes me wonder if I had not published because I’d done a Google search on the topic after I had written this note.

I’ve just done a search on ODC (formerly OTN) for possible references (date range around the end of 2018, with text “enable novalidate”)  and found this question  So maybe we had seen the same question and done the same tests at the same time.

 

 

Covid-19: A Primer On DNA, RNA, and SARS-CoV-2

<<—- Long term hopeful, short term worried

<<– The new Covid-19 B.1.1.7 variant & threat to the health services

I want to explain a few things about SARS-CoV-2 (the virus that causes Covid-19) and the vaccines that are being rolled out in both the UK and the world. To do so I first need to explain about DNA, RNA, proteins and what is called the central dogma of molecular biology, which is what this post is about. The central dogma is the core – the absolute fundamental key thing of life, of our biology. It is the biological equivalent of what quantum mechanics is to physics.

Thankfully, it is far simpler to understand the basics of the central dogma of molecular biology than the basics of quantum mechanics. It is also a well established concept, I was taught it last century at university and it has not changed much in the 32 years or so, though we better understand so much more of the details and ramifications now (6).

Central Dogma Of Molecular Biology

Basically the Central Dogma is that DNA makes RNA and RNA makes protein – and information does not flow backwards. I’ll try and explain that in steps, but before that I want to give a quick reminder about DNA, which most of you probably remember from school, and protein/polypeptides. Sorry, but it’s necessary. Skip to “The Core Of Biology” if you already know all about these.

DNA and Proteins

All living organism contain and are controlled by DNA – Deoxyribonucleic Acid. This is the helical, double-stranded molecule whose structure was worked out by Watson, Crick, and Rosalind Franklin. In all organisms (except bacteria & archaea – together known as prokaryotes) the DNA is held in the nucleus of the cell (1). The whole genome is in every normal cell in an organism (be it a plant, fungus, moss, animal, you. Everything alive that is not bacteria/archaea is a eukaryote – which means the cell has a nucleus). There are some exceptions – such as red blood cells that lack a nucleus, or sex cells that carry one half of the normal amount of DNA for a given species.

The DNA directs all of the biochemistry of an organism (2). Everything. It defines the structure of the proteins we are made of, how the proteins go together, the layers and parts of our organs, the overall plan of our bodies, and the hormones, chemicals, and free-moving cells that go around our bodies like red and white blood cells. We still don’t know how some of this control is done but as DNA changes (mutations) affect all of these things, we know the DNA is basically the instruction book to both make an organism and to keep it functioning.

The DNA of an organism contains (amongst other things) it’s genes. Genes are the instructions for making all of our proteins and we say the genes are “expressed” when the genes are activated and make the proteins. And they do this all the time, at great rates, churning out vast quantities of proteins in each and every cell.

I say proteins but that is not *quite* right. Proteins are made of polypeptides, and polypeptides are made up of amino acids. As an analogy, proteins are paragraphs, polypeptides are sentences, sentences are made of letters which are the amino acids.

In the vast majority of organisms there are 20 different amino acids available to “spell out” all polypeptides. Genes have instructions to make chains of amino acids, called polypeptides. A protein may be a polypeptide, but it might also be made of several polypeptides or polypeptides that have been chemically modified after being initially made. I make this point as below, and in some of the links, polypeptides and amino acids are referenced and general scientific literature can be a bit muddled between polypeptides and proteins. For now, just think of proteins as really complex polypeptides. (3)

DNA consists of four letters as you probably know. Adenine, Cytosine, Guanine, and Thymine. In the double helix the two strands are “inverse mirror images” of each other and the letters pair up – A with T and C with G. So a short strand of DNA might be something like the below and, as indicated, the two halves can be split and, via the pairing of the letters, perfectly copied:

 

 

The DNA is unzipped and split (by an enzyme called Helicase) and then DNA Polymerase can come along and add in the missing letters and you end up with two perfect copies. Usually. In my example, to the right, there is a mistake,  – a C has been added in where an A should be. This mistake is an example of a mutation.

However, that is not part of the Central Dogma. Although DNA duplication is vital (after all, every cell needs a full copy of the organism’s genome so the DNA needs to be replicated each time the cell divides). The main function of DNA is to pour out instructions for the creation of polypeptides

The Core of all Biology

All the complexity of what our cells do and how our biochemistry works is via genes being expressed.  I’m not going to even attempt to describe how gene expression is controlled. It’s incredibly complex, it’s an area of understanding that has advanced hugely since I was taught the basics in my degree 32 years ago, and science still does not really understand a lot of it. But it’s the expression of these genes that allow are cells to do what they do, from growing hair, muscles, and making white blood cells to producing the enzymes that digest our food and control our bodies. All cells express thousands of genes all the time, at different levels of expression, and they do this by producing Messenger RNA, known simply as mRNA.

RNA, or Ribonucleic Acid, is (as the name implies) structurally very similar to DNA. It is single stranded, not double stranded like our genomic DNA, and the Thymine is replaced with a very similar chemical called Uracil.

Transcription

When a gene is expressed the relevant piece of DNA is “unzipped” and an enzyme called RNA Polymerase walks along the DNA and creates a complementary (meaning A becomes U, T become A, C becomes G and G becomes C) RNA copy of the DNA, as is shown in the diagram to the right. This is called transcription and it produces something called pre-mRNA. This is itself then processed by by other enzymes which cut out parts of the RNA that represent “Introns” – bits of DNA in the gene that are not to be used. This bit is not shown on the diagram. It is one of the complexities of our DNA that was poorly understood before the Human Genome Project and is still rarely explained. In a complex organism like a mammal or plant or fish, a single gene can produce a range of proteins depending on how this pre-mRNA is processed.

The diagram below shows how the double-stranded DNA is “unzipped” and the RNA polymerase reads one strand of the DNA and produces an RNA strand based on it.

The final mRNA consists of three main parts. The first part, at what is called the 5 prime (or 5′) end is a cap that allows the mRNA to be recognised and grabbed by the Ribosomes (see later) and transcribed. Then comes the RNA equivalent of the DNA gene for the protein. At the other end, the 3 prime (or 3′) end, a string of As is added (100-200 of them normally), the poly-A tail. We will see why later.

This mature mRNA is then transported out of the nucleus of the cell and out into the body of the cell, into the cytoplasm. The mRNA may hold markers to say where specifically in the cell it is to go but that’s a detail I won’t go into. This seems to be very important to complex organisms like ourselves, that the DNA sits in the nucleus, mRNA is produced and this is quickly passed out of the nucleus to be processed elsewhere in the cell.

Translation

So we now have an mRNA molecule in the cell ready to be translated, i.e. used to make a polypeptide. In the cell there are thousands and thousands of very complex molecules called “Ribosomes”. A ribosome is actually made of two parts, both of which are themselves made out of RNA, not protein. These ribosomes clamp onto the cap of the mRNA, one part on one side, the other on the opposite side, the mRNA in the middle. The cap is a special starter molecule and the first part of the mRNA, which does not code for a polypeptide but controls how easily the Ribosome attaches to the mRNA. The attached ribosome will then “read” the mRNA, working down the string of letters and creating a polypeptide that is described by the RNA sequence.

Special intermediate molecules are used to translate the RNA to an amino acid. These are the tRNA molecules shown in the diagram below. As a ribosome walks along the mRNA is reads small chunks of the mRNA, finds a tRNA that complements the mRNA and temporarily binds it to the mRNA. At the other end of the tRNA is a specific amino acid and this bonds to the growing polypeptide/protein.

Many ribosomes can be walking along a single mRNA molecule  at a time, creating more than one copy of the polypeptide. When a Ribosome gets to the end of the mRNA, to the poly-a tail, it drops off (4) and it will (or at least may) snip off the very end of the poly-a tail. So it shortens the tail. This tail protects the mRNA molecule from being destroyed by the cell so, as it shortens then the mRNA becomes more likely to be destroyed. Why is this important? Well, the cell needs to produce different polypeptides at different concentrations at different times. The poly-a tail is a key part of how the cell controls how long an mRNA molecule last for, creating polypeptides. The longer the tail, the longer the mRNA lasts. The mRNA can’t be allowed to hang about in the cell producing polypeptide for ever. Once it’s tail is gone it is destroyed. Thus for a polypeptide to be constantly produced, the genes in the nucleus need to keep producing the mRNA for it.

As you can appreciate, changes to the 5′ cap can change how quickly (and often) a ribosome latches onto an mRNA molecule, the length of the poly-a tail can control how long the mRNA lasts and so how much polypeptide that one mRNA molecule creates, and the nucleus has overall control over when and how much mRNA is produced. These processes allow constant control and change to how a particular gene is expressed.

I’ve said about the RNA being read and converted into a polypeptide. How does this work? The diagram above and the tRNA give clues to this.

As I said earlier, there are four letters of the RNA alphabet – A,C,G, and U – and a polypeptide is like a sentence. There are 20 types of amino acids that are strung together to make our polypeptides. Our genes are written in sets of 3 letters, called codons. You can think of them as a gene “word”. You can see this with the tRNA molecules in the diagram, at one end they have three RNA letters, and the other end the specific amino acid that those three RNA letters translate to.

A codon, such as UCA, codes for an amino acid called serine (or Ser). GCU codes for alanine (Ala). With 4 letters and a “word” being 3 letters long, there are 64 combinations of A,C,G, & U possible. You can see all 64 of these combinations in the table to the left. So how do the 64 possible codons map to 20 amino acids? Well, some codons have a special meaning.

3 – UAA, UAG, & UGA – are stop codons. They mean “this polypeptide is finished, ribosome stop reading”. One codon is special, AUG. This either means “start reading here” (note, there have to be other sequences in the RNA near it to make it mean this) or amino acid methionine (Met).

As for the others, well several codons mean the same amino acid, as you can see in the table to the left (5). Generally the first two letter in the codon define which amino acid the codon is for (anything starting GU is for valine, Val) but for some the third letter is the deciding factor.

Ribosomes thus start at a special AUG codon on the mRNA and then read three letters at a time and for each one, for each codon, the specific amino acid is added to the growing chain – via the tRNA molecules. The chain can be thousands of amino acids long or just a handful. The chain grows until a stop codon is reached. The longest polypeptide is titin, which is between 27,000-35,000 amino acids long (due to those “introns” I mentioned, sometimes some are cut out, sometimes not) and makes muscle elastic.

That’s it. That is how our DNA, our genes, make all the things that build and control our bodies. Of course, there is an incredible amount of complexity that arises from that central process, like how do the proteins control the inclusion of calcium to make our bones, grab iron and put it in our blood, and stuff all that fat in our cells. But it is all controlled and mediated through polypeptides, through proteins and enzymes.

If you want to see a large version of the Central Dogma diagram you can click here:

->ci350poster-141201170705-conversion-gate02

 

mRNA vaccines.

I won’t go into too many details here, but you may know that a couple of the Covid-19 vaccines are mRNA based. These vaccines are millions of mRNA molecules packaged up into little balls of fat (also called lipid). These mRNA molecules are all the same and are the instructions to produce the spike protein of Covid-19, the bit the virus uses to attach our cells and that the immune system is good at identifying and attacks. This is the Moderna and Pfizer vaccines

The vaccines are ONLY for the spike protein, not the rest of the virus, so the vaccines cannot give you Covid-19. But what the mRNA does do is get into your cells and your cell ribosomes latch on to the mRNA and make the spike protein. The mRNA molecules is not exactly like the one produced by the virus, it is modified to be more stable and last in the cell longer, with for example a longer poly-A tail. The longer it last, the more spike protein is produced. The vaccine mRNA is engineered to produce as much spike protein as possible.

Your body sees this spike protein, it knows it is “foreign” to your body and learns to attack it. Then, if you are infected by the real virus, your body will already know to attack the spike protein and either you do not get ill or you get less ill. Fantastic, isn’t it? I love science and I especially love biology and medicine. Even just 50 years ago this virus would have had to run it’s natural course through us and kill maybe like Spanish ‘flu did, but now we understand so much better what is going on and we can now do something about it. I say “we”, I mean biomedical scientists.

Previous vaccines have relied on getting a modified or damaged version of the whole virus into your body, or modifying another virus that is harmless so that it produces mRNA for exact copies of parts of the dangerous virus. These are hard things to do. The Oxford vaccine is a modified virus (a chimpanzee adenovirus, chosen I believe because it can infect us but does no harm and does not spread in humans).

Advances in handling mRNA, creating it, and understanding how to make it work in our cells, have allowed scientists to create mRNA vaccines which are simpler, more efficient, easy to create or modify , and more targeted than traditional vaccines. Only part of the virus is made and nothing else, so there is no danger whatsoever of the vaccine causing the disease, or a modified version of the disease. Further, if the virus alters (something that is a current worry) then modifying an mRNA vaccine is theoretically very easy and quick. Testing the new version would be necessary and would take months (regulatory bodies allowing – they might let a modified version be fast-tracked, my wife is an expert in pharmacovigilance and she thinks it could be done) , but it means mutations to SARS-CoV-2 can be handled relatively quickly if the need arises. This could be vitally important.

This work has not all been done since Covid-19 appeared about a year ago, it is based on several years of work on MERS, SARS and other viruses. So on the one hand these mRNA vaccines are a new technology, but they are new since 4 or 5 years ago and great advances in how to create them have been made in the last year.

 

Implications of the Central Dogma

Some things follow on from the above that are fundamental to SARS-CoV-2 and vaccines. I’ll touch on them here and expand on them in further posts, as this is a lot in one go.

Mutations and Open Reading Frame

In my previous post on the new variant of SARS-CoV-2 I mention mutations. At the start of this post I mentioned the copying of DNA/RNA and how mistakes can be made, in particular a single letter changing to another. When a single letter of the genome gets changed, this is called a Single Nucleotide Polymorphism or SNP. If that letter is the first one in a codon in a gene, it is almost certainly going to have an impact. It will change the amino acid inserted at that point in the polypeptide. If you look at the codon table earlier it is possible to change the first letter A to C and still get Arginine. Other than that, altering that first letter in the codon alters the polypeptide. Other SNPs can have no effect – changing GUU to GUC still makes valine. Of course, any SNP that creates a stop codon is going to have a potentially massive effect.

SNPs that cause no change to the polypeptide are called synonymous. As they make no difference to the organism, they occur and get passed to the next generation of the organism and all their offspring. We can use these synonymous SNPs to track the lineage of organisms and they are used to track the lineage of SARS-CoV-2. This allows us to, for example, track how the virus has spread geographically. I say us, I mean phylogenetic scientists.

Those SNPs that change the polypeptide sequence are more likely to change something about the biology of the organism. If the change is negative (for example it reduces the efficiency of an enzyme) the organism and it’s descendants will be at a disadvantage and the change will be selected out. If it gives the organism an advantage, it and it’s descendants will do better than those without the change and will take over in the population. This is true of the new variant B.1.1.7 – is better at spreading, it is taking over. Many SNPs that change one amino acid have very little positive or negative effect on the virus. (I don’t know what it is like on modern genetics degree courses but in my day lecturers would almost come to blows over how much effect a single mutation would need to have to be significant, and how much evolution of DNA was just mathematical, accidental drift, and how much was through selection pressure.)

Other, rarer mutations can be deletions and insertions. Extra letters get added or removed. Now, if the number of letters added/removed is 1, 2, 4, 5 or any number that is not divisible by 3, the impact is huge. Why? Well, a gene has something called it’s Open Reading Frame. Codons are always 3 letter long, staring at the ALU that initiates the mRNA being read. That reading frame of 3 letters per word has to be preserved through the whole gene. If you shift all the letters along by anything other than a multiple of 3, everything after that change becomes very different – and usually garbage.

An insertion or deletion of a non-multiple of 3 letters will not be significant if it occurs in DNA/RNA that does not code for something, but if it is in a gene it is 99.9837% (2) of the time a disaster for that gene, destroying the function of that polypeptide it producers. SARS-Cov-2 is an RNA virus and such viruses are almost all functional gene. Thus deletions or insertions that do not preserve the reading frame are rare (but do occur) in SARS-Cov-2 and other viruses.

Variant B.1.1.7 has 3 deletion mutations in it but they all preserve the reading frame. They drop 1,2, or 3 amino acids out of the polypeptides they code for. But the rest of the polypeptide is preserved. One particular deletion, in the spike protein removing amino acids 69 and 70, stops one of the standard PCR tests from detecting RNA fragments of SARS-CoV-2. I’ll revisit this topic in another post, but because it stops one of the standard PCR tests from working, that can be used in many situations for tracking this variant. Don’t worry, PCR tests for SARS-CoV-2 use 2 or 3 RNA sites to identify the virus, so it is still detected. However, the failure of one of the “channels” has turned out to be a boon for tracking this nasty variant.

Single Direction Of Information.

Under the central dogma you will see that information flows from DNA in the cell nucleus, to mRNA that leaves the nucleus and goes into the cell cytoplasm, and this is translated into polypeptides. It does not go the other way.

Nothing I know of in biology can take a polypeptide, let alone a mature protein, and generate RNA from it. Nothing. Humans can make a stab at it, we can look at the amino acid sequence of a protein and design an mRNA strand that might sort-of work but it’s hellishly difficult as organisms like terrestrial plants and vertebrates have complex post-processing of many polypeptides. Biology cannot do it.

There is nothing that takes mRNA and pulls it back into the cell nucleus and shoves it into our DNA. Nothing natural can do this that I am aware of. A couple of people on a social media forum full of biology experts that I mentioned this post on have voiced possibilities, but nothing concrete yet has been forthcoming (and it would be fascinating to learn about if they do, I’m always looking to learn).

Some of you may have heard of retroviruses such as HIV (the virus that causes AIDS). They can do something that sounds similar – but it is not. They can reverse transcribe their own RNA, i.e. create a DNA copy of their RNA using a reverse transcriptase enzyme, and insert it into the host DNA using an integrase enzyme. The retrovirus has the RNA genes for these two proteins in it’s genome, it brings it’s tools with it. They get into the nucleus of the host cell and use their own tools to insert their own genome into the host, along with control DNA so that the viral DNA can be expressed. What it does NOT do (as far as I know and this is possible where I am wrong) is grab random mRNA from around it and insert it into the DNA of the host. Remember, mRNA is exported out of the nucleus. It’s not there in the nucleus, at least not for long. Also, even if a retrovirus was to insert mRNA into the host’s DNA, it would be doing so without promoter sequences and all the stuff needed to get a gene to be expressed.

I make this point as some people on social media have claimed the mRNA in Covid-19 vaccines could get into your DNA. No, it can’t. It won’t. Anyone claiming this does happen does not understand the central dogma of molecular biology. Either that or they could be in line for a Nobel Prize in biology.

Firstly, the vaccine does not get into the nucleus. Second, there is no biological process native to vertebrates to do the insertion of mRNA into DNA. Third, even if by some chance a virus like HIV was present, and by some miracle some of that mRNA for the vaccine got into the nucleus, HIV is inserting a copy it’s own DNA, not random mRNA hanging around. Finally, even if a miracle on a miracle occurred and the mRNA from the vaccine was inserted into your DNA – there would be nothing to cause it to be expressed. It would just sit there doing absolutely nothing.

What Retroviruses can do is insert into a DNA genome, then when it is expressed it can occasionally pick up DNA from around where it inserted into the genome, which is transcribed and included into the RNA for the virus. If this modified virus then infects another organism and takes that original host DNA (as an RNA copy) with it, it can then insert that picked-up DNA into the new host. It’s very rare, it can happen. But no reverse reading of mRNA was involved.

Basically, the idea of mRNA from a vaccine getting into your genome is damned close to impossible given the current understanding of molecular biology.

Viruses

I’ll finish with some information on viruses.

Viruses are weird. There is an ongoing debate (and has been for over 35 years, as it was a topic of discussion during my degree) whether viruses are alive. Viruses can’t do anything without a cell and it’s machinery to make proteins from DNA/RNA. They can’t move themselves, they get moved about by mechanical processes (in droplets of liquid, floating in water, blown around in the air, transferred via fluids in real living things…). They don’t grow, they do not respond to stimuli (all other life from bacteria up do). They do nothing. A virus consists of just a few things:

  • A string of genetic material, either double stranded DNA similar to what is in us, single stranded DNA or RNA (usually single stranded but occasionally double stranded). SARS-CoV-2 is a single stranded RNA virus.
  • A protein coat, called a capsid, encapsulating the genetic material, keeping it protected and whole. This might be a simple, uniform coat or something more complex made of many proteins. SARS-CoV-2 has a capsid made of several proteins including the famous “spike” protein, which sticks out of the capsid and is what latches onto the ACE2 proteins on our cell walls and allows the virus to get into the host cell.
  • Protein(s) within the capsid, binding to and protecting the genetic material. SARS-CoV-2 has this.
  • A virus may have an outer lipid (fat) layer, usually derived from the lipid layer of the host cell it infected. SARS-CoV-2 does not have this.

I think of viruses as very, very complex poisons and not alive. Others think of it as alive.

If you want to know more about the structure of SARS-CoV-2 this paper on the structure of the virus on the NCBI site is very good but quite technical.

The genetic material for a standard virus (like SARS-Cov-2) codes for a load or mRNAs that usurp the polypeptide making machinery of the host cells. i.e., they use the second half of the central dogma. Once the virus gets into a host cell, the mRNA is released and it hijacks our own cell’s ribosomes. It makes new proteins to make the virus shell and proteins to coat the RNA of the virus. It creates an RNA Polymerase enzyme to replicate it’s own genetic code and, in the case of coronaviruses like SARS-CoV-2 (and other types of virus) it produces a “checking enzyme” to make sure the RNA copies accurately.

This last point is very interesting. All organisms mutate but RNA viruses are the fastest mutating thing we know of. But SARS-CoV-2 mutates slowly for an RNA virus as it has a check enzyme. That’s one thing to be very thankful for. Influenza is an RNA virus that does not have a checking enzyme, which is part of why it changes so quickly and we need a new vaccine for it each year.

All these bits of the virus then self-assemble into thousands of new copies of the virus, burst the host cell and go and infect other cells in the organism. Some are ejected from the host organism in droplets coughed out or similar mechanical processes and infect other hosts.

That’s pretty much all that a virus does.

Notes

1) I said all our DNA is in the nucleus and controls everything. This is not quite true and I am sure some of you know that. We also have DNA in our mitochondria, the organelles in our cells providing us with energy at a biochemical level. Mitochondria look a little like bacterial cells living within our cells and some scientists think this is where they originally came from. It is suggested that a very early Eukaryotic cell absorbed and made a symbiotic relationship with a bacterial cell that was very good at making ATP (the unit of energy in most biology). This was so successful that the organism that did that out-competed all other Eukaryotic life and took over. And, over time the absorbed bacteria became simplified and specialised as the mitochondria. As a result, mitochondria have their own DNA. As do chloroplasts in plants.

2) There is really only one hard, absolute rule in biology. There is an exception to every absolute rule. See 1! Forgive me if I don’t cover all the exceptions in the rest of this post, but what I sat here is true 99.9837% of the time. And treat all percentages in documents with scepticism, many are made up.

3) The distinction between amino acids and peptides has always annoyed me. If “Amino acid” is the term for the building blocks of proteins should not a chain of amino acids be a “polyamino” or something? No, we have peptides/polypeptides.  A peptide has to be 2 or more amino acids as it is named after the bond between the two amino acids. A Peptide bond. Strictly speaking a peptide of between 2 and 20 amino acids is called an oligopeptide, and above that is a polypeptide. It’s just messy.

4) The ribosome may not drop off the mRNA. If the poly-A tail and the mRNA cap are intact, they my bind together to form a loop that allows most of the ribosomes to simply circle around the whole mRNA and make more polypeptide more efficiently. This might help curtail the activity of the mRNA more quickly as, when the poly-A tail or the cap are degraded (as there is some mechanism to degrade the cap too), then the loop is broken and Ribosomes can no longer cycle around. I don’t know the details.

5) The codon to amino acid mapping is very nearly universal. Almost all organism use the same mapping and it is one of the proofs that all life on this planet is related. However, there are some exceptions (as there always are in biology). If you want to nerd out on it look at this Wikipedia page on alternative codon translation tables.

6) The basics of the central dogma of molecular biology has been known for over 50 years. Here’s the start of the chapter on it from my 33 year old  “Genes 3” by Benjamin Lewin. Looking back at this book, which I pretty much knew cover to cover back then, I realise how much knowledge has leaked out my head.