Search

Top 60 Oracle Blogs

Recent comments

CQRS, Event Sourcing and the Oracle Database

By Franck Pachot

.
This blog post relates my thoughts when reading about Command Query Responsibility Separation and Event Sourcing, in the context of the Oracle Database (but it can probably apply to any database). We see those terms in the new software architecture diagrams, but they are actually quite old:

Command-Query separation

Command-Query separation was defined by Bertrand Meyer 15 years ago, not for the database but for the Eiffel language. See page 22-44 of Eiffel: a language for software engineering.

This basically states that a function that returns a result does not change anything (“asking a question should not change the answer”). In PL/SQL you can do whatever you want with FUNCTION and PROCEDURE: PROCEDURE can return values with OUT parameters and FUNCTION can modify some data. However, ‘You can’ does not mean that you should. It is easy to define some coding rules where a FUNCTION does not change anything and where on PROCEDURE implements what Bertrand Meyer calls ‘command’.

In SQL this is clearly enforced. A SELECT is for queries and cannot change any data. Even if you call a function that tries to change some data you will encounter:

ORA-14551: cannot perform a DML operation inside a query

Object Oriented data modeling

This concept came from Eiffel, an object-oriented programming language. Object Oriented approach was designed for transient data structures: stored in memory (the object identification is the address in memory) and usually not shared (the representation of the same data is a different object in different systems). But the modular and encapsulation aspects of OO approach being great, OO approach has been extended to data, the persistent data that we store in the database. Before, the functions and the data model were analyzed separately. This designed monolithic applications which were difficult to maintain and evolve. The Object Oriented approach helps to analyze the sub-subsystems separately.

Create Retrieve Update Delete

However, because the objects were the business objects and not the use-cases, or services, then the software architects came with this simplified idea that all interaction with the database is CRUD: you Create (aka INSERT), retrieve (aka SELECT), UPDATE or DELETE. And then you define an Object (mapped to a table) and define those 4 CRUD methods mapped to 4 SQL Statements. And you can do everything. And because some ORM frameworks give the possibility to ignore how the data is stored and shared in the database, a new generation of developers was working with data sets as if it were objects in memory.

And this is completely wrong because

  • one object maps to only one row, and the CRUD approach exposes no method for bulk updates – those where the database can be the most efficient
  • the CRUD methods read and update all table columns – ignoring all efficient access paths to a subset of columns
  • mapping a transient object identifier (address in memory) to a shared persistent identifier (primary key) is not easy
  • and finally, all those CRUD operations belong to an object when they should belong to use-cases and transactions

The last point is more obvious between the queries (the R in CRUD) and the insert (the C in CRUD). In an information system, you usually use (or query or retrieve) the data in a completely different way than it was entered (entered, inserted). Let’s take an example in an Order Entry system. When the customer starts with an order, it is probably the responsibility of the Orders object. But then, it will be processed by for the delivery, and then queried by the Customer object, and finally by the Products for sales analysis. Grouping all that in the same object is easier for the Object-Oriented Analysis, for the modelization of the domain object. But keeping this grouping for the software implementation is a big source of problems. And finally, the same object will be used by completely different use-cases, with different availability, consistency, performance,… requirements. That’s too much responsibility for one object.

Command-Query-Responsability-Separation (CQRS)

Because of this (CRUD row-by-row queries inefficiency and Object over-responsability) the applications started to cache locally the data used by queries. In order to avoid the row-by-row roundtrips and to store them in a model more suited to the query than the way it was inserted. Like dozens of data marts that try to keep in sync with the changes done in the database.

Hibernate, for example, can cache the whole data in a second level cache, using object stores like GigaSpaces or Oracle Coherence. But this doesn’t solve the main problem. The object model, which is not relational but hierarchical, cannot be used efficiently by all use-cases. And I’ve seen very few Hibernated developers accepting to have multiple Entity mappings for the different use-cases. Basically, the relational model can cope with all kind of queries (especially since Oracle In-Memory Columns Store as you can also do analytics on it). But when querying it with an object model, you need several models.

Then came the idea to split the data model between ‘command’ – the modifications, and ‘query’. This is the CQRS from Martin Fowler: https://martinfowler.com/bliki/CQRS.html

This is, again, not new. The relational model exposes data through views and you can, and should, have different views on your data. The data is stored in one normalized model, modified though views or procedures (Transactional API) and queried through views. Separation of responsibility has always been there. And I totally agree with all those diagrams showing multi-layers, separation, and even micro-services. There’s only one problem with them.

Logical vs. physical separation

I don’t really know when, but at a point, the architects looking at those diagrams forgot one step where the logical model should be implemented in a physical model. And what was logical layers became physical tiers without any reasons. Many architects uses ‘layer’, ‘level’, ‘tier’ without even knowing if they are at logical or physical level. It is good to have logical layers, but processing data across multiple physical tiers will exhaust all resources in all tiers just by doing nothing else than waiting on roundtrips latency and expensive context switches.

The CRUD methods had to be coded in an Object Oriented language, and then the idea was Java. Then, the rows have to be shipped between two different processes: one running a JVM, and one running the SQL queries in the RDBMS. Different processes means context switches for each calls. This is latency and overhead, and is not scalable. Oracle has the possibility to run the JVM but this still context switch and datatype conversion. Then, to try to scale, more application servers were needed and this data processing part moved to the application server.

And the bad effect is not only on performance and scalability. Because of this confusion, implementing logical layers into different servers, technologies, languages,… we lost the link between the storage data and the processing of data. Which is exactly the opposite of an Object Oriented approach. With the data logic (you can call it business logic as well, as all data processing is there to implement business rules) in the database you can manage dependencies. The database always keeps track of which function or procedure is using which table, and how.

Event Sourcing

https://martinfowler.com/eaaDev/EventSourcing.html

Oracle can even guarantee consistent reads without changing anything, thanks to MVCC. Of course, SELECT FOR UPDATE can write to the database, not to change data but to write lock information, but you can restrict this by granting only the READ privilege.

This means that, for example, in PL/SQL we use PROCEDURE for what Bertrand Meyer calls ‘command': changing data. And we use FUNCTION to query data

 

Cet article CQRS, Event Sourcing and the Oracle Database est apparu en premier sur Blog dbi services.