Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Oracle Security Training by Pete Finnigan in 2018

Are you worried about the data in your databases being stolen? GDPR has just become law across the EU and the UK and affects business in other countries that process EU citizens data. Maybe you store and process credit card....[Read More]

Posted by Pete On 19/07/18 At 02:04 PM

Google Cloud Spanner – inserting data

By Franck Pachot

.
In a previous post I’ve created a Google Cloud Spanner database and inserted a few rows from the GUI. This is definitely not a solution fo many rows and here is a post about using the command line.

If I start the Google Shell from the icon on the Spanner page for my project, everything is set. But if I run it from elsewhere, using the https://console.cloud.google.com/cloudshell as I did in A free persistent Google Cloud service with Oracle XE I have to set the project:

franck_pachot@cloudshell:~$ gcloud config set project superb-avatar-210409
Updated property [core/project].
franck_pachot@superb-avatar-210409:~$

Instance

I create my Spanner instance with 3 nodes across the world:
¨
franck_pachot@superb-avatar-210409:~$ time gcloud spanner instances create franck --config nam-eur-asia1 --nodes=3 --description Franck
Creating instance...done.
 
real 0m3.940s
user 0m0.344s
sys 0m0.092s

Database

and Spanner database – created in 6 seconds:

franck_pachot@superb-avatar-210409:~$ time gcloud spanner databases create test --instance=franck
Creating database...done.
&nbssp;
real 0m6.832s
user 0m0.320s
sys 0m0.128s

Table

The DDL for table creation can also be run from there:

franck_pachot@superb-avatar-210409:~$ gcloud spanner databases ddl update test --instance=franck --ddl='create table DEMO1 ( ID1 int64, TEXT string(max) ) primary key (ID1)'
DDL updating...done.
'@type': type.googleapis.com/google.protobuf.Empty

I’m now ready to insert one million rows. Here is my table:

franck_pachot@superb-avatar-210409:~$ gcloud spanner databases ddl describe test --instance=franck
--- |-
CREATE TABLE DEMO1 (
ID1 INT64,
TEXT STRING(MAX),
) PRIMARY KEY(ID1)

Insert

The gcloud command line has a limited insert possibility:

franck_pachot@superb-avatar-210409:~$ time for i in $(seq 1 1000000) ; do gcloud beta spanner rows insert --table=DEMO1 --database=test --instance=franck --data=ID1=${i},TEXT=XXX${i} ; done
commitTimestamp: '2018-07-18T11:09:45.065684Z'
commitTimestamp: '2018-07-18T11:09:50.433133Z'
commitTimestamp: '2018-07-18T11:09:55.752857Z'
commitTimestamp: '2018-07-18T11:10:01.044531Z'
commitTimestamp: '2018-07-18T11:10:06.285764Z'
commitTimestamp: '2018-07-18T11:10:11.106936Z'
^C

Ok, let’s stop there. Calling a service for each row is not efficient with a latency of 5 seconds.

API

I’ll use the API from Python. Basically, a connection is a Spanner Client:

franck_pachot@superb-avatar-210409:~$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import spanner
>>> spanner_client = spanner.Client()
>>> instance = spanner_client.instance('franck')
>>> database = instance.database('test')
>>>

Batch Insert

With this I can send a batch of rows to insert. Here is the full Python script I used to insert one million, by batch of 1000 rows:

from google.cloud import spanner
spanner_client = spanner.Client()
instance = spanner_client.instance('franck')
database = instance.database('test')
for j in range(1000):
records=[]
for i in range(1000):
records.append((1+j*1000+i,u'XXX'+str(i)))
with database.batch() as batch:
batch.insert(table='DEMO1',columns=('ID1', 'TEXT',),values=records)

This takes 2 minutes:

franck_pachot@superb-avatar-210409:~$ time python3 test.py
 
real 2m52.707s
user 0m21.776s
sys 0m0.668s
franck_pachot@superb-avatar-210409:~$

If you remember my list of blogs on Variations on 1M rows insert that’s not so fast. But remember that rows are distributed across 3 nodes in 3 continents but here inserting with constantly increasing value have all batched rows going to the same node. The PRIMARY KEY in Google Spanner is not only there to declare a constraint but also determines the organization of data.

Query

The select can also be run from there from a read-only transaction called ‘Snapshot’ because it is doing MVCC consistent reads:

frank_pachot@superb-avatar-210409:~$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import spanner
>>> with spanner.Client().instance('franck').database('test').snapshot() as snapshot:
... results = snapshot.execute_sql('SELECT COUNT(*) FROM DEMO1')
... for row in results:
... print(row)
...
[1000000]

The advantage of the read-only transaction is that it can do consistent reads without locking. The queries executed in a read-write transaction have to acquire some locks in order to guarantee consistency when reading across multiple nodes.

Interleave

So, you can look at the PRIMARY KEY as a partition by range, and we have also reference partitioning with INTERLEAVE IN PARENT. This reminds me of the Oracle CLUSTER segment that is so rarely used because storing the tables separately is finally the better compromise on performance and flexibility for a multi-purpose database.

Here is my creation of DEMO2 where ID1 is a foreign key referencing DEMO1

franck_pachot@superb-avatar-210409:~$ time gcloud spanner databases ddl update test --instance=franck --ddl='create table DEMO2 ( ID1 int64, ID2 int64, TEXT string(max) ) primary key (ID1,ID2), interleave in parent DEMO1 on delete cascade'
DDL updating...done.
'@type': type.googleapis.com/google.protobuf.Empty
 
real 0m24.418s
user 0m0.356s
sys 0m0.088s

I’m now inserting 5 detail rows per each parent row:

from google.cloud import spanner
database = spanner.Client().instance('franck').database('test')
for j in range(1000):
records=[]
for i in range(1000):
for k in range(5):
records.append((1+j*1000+i,k,u'XXX'+str(i)+' '+str(k)))
with database.batch() as batch:
batch.insert(table='DEMO2',columns=('ID1','ID2','TEXT'),values=records)

This ran in 6 minutes.

Join (Cross Apply)

Here is the execution plan for

SELECT * FROM DEMO1 join DEMO2 using(ID1) where DEMO2.TEXT=DEMO1.TEXT

where I join the two tables and apply a filter on the join:
CaptureSpannerCrossApply

Thanks to the INTERLEAVE the join is running locally. Each row from DEMO1 (the Input of the Cross Apply) is joined with DEMO2 (the Map of Cross Apply) locally. Only the result is serialized. On this small number of rows we do not see the benefit from having the rows in multiple nodes. There are only 2 nodes with rows here (2 local executions) and probably one node contains most of the rows. The average time per node is 10.72 seconds and the elapsed time is 20.9 seconds, so I guess that one node ran un 20.9 seconds and the other in 1.35 only.

The same without the tables interleaved (here as DEMO3) is faster to insert but the join will be more complex where DEMO1 must be distributed to all nodes.
CaptureSpannerDistributedCrossApply
Without interleave, the input table of the local Cross Apply is a Batch Scan, which is actually like a temporary table distributed to all nodes (seems to have 51 chunks here), created by the ‘Create Batch’. This is called Distributed Cross Applied.

So what?

Google Spanner has only some aspects of SQL and Relational databases. But it is still, like the NoSQL databases, a database where the data model is focused at one use case only because the data model and the data organization have to be designed for specific data access.

 

Cet article Google Cloud Spanner – inserting data est apparu en premier sur Blog dbi services.

Power BI 101 – Log Files and Tracing

Knowing where log files are and how to turn on debugging is an essential part of any technical job and this goes for Power BI, too.  Remember, as I learn, so does everyone else….Come on, pretty please?

Power BI Desktop

Log files and traces can be accessed one of two ways-

  • Via the Power BI Application
  • Via File Explorer

In the Power BI application, go to File –> Options and Settings –> Options –> Diagnostics.

Crash and dump files are automatically stored with an option to disable them from this screen, but unsure why you’d ever want to do this.  If Power BI does crash, you would lose any valuable data on what the cause was.

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 1519w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing1.png?... 1400w" sizes="(max-width: 318px) 100vw, 318px" data-recalc-dims="1" />

To debug with a trace, you’ll need to enable it from this screen as well, as it’s not turned on by default.  Remember that tracing can be both resource and storage intensive, so only enable it when you actually need to diagnose something.  You can also choose to bypass tracing the geo code cache, as this is used to help map coordinates and it can be very chatty.

To view files, you can click on the open crash/dump file folder and this will open up a File Explorer to the traces directory on your pc.

Directly From File Explorer:

Ensure that File Explorer has viewing set to display hidden items.

C:\Users\\AppData\Local\Microsoft\Power BI Desktop\Traces

Log Files

These are all retained inside the Performance folder under the Traces directory

The file’s will be named with the following naming convention:

..

Locating the files that you need for your current process is easiest if you sort by Date Modified.  Verify that you’re working with the file that is being written to and not the file used to keep track of startup and shutdown log tracking, (0 KB):

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 1512w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing4.png?... 1400w" sizes="(max-width: 438px) 100vw, 438px" data-recalc-dims="1" />

The third log file in the list above, and also the one started before the executable for Power BI Desktop, (PID 13396) is the Microsoft Mashup Container, (Microsoft.Mashup.Container.NetFX40.exe) with its own PID of 16692.  It’s contains valuable information about calculations,  measures and other caching processes.  Take care to ensure the PID of the one used by Power BI in the logs matches the one you’re inspecting in the Task Manager-  Excel and other programs are also known to have a version of this executable, so there may be more than one listed for Power BI, as well as others for different Microsoft applications.

Log File Breakdown

Each file will contain entries providing information on high level processing, including start time, total size of cache allocated for the task, process information, Process ID, (PID), Transaction ID, (TID) and duration.

An example of an entry can be seen below:

ObjectCacheSessions/CacheStats/Size {"Start":"2018-07-19T01:42:24.9707127Z","Action":"ObjectCacheSessions/CacheStats/Size","entryCount":"1","totalSize":"24","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"00000000-0000-0000-0000-000000000000","Process":"PBIDesktop","Pid":13396,"Tid":8,"Duration":"00:00:00.0046865"}

We can easily match up the Process name and the PID with what is displayed in our Task Manager detail view:

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing3.png?... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing3.png?... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/07/tracing3.png?... 1400w" sizes="(max-width: 460px) 100vw, 460px" data-recalc-dims="1" />

We’ve now identified the process, the amount of memory allocated to perform a task captured in the log, start time and the duration.  The information in these log files can assist when diagnosing if Power BI desktop crashes, but the data collected is quite rudimentary.

If you shut down Power BI Desktop, the PBIDesktop* log file writes to the startup file, which was once empty and it then empties and saves off the timestamp of the exit of the program.

The Microsoft Mashup file has much of the same information, but includes deeper level processing work by Power BI, such as work done in the Query Editor or when we create a measure or new column/table.

In the three examples from the file below, you can see a compile, a save and then an evaluate task.  Note that the Host Process ID is my Power BI Desktop we’ve seen earlier, but the interaction with the Microsoft Mashup Container is demonstrated as well:

SimpleDocumentEvaluator/GetResult/Compile {"Start":"2018-07-19T01:48:46.1441843Z","Action":"SimpleDocumentEvaluator/GetResult/Compile","HostProcessId":"13396","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"04248470-07e1-4862-b184-a32f186f26fd","Process":"Microsoft.Mashup.Container.NetFX40","Pid":16692,"Tid":1,"Duration":"00:00:00.4302569"}

ObjectCache/CacheStats/Size {"Start":"2018-07-19T01:48:47.3504705Z","Action":"ObjectCache/CacheStats/Size","HostProcessId":"13396","entryCount":"5","totalSize":"14564","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"04248470-07e1-4862-b184-a32f186f26fd","Process":"Microsoft.Mashup.Container.NetFX40","Pid":16692,"Tid":1,"Duration":"00:00:00.0000170"}

SimpleDocumentEvaluator/GetResult/Evaluate {"Start":"2018-07-19T01:48:46.5744678Z","Action":"SimpleDocumentEvaluator/GetResult/Evaluate","HostProcessId":"13396","ProductVersion":"2.59.5135.781 (PBIDesktop)","ActivityId":"04248470-07e1-4862-b184-a32f186f26fd","Process":"Microsoft.Mashup.Container.NetFX40","Pid":16692,"Tid":1,"Duration":"00:00:00.7780750"}

Another common file in the Performance directory will contain the msmdsrv* naming convention, which collect log information on the data source loader.  Duration information and cache/memory allocation could offer valuable information on poor performance during data loading processes.  First stop is always to check the settings for the desktop to see what has been set for memory allocation vs. assuming it’s the default.

If I just start the program and don’t open anything, only the high level processing of starting, basic memory allocation and stopping will be tracked in the PBIDesktop* file until I open up a PBIX file.  Then anything that needs to be updated and refreshed for the visuals, etc. will begin to write log data to the Microsoft Mashup log file and if a data refresh must be performed, the msmdsrv file.

Trace files

When you do turn on debugging, tracing, as shown in the beginning of this post, a file is created in the parent directory, TRACES.

When enabled and after a restart of the Power BI Desktop, you will receive not only similar information about PID, TID, the process and the duration, but also encounter granule information about Power BI and what’s going on behind the scenes:

  • Application graphics info
  • Settings
  • Parameters
  • Background processes
  • Caching
  • Extensions
  • Query edits
  • Changes applied

You’ll even see entries similar to the following:

SharedLocalStorageAccessor/AcquireMutex

A mutex is a small, efficient allocation of memory.  As mutexes have thread affinity, it means the mutex can only be released by the thread in Power BI that owns it.  If it’s released by another thread, an application exception will be thrown in the application and trapped in the trace file.

The interesting aspect of tracing in Power BI Desktop, the options are put back to default, with granule level tracing disabled when you restart the application.

 

 

 

 

 

 

 



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Power BI 101 - Log Files and Tracing], All Right Reserved. 2018.

Speaking At DOAG 2018 Conference And IT Tage 2018

#333333; font-family: Verdana, Arial, sans-serif; font-size: 13px;">I will be speaking at the yearly DOAG#333333; font-family: Verdana, Arial, sans-serif;"> conference in December as well as at the IT Tage in November. My talk will be "Oracle Optimizer System Statistics Update 2018" where I summarize the history and current state of affairs regarding System Statistics and I/O calibration in recent Oracle versions like 12c and 18c.

Easy way to create large demo-tables in #Exasol and #Oracle

https://uhesse.files.wordpress.com/2015/10/helps.png?w=600&h=558 600w, https://uhesse.files.wordpress.com/2015/10/helps.png?w=150&h=140 150w" sizes="(max-width: 300px) 100vw, 300px" />

If you need a large set of data to test or demonstrate something, this does the trick:

Create demo table in Oracle

create table t as
select 
rownum as id, -- growing unique number
'Just some text' as textcol,
mod(rownum,5) as numcol1, -- numbers 0-4 round-robin
mod(rownum,1000) as numcol2 , -- numbers 0-999 round robin
5000 as numcol3, -- constant
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id -- dates for every month 
from dual connect by level<=1e4 -- 10000 rows 
; 

Create demo table in Exasol

create table t as
select 
rownum as id, -- growing unique number
'Just some text' as textcol,
mod(rownum,5) as numcol1, -- numbers 0-4 round-robin
mod(rownum,1000) as numcol2 , -- numbers 0-999 round robin
5000 as numcol3, -- constant
to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id -- dates for every month 
from (select 1 from dual connect by level<=1e4) -- 10000 rows 
; 

In Oracle you may get this error message for a high number of rows (like 10 Mio): ORA-30009: Not enough memory for CONNECT BY operation.
This way it works in spite of it:

SQL> alter session set workarea_size_policy=manual;

Session altered.

SQL> alter session set sort_area_size=1000000000;

Session altered.

SQL> create table t as
     select 
     rownum as id, 
     'Just some text' as textcol,
     mod(rownum,5) as numcol1, 
     mod(rownum,1000) as numcol2 , 
     5000 as numcol3, 
     to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id  
     from dual connect by level<=1e7 ; 
Table created.
 
SQL> select count(*) from t;

  COUNT(*)
----------
  10000000

SQL> select bytes/1024/1024 as mb from user_segments where segment_name='T';

        MB
----------
       480

In Exasol, this problem doesn’t surface:

SQL_EXA> create table t as
         select
         rownum as id,
         'Just some text' as textcol,
         mod(rownum,5) as numcol1,
         mod(rownum,1000) as numcol2 ,
         5000 as numcol3,
         to_date ('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2018' ,'dd.mm.yyyy') as time_id
         from (select 1 from dual connect by level<=1e7) ; 
EXA: create table t as... 
Rows affected: 10000000 
SQL_EXA> SELECT raw_object_size/1024/1024 as mb FROM SYS.EXA_ALL_OBJECT_SIZES where object_name='T';
EXA: SELECT raw_object_size/1024/1024 as mb FROM SYS.EXA_ALL_OBJECT_SIZES w...

MB
----------------------------------
          286.10229492187500000000

1 row in resultset.

And the resulting table is smaller because Exasol stores it in columnar compressed format </p />
</p></div>

    	  	<div class=

Announcement: Venue Confirmed For Upcoming Brussels “Oracle Indexing Internals and Best Practices” Seminar

I can finally confirm the venue for my upcoming “Oracle Indexing Internals and Best Practices” seminar in beautiful Brussels, Belgium running on 27-28 September 2018. The venue will be the Regus Brussels City Centre Training Rooms Facility, Avenue Louise / Louizalaan 65, Stephanie Square, 1050, Brussels. Note: This will be the last public seminar I’ll run […]

Searching in Oracle Database documentation

Just a quick heads up with something I see from time to time in Chrome (but not in Firefox or any other browser).

Occasionally when doing a search, the results are not limited as per my criteria.  For example, if I am searching for information about Spatial in the Licensing Guide:

image

then when I click the Search button, the results might come back with a far broader search range:

image

If you experience this, there is an easy workaround – simply re-run the search. It only seems to occur (for me at least) on the first execution of the search.  Clicking the Search button again on the page, yielded the correct result second time around:

image

Let me know if you’ve seen similar behaviour – if it is not just me, I’ll pass it along to the Documentation team.

Happy searching!

Which Privileges Are Required to Use the ADWC Service Console?

The Autonomous Data Warehouse Cloud (ADWC) service provides a Service Console that can be used to monitor the service activity and to carry out a small number of administration tasks (e.g. changing some resource management rules).

The documentation specifically says to use the ADMIN user to login. But, actually, any unlocked database user having the CREATE SESSION privilege can be used to login. This is not only useful, but, in my opinion, necessary. In fact, developers should be able to see the service activity without knowing the password of the ADMIN user.

The documentation provides no information about what privileges are needed to take advantage of the provided functionalities. If you login with a user having only the CREATE SESSION privilege you get either empty charts or errors when trying to use some of the functionality. The following figure illustrates:

The Service Console displays no data when a user without the necessary privileges login.

In other words, the UI is static and privileges are only checked when a specific functionality is used. I would expect something dynamic…

By looking at the roles owned by the ADMIN user, I noticed three roles with a name that could match what I was looking for: CONSOLE_MONITOR, CONSOLE_OPERATOR and CONSOLE_ADMIN. Then, with some trial and error, I came up with the following table that summarizes which role provides which functionality:

Functionality CONSOLE_MONITOR CONSOLE_OPERATOR CONSOLE_ADMIN
Overview No data Data visible Data visible
Activity No data Data visible Data visible
Activity – Cancel execution Unauthorized Authorized Authorized
Administration – Download Client Credentials Unauthorized Authorized Authorized
Administration – Set Resource Management Rules Unauthorized Authorized Authorized
Administration – Set Administrator Password Unauthorized Unauthorized Authorized
Administration – Manage Oracle ML Users Unauthorized Unauthorized Authorized
Administration – Download Oracle Instance Client Authorized Authorized Authorized
Administration – Send Feedback to Oracle Authorized Authorized Authorized

Notes:

  • A user with the CONSOLE_MONITOR role has the same privileges than a user without any role. To me this state of affairs do not look correct. I would expect that the CONSOLE_MONITOR role makes visible the Overview and Activity data.
  • With these roles it is not possible to provide only the privileges to see data without being able to modify something or kill sessions.
  • The “Cancel execution” functionality is actually a “kill session”. My guess is that as soon as the service is upgrade to 18c then new functionality to cancel executions will be used instead.

Rebuilding Indexes: Danger With Clustering Factor Calculation (Chilly Down)

Let me start by saying if you don’t already following Jonathan Lewis’s excellent Oracle blog, do yourself a favour. In a recent article, Jonathan highlighted a danger with rebuilding indexes (or indeed creating an index) when used in relation to collecting index statistics with the TABLE_CACHED_BLOCKS preference. I’ve discussed the importance of the TABLE_CACHED_BLOCKS statistics […]

#Exasol Cluster Architecture

This article gives a more detailed view on the Exasol Cluster Architecture. A high level view is provided here.

Exasol Cluster Nodes: Hardware

An Exasol Cluster is built with commodity Intel servers without any particular expensive components. SAS hard drives and Ethernet Cards are sufficient. Especially there is no need for an additional storage layer like a SAN.

See here for a list of Exasol Certified Servers.

https://uhesse.files.wordpress.com/2018/07/dellpoweredger640.jpg?w=150 150w, https://uhesse.files.wordpress.com/2018/07/dellpoweredger640.jpg?w=300 300w" sizes="(max-width: 400px) 100vw, 400px" />

Disk layout

As a best practice the hard drives of Exasol Cluster nodes are configured as RAID 1 pairs. Each cluster node holds four different areas on disk:

1.OS with 50 GB size containing CentOS Linux, EXAClusterOS and the Exasol database executables

2.Swap with 4 GB size

3.Data with 50 GB size containing Logfiles, Coredumps and BucketFS

4.Storage consuming the remaining capacity for the hard drives for the Data Volumes and Archive Volumes

The first three areas can be stored on dedicated disks in which case these disks are also configured in RAID 1 pairs, usually with a smaller size than those that contain the volumes. More common than having dedicated disks is having servers with only one type of disk. These are configured as hardware RAID 1 pairs. On top of that software RAID 0 partitions are being striped across all disks to contain OS, Swap and Data partition.

https://uhesse.files.wordpress.com/2018/07/disklayout.png?w=150&h=67 150w, https://uhesse.files.wordpress.com/2018/07/disklayout.png?w=300&h=134 300w, https://uhesse.files.wordpress.com/2018/07/disklayout.png?w=768&h=343 768w, https://uhesse.files.wordpress.com/2018/07/disklayout.png 885w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Software Layers

This popular multi-node cluster serves as example to illustrate the concepts explained. It is called 4+1 cluster because it has 4 Active nodes and 1 Reserve node. Active and Reserve nodes have the same layers of software available. The purpose of the Reserve node is explained here. Upon cluster installation, the License Server copies these layers as tar-balls across the private network to the other nodes. The License Server is the only node in the cluster that boots from disk. Upon cluster startup, it provides the required SW layers to the other cluster nodes.

https://uhesse.files.wordpress.com/2018/07/swlayers.png?w=150&h=104 150w, https://uhesse.files.wordpress.com/2018/07/swlayers.png?w=300&h=207 300w, https://uhesse.files.wordpress.com/2018/07/swlayers.png 631w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol License Essentials

There are three types of licenses available:

Database RAM License: This most commonly used model specifies the total amount of RAM that can be assigned to databases in the cluster.

Raw Data License: Specifies the maximum size of the raw data you can store across databases in the cluster.

Memory Data License: Specifies the maximum size of the compressed data you can store across all databases.

For licenses based on RAM, Exasol checks the RAM assignment at the start of the database. If the RAM in use exceeds the maximum RAM specified by the license, the database will not start.

For licenses based on data size (raw data license and memory data license), a periodic check is done by Exasol on the size of the data. If the size limit exceeds the value specified in the license, the database does not permit any further data insertion until the usage drops below the specified value.

Customers receive their license as a separate file. To activate the license, these license files are uploaded to the License Server using EXAoperation.

EXAStorage volumes

Storage Volumes are created with EXAoperation on specified nodes.

EXAStorage provides two kinds of volumes:

Data volumes:

Each database needs one volume for persistent data and one temporary volume for temporary data.

While the temporary volume is automatically created by a database process, the persistent data volume has to be created by an Exasol Administrator upon database creation.

Archive volumes:

Archive volumes are used to store backup files of an Exasol database.

Exasol 4+1 Cluster: Data & Archive Volume distribution

Data Volumes and Archive Volumes are hosted on  the hard drives of the active nodes of a cluster.

They consume the major capacity of these drives. The license server usually hosts EXAoperation.

https://uhesse.files.wordpress.com/2018/07/volumedistribution.png?w=150&... 150w, https://uhesse.files.wordpress.com/2018/07/volumedistribution.png?w=300&... 300w, https://uhesse.files.wordpress.com/2018/07/volumedistribution.png 633w" sizes="(max-width: 620px) 100vw, 620px" />

EXAoperation Essentials

EXAoperation is the major management GUI for Exasol Clusters, consisting of an Application Server and a small Configuration Database, both located on the License Server under normal circumstances. EXAoperation can be accessed from all Cluster Nodes via HTTPS. Should the License Server go down, EXAoperation will failover to another node while the availability of the Exasol database is not affected at all.

https://uhesse.files.wordpress.com/2018/07/exaoperation_version.png?w=150 150w, https://uhesse.files.wordpress.com/2018/07/exaoperation_version.png?w=300 300w" sizes="(max-width: 567px) 100vw, 567px" />

Shared-nothing architecture (MPP processing)

Exasol was developed as a parallel system and is constructed according to the shared-nothing principle. Data is distributed across all nodes in a cluster. When responding to queries, all nodes co-operate and special parallel algorithms ensure that most data is processed locally in each individual node’s main memory.

When a query is sent to the system, it is first accepted by the node the client is connected to. The query is then distributed to all nodes. Intelligent algorithms optimize the query, determine the best plan of action and generate needed indexes on the fly. The system then processes the partial results based the local datasets. This processing paradigm is also known as SPMD (single program multiple data). All cluster nodes operate on an equal basis, there is no Master Node. The global query result is delivered back to the user through the original connection.

https://uhesse.files.wordpress.com/2018/07/shared_nothing.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/07/shared_nothing.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2018/07/shared_nothing.png?w=768&h=514 768w, https://uhesse.files.wordpress.com/2018/07/shared_nothing.png 770w" sizes="(max-width: 620px) 100vw, 620px" />

Above picture shows a Cluster with 4 data nodes and one reserve node. The license server is the only server that boots from disk. It provides the OS used by the other nodes over the network.

Exasol uses a shared nothing architecture. The data stored in this database is symbolized with A,B,C,D to indicate that each node contains a different part of the database data. The active nodes n11-n14 each host database instances that operate on their part of the database locally in an MPP way. These instances communicate and coordinate over the private network.

Exasol Network Essentials

Each Cluster node needs at least two network connections: One for the Public Network and one for the Private Network. The Public Network is used for client connections. 1 Gb Ethernet is sufficient usually. The Private Network is used for the Cluster Interconnect of the nodes. 10 GB Ethernet or higher is recommended for the Private Network. Optionally, the Private Network can be separated into one Database Network (Database Instances communicate over it) and one Storage Network (Mirrored Segments are synchronized over this network).

Exasol Redundancy Essentials

Redundancy is an attribute that can be set upon EXAStorage Volume creation. It specifies the number of copies of the data that is hosted on Active Cluster nodes. In practice this is either Redundancy 1 or Redundancy 2. Redundancy 1 means there is no redundancy, so if one node fails, the volume with that redundancy is no longer available. Typically that is only seen with one-node Clusters. Redundancy 2 means that each node holds a copy of data that is operated on by a neighbor node, so the volume remains available if one node fails.

https://uhesse.files.wordpress.com/2018/07/volume.png?w=150&h=86 150w, https://uhesse.files.wordpress.com/2018/07/volume.png?w=300&h=173 300w, https://uhesse.files.wordpress.com/2018/07/volume.png 657w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Redundancy 2

If volumes are configured with redundancy 2 – which is a best practice – then each node holds a mirror of data that is operated on by a neighbor node. If e.g. n11 modifies A the mirror A‘ on n12 is synchronized over the private network. Should an active node fail, the reserve node will step in starting an instance.

https://uhesse.files.wordpress.com/2018/07/redundancy2.png?w=150&h=102 150w, https://uhesse.files.wordpress.com/2018/07/redundancy2.png?w=300&h=204 300w, https://uhesse.files.wordpress.com/2018/07/redundancy2.png 753w" sizes="(max-width: 620px) 100vw, 620px" />