Search

Top 60 Oracle Blogs

Recent comments

#Exasol Cluster Architecture

This article gives a more detailed view on the Exasol Cluster Architecture. A high level view is provided here.

Exasol Cluster Nodes: Hardware

An Exasol Cluster is built with commodity Intel servers without any particular expensive components. SAS hard drives and Ethernet Cards are sufficient. Especially there is no need for an additional storage layer like a SAN.

See here for a list of Exasol Certified Servers.

https://uhesse.files.wordpress.com/2018/07/dellpoweredger640.jpg?w=150 150w, https://uhesse.files.wordpress.com/2018/07/dellpoweredger640.jpg?w=300 300w" sizes="(max-width: 400px) 100vw, 400px" />

Disk layout

As a best practice the hard drives of Exasol Cluster nodes are configured as RAID 1 pairs. Each cluster node holds four different areas on disk:

1.OS with 50 GB size containing CentOS Linux, EXAClusterOS and the Exasol database executables

2.Swap with 4 GB size

3.Data with 50 GB size containing Logfiles, Coredumps and BucketFS

4.Storage consuming the remaining capacity for the hard drives for the Data Volumes and Archive Volumes

The first three areas can be stored on dedicated disks in which case these disks are also configured in RAID 1 pairs, usually with a smaller size than those that contain the volumes. More common than having dedicated disks is having servers with only one type of disk. These are configured as hardware RAID 1 pairs. On top of that software RAID 0 partitions are being striped across all disks to contain OS, Swap and Data partition.

https://uhesse.files.wordpress.com/2018/07/disklayout.png?w=150&h=67 150w, https://uhesse.files.wordpress.com/2018/07/disklayout.png?w=300&h=134 300w, https://uhesse.files.wordpress.com/2018/07/disklayout.png?w=768&h=343 768w, https://uhesse.files.wordpress.com/2018/07/disklayout.png 885w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Software Layers

This popular multi-node cluster serves as example to illustrate the concepts explained. It is called 4+1 cluster because it has 4 Active nodes and 1 Reserve node. Active and Reserve nodes have the same layers of software available. The purpose of the Reserve node is explained here. Upon cluster installation, the License Server copies these layers as tar-balls across the private network to the other nodes. The License Server is the only node in the cluster that boots from disk. Upon cluster startup, it provides the required SW layers to the other cluster nodes.

https://uhesse.files.wordpress.com/2018/07/swlayers.png?w=150&h=104 150w, https://uhesse.files.wordpress.com/2018/07/swlayers.png?w=300&h=207 300w, https://uhesse.files.wordpress.com/2018/07/swlayers.png 631w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol License Essentials

There are three types of licenses available:

Database RAM License: This most commonly used model specifies the total amount of RAM that can be assigned to databases in the cluster.

Raw Data License: Specifies the maximum size of the raw data you can store across databases in the cluster.

Memory Data License: Specifies the maximum size of the compressed data you can store across all databases.

For licenses based on RAM, Exasol checks the RAM assignment at the start of the database. If the RAM in use exceeds the maximum RAM specified by the license, the database will not start.

For licenses based on data size (raw data license and memory data license), a periodic check is done by Exasol on the size of the data. If the size limit exceeds the value specified in the license, the database does not permit any further data insertion until the usage drops below the specified value.

Customers receive their license as a separate file. To activate the license, these license files are uploaded to the License Server using EXAoperation.

EXAStorage volumes

Storage Volumes are created with EXAoperation on specified nodes.

EXAStorage provides two kinds of volumes:

Data volumes:

Each database needs one volume for persistent data and one temporary volume for temporary data.

While the temporary volume is automatically created by a database process, the persistent data volume has to be created by an Exasol Administrator upon database creation.

Archive volumes:

Archive volumes are used to store backup files of an Exasol database.

Exasol 4+1 Cluster: Data & Archive Volume distribution

Data Volumes and Archive Volumes are hosted on  the hard drives of the active nodes of a cluster.

They consume the major capacity of these drives. The license server usually hosts EXAoperation.

https://uhesse.files.wordpress.com/2018/07/volumedistribution.png?w=150&... 150w, https://uhesse.files.wordpress.com/2018/07/volumedistribution.png?w=300&... 300w, https://uhesse.files.wordpress.com/2018/07/volumedistribution.png 633w" sizes="(max-width: 620px) 100vw, 620px" />

EXAoperation Essentials

EXAoperation is the major management GUI for Exasol Clusters, consisting of an Application Server and a small Configuration Database, both located on the License Server under normal circumstances. EXAoperation can be accessed from all Cluster Nodes via HTTPS. Should the License Server go down, EXAoperation will failover to another node while the availability of the Exasol database is not affected at all.

https://uhesse.files.wordpress.com/2018/07/exaoperation_version.png?w=150 150w, https://uhesse.files.wordpress.com/2018/07/exaoperation_version.png?w=300 300w" sizes="(max-width: 567px) 100vw, 567px" />

Shared-nothing architecture (MPP processing)

Exasol was developed as a parallel system and is constructed according to the shared-nothing principle. Data is distributed across all nodes in a cluster. When responding to queries, all nodes co-operate and special parallel algorithms ensure that most data is processed locally in each individual node’s main memory.

When a query is sent to the system, it is first accepted by the node the client is connected to. The query is then distributed to all nodes. Intelligent algorithms optimize the query, determine the best plan of action and generate needed indexes on the fly. The system then processes the partial results based the local datasets. This processing paradigm is also known as SPMD (single program multiple data). All cluster nodes operate on an equal basis, there is no Master Node. The global query result is delivered back to the user through the original connection.

https://uhesse.files.wordpress.com/2018/07/shared_nothing.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/07/shared_nothing.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2018/07/shared_nothing.png?w=768&h=514 768w, https://uhesse.files.wordpress.com/2018/07/shared_nothing.png 770w" sizes="(max-width: 620px) 100vw, 620px" />

Above picture shows a Cluster with 4 data nodes and one reserve node. The license server is the only server that boots from disk. It provides the OS used by the other nodes over the network.

Exasol uses a shared nothing architecture. The data stored in this database is symbolized with A,B,C,D to indicate that each node contains a different part of the database data. The active nodes n11-n14 each host database instances that operate on their part of the database locally in an MPP way. These instances communicate and coordinate over the private network.

Exasol Network Essentials

Each Cluster node needs at least two network connections: One for the Public Network and one for the Private Network. The Public Network is used for client connections. 1 Gb Ethernet is sufficient usually. The Private Network is used for the Cluster Interconnect of the nodes. 10 GB Ethernet or higher is recommended for the Private Network. Optionally, the Private Network can be separated into one Database Network (Database Instances communicate over it) and one Storage Network (Mirrored Segments are synchronized over this network).

Exasol Redundancy Essentials

Redundancy is an attribute that can be set upon EXAStorage Volume creation. It specifies the number of copies of the data that is hosted on Active Cluster nodes. In practice this is either Redundancy 1 or Redundancy 2. Redundancy 1 means there is no redundancy, so if one node fails, the volume with that redundancy is no longer available. Typically that is only seen with one-node Clusters. Redundancy 2 means that each node holds a copy of data that is operated on by a neighbor node, so the volume remains available if one node fails.

https://uhesse.files.wordpress.com/2018/07/volume.png?w=150&h=86 150w, https://uhesse.files.wordpress.com/2018/07/volume.png?w=300&h=173 300w, https://uhesse.files.wordpress.com/2018/07/volume.png 657w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Redundancy 2

If volumes are configured with redundancy 2 – which is a best practice – then each node holds a mirror of data that is operated on by a neighbor node. If e.g. n11 modifies A the mirror A‘ on n12 is synchronized over the private network. Should an active node fail, the reserve node will step in starting an instance.

https://uhesse.files.wordpress.com/2018/07/redundancy2.png?w=150&h=102 150w, https://uhesse.files.wordpress.com/2018/07/redundancy2.png?w=300&h=204 300w, https://uhesse.files.wordpress.com/2018/07/redundancy2.png 753w" sizes="(max-width: 620px) 100vw, 620px" />