How to set up MariaDB Galera multi-master synchronous replication with Debian 10

How to set up MariaDB Galera multi-master synchronous replication with Debian 10

MariaDB offers two different high availability (HA) and cluster solutions. The first is the standard MariaDB master / slave replication, which can be configured into various topologies, mainly for load balancing, HA and backup. The second is MariaDB Galera, a multi-master synchronous cluster solution. Its main features are as follows:

  • Multi-master: All nodes in the Galera cluster can perform read and write operations, which provides better scalability.
  • Nodes can automatically join the cluster and be evicted in the event of a failure.
  • Galera replication is synchronous, which means that changes on one node can be guaranteed to be applied to other nodes. In theory, this ensures that no data is lost in the event of a node failure.

This guide will guide you through installing MariaDB and its configuration in a Galera cluster. We will use three Debian 10 nodes for the demonstration, although any number (≥3) of nodes can be used. Technically, it is possible to set up two nodes in a Galera cluster, but it does not provide fault tolerance because a failed node will cause the other node to stop.

Claim

  • Three or more Debian 10 instances.
  • Access the root user or any user with sudo privileges.
  • of $ Editor Environment variables should be set.

Note: Galera clusters can work over WAN or LAN. If your nodes share a private network, use a private IP address where applicable. Otherwise, a WAN address should be used.

If you are using the sudo user, use the following command to open and make this setting using the root shell:

sudo -s

Step 1: Install MariaDB

This step should be performed on all nodes.

Use the following commands to install MariaDB, Galera library and Rsync. The latter is used by Galera.

apt update
apt install -y mariadb-server mariadb-client galera-3 rsync

Make sure MariaDB service is enabled:

systemctl enable mariadb.service

Protect your MariaDB instance with the mysql_secure_installation script:

mysql_secure_installation

Answer the questions shown below and make sure to choose a strong password for the MySQL root user.

Enter current password for root (enter for none): Press 
Set root password? [Y/n] y
New password: your_password
Re-enter new password: your_password
Remove anonymous users? [Y/n] y
Disallow root login remotely? [Y/n] y
Remove test database and access to it? [Y/n] y
Reload privilege tables now? [Y/n] y
All done!  If you've completed all of the above steps, your MariaDB
installation should now be secure.

Step 2: Configure MariaDB

This step should be performed on all nodes.

Stop the MariaDB service on all nodes:

systemctl stop mariadb.service

By default, the MariaDB daemon only listens for connections on localhost. For the cluster to work, it should be changed to an externally accessible address. To do this, edit the options file /etc/mysql/mariadb.conf.d/50-server.cnf:

$EDITOR /etc/mysql/mariadb.conf.d/50-server.cnf

Find the following line:

bind-address = 127.0.0.1

If you are using a private network for your cluster and do not want to expose MariaDB to other networks (i.e. WAN), specify a local IPv4 address for each node. Otherwise, use 0.0.0.0, which instructs MariaDB to listen on all interfaces. E.g:

bind-address = 0.0.0.0

Save your changes and exit the text editor.

Now we will configure the cluster related options. Create a new options file:

$EDITOR /etc/mysql/mariadb.conf.d/99-cluster.cnf

Enter the following sensible configuration in the file to replace the IP address. It should be the same on all nodes.

[galera]
wsrep_on = on 
wsrep_provider = /lib/galera/libgalera_smm.so 
wsrep_cluster_address = gcomm://192.0.2.1,192.0.2.2,192.0.2.3 
wsrep_cluster_name = galera_cluster_0 

default_storage_engine = InnoDB 
innodb_autoinc_lock_mode = 2 
innodb_doublewrite = 1 

binlog_format = ROW
  • wsrep_on = on enables write set replication, which is a basic feature used by Galera.
  • wsrep_provider specifies the path to the galera library. It is provided by the galera-3 package at /lib/galera/libgalera_smm.so on Debian 10.
  • wsrep_cluster_address should contain at least one address of another cluster member. It is recommended to list all members of the cluster. No specific order is required.
  • wsrep_cluster_name should be unique to the cluster and should be the same on all nodes in the same galera cluster.
  • The remaining options are required for proper operation of Galera and should not be changed.

Step 3: Boot the cluster

Before proceeding, make sure MariaDB is stopped / inactive on all nodes:

systemctl status mariadb.service

To start the cluster, you first need to create a node. On Debian 10, this can be done using the galera_new_cluster script. The script can only be executed on one node and the cluster can be initialized only once.

galera_new_cluster

This will start MariaDB on the current node. Make sure it works with:

systemctl status mariadb.service

Then start MariaDB on the other nodes using the following command:

systemctl start mariadb.service

The cluster should now be operational.

Step 4: Testing

To ensure that the cluster works as expected, select any node and log in to MariaDB:

mysql -u root -p

Issue the following statement to create the database:

> CREATE DATABASE test0;
> q

Then check this new database on all other nodes:

mysql -u root -p -e "SHOW DATABASES;"

The above command should return a list containing test0:

+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test0              |
+--------------------+

You might want to test more thoroughly by writing to the cluster from each node. When you are satisfied with the test, clear all unnecessary databases from the cluster. Any node can be used.

mysql -u root -p -e "DROP DATABASE test0;"

Step 5: Troubleshooting tips

Use the following query to view information about the current status of the node / cluster:

mysql -u root -p -e "SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_REPL_LATENCY','WSREP_EVS_DELAYED','WSREP_READY');"

A healthy 3-node cluster should return the following:

+---------------------------+----------------+
| VARIABLE_NAME             | VARIABLE_VALUE |
+---------------------------+----------------+
| WSREP_CLUSTER_SIZE        | 3              |
| WSREP_CLUSTER_STATUS      | Primary        |
| WSREP_EVS_DELAYED         |                |
| WSREP_EVS_REPL_LATENCY    | 0/0/0/0/0      |
| WSREP_LOCAL_STATE_COMMENT | Synced         |
| WSREP_READY               | ON             |
+---------------------------+----------------+
  • WSREP_CLUSTER_SIZE indicates the current number of nodes in the cluster component.
  • WSREP_CLUSTER_STATUS represents the status of the cluster components, not the entire cluster.
  • WSREP_EVS_DELAYED shows a list of delayed nodes. The value of the healthy cluster should be empty.
  • WSREP_EVS_REPL_LATENCY shows the replication delay in min / avg / max / stddev / samplesize format. These values ​​are displayed in seconds. High latency can cause performance degradation.
  • WSREP_LOCAL_STATE_COMMENT displays the current node status.
  • WSREP_READY indicates whether the node can accept queries.

When a node in a 3-node cluster loses connectivity, the cluster is divided into a major component consisting of 2 nodes and a non-primary component. Major components are not affected by the failure and continue to operate normally. From a non-major component perspective, the query shown above will return the following:

+---------------------------+--------------------------------------------------------------------------------------------------------------------------------+
| VARIABLE_NAME             | VARIABLE_VALUE                                                                                                                 |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------------+
| WSREP_CLUSTER_SIZE        | 1                                                                                                                              |
| WSREP_CLUSTER_STATUS      | non-Primary                                                                                                                    |
| WSREP_EVS_DELAYED         | 6b7864f2-fe7d-11e9-84ab-93e58c0d2907:tcp://192.0.2.1:4567:3,a421be89-fe7d-11e9-a91e-7e62f7562e58:tcp://192.0.2.3:4567:2        |
| WSREP_EVS_REPL_LATENCY    | 0/0/0/0/0                                                                                                                      |
| WSREP_LOCAL_STATE_COMMENT | Initialized                                                                                                                    |
| WSREP_READY               | OFF                                                                                                                            |
+---------------------------+--------------------------------------------------------------------------------------------------------------------------------+

Note the WSREP_EVS_DELAYED value, which indicates a connection problem to other nodes.

On the main component node, the same query returns:

+---------------------------+----------------------------------------------------------------+
| VARIABLE_NAME             | VARIABLE_VALUE                                                 |
+---------------------------+----------------------------------------------------------------+
| WSREP_CLUSTER_SIZE        | 2                                                              |
| WSREP_CLUSTER_STATUS      | Primary                                                        |
| WSREP_EVS_DELAYED         | a2217526-fe7d-11e9-8692-1f2f0cdb403d:tcp://192.0.2.2:4567:2    |
| WSREP_EVS_REPL_LATENCY    | 0/0/0/0/0                                                      |
| WSREP_LOCAL_STATE_COMMENT | Synced                                                         |
| WSREP_READY               | ON                                                             |
+---------------------------+----------------------------------------------------------------+

To recover from a single node failure, no manual intervention is required. When a failed node reconnects to the cluster, it automatically synchronizes with the cluster.

More information

For advanced configuration options, see Galera Cluster System Variables.

Source

Sidebar