2008-09-15

MySQL Multimaster Replication in an Asynchronous Environment

By design, MySQL replication occurs asynchronously. That is to say that the replication on a slave doesn't necessarily occur at the same time as on the master. In a multi-master replicated environment (assuming two masters), each master is a slave to the other master. There are a few gotcha's to consider when creating or editing data asynchronously in a multi-master environment. You can get bitten by these issues if using AJAX, threads, or even loading images that are database protected or backed. It's even possible to run across these in an entirely synchronous environment if the replication lag time is high enough.

Let's assume for this discussion we have the following table:

CREATE TABLE users (
    user_id INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user_name CHAR(64) DEFAULT '',
    user_password CHAR(64) DEFAULT '',
    user_last_login DATETIME DEFAULT '0000-00-00 00:00:00',
    UNIQUE KEY (user_name)
);
Let's also assume that we have two servers, master1 and master2.

Auto Increment Conflict

Assume that the following is submitted to master1:

INSERT INTO users VALUES (0, 'user1', 'a609316768619f154ef58db4d847b75e', '1979-09-23');
and let's label this as e1 (event1) and assume that master1 assigns user_id 1 to 'user1'.
The following is then submitted to master2:
INSERT INTO users VALUES (0, 'user2', 'f522d1d715970073a6413474ca0e0f63', '1984-01-02');
and let's label this as e2 (event2) and assume that master2 assigns user_id 1 to 'user2'.
Oops. Now when replication occurs on the other slave, we end up with 'user1' and 'user2' having different values for user_id on each server.
This topic has been covered in depth elsewhere so I won't go into details. For fixes and more information see Advanced MySQL Replication Techniques. Note that I recommend using a UUID/GUID instead of AUTO_INCREMENT to avoid this type of problem however the MySQL function UUID() doesn't work with statement based replication.

Uniqueness Conflict

Let's assume that you have some AJAX code which attempts to create a user, 'user1' which results in the following SQL statement:

INSERT INTO users VALUES (0, 'user1', 'a609316768619f154ef58db4d847b75e', '1979-09-23');
being submitted to master1 and let's assume that for whatever reason you attempt to create the user a second time (timeout occurred, side effect causes another create to happen, etc) against master2.
You would think that the UNIQUE constraint would prevent the creation from occurring the second time however whether or not the statement is executed on both servers and in what order depends on a variety of factors including server lag (the amount of time between a statement being executed on one server and replicated and executed on the second server). This problem means that you can end up with user1 being created on both master1 and master2 but having two different user_id's.
How can you avoid this pitfall? Avoid asynchronous, identical create statements. Make the call synchronous. Additionally you can configure your application to only write to one database for certain statements, essentially from an application level reverting to a typical master-slave replicated environment.
Yes, I have run into this in a production environment.

Update Conflict

Let's assume you submit the following update request to master1:

UPDATE users SET user_last_login = '2008-09-15 17:58:30' WHERE user_id=1;
which is executed on master1 at the time set in user_last_login plus 1 second (at 2008-09-15 17:58:31).
Now the following is submitted to master2:
UPDATE users SET user_last_login = '2008-09-15 17:58:32' WHERE user_id=1;
which is executed on master2 at the same time set in user_last_login plus 1 second (at 2008-09-15 17:58:33).
Lastly, the update on master2 is replicated to master1, and the update on master1 is replicated to master2.
Here is what was executed on master1:
UPDATE users SET user_last_login = '2008-09-15 17:58:30' WHERE user_id=1;
UPDATE users SET user_last_login = '2008-09-15 17:58:32' WHERE user_id=1;
and on master2:
UPDATE users SET user_last_login = '2008-09-15 17:58:32' WHERE user_id=1;
UPDATE users SET user_last_login = '2008-09-15 17:58:30' WHERE user_id=1;
Now we have different values on each server. Uh oh.

Delete Conflict

This can occur when a delete occurs on master1 and before that delete is replicated to master2, an update to the PK used for the delete in statement1 on master1 is executed on master2. Imagine the following is executed on master1:

DELETE FROM user WHERE user_name='user1';
and the following is executed on master2:
UPDATE user SET user_name='user3' WHERE user_name='user1';
Now we have inconsistent data on each server. Uh oh.

Summary

In short, there are a variety of challenges to overcome in a multi-master replication setup. These problems are exacerbated by asynchronous operations on your data set. A few bullets of advice:
  • Reduce complex transactions to be written to a single server.
  • Monitor server lag.
  • Be prepared for failure. The more you distribute your data set and scale your service, the more you will need to deal with failures.

References

No comments: