onsdag 26 februari 2014

Easy HA and automatic failover using MySQL Fabric - Part II


In Part I of this blog series we installed and configured our MySQL Fabric, now we are ready to start test our python application against MySQL Fabric.

Most of this is covered in first blog but lets do a short recap. We now have a HA-group consisting of 3 MySQL servers, these are running on ports 63307 - 63309. We have also configured one MySQL instance 63306 to act as backing store for MySQL Fabric, that is holding all fabric meta data for us.


Now it time to start focus on application side of things, as we see in picture a key component is the fabric aware connector, it is here where all the magic is happening, the connector also store the state stored in backing store, so we do not do any unnecessary lookup to Fabric node for every statement executed.




HA and failover


The python program we will be using is a slightly modified version of our example program in "quick start" from here. I have altered group name, changed connection parameters, error handling and also added a forever loop at end to make things a bit more interesting. The program will firstly add 10 rows into table test.employees and then each half second add one more row and select last 5 added rows. If you want to stop program just hit Ctrl-c. Also, as you might see I'm not a natural python coder, so any comments on the code will be silently ignored ;)

import mysql.connector
from mysql.connector import fabric

import time


def add_employee(conn, emp_no, first_name, last_name):
    try:
       conn.set_property(group="mygroup", mode=fabric.MODE_READWRITE)
       cur = conn.cursor()
       cur.execute(
          "INSERT INTO employees VALUES (%s, %s, %s)",
          (emp_no, first_name, last_name)
       )
    except mysql.connector.Error:
       print "Database connection error, trying to reconnect ..."
       conn=connect()

def find_employee(conn, emp_no):
    try:
       conn.set_property(group="mygroup", mode=fabric.MODE_READONLY)
       cur = conn.cursor()
       cur.execute(
           "SELECT first_name, last_name FROM employees "
           "WHERE emp_no = %s", (emp_no, )
        )
    except mysql.connector.Error:
       print "Database connection error, trying to reconnect ..."
       conn=connect()
    for row in cur:
        print row

# Address of the Fabric, not the host we are going to connect to.
def connect():
    try:
       conn=mysql.connector.connect(
          fabric={"host" : "localhost", "port" : 8080},
          user="root", database="test", password="root",
          autocommit=True
       )
    except mysql.connector.Error:
       print "Error trying to get a new database connection"
       quit()
    return conn

conn = connect()
conn.set_property(group="mygroup", mode=fabric.MODE_READWRITE)
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS employees")
cur.execute(
    "CREATE TABLE employees ("
    "   emp_no INT, "
    "   first_name CHAR(40), "
    "   last_name CHAR(40)"
    ")"
    )

high=10
for x in range(0, high):
   add_employee(conn, x, "John"+":"+str(x), "Doe")

high+=1
while True:

   add_employee(conn, high, "John"+":"+str(high), "Doe")
   time.sleep(0.5)
   for x in range(high-5, high):
      find_employee(conn, x)
   high+=1

So, lets start this application up and see what happens.

# Start demo program
bash$ python fabric-demo.py

(u'John:0', u'Doe')
(u'John:1', u'Doe')
(u'John:2', u'Doe')
(u'John:3', u'Doe')
(u'John:4', u'Doe')
(u'John:5', u'Doe')
 ......

It's working, lets see how the load is spread.

Uptime: 9176  Threads: 5  Questions: 200842  Slow queries: 0  Opens: 74  Flush tables: 1  Open tables: 64  Queries per second avg: 21.887
ted@ted-Latitude-E6410:/opt/MySQL-fabric/src$ mysqladmin -uroot -proot -h127.0.0.1 -P63308 status
Uptime: 9178  Threads: 6  Questions: 211802  Slow queries: 0  Opens: 76  Flush tables: 1  Open tables: 65  Queries per second avg: 23.077
ted@ted-Latitude-E6410:/opt/MySQL-fabric/src$ mysqladmin -uroot -proot -h127.0.0.1 -P63309 status
Uptime: 9181  Threads: 5  Questions: 212836  Slow queries: 0  Opens: 74  Flush tables: 1  Open tables: 64  Queries per second avg: 23.182

All MySQL servers in the group are sharing the work load, that's great!

Switchover for regular maintenance

This is for when you need to take your primary offline and do regular maintenance, this could be anything from OS update or adding more disk to machine. All we need to do in  this case if to tell MySQL Fabric to promote a new primary, since we like control we can also tell Fabric wich server we want to elect as primary.


bash$ mysqlfabric group lookup_servers mygroup
Command :
{ success     = True
  return      = [['fae19070-9ea5-11e3-8d61-5c260a4a26ad', '127.0.0.1:63307', False, 'SECONDARY'], ['fae19b5f-9ea5-11e3-8d61-5c260a4a26ad', '127.0.0.1:63308', True, 'PRIMARY'], ['fae19b62-9ea5-11e3-8d61-5c260a4a26ad', '127.0.0.1:63309', False, 'SECONDARY']]
  activities  =
}


bash$ mysqlfabric group promote mygroup fae19070-9ea5-11e3-8d61-5c260a4a26ad
Procedure :
{ uuid        = 37e80ebe-9a1f-4d26-a647-d62ce6c49fd1,
  finished    = True,
  success     = True,
  return      = True,
  activities  =
}

bash$ mysqlfabric group lookup_servers mygroup
Command :
{ success     = True
  return      = [['fae19070-9ea5-11e3-8d61-5c260a4a26ad', '127.0.0.1:63307', True, 'PRIMARY'], ['fae19b5f-9ea5-11e3-8d61-5c260a4a26ad', '127.0.0.1:63308', False, 'SECONDARY'], ['fae19b62-9ea5-11e3-8d61-5c260a4a26ad', '127.0.0.1:63309', False, 'SECONDARY']]
  activities  =
}

First we ran lookup_servers to see the current state of Fabric, next we promote MySQL instance in port 63307 to be new primary and finally we verify that our changes went through. Most of the time this operation is not affecting the application but somethime we see that the "Database connection error, trying to reconnect ..." is triggered and application reconnects.

Failover in case of Primary failure

Now, the interesting part begins, let's see what happens when we kill our primary server in the group. First we need to find our what server is primary.


bash$ mysqlfabric group lookup_servers mygroup
Command :
{ success     = True
  return      = [['ef4593df-9e2b-11e3-8a45-5c260a4a26ad', '127.0.0.1:63307', True, 'PRIMARY'], ['ef49ce23-9e2b-11e3-8a45-5c260a4a26ad', '127.0.0.1:63309', False, 'SECONDARY'], ['ef49d0b6-9e2b-11e3-8a45-5c260a4a26ad', '127.0.0.1:63308', False, 'SECONDARY']]
  activities  =
}

So, MySQL instance running on port 63307 is elected as primary server for my group. Get pid for this instance and lets kill it!

bash$ pgrep -fl mysql | grep 63307
ted@ted-Latitude-E6410:/opt/MySQL-fabric$ pgrep -fl mysql | grep 63307
24762 .......

bash$ kill -9 24762

In the fabric log you will now see:
[INFO] 1393349679.578709 - FailureDetector(mygroup) - Server (ef4593df-9e2b-11e3-8a45-5c260a4a26ad) in group (mygroup) has been lost.
[INFO] 1393349679.578908 - FailureDetector(mygroup) - Master (ef4593df-9e2b-11e3-8a45-5c260a4a26ad) in group (mygroup) has been lost.

Application is still working, no hickups and nothing stopped, let's see how MySQL Fabric has reorganized itself.

bash$ mysqlfabric group lookup_servers mygroup
Command :
{ success     = True
  return      = [['ef4593df-9e2b-11e3-8a45-5c260a4a26ad', '127.0.0.1:63307', False, 'FAULTY'], ['ef49ce23-9e2b-11e3-8a45-5c260a4a26ad', '127.0.0.1:63309', True, 'PRIMARY'], ['ef49d0b6-9e2b-11e3-8a45-5c260a4a26ad', '127.0.0.1:63308', False, 'SECONDARY']]
  activities  =
}

MySQL instance on port 63307 is now considered as Faulty by Fabric and MySQL instance running on port 63309 took over as primary. Everything worked as expected. First time I forgot to run command to set group as active "mysqlfabric group activate mygroup", this meant no automatic failure detection and no new primary was elected so my python program died ...
If you don't want automatic failure detection or need to take your HA group offline you can always  deactivate the group, if you do this you need to run the promote command manually to get new primary if your current one dies.

Conclusion

I'm very positive by this first encounter with MySQL Fabric, I like the fact that it is using the connectors to handle failover, no need to have VIP address or DNS manipulation solution in front of MySQL to achieve failover, of course you can always invest in a load balancer but this is normally very costly and adds one more product to take care of.

As a future wish I hope we will have support for Fabric in all our connectors so all of you can leverage the usefulness of MySQL Fabric!

6 kommentarer:

  1. What Fabric aware drivers are you talking about specifically? You only show one.

    SvaraRadera
    Svar
    1. Hi Ulf,

      I'm sorry if I mislead you to believe I would cover more connectors in this series of blogs, the purpose of these 2 blogs was to see how MySQL Fabric could be used as a HA option.

      If you are referring to my conclusion it is only my humble wish that everyone could use fabric from any programming language. Personally I would like to see this Fabric aware connectors for Perl and C.

      Cheers,
      Ted

      Radera
  2. Great posts Ted.

    Regarding this sample application, I'd tried something similar and stumbled over a race condition that I think will be in your code too.

    You add the employee rows and then imediately read them back - not a problem with a standalone database. However, in this case the rows are being written to the master and then will be *asynchronously* replicated to the slaves. You've set the connection type to read only when you read the rows back which means that the reads could be sent to the slaves. There's a chance (and more likely if running the myslds on different machines) that the rows may not have been written to the slave database at that point.

    I've raised a bug report against the documentation for this.

    SvaraRadera
    Svar
    1. Hi Andrew,

      Good catch, I also spotted this problem when starting to test the code, I worked around this by adding a well placed sleep(0.5) in between insert into master and select from slave, perhaps not my proudest moment as a developer but it works. I guess this "sleep" should be tuned to latency to ship statement to slave instance + time for SQL thread to apply statement.

      Cheers,
      Ted

      Radera
  3. What happens if the fabric node goes down? Does this break HA?

    SvaraRadera
  4. It is not a problem if fabric node stops, all information from fabric is cached in connector layer but sooner or later you will need the fabric node and only option to make fabric data HA today is by storing it's data inside MySQL Cluster or other HA solution. Remember it's still early days for fabric and I'm pretty sure that for upcoming versions (RC and GA) of MySQL Fabric we will hopefully see a better/easier solution to make fabric node HA.

    SvaraRadera