Wednesday, 20 July 2016

Setting up HA with HAProxy and Keepalived in AWS

Typically (or rather by default) keepalived uses multicast to make decisions dependent on host availability - although on cloud platforms like AWS, Google Developer Cloud etc. multicast is not currently supported and hence we must instruct keepalived to use unicast instead.

For this exercise there will be two HAProxy instances (a slave and a master node) that will share an elastic IP between the two of them using keepalived to perform the switch over where nescasery.

These two load balances will then interact with two backend application servers - which in turn themselves interact with it's own backend database server that have SQL replication setup.

On the master we should firstly ensure the system is up-to-date and it has the relevant version of haproxy installed (which is anything > 1.2.13.)

yum update && yum install haproxy keepalived

Ensure both of them startup on boot:

systemctl enable keepalived
systemctl enable haproxy

Now in a normal environment keepalived does a great job of automatically assigning the shared IP to the nescasery host - although due to (static IP configuration) limitations within AWS this is not possible and instead we should instruct keepalived to run a script when a failover should occur - which will simply utilize the AWS API by re-associating an elastic IP from the master to the slave (or visa versa.)

We should replace the keepalived.conf configuration as follows:

sudo mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orig
sudo vi /etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
script "pidof haproxy"
interval 5
 fall 2 # fail twice before failing test
 rise 2 # ensure is successful twice before passing test
}

vrrp_instance VI_1 {
   debug 2
   interface eth0              
   state MASTER
   virtual_router_id 51        
   priority 101                
   unicast_src_ip 10.11.12.201    
   unicast_peer {
       5.6.7.8                
   }
   track_script {
       chk_haproxy
   }
   notify_master /usr/libexec/keepalived/failover.sh
}

and add the following on the slave node:

sudo cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orig
sudo vi /etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
script "pidof haproxy"
        interval 2
}

vrrp_instance VI_1 {
   debug 2
   interface eth0              
   state BACKUP
   virtual_router_id 51        
   priority 100                
   unicast_src_ip 10.11.13.202    
   unicast_peer {
       1.2.3.4                
   }
   track_script {
       chk_haproxy
   }
   notify_master /usr/libexec/keepalived/failover.sh
   notify_fault  /usr/libexec/keepalived/failover_fault.sh
}

Now we will create the script defined in the 'notify_master' section - although before we do this we should use AWS IAM to create and configure the relevant role for our servers so they are able to use the AWS CLI to switch the elastic IP's.

I create a policy with something like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:AssignPrivateIpAddresses",
                "ec2:AssociateAddress",
                "ec2:DescribeInstances"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

* Although I would reccomended specifying the resource specifcially to tighten it up.

Now we will create the script (on both nodes):

sudo vi /usr/libexec/keepalived/failover.sh
chmod 700 /usr/libexec/keepalived/failover.sh

#!/bin/bash

ALLOCATION_ID=eipalloc-123456
INSTANCE_ID=i-123456789
SECONDARY_PRIVATE_IP=172.30.0.101

/usr/bin/aws ec2 associate-address --allocation-id $ALLOCATION_ID --instance-id $INSTANCE_ID --private-ip-address $SECONDARY_PRIVATE_IP --allow-reassociation

and then (on each node) configure the AWS CLI:

aws configure

For the networking side we will have a single interface on each node - although both of them will have a secondary IP (which we will use to assosiate with our elastic IP.) The IP's of the two machines will also be in separate subnet's since they are spread accross two availability zones.

We should now start keepalived on both hosts:

sudo service keepalived start
sudo service haproxy start

** You WILL almost certainly come accross problems with SELinux (if it's enabled) - ensure you check your audit.log for any related messages and resolve those problems before continuing! **

We should now see the following on the master node:

tail -f /var/log/messages

Jul 18 15:34:40 localhost Keepalived_vrrp[27585]: VRRP_Script(chk_haproxy) succeeded
Jul 18 15:34:40 localhost Keepalived_vrrp[27585]: Kernel is reporting: interface eth0 UP
Jul 18 15:34:40 localhost Keepalived_vrrp[27585]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jul 18 15:34:41 localhost Keepalived_vrrp[27585]: VRRP_Instance(VI_1) Entering MASTER STATE
Jul 18 15:34:59 localhost Keepalived_vrrp[27585]: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Jul 18 15:34:59 localhost Keepalived_vrrp[27585]: VRRP_Instance(VI_1) Received lower prio advert, forcing new election

and then on the slave node:

tail -f /var/log/messages

Jul 18 15:34:54 localhost Keepalived_vrrp[27641]: VRRP_Script(chk_haproxy) succeeded
Jul 18 15:34:55 localhost Keepalived_vrrp[27641]: Kernel is reporting: interface eth0 UP
Jul 18 15:34:59 localhost Keepalived_vrrp[27641]: VRRP_Instance(VI_1) Transition to MASTER STATE
Jul 18 15:34:59 localhost Keepalived_vrrp[27641]: VRRP_Instance(VI_1) Received higher prio advert
Jul 18 15:34:59 localhost Keepalived_vrrp[27641]: VRRP_Instance(VI_1) Entering BACKUP STATE

We will now configure the HAProxy portion by replacing the existing haproxy config:

mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig

vi /etc/haproxy/haproxy.cfg

global
    daemon
    maxconn 4000
    stats socket /var/run/haproxy.sock mode 600 level admin
    stats timeout 2m
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    tcp
    option  tcplog
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend www
    bind 172.30.0.241:80
    default_backend webserver_pool

backend webserver_pool
    balance roundrobin
    mode http
    option httplog
    option  httpchk    GET /someService/isAlive
    server  serverA 10.11.12.13:8080 check inter 5000 downinter 500    # active node
    server  serverB 10.12.13.14:8080 check inter 5000 backup           # passive node

listen admin
    bind 172.30.0.241:8777
    stats enable
    stats realm   Haproxy\ Statistics
    stats auth    adminuser:secure_pa$$word!

Finally reload both haproxy instances to apply the new configuration.

0 comments:

Post a comment