Asterisk / Linux Contingency What will you do when things go wrong?
Options for disaster recovery ► Highly
redundant server system ► Disk level backups ► High Availability Cluster Each has its ups and downs: ► ► ►
A redundant server system with RAID and redundant power = $$$$, but still present a single point of failure (SPOF). Backups are mandatory for any enterprise class system, but still don’t guarantee uptime. High Availability across 2 or more low cost servers is much more cost effective and lowers SPOF significantly.
HA Alternatives ►
Rendundant server systems should have at least RAID1 disk mirroring and dual redundant power supplies – such systems start in the $2500 range.
►
Disk level backups – simple file backups are ok, but require that you first install OS and patches before restoring files. An imaging server for a complete disk image is preferred. MondoArchive is the only disk level imaging system that does not require a dedicated imaging server and can run without shutting down the PBX (mondo is hardware specific).
Backup Design Optimally backups should be stored offsite and/or on multiple media to avoid location related disaster
On to HA ► High
availability has 2 major components:
The heartbeat system (notifies servers of outage) Data synchronization system (syncs data between 2 servers) ►
This presentation will use Open Source Linux HA (http://www.linux-ha.org/) and DRBD (Distributed Replicated Block Device) – DRBD is like RAID1 mirroring, except 1 hard drive is in one server, and the other is across the network in another server.
HA starts and stops services in the event of a failure, or manual shutdown ► DRBD mirrors all data across the network in real time, in this model we assume that only one copy of this mirror will be live at any given time, in the even of a failure the other copy comes live. ► Simple rsync+cron could be used instead of DRBD, but is not as fast or efficient. ►
DRBD HA Diagram
HA can be setup in several different manners, this document uses the following due to its simplicity and effectiveness:
Recommended HA physical layout ► ► ►
► ►
The previous slide describes the physical layout quite well. A 2 node cluster is best for simple failover needs Each node should be connected on a separate subnet using gigabit nics with a crossover cable (no switch) for heartbeat and data sync Each node should have its own dedicated UPS and if possible its own dedicated circuit breaker It would be even better if each node where in 2 separate rooms, or buildings even – it is best to maintain them both on the same local LAN, but could be done over a WAN if speeds permit.
What Can HA Clustering Do For You
HA Details ► HA
sends heartbeat signals back and forth between 2 (or more) servers ► If a failure occurs you can detect it in milliseconds and the standby machine can take over in 5 seconds or less (as long as your network can provide the communication speed) ► Each server has its own IP address plus a floating IP that is controlled by the HA service, the floating IP will be used only on the current live system.
HA Installation ► ►
It is recommended that install all the HA+DRBD packages then use the sample config files here: http://www.astusers.org/ha On CentOS (Redhat) Redhat) based distros (ie trixbox) trixbox) you can use yum to install the HA package: yum install -y heartbeat
DRBD Config ► ►
Unfortunately DRBD is not quite as simple to install – due to package availability*, and partitioning. You will need to either
► ►
a: repartition your drive with an area for the DRBD partition, as as shown on pages 33-6 in the appendix** b: install a 2nd drive dedicated to the DRBD partition (easiest)
If your linux distro maintains up to date packages*, you can use yum to install, unfortunately this is usually not the case. 3 Components of DRBD install:
DRBD binary (uses /etc/drbd.conf) DRBD Kernel Module (version specific to your kernel) DRBD Links (builds links from your file system to the relocated files on the DRBD disk) The following works on CentOS 5.1 with kernel 2.6.18-53.1.4.el5: yum install drbd rpm -ihv http://ubuntu.nad.go.id/repo/apthttp://ubuntu.nad.go.id/repo/apt-centos5/bleeding/drbdcentos5/bleeding/drbd-kmdlkmdl-2.6.182.6.18-53.1.4.el553.1.4.el5-8.2.58.2.521.el5.i686.rpm
rpm -ihv ftp://ftp.tummy.com/pub/tummy/drbdlinks/drbdlinks-1.11-1.noarch.rpm ► ►
Recommendation – install your OS/Distro OS/Distro,, then install a 2nd harddrive for DRBD Recommended – download my ha/drbd ha/drbd config files when you get all the components installed: http://www.astusers.org/ha
►
*Find a complete package or compile from source from http://oss.linbit.com/
DRBD Config Continued #On both nodes(servers): drbdadm create-md share ► #share is the name of your resource in /etc/drbd.conf ► #now on the primary node: drbdadm -- --overwrite-data-of-peer primary all ► # this may take a LONG time to run (in the background) ► # check progress by typing: watch cat /proc/drbd ► #finally on primary node: format the drbd0 partition with a file system: mkfs -t ext3 /dev/drbd0 ► #now go to secondary node and type to sync up to the primary node: drbdadm attach share ► # this may take a LONG time to run (in the background) cat /proc/drbd #should tell you "ds:UpToDate/UpToDate" ► #on primary mount the new file system under the new "share" folder to test: mkdir /share mount /dev/drbd0 /share ►
Final Heartbeat config ► ► ► ► ► ► ► ►
On both servers do the following: Stop all services that need failover and set to manual Set Heartbeat service to automatic Use tar to copy all service specific config files to the DRBD partition – this only need be done on the current master server Add said files and folders to /etc/drbdlinks.conf to automatically build links to the DRBD partition Remove amportal from /etc/rc.local, and build a new amportal script that is HA compliant Edit /etc/ha.d/ha.cf , haresources , and authkeys, as well as /etc/drbd.conf, to meet your needs Finally edit /etc/my.cnf: datadir=/share/var/lib/mysql socket=/share/var/lib/mysql/mysql.sock
Caviots of HA ►
You want to avoid having 2 primary nodes – if both nodes are still up but fail to see each other they would then both become primarys, and data will become out of sync (known as “Split Brain”) Use ipfail and pingd/ping_group in your ha.cf to minimize this possibility
►
► ► ►
If you mess something up (delete a file, change a user) – the mistake will instantly be synchronized to both servers – regular backups with offsite or removable media should still be used. Normal RAID1 in each server is still recommended to increase uptime, but not required - a failed disc will disconnect all calls if failover occurs. Supports auto failover of SIP, T1/PRI trunks (using Redfone TDMoE hardware) or analog lines wired in parallel. So far IAX wont use a floating IP, it must use the real IP of the primary server (haven’t really investigated this yet).
Resolving Failover Problems ► ► ► ► ► ► ► ► ► ► ► ► ► ► ► ► ►
Resolving "unclean" failovers in which your data becomes out of sync between 2 primary nodes (aka "Split Brain"): #Check both nodes to see that they are both running StandAlone status, run: cat /proc/drbd /proc/drbd #Stop Heartbeat services on both nodes: service heartbeat stop #One of the nodes must discard its data, and allow the other to overwrite it, on node to discard run: drbdadm secondary share #(assuming "share" is your DRBD resource name) drbdadm -- --discard --discard--mymy-data connect share #On the other node (the split brain survivor - aka data you wish to keep), #if its connection state is also StandAlone, StandAlone, you would run: drbdadm connect share #(assuming "share" is your DRBD resource name) #Allow the 2 file systems to sync up, check by: watch cat /proc/drbd /proc/drbd #Then it is recommended to reboot the 2 systems and let HA services services once again manage the cluster and file systems, if HA seems to be having further problems, restart restart the HA services and run: tail –f /var /log/ha--log /var/log/ha
HA Advanced config (HA v2) ► ► ►
This document does not have the capacity to cover HA v2, and uses HAv1 which is far simpler. HAv1: uses 2 simple files (/etc/ha.d/ha.cf + haresources) to configure and manage the cluster HAv2: Uses a complex xml config database that offers many advanced options- primarily/most importantly resource monitoring, rather than simple server heartbeat monitoring: /var/lib/heartbeat/crm/cib.xml If your service (ie asterisk) provides proper “status” information, HA can monitor that status and do so on several different services, however if the service cannot provide failure notifications through monitoring/status queries, you must custom build such capacity using scripting (OCF Resource Agent)
Some Linux-HA Terminology
Key Linux-HA Processes (r2)
Linux-HA Architecture (r2)
Appendix & Notes ► ► ► ► ► ► ► ► ► ►
Good references: http://www.trixbox.org/forums/trixboxhttp://www.trixbox.org/forums/trixbox-forums/openforums/open-discussion/hadiscussion/ha-cluster <<decent guide http://www.danielaliaman.com/blog/files/phonecube/cluster/AsteriskCluster.pdf http://www.danielaliaman.com/blog/files/phonecube/cluster/AsteriskCluster.pdf <<notes on partitions** http://www.linuxhttp://www.linux-ha.org/DRBD http://www.linuxhttp://www.linux-ha.org/DRBD/HowTov2 http://www.drbd.org/usershttp://www.drbd.org/users-guide/ http://wiki.centos.org/HowTos/Hahttp://wiki.centos.org/HowTos/Ha-Drbd http://www.voiphttp://www.voip-info.org/wiki/view/Asterisk+High+Availability+Solutions http://www.tummy.com/Community/software/drbdlinks/ ftp://ftp.tummy.com/pub/tummy/drbdlinks/
►
http://www.linuxhttp://www.linux-ha.org/_cache/HeartbeatTutorials__LCA2007ha.org/_cache/HeartbeatTutorials__LCA2007-tutorial.pdf <
► ►
http://www.redhttp://www.red-fone.com/assets/documents/Trixbox_FB2_Heartbeat_Tutorial.pdf trixbox + Redfone http://support.redhttp://support.red-fone.com/downloads/tools/HA_Whitepaper.pdf
► ► ► ►
Credits ►
Major credit goes to Alan Robertson for material taken from his extensive HA guide
►
Asterisk, Digium and the Asterisk logo are registered trademarks of Digium Corporation DRBD is a registered trademark of Linbit Linux is a registered trademark of Linus Torvalds Redfone and Fonebridge are registered trademarks of Redfone Communications LLC. trixbox is a registered trademark of Fonality Inc. All other trademarks are property of their respective owners. Finally – me: This presentation was organized by John Hyde, this document is copyright Simple Technologies under GPLv3 – if you modify it please contribute your modifications back.
► ► ► ► ► ►
►
Please check back at www.astusers.org/ha for future additions to this document.