Thursday, February 24, 2011

Force Replication Between All Active Directory Servers

Occasionally, I have to troubleshoot Active Directory issues between branch offices and I can never remember all of the resync arguments for the repadmin.exe command. So I'm posting it here.

repadmin /syncall /A /e /P

This will force the executing DC to sync with all NC's known to it.

You should see something like this * number of NC in your domain:

Syncing all NC's held on ATLAS.
Syncing partition: DC=ForestDnsZones,DC=my,DC=corp,DC=com
CALLBACK MESSAGE: The following replication is in progress:
    From: c2fa9a13-bc15-419c-b416-21e6da3d5760._msdcs.my.corp.com
    To  : cee785b6-01fe-490c-8e50-5199841a1b58._msdcs.my.corp.com
CALLBACK MESSAGE: The following replication is in progress:
    From: c2fa9a13-bc15-419c-b416-21e6da3d5760._msdcs.my.corp.com
    To  : 62aa2e39-9c52-4eef-a789-f201350c0b02._msdcs.my.corp.com
CALLBACK MESSAGE: The following replication completed successfully:
    From: c2fa9a13-bc15-419c-b416-21e6da3d5760._msdcs.my.corp.com
    To  : cee785b6-01fe-490c-8e50-5199841a1b58._msdcs.my.corp.com
CALLBACK MESSAGE: The following replication completed successfully:
    From: c2fa9a13-bc15-419c-b416-21e6da3d5760._msdcs.my.corp.com
    To  : 62aa2e39-9c52-4eef-a789-f201350c0b02._msdcs.my.corp.com
CALLBACK MESSAGE: SyncAll Finished.
SyncAll terminated with no errors.

How to automatically connect Windows 7 or 2008 R2 VPN on start up

Do you have a Windows 7 or 2008 R2 machine that needs to automatically connect to a VPN? Here are some instructions on configuring the Task Scheduler to do this for you.

My thanks to RpCahoon for providing his helpful post on Microsoft's Social Answers site. I'm also giving Microsoft a nod for doing such thorough job with the modern Task Scheduler.

Instructions
  1. Open Task Scheduler
    Start > All Programs > Accessories > System Tools > Task Scheduler
  2. Click "Create Task" in the Actions pane on the right
  3. General Tab
    1. Provide a logical name for the task like "Auto VPN"
    2. Switch the running task mode to Run whether user is logged on or not
    3. Enable the Run with highest privileges option
    4. Change the "Configure for:" drop-down to Windows 7, Windows Server 2008 R2
  4. Triggers Tab
    1. Click the "New..." button
    2. Change "Begin the task:" to At start up
    3. (Optional) Enable "Delay task for" and set to 5 minutes. This give the machine a chance to idle down before launching the VPN.
  5. Actions Tab
    1. Click the "New..." button
    2. Enter c:\windows\system32\rasdial.exe in the "Program/script:" field. You can also browse to it if you don't want to type it or your default Windows install directory is different.
    3. Type the connection name in the "Add arguments" field. The rasdial.exe requires you wrap the connection name in quotes if it has spaces. You may also need to append the connection's username and password if they are required.
  6. Conditions Tab
    1. Un-check all of the options on the conditions tab.
  7. Settings Tab
    1. (Optional) enable "If the task fails, restart every:" and set to an appropriate value. I set mine to 1 hour in case there is a problem on the VPN server's end. 
    2. (Optional) set the "Attempt to restart up to:" value to an acceptable number. My default is 72 times at a 1 hour interval. This covers long weekend.
  8. Save the new task

Friday, February 18, 2011

Get those FSYNC numbers up on your ZFS pool

For the last week, I've been trying to figure out why our 10 drive ZFS zpool has been delivering such lousy NFS performance Proxmox KVM cluster.

Here's what pveperf was returning:
pveperf /mnt/pve/kvm-images/
CPU BOGOMIPS:      76608.87
REGEX/SECOND:      896132
HD SIZE:           7977.14 GB (xxx.xxx.xxx.xxx:/volumes/vol0/kvm-images)
FSYNCS/SECOND:     23.15
DNS EXT:           58.84 ms
DNS INT:           1.50 ms (my.company.com)

The zpool looked like this:
zpool status vol0
  pool: vol0
 state: ONLINE
 scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        vol0                       ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c0t5000C50010377B5Bd0  ONLINE       0     0     0
            c0t5000C5001037C317d0  ONLINE       0     0     0
            c0t5000C5001037EED7d0  ONLINE       0     0     0
            c0t5000C50010381737d0  ONLINE       0     0     0
            c0t5000C50010381BBBd0  ONLINE       0     0     0
            c0t5000C50010382777d0  ONLINE       0     0     0
            c0t5000C5001038291Fd0  ONLINE       0     0     0
            c0t5000C500103870A3d0  ONLINE       0     0     0
            c0t5000C500103871C3d0  ONLINE       0     0     0
            c0t5000C500103924E3d0  ONLINE       0     0     0
            c0t5000C500103941F7d0  ONLINE       0     0     0
        cache
          c0t50015179591D9AEFd0    ONLINE       0     0     0
          c0t50015179591DACA1d0    ONLINE       0     0     0
          c1t2d0                   ONLINE       0     0     0
        spares
          c0t5000C50010395057d0    AVAIL   

errors: No known data errors

Raw write speed wasn't a problem. Tests of copying DVD iso files were supper fast over the 10G network backbone. But the performance of creating new files and folders really hurt. This was very apparent when I started using bonnie++ on the NFS shares from the Proxmox nodes. Bonnie++ zipped along until it started its "Create files..." tests. The Linux client would practically lock up.

So a little Goolge ZFS keyword searching later, I came across Joe Little's blog post, ZFS Log Devices: A Review of the DDRdrive X1. This got me thinking about my zpool setup. Looking at the configuration again, I realized that I'd made a mistake and added the second Intel X25M SSD to the cache pool instead of the log pool. :)

Thanks to ZFS awesomeness it was real easy to pull the SSD out of the cache and designate it as part of the log pool. No down time for the production system and no wasted weird weekend hours staring at glowing terminal console.

Oh man, did that make a difference in performance.

Here's what the reconfigured vol0 zpool looks like:
zpool status vol0
  pool: vol0
 state: ONLINE
 scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        vol0                       ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c0t5000C50010377B5Bd0  ONLINE       0     0     0
            c0t5000C5001037C317d0  ONLINE       0     0     0
            c0t5000C5001037EED7d0  ONLINE       0     0     0
            c0t5000C50010381737d0  ONLINE       0     0     0
            c0t5000C50010381BBBd0  ONLINE       0     0     0
            c0t5000C50010382777d0  ONLINE       0     0     0
            c0t5000C5001038291Fd0  ONLINE       0     0     0
            c0t5000C500103870A3d0  ONLINE       0     0     0
            c0t5000C500103871C3d0  ONLINE       0     0     0
            c0t5000C500103924E3d0  ONLINE       0     0     0
            c0t5000C500103941F7d0  ONLINE       0     0     0
        logs
          c1t2d0                   ONLINE       0     0     0
        cache
          c0t50015179591D9AEFd0    ONLINE       0     0     0
          c0t50015179591DACA1d0    ONLINE       0     0     0
        spares
          c0t5000C50010395057d0    AVAIL   

errors: No known data errors

Now ZFS can properly feed all of the Linux FSYNC disk requests. Check out the Proxmox performance test improvements.

pveperf /mnt/pve/kvm-images/
CPU BOGOMIPS:      76608.87
REGEX/SECOND:      896132
HD SIZE:           7977.14 GB (xxx.xxx.xxx.xxx:/volumes/vol0/kvm-images)
FSYNCS/SECOND:     1403.21
DNS EXT:           58.84 ms
DNS INT:           1.50 ms (my.company.com)

Get your CentOS 5.5 mouse to behave as Linux KVM guest

Just spent 30 minutes trying to figure out why CentOS 5.5 wasn't playing nice with QEMU/KVM's USB tablet emulator.

All you need to do is edit the xorg.conf old school style. My thanks to dyasny for posting his xorg.conf code snipit.

Here's a copy of my working configuration.
# Xorg configuration created by pyxf86config

Section "ServerLayout"
        Identifier     "Default Layout"
        Screen      0  "Screen0" 0 0
        InputDevice    "Keyboard0" "CoreKeyboard"
        InputDevice "Tablet" "SendCoreEvents"
        InputDevice "Mouse" "CorePointer"
EndSection

Section "InputDevice"
        Identifier  "Keyboard0"
        Driver      "kbd"
        Option      "XkbModel" "pc105"
        Option      "XkbLayout" "us"
EndSection

Section "InputDevice"
        Identifier "Mouse"
        Driver "void"
        #Option "Device" "/dev/input/mice"
        #Option "Emulate3Buttons" "yes"
EndSection

Section "InputDevice"
        Identifier "Tablet"
        Driver "evdev"
        Option "Device" "/dev/input/event2"
        Option "CorePointer" "true"
        Option "Name" "Adomax QEMU USB Tablet"
EndSection

Section "Device"
        Identifier  "Videocard0"
        Driver      "cirrus"
EndSection

Section "Screen"
        Identifier "Screen0"
        Device     "Videocard0"
        DefaultDepth     24
        SubSection "Display"
                Viewport   0 0
                Depth     24
        EndSubSection
EndSection

Hope this helps.

Tuesday, February 1, 2011

Use Clonezilla to image KVM VirtIO disks

I'm always seeking to squeeze more speed out of common administrator tasks like disk imaging and P2V conversions. Today I tried using my favorite FOSS cloning software, Clonezilla, to restore an image to a KVM running VirtIO disks. What I found was that the current stable release (20110113-maverick) doesn't recognize VirtIO's /dev/vh[a,b,c...] disk naming syntax. You get used to this working with KVM and I'm still on the fence about VirtIO's name convention verses the more common /dev/sd[a,b,c...] method.

Luckily, another Clonezilla user already submitted a patch for VirtIO drives back in December. It should make it into a future stable release in a few months.

http://sourceforge.net/tracker/index.php?func=detail&aid=3112544&group_id=115473&atid=671650

I was in rush to get a P2V complete, so I used a quick sed onliner to modify stable Clonezilla's Perl scripts to recognize the /dev/vda disk. You'll need to drop into the shell mode to execute this.

sudo sed -i '/\[hs\]/\[vhs\]/' /opt/drbl/sbin/ocs-*

Keep in mind that these changes will be lost if your booting from a Live CDROM.

Using the VirtIO disk drivers improve the disk imaging throughput for my machine by about 15 percent. Also, don't forget to preload the VirtIO drivers on a Windows machine before imaging and restoring. Otherwise you'll get BSOD on boot.

Wednesday, January 19, 2011

Puppet Module For Centrify Express [Reloaded]

I've expanded on my previous simple Puppet module for Centrify Express based on the helpful advice I received from David McNeely at Centrify. This latest version of my module does not expose domain username or passwords. It requires you to pre-create them from a machine already running Centrify Express as a domain member.

You can pre-create the account just before you sign the puppet client's certificate.
sudo adjoin -w -P -u  -n  your.domain.net
sudo puppetca -s new-hostname.your.domain.net

Download the latest code from GitHub. puppet-centrify

git clone git://github.com/ninjix/puppet-centrifydc.git

The new version of the module has the following features:
  • Installs the Centrify Express Ubuntu package
  • Automatically attempts to join the machine to the domain after install the apt package
  • Registers the machine name in Active Directory DNS
  • Restricts logins on Ubuntu servers to the "Domain Admins" user group
  • Allows additional logins for users and groups to be granted access
Note: Make sure you enable the Canonical partner repository.
deb http://archive.canonical.com/ubuntu lucid partner

Here are some examples of how you can configure your nodes using this module.
node    'deimos',
        'phobos' inherits default {

        $domain = "my.lab.net"
        include centrifydc
}
This is a basic method which provides the domain. The "Domain Admins" group will be granted access by default. You can set other defaults by editing the templates.

node    'callisto' inherits default {

        $domain = "my.lab.net"
        groups_allow = ["Astro Group","Physics Team"]

        include centrifydc
}
Example two allows members of the "Astro Group" and "Physics Team" domain groups to login in addition to members of the "Domain Admin" group.

node    'ganymede' inherits default {

        $domain = "my.lab.net"
        users_allow = ["carl.sagan"]
        groups_allow = ["Astro Group","Physics Team"]

        include centrifydc
}
The third example is similar to the second but also allows the user "carl.sagan" to login.

Monday, January 17, 2011

Use Unison to Synchronize your remote shares

At the office and around the house, I often like to keep directories synchronized with network shares. Microsoft has provided two-way, remote folder sync for quite a while now. It is also possible to perform on Linux with a nifty utility named Unison.

Unison allows you to synchronize in both directions and builds on top of the tried and true rsync protocol. It's built to play well with file exchanges between Unix and Windows hosts. It also has a number of options that allow you to fine tune your sync or script the whole operation. There is a GUI version as well.

You can install it on Debian/Ubuntu with apt-get:
sudo apt-get install unison unison-gtk

In my daily use, I typically have several Nautilus .gvfs mounts to various Windows SMB/CIFS shares and SFTP hosts. Unison isn't directly aware of these Nautilus style mounts so I cobbled together this Nautilus script based on some examples I found at http://g-scripts.sourceforge.net.

Instructions

Copy the script to your ~/.gnome2/nautilus-scripts/ directory with the name unison-sync.sh.

Set the execute bit on the script.

Make sure zenity is installed.
sudo apt-get install zenity

With Nautilus, connect to a server resource using SMB or SFTP.

Right click on a remote directory and click scripts>unison-sync.sh.

A file directory dialog will appear. This allows you to select the local location you want to synchronize with the server.

Save the name of the Unison preference file.

Now run Unison from the terminal or the GUI.
unison pref_name 

Note

My script enable auto approve for non-conflicts to save time. You might want to change that. It also disables permissions since Windows mounts don't support the same types as standard Linux file systems.

The unison-sync.sh script:
#!/bin/bash 
#
#       This program is free software; you can redistribute it and/or modify
#       it under the terms of the GNU General Public License as published by
#       the Free Software Foundation; either version 2 of the License, or
#       (at your option) any later version.
#       
#       This program is distributed in the hope that it will be useful,
#       but WITHOUT ANY WARRANTY; without even the implied warranty of
#       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#       GNU General Public License for more details.
#       
#       You should have received a copy of the GNU General Public License
#       along with this program; if not, write to the Free Software
#       Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#       MA 02110-1301, USA.
#
#  author :
#    clayton.kramer  gmail.com 
#
#  description :
#    Provides a quick way of making Unison preference files from
#    Nautilus.
#
#  informations :
#    - a script for use (only) with Nautilus. 
#    - to use, copy to your ${HOME}/.gnome2/nautilus-scripts/ directory.
#
#  WARNINGS :
#    - this script must be executable.
#    - package "zenity" must be installed
#
#  THANKS :
#    This script was heavily sourced from the work of SLK. Having
#    Perl regex to parse .gvfs paths was a huge time saver.
#    

# CONSTANTS

# some labels used for zenity [en]
z_title="Synchronize Folder"
z_err_gvfs="cannot acces to directory - check gvfs\nEXIT"
z_err_uri="cannot acces to directory - uri not known\nEXIT"

# INIT VARIABLES

# may depends of your system : (current settings for debian, ubuntu)

GVFSMOUNT='/usr/bin/gvfs-mount'
GREP='/bin/grep'
IFCONFIG='/sbin/ifconfig'
KILL='/bin/kill'
LSOF='/usr/bin/lsof'
PERL='/usr/bin/perl'
PYTHON='/usr/bin/python2.5'
SLEEP='/bin/sleep'
ZENITY='/usr/bin/zenity'

# MAIN

export LANG=C

# retrieve the first object selected or the current uri
if [ "$NAUTILUS_SCRIPT_SELECTED_URIS" == "" ] ; then
    uri_first_object=`echo -e "$NAUTILUS_SCRIPT_CURRENT_URI" \
      | $PERL -ne 'print;exit'`
else
    uri_first_object=`echo -e "$NAUTILUS_SCRIPT_SELECTED_URIS" \
      | $PERL -ne 'print;exit'`
fi

type_uri=`echo "$uri_first_object" \
  | $PERL -pe 's~^(.+?)://.+$~$1~'`

# try to get the full path of the uri (local path or gvfs mount ?)
if [ $type_uri == "file" ] ; then
    
    filepath_object=`echo "$uri_first_object" \
      | $PERL -pe '
        s~^file://~~;
        s~%([0-9A-Fa-f]{2})~chr(hex($1))~eg'`
    
elif [ $type_uri == "smb" -o $type_uri == "sftp" ] ; then
    if [ -x $GVFSMOUNT ] ; then
        
        # host (and share for smb) are matching a directory in ~/.gvfs/
        
        host_share_uri=`echo "$uri_first_object" \
          | $PERL -pe '
            s~^(smb://.+?/.+?/).*$~$1~;
            s~^(sftp://.+?/).*$~$1~;
            '`
        
        path_gvfs=`${GVFSMOUNT} -l  \
          | $GREP "$host_share_uri" \
          | $PERL -ne 'print/^.+?:\s(.+?)\s->.+$/'`
        
        # now let's create the local path
        path_uri=`echo "$uri_first_object" \
          | $PERL -pe '
            s~^smb://.+?/.+?/~~;
            s~^sftp://.+?/~~;
            s~%([0-9A-Fa-f]{2})~chr(hex($1))~eg'`
        
        filepath_object="${HOME}/.gvfs/${path_gvfs}/${path_uri}"
        
    else
        $ZENITY --error --title "$z_title" --width "320" \
          --text="$z_err_gvfs"
        
        exit 1
    fi
else
    $ZENITY --error --title "$z_title" --width "320" \
      --text="$z_err_uri"
    
    exit 1
fi


if [ -d "${HOME}/.unison" ]; then
    # create the Unison user directory if it doesn't exist
    mkdir -p "${HOME}/.unison"
fi

# Select a local directory to sync with
local_dir=`$ZENITY --title "$z_title" --file-selection --directory`

# Provide an alias for the sync
mount_name=`echo "$filepath_object" |  perl -ne 'print/main on (\w*)\//'`

base_name=`echo "$filepath_object" | perl -ne 'print/.*\/(.*)$/;'`
alias="$mount_name-$base_name"
alias=`$ZENITY --title "$z_title" --entry --text="Enter a name for this Unison preferences file." --entry-text="$alias"`
alias="$alias.prf"

# Write the Unison file
echo "# Unison preferences file" > ${HOME}/.unison/$alias
echo "root = $local_dir" >> ${HOME}/.unison/$alias
echo "root = $filepath_object" >> ${HOME}/.unison/$alias
echo "perms = 0" >> ${HOME}/.unison/$alias
echo "dontchmod = true" >> ${HOME}/.unison/$alias
echo "auto = true" >> ${HOME}/.unison/$alias

exit 0


### EOF

Sunday, January 16, 2011

Puppet manifest for Centrify Express on Ubuntu

I've been really pleased with Canonical's new partnership with Centrify, one of the big names in Unix/Linux/Mac Active Directory integration. For the last month, I've started to replace Likewise Open on all of our machines at work.

Tonight, I took a moment to write a quick Puppet manifest for installing centrifydc and automatically joining the machine to our AD infrastructure.

Requirements
  • Have an AD user account with privileges to add more than 10 computers to your domain.
  • Enable the Canonical partner repository (I manage my /etc/apt/sources.list with Puppet)
This script is going to expose a user account password in a text file so make sure you lock it down at same time you delegate the computer object permissions. (If anyone has a better way, I'd appreciate a comment from you.)

class centrify {

        package { centrifydc :
                ensure => latest ,
                notify => Exec["adjoin"]
        }

        exec { "adjoin" :
                path => "/usr/bin:/usr/sbin:/bin",
                returns => 15,
                command => "adjoin -w -u domainjoiner -p passwordF00 my.company.net",
                refreshonly => true,
        }

        service { centrifydc:
                ensure  => running
        }

}

The domain join action is only executed when Puppet detects that the package has to be installed or updated. Successful AD joins return a "15" code instead of the normal "0".

Wednesday, January 12, 2011

Highly Available Zabbix Monitoring Server Using Corosync + Pacemaker + DRBD

I recently built a highly available Zabbix monitoring server for a client. It uses the Linux HA tools Corosync and Pacemaker to cluster the services. Linbit's DRBD is used as the cluster storage.

This configuration uses Ubuntu Server 10.04 LTS (Lucid) for the two cluster nodes Linux distribution. These instructions should work on Ubuntu 10.10 and Debian 6.0 (Squeez) with minor changes.

Server Network Configuration
virt ip  192.168.0.20
zbx-01   192.168.0.21
zbx-02   192.168.0.22

I built this configuration on Linux KVM machines using VirtIO disks. These disks show up as /dev/vd* instead of the typical /dev/sd* convention. Make sure you make changes as necessary for your environment.

Each server has second virtual disk that will be used by DRBD.

Setup DRBD

Begin with DRBD. It works as the block device from which a file system will store MySQL's data files. It is available in official Ubuntu repositories.

sudo apt-get install linux-headers-server psmisc drbd8-utils

Make a DRBD block device configuration file in the /etc/drbd.d/mysql_r0.res.

resource mysql_r0 {
    syncer {
        rate  110M;
    }
    on zbx-01 {
        device    /dev/drbd1;
        disk      /dev/vdb;
        address   192.168.0.21:7789;
        meta-disk internal;
    }
    on zbx-02 {
        device    /dev/drbd1;
        disk      /dev/vdb;
        address   192.168.0.22:7789;
        meta-disk internal;
    }
}

Some important things to know:

  • The DRBD daemon expects the file to end with ".res"
  • Make sure to change device and IP address for your environment.
  • Syncer rate 110M is for 1Gb network connections.
  • The host names of each machine must match DRBD resource names

Create the DRBD meta data on the resource device.

sudo drbdadm create-md mysql_r0

Now repeat the previous steps on the second server, zbx-02.

Start the DRBD service on both servers.

/etc/init.d/drbd start

Use zbx-01 as primary server for start. You'll use it to create filesystem and force the other DRBD server on zbx-02 to sync from it.

On zbx-01:
sudo drbdadm -- --overwrite-data-of-peer primary mysql_r0
sudo drbdadm primary mysql_r0
sudo mkfs.ext4 /dev/drbd1

Depending on the size of your DRBD disk, it may take a minute or so to synchronize the two resources. I like to monitor the progress of this initial sync using the follow command.

watch cat /proc/drbd

Now mount the DRBD resource.

sudo mount /dev/drbd1 /mnt

Remove the DRBD LSB init links since the service start and stop will be controlled by Pacemaker.

sudo update-rc.d -f drbd remove

MySQL Server Installation and Configuration

Install the MySQL Server packages.

sudo apt-get install mysql-server

Stop the MySQL Server daemon.

sudo /etc/init.d/mysql stop

Copy the MySQL data directory to the DRBD supported mount.

sudo cp -av /var/lib/mysql/ /mnt/

Edit the the /etc/mysql/mysql.cnf file. Change the bind address to that of the virtual IP. Set the datadir property to point to the DRBD mount you specified earlier. In this example it is important to note that we are using the /mnt folder for simplicity. You will most likely want to change this to something like /mnt/drbd1 for production use.

/etc/mysql/my.cnf
[mysqld]

user            = mysql
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /mnt/mysql
tmpdir          = /tmp
skip-external-locking


I like to add the following InnoDB properties to the MySQL my.cnf file. These settings are tuned for a 4 cpu 4G memory machines. MySQL and DRBD pros recommend using InnoDB engine because it has the much better recovery characteristics than old MyISAM. I set my server to default to the InnoDB engine for this reason.

/etc/mysql/my.cnf
...

#
# * Make InnoDB the default engine
#
default-storage-engine    = innodb

#
# * Innodb Performance Settings
#
innodb_buffer_pool_size         = 1600M
innodb_log_file_size            = 256M
innodb_log_buffer_size          = 4M
innodb_flush_log_at_trx_commit  = 2
innodb_thread_concurrency       = 8
innodb_flush_method             = O_DIRECT
innodb_file_per_table

...

Repeat the previous MySQL /etc/mysql/my.cnf changes on zbx-02.

You may need to delete the InnoDB data files if you have changed the default settings to the performance ones I used. DO NOT DO THIS ON A SYSTEM IN PRODUCTION!

cd /mnt/mysql
sudo rm ib*

On zbx-01 try starting the MySQL Server.

sudo /etc/init.d/mysql start

Watch the /var/log/mysql/mysql.err for any problems. Logging in with a mysql client is also a good idea.

Stop MySQL once you've confirmed it's running properly on the DRBD resource.

Remove the MySQL LSB daemon start links so they do not conflict with Pacemaker.

sudo update-rc.d -f mysql remove

There is also an Upstart script included with the Ubuntu MySQL Server package. You'll need to edit it so that it doesn't try to start the service on boot up.

Comment out the start, stop and respawn command in /etc/init/mysql.conf. It should look like this example snip-it.

# MySQL Service

description     "MySQL Server"
author          "Mario Limonciello "

#start on (net-device-up
#          and local-filesystems
#         and runlevel [2345])
#stop on runlevel [016]

#respawn

env HOME=/etc/mysql
umask 007

...

Repeat this step on zbx-02.

Install and Configure Corosync and Pacemaker

Pacemaker with Corosync is included in the Ubuntu 10.04 LTS repositories.

sudo apt-get install pacemaker

Edit the /etc/default/corosync file using your favorite text editor and enable corosync (START=yes).

Pacemaker uses encrypted connections between the cluster nodes so you need to generate a corosync authkey file.

sudo corosync-keygen

*Note!* This can take a while if there's no enough entropy.

Copy the /etc/corosync/authkey to all servers that will form this cluster. Make sure it is owned by root:root and has 400 permissions.

In /etc/corosync/corosync.conf replace bindnetaddr (by defaults it's 127.0.0.1) with network address of your server, replacing last digit with 0. For example, if your IP is 192.168.0.21, then you would put 192.168.0.0.

Start the Corosync daemon.

sudo /etc/init.d/corosync start

Now your cluster is configured and ready to monitor, stop and start your services on all your cluster servers.

You can check the status with the crm status command.

crm status
============
Last updated: Wed Sep 15 11:33:09 2010
Stack: openais
Current DC: zbx-01 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ zbx-01 zbx-02 ]

Now update the Corosync CRM configuration to include DRBD and MySQL.

sudo crm configure edit

Here's a working example but be sure to edit for your environment.

node zbx-01 \
        attributes standby="off"
node zbx-02 \
        attributes standby="off"
primitive drbd_mysql ocf:linbit:drbd \
        params drbd_resource="mysql_r0" \
        op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/mysql_r0" directory="/mnt/" fstype="ext4" options="acl"
primitive ip_mysql ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.20" nic="eth0"
primitive mysqld lsb:mysql \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="30s"
group zabbix_group fs_mysql ip_mysql mysqld \
        meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
colocation mysql_on_drbd inf: _rsc_set_ zabbix_group ms_drbd_mysql:Master
order mysql_after_drbd inf: _rsc_set_ ms_drbd_mysql:promote zabbix_group:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        last-lrm-refresh="1294782404"

Some notes about this configuration:

  • It monitors the DRBD resource every 15s
  • The takeover IP address is 192.168.0.20
  • MySQL Server is allowed 2 minutes to startup in case it need to perform recovery operations on the Zabbix database
  • The STONITH property is disabled since we are only setting up a two node cluster.

You can check the status of the cluster with the crm_mon utility.

sudo crm_mon

Here's and example of what you want to see:

============
Last updated: Wed Mar 11 23:04:49 2011
Stack: openais
Current DC: zbx-01 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ zbx-01 zbx-02 ]

 Resource Group: zabbix_group
     fs_mysql (ocf::heartbeat:Filesystem): Started zbx-01
     ip_mysql (ocf::heartbeat:IPaddr2): Started zbx-01
     mysqld (lsb:mysql): Started zbx-01
 Master/Slave Set: ms_drbd_mysql
     Masters: [ zbx-01 ]
     Slaves: [ zbx-02 ]

Install Zabbix Server

How you install Zabbix is up to you. I like to use recompile the latest upstream Debian packages but using the older Ubuntu Lucid repository version or the official tarball will also work. If you use the apt package remember to not use the dbconfig-common option on zbx-02. You can copy over the configs files from zbx-01.

sudo apt-get install zabbix-server-mysql 

Edit the /etc/zabbix/zabbix_server.conf file. Set the SourceIP=192.168.0.20 so that Zabbix will use the virtual "take over" ip address. This will make setting up client configurations and firewall rules much easier.

Check your newly installed Zabbix server for a clean start.

sudo tail /var/log/zabbix-server/zabbix-server.log

Remove the LSB init script links.

sudo update-rc.d -f zabbix-server remove

Install Apache and Zabbix PHP frontend.

sudo apt-get install apache2 php5 php5-mysql php5-ldap php5-gd zabbix-frontend-php

Remove Apache's auto start links.

sudo update-rc.d -f zabbix-server remove

Repeat on zbx-02.

Copy the configuration file from zbx-01's /etc/zabbix directory to zbx-02's /etc/zabbix folder.

Update Corosync Configuration With Zabbix and Apache

sudo crm configure edit

Working example:
node zbx-01 \
        attributes standby="off"
node zbx-02 \
        attributes standby="off"
primitive apache lsb:apache2
primitive drbd_mysql ocf:linbit:drbd \
        params drbd_resource="mysql_r0" \
        op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/mysql_r0" directory="/mnt/" fstype="ext4" options="acl"
primitive ip_mysql ocf:heartbeat:IPaddr2 \
        params ip="192.168.0.20" nic="eth0"
primitive mysqld lsb:mysql \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="30s"
primitive zabbix lsb:zabbix-server \
        op start interval="0" timeout="60" delay="5s" \
        op monitor interval="30s"
group zabbix_group fs_mysql ip_mysql mysqld zabbix apache \
        meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
colocation mysql_on_drbd inf: _rsc_set_ zabbix_group ms_drbd_mysql:Master
order mysql_after_drbd inf: _rsc_set_ ms_drbd_mysql:promote zabbix_group:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        last-lrm-refresh="1294782404"