www.zeroshell.org Forum Index www.zeroshell.org
Linux Distribution for server and embedded devices
 
 SearchSearch  RegisterRegister  UsergroupsUsergroups 
 ProfileProfile  Log inLog in  Log in to check your private messagesPrivate Message 

[Janus / twin Alix2] #7 Auto-switch prime to backup and back

 
Post new topic   Reply to topic    www.zeroshell.org Forum Index -> ZeroShell
View previous topic :: View next topic  
Author Message
PatrickB



Joined: 03 Nov 2012
Posts: 46

PostPosted: Sun Aug 16, 2015 5:32 pm    Post subject: [Janus / twin Alix2] #7 Auto-switch prime to backup and back Reply with quote

Hello.

We are close to the end now...

Today I give you my solution to make the backup server watch the prime and become the LAN Master in case it has disappeared, then restitute the role when the prime is back.
This role includes:
- the gateway,
- the DN server,
- the WiFi connection point,
- on WAN side, the unique IP set as DMZ in the DSL box.

As explained in posts #3 and #4 of the saga, for the DHCP server and the NetBios Browse Master, the switching is implicit.

Considering the definition of the DNS local zone (post #4), the active DNS is where the IP .1 is up, and it is the same for the gateway. Then we just (not really, too simple Razz) have to activate it on the backup server.


Problem #1 - Watching the prime server

Initially I wanted to rely on Samba (nmblookup Janus1) since there is already automatic arbitration there.
Very bad idea because the stability is quite long to establish, and notably at boot, since the name is not found, the backup server competes with the prime and this is a mess on the LAN Mad

Currently I prefer to ping the administrative IP of the prime (.11). It works fine to detect if it is down. If it was in a zombie state it would be harder to detect.
Question Still searching an idea there...

The job is done with a cronjob run every 2mn on the backup server only:
Code:
# Bash script: WatchJanus1-Cron

PATH=/opt/bin:$PATH

DoIt="no"
if [ "$(ping 192.168.xxx.11 -c 2 | grep '100% packet loss' | wc -l | xargs echo)" == "1" ]; then
  DoIt="yes" ;
fi

echo Action = $DoIt

/opt/bin/be-lan-master.sh $DoIt

The script be-lan-master.sh decides when switching is required.

It is not so simple to see whether an IP is up on an interface which has several:
Code:
# when 192.168.xxx.1 is up:
root@janus2> ifconfig -a -v BRIDGE01:00
BRIDGE01: Link encap:Ethernet  HWaddr 00:0D:B9:XX:YY:ZZ
          inet addr:192.168.xxx.1  Bcast:192.168.xxx.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

# when 192.168.xxx.1 is down:
root@janus2> ifconfig -a -v BRIDGE01:00
BRIDGE01: Link encap:Ethernet  HWaddr 00:0D:B9:XX:YY:ZZ
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

So be-lan-master.sh compares the query and current state this way:
Code:
ZtIsLanMaster="no"

if [ "$(ifconfig -a -v 'BRIDGE01:00' | grep '192.168.xxx.1')" != "" ]; then
   ZtIsLanMaster="yes" ;
fi
if [ "$ZtMode" == "$ZtIsLanMaster" ]; then
   # Nothing to do
   exit 0 ;
fi

# Here we have to switch
...


Problem #2 - Switching

The administrative IP (.12) must keep up whatever happens, else the next step, if a reboot does not save us, is to restore the factory settings and to play again from the very beginning Shocked

Arrow I recommend to add such a thing in the PostBoot script, in case of:
Code:
# Safety feature: restore the LAN interface in case it was set down !
ifconfig BRIDGE01:02 192.168.xxx.12/24 up
ifconfig ETH01 up


Before experimenting, I explored the device to locate and name safely all the items to be used for switching:
Code:
# The interfaces and IP addresses
ZtWiredLanInterfaceName="ETH01"
ZtAdminInterfaceName="BRIDGE01:02"
ZtAdminIP="192.168.xxx.12"
ZtMasterInterfaceName="BRIDGE01:00"
ZtMasterIP="192.168.xxx.1"
ZtWifiLanInterfaceName="WLAN00"

# Wan side: take the unique IP set as DMZ in the DSL box
ZtWiredWanInterfaceName="ETH00"
ZtBoxDmzInterfaceName="BRIDGE00:01"
ZtBoxDmzIP="192.168.yyy.103"


In the principle it should be very simple:
Code:
if [ "$ZtMode" == "yes" ]; then
   echo -e "$(DateForLog) - Janus2 becomes LAN Master\n"      > $ZtReportEmailBodyFile ;
   echo -e "Activating IP = $ZtMasterIP"                     >> $ZtReportEmailBodyFile ;
   ifconfig $ZtMasterInterfaceName $ZtMasterIP/24 up ;
elif [ "$ZtMode" == "no" ]; then
   echo -e "$(DateForLog) - Janus2 stops to be LAN Master\n"  > $ZtReportEmailBodyFile ;
   echo -e "Deactivating IP = $ZtMasterIP"                   >> $ZtReportEmailBodyFile ;
   ifconfig $ZtMasterInterfaceName $ZtMasterIP/24 down ;
else
   echo "$ZtThisScript: Unexpected directive '$ZtMode' !" ;
   exit 1 ;
fi


Then I experienced issues with the switch my servers are plugged to: it does not see the change, then it won't route the IP .1 to its new owner ! Symptom: the ping fails.
So I switch the physical interface down and up to force the switch to update:
Code:
# In both cases we may have to force the switch to rebuild its routes.
# - turn the physical interface off for a second:

echo -e "Switching down and up the wired interface $ZtWiredLanInterfaceName" >> $ZtReportEmailBodyFile
ifconfig $ZtWiredLanInterfaceName down
sleep 1
# <!> If the next one was failing, we would have a big problem !
#     It is prudent to force it up in the PostBoot script, in case of...
ifconfig $ZtWiredLanInterfaceName up


Then another issue: the DNS also did not understand. Symptom: on the server itself, in the ZS GUI, DNS page, the 'DNS Lookup' works using localhost, but not using '192.168.xxx.1'.
It is fixed by just restarting the service:
Code:
# Need to restart the DNS so that it integrates the change
echo -e "Restarting the DNS service"                               >> $ZtReportEmailBodyFile
/etc/rc.d/init.d/dns restart | sed -e 's/.\[[0-9;-]*[A-Za-z]//g'   >> $ZtReportEmailBodyFile

The SED filter is to remove the escape sequences that display a right aligned green [ OK ] because some mailers (like Thunderbird) get confused and display a void email Evil or Very Mad

Thing to now: the IP switched up or down this way is not updated in the GUI, and in case of reboot the manual settings would be restored. But I don't need the change to be persistent, then for such a feature the solution works fine.


The result

After that, the gateway, DNS and all other features have switched. Janus2 replaces Janus1 within 2 mn, does it only when it disappears, and gives it back the role within 2mn after it reappears. This delay could be reduced but I don't want to flood with pings.

The same way it is possible to switch the WiFi and the DMZ IP on Wan side (usually a DSL box accepts only one and filters nothing for it, the performances may be better).

At the end, my script checks again the state of the interfaces and active IPs to report the operation through an email, but this is out of topic.

It works very fine and achieves the purpose of twin servers Very Happy


Hope the tips can help someone.

Ideas for improvements are welcome.

Best regards.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    www.zeroshell.org Forum Index -> ZeroShell All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group