Moin,
ich nutze FHEM und pilight seit Jahren. Immer wieder stand ich vor dem Problem, dass sich einer der Dienste FHEM oder PILIGHT (offensichtlich grundlos oder aufgrund von Fehlern in FHEM-Modulen) verabschiedet hat. Das zeigte sich dann darin, dass FHEM nicht oder nur fehlerhaft lief (WebGUI nicht erreichbar) und/oder die Fehlermeldung " no pilight ssdp connections found " auftrat.
Ich habe ein Skript geschrieben, dass per CRON alle x Minuten ausgeführt wird.
Dann wird die Funktionalität von FHEM und PILIGHT geprüft.
Bei einem Fehler werden dann die jeweiligen Dienste neu gestartet.
Wenn das nach x Versuchen ohne Erfolg ist, kann auf Wunsch das gesamte System rebootet werden.
Auf Wunsch kann eine Fehlermeldung in Dein FHEM-Log ausgegeben werden.
Hier das Skript, vielleicht kann es der ein oder andere gebrauchen.
Download: http://www.emjau.de/downloads/selfhealer_fhem_piligth.sh (http://www.emjau.de/downloads/selfhealer_fhem_piligth.sh)
Die gesamte Beschreibung zur Handhabung und Individualisierung des Skripts befindet sich in den Kommentaren am Anfang des Skripts.
Ich habe es auf meinem Pi 3B mit Raspbian implementiert.
Wenn irgendwas nicht funktioniert oder Fragen bestehen - bitte kontaktier mich mit Angaben zu Deinem System und genauer Fehlerbeschreibung.
Ich möchte das Skript kontinuierlich verbessern, so dass es für alle User möglichst einfach und problemlos einzusetzen ist.
Dazu brauche ich den Input derjenigen, die es ausprobiert haben.
Danke!
UPDATE 2018-11-24 : Ermittlung des absoluten Pfades bei Ausführung durch CRON ist gefixt!
#!/bin/bash
# =====================================================================================
# Self-Healer for the services FHEM und PILIGHT
# (c)2018 by emjau
# contact fhem@emjau.de for help (put catchword FHEM to subject!)
# =====================================================================================
#
# ! ! ! !
# Please read carefully to understand how to get this script working correctly ! ! ! !
# ! ! ! !
#
# =====================================================================================
#
# This script is checking if the services FHEM and PILIGHT are running properly
# If malfunction is detected these services are restarted
# After repeatedly unsuccessful attempts the whole system can be rebootet (if you allow)
#
# Put this script to [YourDirectory].
# You may change the script's name, it must end with .sh
# The names of the state- and log-files will be automatically adapted.
#
# The script needs full-access-rights 777:
# >> chmod 777 [YourDirectory]/[scriptname].sh
#
# Run the script periodically, i.g. every 5 minutes:
# Entry in CRONTAB: ( >> sudo crontab -e)
# */5 * * * * [YourDirectory]/[scriptname].sh
#
# Costumize the section YOUR SETTINGS !!!
#
# The following files will be created by this script:
# - [scriptname].log
# - [scriptname].status
# - [scriptname].reboots
#
# ...you may reset all counters by deleting the .status and .reboots files.
#
# Check if the script works:
# Drop one of these commands:
# >> sudo service fhem stop or
# >> sudo service pilight stop
# ...then wait for the next run by CRON (or run it manually), read the logfile an test the function of FHEM/PILIGHT
#
#=============================================================================================#
# YOUR SETTINGS (customize this!) #
#=============================================================================================#
FHEM_USER="YOUR_USERNAME_FOR_FHEM_WEBGUI" # Your FHEM WebGUI-User
FHEM_PASS="YOUR_PASSWORD_FOR_FHEM_WEBGUI" # Your FHEM WebGUI-Password
FHEM_URL="http://127.0.0.1:8083/fhem" # Your URL + :Port for the FHEM-WebGUI [default http://127.0.0.1:8083/fhem]
REBOOT=YES # [YES / NO] Should the whole system been rebooted if restarting the services was not successful?
TRIES_BEFORE_REBOOT=3 #[default = 3] how many tries of service-restarts before rebooting the whole system?
REBOOTS_MAX=2 #[default=2] how many reboots ín maximm
LOG_SUCCESSFUL_CHECKS=NO # [YES / NO] if you want to log successful checks too -> set to YES [default=NO]
# Warning: setting to YES might cause a big logfile!
WRITE_TO_FHEM_LOGFILE=YES # [YES / NO] do you whish an output to you FHEM-logfile if an error was detected?
FHEM_LOGFILE=/opt/fhem/log/fhem-$(date +%Y-%m).log #your FHEM-Logfile. Take care of giving the absolute path to the fhem-logfile!
#=============================================================================================#
# PROGRAM (do not touch!) #
#=============================================================================================#
# determine the absolute path where this script is run from
# ---
SOURCE="${BASH_SOURCE[0]}"
while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
SOURCE="$(readlink "$SOURCE")"
[[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
done
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
# ---
cd $DIR
SCRIPTNAME_FULL=${0##*/} # determine the name of this script
SCRIPTNAME=${SCRIPTNAME_FULL%.*} # cut .sh from scriptname
logfile=$SCRIPTNAME.log
statusfile=$SCRIPTNAME.status
rebootfile=$SCRIPTNAME.reboots
timestamp=`date +%Y-%m-%d_%H:%M:%S`
if [ ! -e "$statusfile" ]; then echo "0" > $statusfile; fi
if [ ! -e "$rebootfile" ]; then echo "0" > $rebootfile; fi
if [ ! -e "$logfile" ]; then echo "Logfile created on first run: $timestamp" | tee -a $logfile; echo "--------------------------------------------------------" | tee -a $logfile; fi
status=$(sed -n '1p;2q' $statusfile) # read first (and only) row of statusfile
reboot_counter=$(sed -n '1p;2q' $rebootfile) # read first (and only) row of rebootfile
#echo "$timestamp ... testing reachability of FHEM-WebGui and testing login:" | tee -a $logfile
# Check if login to FHEM-WebGUI is successful and if the button 'Save config' is visible:
testvar=$(/usr/bin/wget $FHEM_URL --timeout=10 --tries=2 --http-user=$FHEM_USER --http-passwd=$FHEM_PASS -O - 2>/dev/null | grep 'Save config') #try login and suppress output so stdout
#testvar=$(/usr/bin/wget $FHEM_URL --timeout=10 --tries=2 --http-user=$FHEM_USER --http-passwd=$FHEM_PASS -O - | grep 'Save config') #try login WITHOUT suppressing output so stdout
testvar_len=${#testvar}
failure_fhem=0
failure_pilight=0
if [ $testvar_len -lt 1 ] # if login to FHEM-WebGUI was NOT successful
then
failure_fhem=1
else
TEST=$(/usr/local/bin/pilight-send -p raw --code="999 999 999 999" 2>&1)
if [[ "$TEST" =~ "no pilight ssdp connections found" ]] # if pilight ssdp-connection is faulty
then
failure_pilight=1
fi
fi
#,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
# for testing purposes only !
#,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
#failure_fhem="1"
#failure_pilight="1"
#,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
#,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
if [ $failure_fhem -gt 0 ] || [ $failure_pilight -gt 0 ] # if login to FHEM-WebGUI was NOT successful OR pilight ssdp-connection is faulty
then
timestamp=`date +%Y-%m-%d_%H:%M:%S`
echo "----------------------------------------------------------------------------------------" | tee -a $logfile
if [ $failure_fhem -gt 0 ]; then echo "$timestamp !!! The FHEM WebGUI is down !!!" | tee -a $logfile; fi
if [ $failure_pilight -gt 0 ]; then echo "$timestamp !!! PiLight SSDP-Connection is fucked up !!!" | tee -a $logfile; fi
if [ $WRITE_TO_FHEM_LOGFILE == "YES" ]
then
if [ ! -e "$FHEM_LOGFILE" ]
then
echo "!!! Your stated FHEM_LOGFILE does not exist: $FHEM_LOGFILE" | tee -a $logfile
else
echo $'\n\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n'$timestamp$'\n'$SCRIPTNAME_FULL$' detected an ERROR\nSee '$SCRIPTNAME$'.log for details\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n\n' >> $FHEM_LOGFILE
fi
fi
if [[ "$status" =~ ^[0-9]+$ ]] #if $status is integer
then
#echo "number of already performed restarts of services (fhem + pilight): $status (max. before reboot: $TRIES_BEFORE_REBOOT)" | tee -a $logfile
if [ $status -lt $TRIES_BEFORE_REBOOT ]
then
if [ $status -eq 0 ] && [ $reboot_counter -gt 0 ]; then echo "System has been just rebooted for the $reboot_counter. time! (REBOOTS_MAX = $REBOOTS_MAX)" | tee -a $logfile; fi
status_new=$((status+1))
if [ $reboot_counter -gt 0 ]; then msg_addon="-> after $reboot_counter reboots"; else msg_addon=""; fi
if [ $failure_fhem -gt 0 ]
then
echo "restarting services for the $status_new. time $msg_addon... (FHEM + PILIGHT)" | tee -a $logfile
echo " --> this will take about 15 seconds..."
echo $status_new > $statusfile
sudo service fhem stop && sleep 3
sudo service pilight stop && sleep 3
sudo service pilight start && sleep 3
sudo service fhem start && sleep 3
elif [ $failure_pilight -gt 0 ]
then
echo "restarting services for the $status_new. time $msg_addon... (PILIGHT only)" | tee -a $logfile
echo " --> this will take about 10 seconds..."
echo $status_new > $statusfile
sudo service pilight stop && sleep 3
sudo service pilight start && sleep 3
else
echo " !!! UNDEFINED ERROR 465_BC !!! " | tee -a $logfile
fi
else
echo "Maximum number of attempts by restarting service(s) is now reached!" | tee -a $logfile
if [ $REBOOT == "YES" ]
then
reboot_counter_new=$((reboot_counter+1))
if [ $reboot_counter_new -gt $REBOOTS_MAX ]
then
echo "Maximum number of attempts by rebooting is reached. Nothing to be done... sorry..." | tee -a $logfile
else
echo " -> Now: rebooting the system for the $reboot_counter_new. time!" | tee -a $logfile
echo $reboot_counter_new > $rebootfile
echo "0" > $statusfile
#sudo shutdown -r now
fi
else
echo " Reboot not allowed by your settings! (YOUR SETTINGS: REBOOT=$REBOOT) ...no action!" | tee -a $logfile
echo " -> Please set REBOOT=YES in section YOUR_SETTIGS to allow reboots." | tee -a $logfile
fi
fi
else #is $status is NOT integer
echo "$timestamp !!! Undefined last known state! Variable status is not numeric: $status" | tee -a $logfile
fi
else
#echo "$timestamp The FHEM WebGUI running and login successful." | tee -a $logfile
if [ $status -ne 0 ] || [ $reboot_counter -ne 0 ] #if last state was faulty
then
echo "----------------------------------------------------------------------------------------" | tee -a $logfile
echo "$timestamp Finally everything is fine again !!! -> Resetting state and reboot-counter to 0 (OK)..." | tee -a $logfile
echo "0" > $statusfile
echo "0" > $rebootfile
else
echo "[OK] services FHEM and PILIGHT are fine!"
if [ $LOG_SUCCESSFUL_CHECKS == "YES" ]
then
echo "$timestamp [OK] Check by $SCRIPTNAME_FULL was OK..." | tee -a $logfile
fi
fi
fi