Python Server Monitoring Using Bash Scripts

This article is a continuation of our series on developing Python websites using bottle.py. In our first article, we talked about securing shared hosting for Python applications. The development of the actual application is left as an exercise for the student (for ideas, take a look at this excellent example from the bottle.py documents). Once you have an application and a server, you’re ready for the next challenge: keeping it running!

So… how do you keep it running (eg. python server monitoring)?

This article approaches this issue for those of you using Webfaction or a Linux VPS. People using Heroku, GAE, and other hosting options should check alternate sources.

Starting the application is fairly straightforward:

python server.py &

At which point you should see the appropriate content displayed on your URL. Great, let’s go ahead and hang up our connection.

Go look back at your site. Oops…first lesson. When starting a server on linux, use the “nohup” command. That ensures the process will run after your session closes.

nohup python server.py &

Much better… At this point you have a choice. There are server monitoring packages out there that you can plug and play. Or you can go old school and learn a little bit about Linux in the process. I decided to write my own bash script to watch the server. It helped me learn more about Linux and gave me the freedom to add custom features.

I’m going to share some code and examples I used to help build our monitoring script. Here are some of the highlights from what we learned writing our script:

  • Functions are supported in bash and you should use them. Your monitoring process is basically executing a series of tests; when a test is failed, your script will need to “do something”. This could range from logging the exception to emailing you or, if required, rebooting the server. Your script will be more maintainable if you organize these activities into functions. Here is a limited example of this.

server_restart() {

nohup python server.py &

mail -s “The Server restart – MYSITE” $recips

}

 

  • Set things up so you will get an email after any significant event occurs. This will help you you figure out what broke and when. It also provides an easy way to track the frequency of errors. Here is an example of a Linux email command.

echo “event description” | mail -s “Server restart” $recips.

 

  • You should use ps to monitor the amount of memory your application is using. If you are using a lot of memory, the script should kill the existing process and restart it.
  • You should consider periodically using wget to see if your server is still returning a result. If you receive an error code, restart the server. This is an example of why you should write a “restart function” (and an “email me function”), since this is the most common first-line response to failing an automated test. The example below checks your site every 10 minutes (consider server load when setting this up).

# http response check

if [ $(expr $(date ‘+%M’) % 10) -eq 1 ]; then

for result in  $(wget –spider -S “http:yoursite” 2>&1 | grep “HTTP/” | awk ‘{print $2}’); do

echo $result  if [ $result -ne 200 ]; then

#Release the Kraken! Reboot the Server!

fi

done

fi

 

  • Finally, if the server requires another process to function (MongoDb, etc), you should make a point of checking *that* application to ensure it is still working.
  • Your script should be designed to immediately handle the issue (usually – restart the server) and email you a report so you can see trends and/or search for deeper root causes when you get a chance. The final product should be set up as a cron job that runs every few minutes.

The writing of a complete script is left as an exercise for the reader, since your specific needs will vary. Check out this tutorial and this other excellent collection of examples if you get stuck. Done properly, this script will keep the app in production while you work on other things.

Final comment – if you have an Android smart phone, look at Connectbot – it gives you an easy way to get in contact with your server when you’re not near a computer.

Incidentally, Murphy’s law tends to control the timing of when servers misbehave. I had a site which was rock-solid reliable, until I set foot in a foreign country without a personal laptop. Naturally, the server promptly crashed. Twice. And stayed down for the duration of an all-day business meeting followed by a team dinner. Fortunately I had Connectbot on my phone and was able to make some appropriate tweaks… And we upgraded the script when I got back home that weekend 🙂

Be the first to comment

Leave a Reply

Your email address will not be published.


*