New blog post about using healthchecks to monitor your website

2020-02-11 20:58:47 +01:00 · 2020-02-11 20:58:47 +01:00 · b4b22ab5d7
commit b4b22ab5d7
parent 48de646c64
2 changed files with 130 additions and 1 deletions
--- a/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst
+++ b/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst
@ -0,0 +1,129 @@
 .. title: Building up simple monitoring on Healthchecks
 .. date: 2020-02-11
 .. slug: building-up-simple-monitoring-on-healthchecks
 .. updated: 2020-02-11
 .. status: published
 .. tags: monitoring, healthchecks, cron, curl
 .. category: monitoring
 .. authors: Elia el Lazkani
 .. description: 
 .. type: text
 I talked :doc:`previously <simple-cron-monitoring-with-healthchecks>` about deploying my own simple monitoring system.
 Now that it's up, I'm only using it for my backups. That's a good use, for sure, but I know I can do better.
 So I went digging.
 .. TEASER_END
 Introduction
 ============
 I host a list of services, some are public like my blog while others private.
 These services are not critical, some can be down for short periods of time.
 Some services might even be down for longer periods without causing any loss in functionality.
 That being said, I'm a *DevOps engineer*. That means, I need to know.
 Yea, it doesn't mean I'll do something about it right away, but I'd like to be in the know.
 Which got me thinking...
 Healthchecks Endpoints
 ======================
 Watching **borg** use its *healthchecks* hook opened my eyes on another functionality of **Healthchecks**.
 It seems that if you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/start``,
 it will start a counter that will measure the time until you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219``.
 This way, you can find out how long it is taking you to check on the status of a service. Or maybe, how long a service is taking to backup.
 It turns out that *healthchecks* also offers a different endpoint to ping. 
 You can report a failure straight away by pinging ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/fail``.
 This way, you do not have to wait until the time expires before you get notified of a failure.
 With those pieces of knowledge, we can do a lot.
 A lot ?
 =======
 Yes, a lot...
 Let's put what we have learned so far into action.
 .. code:: bash
    #!/bin/bash
    WEB_HOST=$1
    CHECK_ID=$2
    HEALTHCHECKS_HOST="https://healthchecks.example.com/ping"
    curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/start" > /dev/null
    OUTPUT=`curl -sS "${WEB_HOST}"`
    STATUS=$?
    if [[ $STATUS -eq 0 ]]; then
        curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}" > /dev/null
    else
        curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/fail" > /dev/null
    fi
 We start by defining a few variables for the website hostname to monitor, the check ID provided by *healthchecks* and finally
 the *healthchecks* base link for the monitors.
 Once those are set, we simply use ``curl`` with a couple of special flags to make sure that it fails properly if something goes wrong.
 We start the *healthchecks* timer, run the website check and either call the passing or the failing *healthchecks* endpoint depending on the outcomes.
 .. code:: text
    $ chmod +x https_healthchecks_monitor.sh
    $ ./https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219 
 Test it out.
 Okay, that's nice but now what !
 ================================
 Now, let's hook it up to our cron.
 Start with ``crontab -e`` which should open your favorite text editor.
 Then create a cron entry (a new line) like the following:
 .. code:: text
    */15 * * * * /path/to/https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219 
 This will run the script every 15 minutes. Make sure that your timeout is 15 minutes for this check, with a grace period of 5 minutes.
 That configuration will guarantee that you will get notified 20 minutes after any failure, at the worst.
 Be aware, I said any failure.
 Getting notified does not guarantee that your website is down.
 It can only guarantee that *healthchecks* wasn't pinged on time.
 Getting notified covers a bunch of cases. Some of them are:
  * The server running the cron is down
  * The cron services is not running
  * The server running the cron lost internet access
  * Your certificate expired
  * Your website is down
 You can create checks to cover most of these if you care to make it a full monitoring system.
 If you want to go that far, maybe you should invest in a monitoring system with more features.
 Conclusion
 ==========
 Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
 With a little of scripting, couple of commands and the power of cron we were able to make *healthchecks* monitor our websites.
--- a/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst
+++ b/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst
@ -1,6 +1,6 @@
 .. title: Simple cron monitoring with HealthChecks
 .. date: 2020-02-09
-.. slug: simple_cron_monitoring_with_healthchecks
+.. slug: simple-cron-monitoring-with-healthchecks
 .. updated: 2020-02-09
 .. status: published
 .. tags: monitoring, healthchecks, cron