New blog post about using healthchecks to monitor your website

2020-02-11 20:58:47 +01:00 · 2020-02-11 20:58:47 +01:00 · b4b22ab5d7
commit b4b22ab5d7
parent 48de646c64
2 changed files with 130 additions and 1 deletions
--- a/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst
+++ b/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst
@ -0,0 +1,129 @@
+.. title: Building up simple monitoring on Healthchecks
+.. date: 2020-02-11
+.. slug: building-up-simple-monitoring-on-healthchecks
+.. updated: 2020-02-11
+.. status: published
+.. tags: monitoring, healthchecks, cron, curl
+.. category: monitoring
+.. authors: Elia el Lazkani
+.. description: 
+.. type: text
+
+I talked :doc:`previously <simple-cron-monitoring-with-healthchecks>` about deploying my own simple monitoring system.
+
+Now that it's up, I'm only using it for my backups. That's a good use, for sure, but I know I can do better.
+
+So I went digging.
+
+.. TEASER_END
+
+
+Introduction
+============
+
+I host a list of services, some are public like my blog while others private.
+These services are not critical, some can be down for short periods of time.
+Some services might even be down for longer periods without causing any loss in functionality.
+
+That being said, I'm a *DevOps engineer*. That means, I need to know.
+
+Yea, it doesn't mean I'll do something about it right away, but I'd like to be in the know.
+
+Which got me thinking...
+
+
+Healthchecks Endpoints
+======================
+
+Watching **borg** use its *healthchecks* hook opened my eyes on another functionality of **Healthchecks**.
+
+It seems that if you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/start``,
+it will start a counter that will measure the time until you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219``.
+This way, you can find out how long it is taking you to check on the status of a service. Or maybe, how long a service is taking to backup.
+
+It turns out that *healthchecks* also offers a different endpoint to ping. 
+You can report a failure straight away by pinging ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/fail``.
+This way, you do not have to wait until the time expires before you get notified of a failure.
+
+With those pieces of knowledge, we can do a lot.
+
+
+A lot ?
+=======
+
+Yes, a lot...
+
+Let's put what we have learned so far into action.
+
+.. code:: bash
+
+    #!/bin/bash
+  
+    WEB_HOST=$1
+    CHECK_ID=$2
+    
+    HEALTHCHECKS_HOST="https://healthchecks.example.com/ping"
+    
+    curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/start" > /dev/null
+    
+    OUTPUT=`curl -sS "${WEB_HOST}"`
+    STATUS=$?
+    
+    if [[ $STATUS -eq 0 ]]; then
+        curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}" > /dev/null
+    else
+        curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/fail" > /dev/null
+    fi
+
+
+We start by defining a few variables for the website hostname to monitor, the check ID provided by *healthchecks* and finally
+the *healthchecks* base link for the monitors.
+
+Once those are set, we simply use ``curl`` with a couple of special flags to make sure that it fails properly if something goes wrong.
+
+We start the *healthchecks* timer, run the website check and either call the passing or the failing *healthchecks* endpoint depending on the outcomes.
+
+.. code:: text
+
+    $ chmod +x https_healthchecks_monitor.sh
+    $ ./https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219 
+
+Test it out.
+
+
+Okay, that's nice but now what !
+================================
+
+Now, let's hook it up to our cron.
+
+Start with ``crontab -e`` which should open your favorite text editor.
+
+Then create a cron entry (a new line) like the following:
+
+.. code:: text
+
+    */15 * * * * /path/to/https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219 
+
+This will run the script every 15 minutes. Make sure that your timeout is 15 minutes for this check, with a grace period of 5 minutes.
+That configuration will guarantee that you will get notified 20 minutes after any failure, at the worst.
+
+Be aware, I said any failure.
+Getting notified does not guarantee that your website is down.
+It can only guarantee that *healthchecks* wasn't pinged on time.
+
+Getting notified covers a bunch of cases. Some of them are:
+  * The server running the cron is down
+  * The cron services is not running
+  * The server running the cron lost internet access
+  * Your certificate expired
+  * Your website is down
+
+You can create checks to cover most of these if you care to make it a full monitoring system.
+If you want to go that far, maybe you should invest in a monitoring system with more features.
+
+
+Conclusion
+==========
+
+Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
+With a little of scripting, couple of commands and the power of cron we were able to make *healthchecks* monitor our websites.
--- a/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst
+++ b/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst
@ -1,6 +1,6 @@
 .. title: Simple cron monitoring with HealthChecks
 .. date: 2020-02-09
-.. slug: simple_cron_monitoring_with_healthchecks
+.. slug: simple-cron-monitoring-with-healthchecks
 .. updated: 2020-02-09
 .. status: published
 .. tags: monitoring, healthchecks, cron