diff --git a/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst b/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst new file mode 100644 index 0000000..d4ea3bf --- /dev/null +++ b/posts/monitoring/building_up_simple_monitoring_on_healthchecks.rst @@ -0,0 +1,129 @@ +.. title: Building up simple monitoring on Healthchecks +.. date: 2020-02-11 +.. slug: building-up-simple-monitoring-on-healthchecks +.. updated: 2020-02-11 +.. status: published +.. tags: monitoring, healthchecks, cron, curl +.. category: monitoring +.. authors: Elia el Lazkani +.. description: +.. type: text + +I talked :doc:`previously ` about deploying my own simple monitoring system. + +Now that it's up, I'm only using it for my backups. That's a good use, for sure, but I know I can do better. + +So I went digging. + +.. TEASER_END + + +Introduction +============ + +I host a list of services, some are public like my blog while others private. +These services are not critical, some can be down for short periods of time. +Some services might even be down for longer periods without causing any loss in functionality. + +That being said, I'm a *DevOps engineer*. That means, I need to know. + +Yea, it doesn't mean I'll do something about it right away, but I'd like to be in the know. + +Which got me thinking... + + +Healthchecks Endpoints +====================== + +Watching **borg** use its *healthchecks* hook opened my eyes on another functionality of **Healthchecks**. + +It seems that if you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/start``, +it will start a counter that will measure the time until you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219``. +This way, you can find out how long it is taking you to check on the status of a service. Or maybe, how long a service is taking to backup. + +It turns out that *healthchecks* also offers a different endpoint to ping. +You can report a failure straight away by pinging ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/fail``. +This way, you do not have to wait until the time expires before you get notified of a failure. + +With those pieces of knowledge, we can do a lot. + + +A lot ? +======= + +Yes, a lot... + +Let's put what we have learned so far into action. + +.. code:: bash + + #!/bin/bash + + WEB_HOST=$1 + CHECK_ID=$2 + + HEALTHCHECKS_HOST="https://healthchecks.example.com/ping" + + curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/start" > /dev/null + + OUTPUT=`curl -sS "${WEB_HOST}"` + STATUS=$? + + if [[ $STATUS -eq 0 ]]; then + curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}" > /dev/null + else + curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/fail" > /dev/null + fi + + +We start by defining a few variables for the website hostname to monitor, the check ID provided by *healthchecks* and finally +the *healthchecks* base link for the monitors. + +Once those are set, we simply use ``curl`` with a couple of special flags to make sure that it fails properly if something goes wrong. + +We start the *healthchecks* timer, run the website check and either call the passing or the failing *healthchecks* endpoint depending on the outcomes. + +.. code:: text + + $ chmod +x https_healthchecks_monitor.sh + $ ./https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219 + +Test it out. + + +Okay, that's nice but now what ! +================================ + +Now, let's hook it up to our cron. + +Start with ``crontab -e`` which should open your favorite text editor. + +Then create a cron entry (a new line) like the following: + +.. code:: text + + */15 * * * * /path/to/https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219 + +This will run the script every 15 minutes. Make sure that your timeout is 15 minutes for this check, with a grace period of 5 minutes. +That configuration will guarantee that you will get notified 20 minutes after any failure, at the worst. + +Be aware, I said any failure. +Getting notified does not guarantee that your website is down. +It can only guarantee that *healthchecks* wasn't pinged on time. + +Getting notified covers a bunch of cases. Some of them are: + * The server running the cron is down + * The cron services is not running + * The server running the cron lost internet access + * Your certificate expired + * Your website is down + +You can create checks to cover most of these if you care to make it a full monitoring system. +If you want to go that far, maybe you should invest in a monitoring system with more features. + + +Conclusion +========== + +Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful. +With a little of scripting, couple of commands and the power of cron we were able to make *healthchecks* monitor our websites. diff --git a/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst b/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst index 8a368a0..0b278d9 100644 --- a/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst +++ b/posts/monitoring/simple_cron_monitoring_with_healthchecks.rst @@ -1,6 +1,6 @@ .. title: Simple cron monitoring with HealthChecks .. date: 2020-02-09 -.. slug: simple_cron_monitoring_with_healthchecks +.. slug: simple-cron-monitoring-with-healthchecks .. updated: 2020-02-09 .. status: published .. tags: monitoring, healthchecks, cron