New blog post about using healthchecks to monitor your website

This commit is contained in:
Elia el Lazkani 2020-02-11 20:58:47 +01:00 committed by Elia El Lazkani
parent 48de646c64
commit b4b22ab5d7
No known key found for this signature in database
GPG key ID: FBD81F2B1F488C2B
2 changed files with 130 additions and 1 deletions

View file

@ -0,0 +1,129 @@
.. title: Building up simple monitoring on Healthchecks
.. date: 2020-02-11
.. slug: building-up-simple-monitoring-on-healthchecks
.. updated: 2020-02-11
.. status: published
.. tags: monitoring, healthchecks, cron, curl
.. category: monitoring
.. authors: Elia el Lazkani
.. description:
.. type: text
I talked :doc:`previously <simple-cron-monitoring-with-healthchecks>` about deploying my own simple monitoring system.
Now that it's up, I'm only using it for my backups. That's a good use, for sure, but I know I can do better.
So I went digging.
.. TEASER_END
Introduction
============
I host a list of services, some are public like my blog while others private.
These services are not critical, some can be down for short periods of time.
Some services might even be down for longer periods without causing any loss in functionality.
That being said, I'm a *DevOps engineer*. That means, I need to know.
Yea, it doesn't mean I'll do something about it right away, but I'd like to be in the know.
Which got me thinking...
Healthchecks Endpoints
======================
Watching **borg** use its *healthchecks* hook opened my eyes on another functionality of **Healthchecks**.
It seems that if you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/start``,
it will start a counter that will measure the time until you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219``.
This way, you can find out how long it is taking you to check on the status of a service. Or maybe, how long a service is taking to backup.
It turns out that *healthchecks* also offers a different endpoint to ping.
You can report a failure straight away by pinging ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/fail``.
This way, you do not have to wait until the time expires before you get notified of a failure.
With those pieces of knowledge, we can do a lot.
A lot ?
=======
Yes, a lot...
Let's put what we have learned so far into action.
.. code:: bash
#!/bin/bash
WEB_HOST=$1
CHECK_ID=$2
HEALTHCHECKS_HOST="https://healthchecks.example.com/ping"
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/start" > /dev/null
OUTPUT=`curl -sS "${WEB_HOST}"`
STATUS=$?
if [[ $STATUS -eq 0 ]]; then
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}" > /dev/null
else
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/fail" > /dev/null
fi
We start by defining a few variables for the website hostname to monitor, the check ID provided by *healthchecks* and finally
the *healthchecks* base link for the monitors.
Once those are set, we simply use ``curl`` with a couple of special flags to make sure that it fails properly if something goes wrong.
We start the *healthchecks* timer, run the website check and either call the passing or the failing *healthchecks* endpoint depending on the outcomes.
.. code:: text
$ chmod +x https_healthchecks_monitor.sh
$ ./https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219
Test it out.
Okay, that's nice but now what !
================================
Now, let's hook it up to our cron.
Start with ``crontab -e`` which should open your favorite text editor.
Then create a cron entry (a new line) like the following:
.. code:: text
*/15 * * * * /path/to/https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219
This will run the script every 15 minutes. Make sure that your timeout is 15 minutes for this check, with a grace period of 5 minutes.
That configuration will guarantee that you will get notified 20 minutes after any failure, at the worst.
Be aware, I said any failure.
Getting notified does not guarantee that your website is down.
It can only guarantee that *healthchecks* wasn't pinged on time.
Getting notified covers a bunch of cases. Some of them are:
* The server running the cron is down
* The cron services is not running
* The server running the cron lost internet access
* Your certificate expired
* Your website is down
You can create checks to cover most of these if you care to make it a full monitoring system.
If you want to go that far, maybe you should invest in a monitoring system with more features.
Conclusion
==========
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
With a little of scripting, couple of commands and the power of cron we were able to make *healthchecks* monitor our websites.

View file

@ -1,6 +1,6 @@
.. title: Simple cron monitoring with HealthChecks
.. date: 2020-02-09
.. slug: simple_cron_monitoring_with_healthchecks
.. slug: simple-cron-monitoring-with-healthchecks
.. updated: 2020-02-09
.. status: published
.. tags: monitoring, healthchecks, cron