New blog post about using healthchecks to monitor your website
This commit is contained in:
parent
48de646c64
commit
b4b22ab5d7
2 changed files with 130 additions and 1 deletions
|
@ -0,0 +1,129 @@
|
||||||
|
.. title: Building up simple monitoring on Healthchecks
|
||||||
|
.. date: 2020-02-11
|
||||||
|
.. slug: building-up-simple-monitoring-on-healthchecks
|
||||||
|
.. updated: 2020-02-11
|
||||||
|
.. status: published
|
||||||
|
.. tags: monitoring, healthchecks, cron, curl
|
||||||
|
.. category: monitoring
|
||||||
|
.. authors: Elia el Lazkani
|
||||||
|
.. description:
|
||||||
|
.. type: text
|
||||||
|
|
||||||
|
I talked :doc:`previously <simple-cron-monitoring-with-healthchecks>` about deploying my own simple monitoring system.
|
||||||
|
|
||||||
|
Now that it's up, I'm only using it for my backups. That's a good use, for sure, but I know I can do better.
|
||||||
|
|
||||||
|
So I went digging.
|
||||||
|
|
||||||
|
.. TEASER_END
|
||||||
|
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
I host a list of services, some are public like my blog while others private.
|
||||||
|
These services are not critical, some can be down for short periods of time.
|
||||||
|
Some services might even be down for longer periods without causing any loss in functionality.
|
||||||
|
|
||||||
|
That being said, I'm a *DevOps engineer*. That means, I need to know.
|
||||||
|
|
||||||
|
Yea, it doesn't mean I'll do something about it right away, but I'd like to be in the know.
|
||||||
|
|
||||||
|
Which got me thinking...
|
||||||
|
|
||||||
|
|
||||||
|
Healthchecks Endpoints
|
||||||
|
======================
|
||||||
|
|
||||||
|
Watching **borg** use its *healthchecks* hook opened my eyes on another functionality of **Healthchecks**.
|
||||||
|
|
||||||
|
It seems that if you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/start``,
|
||||||
|
it will start a counter that will measure the time until you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219``.
|
||||||
|
This way, you can find out how long it is taking you to check on the status of a service. Or maybe, how long a service is taking to backup.
|
||||||
|
|
||||||
|
It turns out that *healthchecks* also offers a different endpoint to ping.
|
||||||
|
You can report a failure straight away by pinging ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/fail``.
|
||||||
|
This way, you do not have to wait until the time expires before you get notified of a failure.
|
||||||
|
|
||||||
|
With those pieces of knowledge, we can do a lot.
|
||||||
|
|
||||||
|
|
||||||
|
A lot ?
|
||||||
|
=======
|
||||||
|
|
||||||
|
Yes, a lot...
|
||||||
|
|
||||||
|
Let's put what we have learned so far into action.
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
WEB_HOST=$1
|
||||||
|
CHECK_ID=$2
|
||||||
|
|
||||||
|
HEALTHCHECKS_HOST="https://healthchecks.example.com/ping"
|
||||||
|
|
||||||
|
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/start" > /dev/null
|
||||||
|
|
||||||
|
OUTPUT=`curl -sS "${WEB_HOST}"`
|
||||||
|
STATUS=$?
|
||||||
|
|
||||||
|
if [[ $STATUS -eq 0 ]]; then
|
||||||
|
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}" > /dev/null
|
||||||
|
else
|
||||||
|
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/fail" > /dev/null
|
||||||
|
fi
|
||||||
|
|
||||||
|
|
||||||
|
We start by defining a few variables for the website hostname to monitor, the check ID provided by *healthchecks* and finally
|
||||||
|
the *healthchecks* base link for the monitors.
|
||||||
|
|
||||||
|
Once those are set, we simply use ``curl`` with a couple of special flags to make sure that it fails properly if something goes wrong.
|
||||||
|
|
||||||
|
We start the *healthchecks* timer, run the website check and either call the passing or the failing *healthchecks* endpoint depending on the outcomes.
|
||||||
|
|
||||||
|
.. code:: text
|
||||||
|
|
||||||
|
$ chmod +x https_healthchecks_monitor.sh
|
||||||
|
$ ./https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219
|
||||||
|
|
||||||
|
Test it out.
|
||||||
|
|
||||||
|
|
||||||
|
Okay, that's nice but now what !
|
||||||
|
================================
|
||||||
|
|
||||||
|
Now, let's hook it up to our cron.
|
||||||
|
|
||||||
|
Start with ``crontab -e`` which should open your favorite text editor.
|
||||||
|
|
||||||
|
Then create a cron entry (a new line) like the following:
|
||||||
|
|
||||||
|
.. code:: text
|
||||||
|
|
||||||
|
*/15 * * * * /path/to/https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219
|
||||||
|
|
||||||
|
This will run the script every 15 minutes. Make sure that your timeout is 15 minutes for this check, with a grace period of 5 minutes.
|
||||||
|
That configuration will guarantee that you will get notified 20 minutes after any failure, at the worst.
|
||||||
|
|
||||||
|
Be aware, I said any failure.
|
||||||
|
Getting notified does not guarantee that your website is down.
|
||||||
|
It can only guarantee that *healthchecks* wasn't pinged on time.
|
||||||
|
|
||||||
|
Getting notified covers a bunch of cases. Some of them are:
|
||||||
|
* The server running the cron is down
|
||||||
|
* The cron services is not running
|
||||||
|
* The server running the cron lost internet access
|
||||||
|
* Your certificate expired
|
||||||
|
* Your website is down
|
||||||
|
|
||||||
|
You can create checks to cover most of these if you care to make it a full monitoring system.
|
||||||
|
If you want to go that far, maybe you should invest in a monitoring system with more features.
|
||||||
|
|
||||||
|
|
||||||
|
Conclusion
|
||||||
|
==========
|
||||||
|
|
||||||
|
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
|
||||||
|
With a little of scripting, couple of commands and the power of cron we were able to make *healthchecks* monitor our websites.
|
|
@ -1,6 +1,6 @@
|
||||||
.. title: Simple cron monitoring with HealthChecks
|
.. title: Simple cron monitoring with HealthChecks
|
||||||
.. date: 2020-02-09
|
.. date: 2020-02-09
|
||||||
.. slug: simple_cron_monitoring_with_healthchecks
|
.. slug: simple-cron-monitoring-with-healthchecks
|
||||||
.. updated: 2020-02-09
|
.. updated: 2020-02-09
|
||||||
.. status: published
|
.. status: published
|
||||||
.. tags: monitoring, healthchecks, cron
|
.. tags: monitoring, healthchecks, cron
|
||||||
|
|
Reference in a new issue