New blog post about using healthchecks to monitor your website
This commit is contained in:
parent
48de646c64
commit
b4b22ab5d7
2 changed files with 130 additions and 1 deletions
|
@ -0,0 +1,129 @@
|
|||
.. title: Building up simple monitoring on Healthchecks
|
||||
.. date: 2020-02-11
|
||||
.. slug: building-up-simple-monitoring-on-healthchecks
|
||||
.. updated: 2020-02-11
|
||||
.. status: published
|
||||
.. tags: monitoring, healthchecks, cron, curl
|
||||
.. category: monitoring
|
||||
.. authors: Elia el Lazkani
|
||||
.. description:
|
||||
.. type: text
|
||||
|
||||
I talked :doc:`previously <simple-cron-monitoring-with-healthchecks>` about deploying my own simple monitoring system.
|
||||
|
||||
Now that it's up, I'm only using it for my backups. That's a good use, for sure, but I know I can do better.
|
||||
|
||||
So I went digging.
|
||||
|
||||
.. TEASER_END
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
I host a list of services, some are public like my blog while others private.
|
||||
These services are not critical, some can be down for short periods of time.
|
||||
Some services might even be down for longer periods without causing any loss in functionality.
|
||||
|
||||
That being said, I'm a *DevOps engineer*. That means, I need to know.
|
||||
|
||||
Yea, it doesn't mean I'll do something about it right away, but I'd like to be in the know.
|
||||
|
||||
Which got me thinking...
|
||||
|
||||
|
||||
Healthchecks Endpoints
|
||||
======================
|
||||
|
||||
Watching **borg** use its *healthchecks* hook opened my eyes on another functionality of **Healthchecks**.
|
||||
|
||||
It seems that if you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/start``,
|
||||
it will start a counter that will measure the time until you ping ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219``.
|
||||
This way, you can find out how long it is taking you to check on the status of a service. Or maybe, how long a service is taking to backup.
|
||||
|
||||
It turns out that *healthchecks* also offers a different endpoint to ping.
|
||||
You can report a failure straight away by pinging ``https://healthchecks.example.com/ping/84b2a834-02f5-524f-4c27-a2f24562b219/fail``.
|
||||
This way, you do not have to wait until the time expires before you get notified of a failure.
|
||||
|
||||
With those pieces of knowledge, we can do a lot.
|
||||
|
||||
|
||||
A lot ?
|
||||
=======
|
||||
|
||||
Yes, a lot...
|
||||
|
||||
Let's put what we have learned so far into action.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
WEB_HOST=$1
|
||||
CHECK_ID=$2
|
||||
|
||||
HEALTHCHECKS_HOST="https://healthchecks.example.com/ping"
|
||||
|
||||
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/start" > /dev/null
|
||||
|
||||
OUTPUT=`curl -sS "${WEB_HOST}"`
|
||||
STATUS=$?
|
||||
|
||||
if [[ $STATUS -eq 0 ]]; then
|
||||
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}" > /dev/null
|
||||
else
|
||||
curl -fsS --retry 3 "${HEALTHCHECKS_HOST}/${CHECK_ID}/fail" > /dev/null
|
||||
fi
|
||||
|
||||
|
||||
We start by defining a few variables for the website hostname to monitor, the check ID provided by *healthchecks* and finally
|
||||
the *healthchecks* base link for the monitors.
|
||||
|
||||
Once those are set, we simply use ``curl`` with a couple of special flags to make sure that it fails properly if something goes wrong.
|
||||
|
||||
We start the *healthchecks* timer, run the website check and either call the passing or the failing *healthchecks* endpoint depending on the outcomes.
|
||||
|
||||
.. code:: text
|
||||
|
||||
$ chmod +x https_healthchecks_monitor.sh
|
||||
$ ./https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219
|
||||
|
||||
Test it out.
|
||||
|
||||
|
||||
Okay, that's nice but now what !
|
||||
================================
|
||||
|
||||
Now, let's hook it up to our cron.
|
||||
|
||||
Start with ``crontab -e`` which should open your favorite text editor.
|
||||
|
||||
Then create a cron entry (a new line) like the following:
|
||||
|
||||
.. code:: text
|
||||
|
||||
*/15 * * * * /path/to/https_healthchecks_monitor.sh https://healthchecks.example.com 84b2a834-02f5-524f-4c27-a2f24562b219
|
||||
|
||||
This will run the script every 15 minutes. Make sure that your timeout is 15 minutes for this check, with a grace period of 5 minutes.
|
||||
That configuration will guarantee that you will get notified 20 minutes after any failure, at the worst.
|
||||
|
||||
Be aware, I said any failure.
|
||||
Getting notified does not guarantee that your website is down.
|
||||
It can only guarantee that *healthchecks* wasn't pinged on time.
|
||||
|
||||
Getting notified covers a bunch of cases. Some of them are:
|
||||
* The server running the cron is down
|
||||
* The cron services is not running
|
||||
* The server running the cron lost internet access
|
||||
* Your certificate expired
|
||||
* Your website is down
|
||||
|
||||
You can create checks to cover most of these if you care to make it a full monitoring system.
|
||||
If you want to go that far, maybe you should invest in a monitoring system with more features.
|
||||
|
||||
|
||||
Conclusion
|
||||
==========
|
||||
|
||||
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
|
||||
With a little of scripting, couple of commands and the power of cron we were able to make *healthchecks* monitor our websites.
|
|
@ -1,6 +1,6 @@
|
|||
.. title: Simple cron monitoring with HealthChecks
|
||||
.. date: 2020-02-09
|
||||
.. slug: simple_cron_monitoring_with_healthchecks
|
||||
.. slug: simple-cron-monitoring-with-healthchecks
|
||||
.. updated: 2020-02-09
|
||||
.. status: published
|
||||
.. tags: monitoring, healthchecks, cron
|
||||
|
|
Reference in a new issue