chore(): New blog post talking about running Prometheus

This commit is contained in:
Elia el Lazkani 2021-09-17 20:45:34 +02:00
parent 648843bca2
commit 68761158b6
6 changed files with 516 additions and 0 deletions

View file

@ -5041,6 +5041,246 @@ If you want to go that far, maybe you should invest in a monitoring system with
**** Conclusion
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
With a little of scripting, couple of commands and the power of cron we were able to make /healthchecks/ monitor our websites.
*** TODO Upgrade your monitoring setup with Prometheus :prometheus:metrics:container:
:PROPERTIES:
:EXPORT_HUGO_LASTMOD: 2021-09-17
:EXPORT_DATE: 2021-09-17
:EXPORT_FILE_NAME: upgrade-your-monitoring-setup-with-prometheus
:CUSTOM_ID: upgrade-your-monitoring-setup-with-prometheus
:END:
After running simple monitoring for quite a while, I decided to upgrade my
setup. It is about time to get some real metric gathering to see what's going
on. It's also time to get some proper monitoring setup.
There are a lot of options in this field and I should, probably, write a blog
post on my views on the topic. For this experiment, on the other hand, the
solution is already pre-chosen. We'll be running Prometheus.
#+hugo: more
**** Prometheus
To answer the question, /what is Prometheus?/, we'll rip a page out of the
Prometheus [[https://prometheus.io/docs/introduction/overview/][docs]].
#+begin_quote
Prometheus is an open-source systems monitoring and alerting toolkit originally
built at SoundCloud. Since its inception in 2012, many companies and
organizations have adopted Prometheus, and the project has a very active
developer and user community. It is now a standalone open source project and
maintained independently of any company. To emphasize this, and to clarify the
project's governance structure, Prometheus joined the Cloud Native Computing
Foundation in 2016 as the second hosted project, after Kubernetes.
Prometheus collects and stores its metrics as time series data, i.e. metrics
information is stored with the timestamp at which it was recorded, alongside
optional key-value pairs called labels.
#+end_quote
let's decypher all this jargon down to plain English. In simple terms,
Prometheus is a system that scrape metrics, from your services and applications,
and stores those metrics, in a time series database, ready to serve back again
when queried.
Prometheus also offers a way to create rules on those metrics to alert you when
something goes wrong. Combined with [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]], you got yourself a full
monitoring system.
**** Configuration
Now that we briefly touched on a /few/ features of *Prometheus* and before we
can deploy, we need to write our configuration.
This is an example of a bare configuration.
#+NAME: prometheus-scraping-config
#+begin_src yaml
scrape_configs:
- job_name: prometheus
scrape_interval: 30s
static_configs:
- targets:
- prometheus:9090
#+end_src
This will make Prometheus scrape itself every 30 seconds for metrics. At least
you get /some/ metrics to query later. If you want the full experience, I would
suggest you enable /Prometheus metrics/ for your services. Consult the docs of
the project to see if and how it can expose metrics for /Prometheus/ to scrape,
then add the scrape endpoint to your configuration as shown above.
Here's a an example of a couple more, /well known/, projects; [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]] and
[[https://github.com/prometheus/node_exporter][/node exporter/]].
#+NAME: prometheus-example-scraping-config
#+begin_src yaml
- job_name: alertmanager
scrape_interval: 30s
static_configs:
- targets:
- alertmanager:9093
- job_name: node-exporter
scrape_interval: 30s
static_configs:
- targets:
- node-exporter:9100
#+end_src
A wider [[https://prometheus.io/docs/instrumenting/exporters/][list of exporters]] can be found on the Prometheus docs.
**** Deployment
Now that we got ourselves a cofniguration, let's deploy *Prometheus*.
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
using =docker-compose= in this example to make it easier to translate later to
other types of deployments.
#+BEGIN_EXPORT html
<div class="admonition note">
<p class="admonition-title">Note</p>
#+END_EXPORT
I'm still running on =2.x= API version. I know I need to upgrade to a newer
version but that's a bit of networking work. It's an ongoing work.
#+BEGIN_EXPORT html
</div>
#+END_EXPORT
The =docker-compose= file should look like the following.
#+begin_src yaml
---
version: '2.3'
services:
prometheus:
image: quay.io/prometheus/prometheus:v2.27.0
container_name: prometheus
mem_limit: 400m
mem_reservation: 300m
restart: unless-stopped
command:
- --config.file=/etc/prometheus/prometheus.yml
- --web.external-url=http://prometheus.localhost/
volumes:
- "./prometheus/:/etc/prometheus/:ro"
ports:
- "80:9090"
#+end_src
A few things to *note*, especially for the new container crowd. The container
image *version* is explicitly specified, do *not* use =latest= in production.
To make sure I don't overload my host, I set memory limits. I don't mind if it
goes down, this is a PoC (Proof of Concept) for the time being. In your case,
you might want to choose higher limits to give it more room to breath. When the
memory limit is reached, the container will be killed with /Out Of Memory/
error.
In the *command* section, I specify the /external url/ for Prometheus to
redirect me correctly. This is what Prometheus thinks its own hostname is. I
also specify the configuration file, previously written, which I mount as
/read-only/ in the *volumes* section.
Finally, we need to port-forward =9090= to our hosts' =80= if possible to access
*Prometheus*. Otherwise, figure out a way to route it properly. This is a local
installation, which is suggested by the Prometheus /hostname/.
If you made it so far, you should be able to run this with no issues.
#+begin_src bash
docker-compose up -d
#+end_src
**** Prometheus Rules
*Prometheus* supports *two* types of rules; recording and alerting. Let's expand
a little bit on those two concepts.
***** Recording Rules
First, let's start off with [[https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/][recording rules]]. I don't think I can explain it
better than the *Prometheus* documentation which says.
#+begin_quote
Recording rules allow you to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series.
Querying the precomputed result will then often be much faster than executing
the original expression every time it is needed. This is especially useful for
dashboards, which need to query the same expression repeatedly every time they
refresh.
#+end_quote
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
create recording rules yet for my setup so I'll forgo this step.
***** Alerting Rules
As the name suggests, [[https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules][alerting rules]] allow you to define conditional expressions
based on metrics which will trigger notifications to alert you.
This is a very simple example of an /alert rule/ that monitors all the endpoints
scraped by /Prometheus/ to see if any of them is down. If this expression return
a result, an alert will fire from /Prometheus/.
#+begin_src yaml
groups:
- name: Instance down
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
#+end_src
To be able to add this alert to *Prometheus*, we need to save it in a
=rules.yml= file and then include it in the *Prometheus* configuration as follows.
#+NAME: prometheus-rule-files-config
#+begin_src yaml
rule_files:
- "rules.yml"
#+end_src
Making the configuration intiretly as follows.
#+begin_src yaml :noweb yes
<<prometheus-rule-files-config>>
<<prometheus-scraping-config>>
<<prometheus-example-scraping-config>>
#+end_src
At this point, make sure everything is mounted into the container properly and
rerun your *Prometheus*.
**** Prometheus UI
Congratulations if you've made it so far. If you visit http://localhost/ at
stage you should get to Prometheus where you can query your metrics.
#+caption: Prometheus overview
#+attr_html: :target _blank
[[file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png][file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png]]
You can get all sorts of information under the /status/ drop-down menu.
#+caption: Prometheus Status drop-down menu
#+attr_html: :target _blank
[[file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png][file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png]]
**** Conclusion
As you can see, deploying *Prometheus* is not too hard. If you're running
/Kubernetes/, make sure you use the operator. It will make your life a lot
easier in all sorts of things.
Take your time to familiarise yourself with *Prometheus* and consult the
documentation as much as possible. It is well written and in most cases your
best friend. Figure out different ways to create rules for recording and
alerting. Most people at this stage deploy *Grafana* to start visualizing their
metrics. Well... Not in this blog post we ain't !
I hope you enjoy playing around with *Prometheus* and until the next post.
** Nikola :@nikola:
*** DONE Welcome back to the old world :blog:org_mode:emacs:rst:
:PROPERTIES:

View file

@ -0,0 +1,264 @@
+++
title = "Upgrade your monitoring setup with Prometheus"
author = ["Elia el Lazkani"]
date = 2021-09-17
lastmod = 2021-09-17
tags = ["prometheus", "metrics", "container"]
categories = ["monitoring"]
draft = true
+++
After running simple monitoring for quite a while, I decided to upgrade my
setup. It is about time to get some real metric gathering to see what's going
on. It's also time to get some proper monitoring setup.
There are a lot of options in this field and I should, probably, write a blog
post on my views on the topic. For this experiment, on the other hand, the
solution is already pre-chosen. We'll be running Prometheus.
<!--more-->
## Prometheus {#prometheus}
To answer the question, _what is Prometheus?_, we'll rip a page out of the
Prometheus [docs](https://prometheus.io/docs/introduction/overview/).
> Prometheus is an open-source systems monitoring and alerting toolkit originally
> built at SoundCloud. Since its inception in 2012, many companies and
> organizations have adopted Prometheus, and the project has a very active
> developer and user community. It is now a standalone open source project and
> maintained independently of any company. To emphasize this, and to clarify the
> project's governance structure, Prometheus joined the Cloud Native Computing
> Foundation in 2016 as the second hosted project, after Kubernetes.
>
> Prometheus collects and stores its metrics as time series data, i.e. metrics
> information is stored with the timestamp at which it was recorded, alongside
> optional key-value pairs called labels.
let's decypher all this jargon down to plain English. In simple terms,
Prometheus is a system that scrape metrics, from your services and applications,
and stores those metrics, in a time series database, ready to serve back again
when queried.
Prometheus also offers a way to create rules on those metrics to alert you when
something goes wrong. Combined with [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/), you got yourself a full
monitoring system.
## Configuration {#configuration}
Now that we briefly touched on a _few_ features of **Prometheus** and before we
can deploy, we need to write our configuration.
This is an example of a bare configuration.
<a id="code-snippet--prometheus-scraping-config"></a>
```yaml
scrape_configs:
- job_name: prometheus
scrape_interval: 30s
static_configs:
- targets:
- prometheus:9090
```
This will make Prometheus scrape itself every 30 seconds for metrics. At least
you get _some_ metrics to query later. If you want the full experience, I would
suggest you enable _Prometheus metrics_ for your services. Consult the docs of
the project to see if and how it can expose metrics for _Prometheus_ to scrape,
then add the scrape endpoint to your configuration as shown above.
Here's a an example of a couple more, _well known_, projects; [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/) and
[_node exporter_](https://github.com/prometheus/node%5Fexporter).
<a id="code-snippet--prometheus-example-scraping-config"></a>
```yaml
- job_name: alertmanager
scrape_interval: 30s
static_configs:
- targets:
- alertmanager:9093
- job_name: node-exporter
scrape_interval: 30s
static_configs:
- targets:
- node-exporter:9100
```
A wider [list of exporters](https://prometheus.io/docs/instrumenting/exporters/) can be found on the Prometheus docs.
## Deployment {#deployment}
Now that we got ourselves a cofniguration, let's deploy **Prometheus**.
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
using `docker-compose` in this example to make it easier to translate later to
other types of deployments.
<div class="admonition note">
<p class="admonition-title">Note</p>
I'm still running on `2.x` API version. I know I need to upgrade to a newer
version but that's a bit of networking work. It's an ongoing work.
</div>
The `docker-compose` file should look like the following.
```yaml
---
version: '2.3'
services:
prometheus:
image: quay.io/prometheus/prometheus:v2.27.0
container_name: prometheus
mem_limit: 400m
mem_reservation: 300m
restart: unless-stopped
command:
- --config.file=/etc/prometheus/prometheus.yml
- --web.external-url=http://prometheus.localhost/
volumes:
- "./prometheus/:/etc/prometheus/:ro"
ports:
- "80:9090"
```
A few things to **note**, especially for the new container crowd. The container
image **version** is explicitly specified, do **not** use `latest` in production.
To make sure I don't overload my host, I set memory limits. I don't mind if it
goes down, this is a PoC (Proof of Concept) for the time being. In your case,
you might want to choose higher limits to give it more room to breath. When the
memory limit is reached, the container will be killed with _Out Of Memory_
error.
In the **command** section, I specify the _external url_ for Prometheus to
redirect me correctly. This is what Prometheus thinks its own hostname is. I
also specify the configuration file, previously written, which I mount as
_read-only_ in the **volumes** section.
Finally, we need to port-forward `9090` to our hosts' `80` if possible to access
**Prometheus**. Otherwise, figure out a way to route it properly. This is a local
installation, which is suggested by the Prometheus _hostname_.
If you made it so far, you should be able to run this with no issues.
```bash
docker-compose up -d
```
## Prometheus Rules {#prometheus-rules}
**Prometheus** supports **two** types of rules; recording and alerting. Let's expand
a little bit on those two concepts.
### Recording Rules {#recording-rules}
First, let's start off with [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording%5Frules/). I don't think I can explain it
better than the **Prometheus** documentation which says.
> Recording rules allow you to precompute frequently needed or computationally
> expensive expressions and save their result as a new set of time series.
> Querying the precomputed result will then often be much faster than executing
> the original expression every time it is needed. This is especially useful for
> dashboards, which need to query the same expression repeatedly every time they
> refresh.
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
create recording rules yet for my setup so I'll forgo this step.
### Alerting Rules {#alerting-rules}
As the name suggests, [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting%5Frules/#alerting-rules) allow you to define conditional expressions
based on metrics which will trigger notifications to alert you.
This is a very simple example of an _alert rule_ that monitors all the endpoints
scraped by _Prometheus_ to see if any of them is down. If this expression return
a result, an alert will fire from _Prometheus_.
```yaml
groups:
- name: Instance down
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
```
To be able to add this alert to **Prometheus**, we need to save it in a
`rules.yml` file and then include it in the **Prometheus** configuration as follows.
<a id="code-snippet--prometheus-rule-files-config"></a>
```yaml
rule_files:
- "rules.yml"
```
Making the configuration intiretly as follows.
```yaml
rule_files:
- "rules.yml"
scrape_configs:
- job_name: prometheus
scrape_interval: 30s
static_configs:
- targets:
- prometheus:9090
- job_name: alertmanager
scrape_interval: 30s
static_configs:
- targets:
- alertmanager:9093
- job_name: node-exporter
scrape_interval: 30s
static_configs:
- targets:
- node-exporter:9100
```
At this point, make sure everything is mounted into the container properly and
rerun your **Prometheus**.
## Prometheus UI {#prometheus-ui}
Congratulations if you've made it so far. If you visit <http://localhost/> at
stage you should get to Prometheus where you can query your metrics.
{{< figure src="/ox-hugo/01-prometheus-overview.png" caption="Figure 1: Prometheus overview" target="_blank" link="/ox-hugo/01-prometheus-overview.png" >}}
You can get all sorts of information under the _status_ drop-down menu.
{{< figure src="/ox-hugo/02-prometheus-status-drop-down-menu.png" caption="Figure 2: Prometheus Status drop-down menu" target="_blank" link="/ox-hugo/02-prometheus-status-drop-down-menu.png" >}}
## Conclusion {#conclusion}
As you can see, deploying **Prometheus** is not too hard. If you're running
_Kubernetes_, make sure you use the operator. It will make your life a lot
easier in all sorts of things.
Take your time to familiarise yourself with **Prometheus** and consult the
documentation as much as possible. It is well written and in most cases your
best friend. Figure out different ways to create rules for recording and
alerting. Most people at this stage deploy **Grafana** to start visualizing their
metrics. Well... Not in this blog post we ain't !
I hope you enjoy playing around with **Prometheus** and until the next post.

BIN
static/ox-hugo/01-prometheus-overview.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
static/ox-hugo/02-prometheus-status-drop-down-menu.png (Stored with Git LFS) Normal file

Binary file not shown.