chore(): New blog post talking about running Prometheus
This commit is contained in:
parent
648843bca2
commit
68761158b6
6 changed files with 516 additions and 0 deletions
|
@ -5041,6 +5041,246 @@ If you want to go that far, maybe you should invest in a monitoring system with
|
|||
**** Conclusion
|
||||
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
|
||||
With a little of scripting, couple of commands and the power of cron we were able to make /healthchecks/ monitor our websites.
|
||||
*** TODO Upgrade your monitoring setup with Prometheus :prometheus:metrics:container:
|
||||
:PROPERTIES:
|
||||
:EXPORT_HUGO_LASTMOD: 2021-09-17
|
||||
:EXPORT_DATE: 2021-09-17
|
||||
:EXPORT_FILE_NAME: upgrade-your-monitoring-setup-with-prometheus
|
||||
:CUSTOM_ID: upgrade-your-monitoring-setup-with-prometheus
|
||||
:END:
|
||||
|
||||
After running simple monitoring for quite a while, I decided to upgrade my
|
||||
setup. It is about time to get some real metric gathering to see what's going
|
||||
on. It's also time to get some proper monitoring setup.
|
||||
|
||||
There are a lot of options in this field and I should, probably, write a blog
|
||||
post on my views on the topic. For this experiment, on the other hand, the
|
||||
solution is already pre-chosen. We'll be running Prometheus.
|
||||
|
||||
#+hugo: more
|
||||
|
||||
**** Prometheus
|
||||
To answer the question, /what is Prometheus?/, we'll rip a page out of the
|
||||
Prometheus [[https://prometheus.io/docs/introduction/overview/][docs]].
|
||||
|
||||
#+begin_quote
|
||||
Prometheus is an open-source systems monitoring and alerting toolkit originally
|
||||
built at SoundCloud. Since its inception in 2012, many companies and
|
||||
organizations have adopted Prometheus, and the project has a very active
|
||||
developer and user community. It is now a standalone open source project and
|
||||
maintained independently of any company. To emphasize this, and to clarify the
|
||||
project's governance structure, Prometheus joined the Cloud Native Computing
|
||||
Foundation in 2016 as the second hosted project, after Kubernetes.
|
||||
|
||||
Prometheus collects and stores its metrics as time series data, i.e. metrics
|
||||
information is stored with the timestamp at which it was recorded, alongside
|
||||
optional key-value pairs called labels.
|
||||
#+end_quote
|
||||
|
||||
let's decypher all this jargon down to plain English. In simple terms,
|
||||
Prometheus is a system that scrape metrics, from your services and applications,
|
||||
and stores those metrics, in a time series database, ready to serve back again
|
||||
when queried.
|
||||
|
||||
Prometheus also offers a way to create rules on those metrics to alert you when
|
||||
something goes wrong. Combined with [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]], you got yourself a full
|
||||
monitoring system.
|
||||
|
||||
**** Configuration
|
||||
Now that we briefly touched on a /few/ features of *Prometheus* and before we
|
||||
can deploy, we need to write our configuration.
|
||||
|
||||
This is an example of a bare configuration.
|
||||
|
||||
#+NAME: prometheus-scraping-config
|
||||
#+begin_src yaml
|
||||
scrape_configs:
|
||||
- job_name: prometheus
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- prometheus:9090
|
||||
#+end_src
|
||||
|
||||
This will make Prometheus scrape itself every 30 seconds for metrics. At least
|
||||
you get /some/ metrics to query later. If you want the full experience, I would
|
||||
suggest you enable /Prometheus metrics/ for your services. Consult the docs of
|
||||
the project to see if and how it can expose metrics for /Prometheus/ to scrape,
|
||||
then add the scrape endpoint to your configuration as shown above.
|
||||
|
||||
Here's a an example of a couple more, /well known/, projects; [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]] and
|
||||
[[https://github.com/prometheus/node_exporter][/node exporter/]].
|
||||
|
||||
#+NAME: prometheus-example-scraping-config
|
||||
#+begin_src yaml
|
||||
- job_name: alertmanager
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
|
||||
- job_name: node-exporter
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- node-exporter:9100
|
||||
#+end_src
|
||||
|
||||
A wider [[https://prometheus.io/docs/instrumenting/exporters/][list of exporters]] can be found on the Prometheus docs.
|
||||
|
||||
**** Deployment
|
||||
Now that we got ourselves a cofniguration, let's deploy *Prometheus*.
|
||||
|
||||
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
|
||||
using =docker-compose= in this example to make it easier to translate later to
|
||||
other types of deployments.
|
||||
|
||||
#+BEGIN_EXPORT html
|
||||
<div class="admonition note">
|
||||
<p class="admonition-title">Note</p>
|
||||
#+END_EXPORT
|
||||
I'm still running on =2.x= API version. I know I need to upgrade to a newer
|
||||
version but that's a bit of networking work. It's an ongoing work.
|
||||
#+BEGIN_EXPORT html
|
||||
</div>
|
||||
#+END_EXPORT
|
||||
|
||||
The =docker-compose= file should look like the following.
|
||||
|
||||
#+begin_src yaml
|
||||
---
|
||||
version: '2.3'
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: quay.io/prometheus/prometheus:v2.27.0
|
||||
container_name: prometheus
|
||||
mem_limit: 400m
|
||||
mem_reservation: 300m
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- --config.file=/etc/prometheus/prometheus.yml
|
||||
- --web.external-url=http://prometheus.localhost/
|
||||
volumes:
|
||||
- "./prometheus/:/etc/prometheus/:ro"
|
||||
ports:
|
||||
- "80:9090"
|
||||
#+end_src
|
||||
|
||||
A few things to *note*, especially for the new container crowd. The container
|
||||
image *version* is explicitly specified, do *not* use =latest= in production.
|
||||
|
||||
To make sure I don't overload my host, I set memory limits. I don't mind if it
|
||||
goes down, this is a PoC (Proof of Concept) for the time being. In your case,
|
||||
you might want to choose higher limits to give it more room to breath. When the
|
||||
memory limit is reached, the container will be killed with /Out Of Memory/
|
||||
error.
|
||||
|
||||
In the *command* section, I specify the /external url/ for Prometheus to
|
||||
redirect me correctly. This is what Prometheus thinks its own hostname is. I
|
||||
also specify the configuration file, previously written, which I mount as
|
||||
/read-only/ in the *volumes* section.
|
||||
|
||||
Finally, we need to port-forward =9090= to our hosts' =80= if possible to access
|
||||
*Prometheus*. Otherwise, figure out a way to route it properly. This is a local
|
||||
installation, which is suggested by the Prometheus /hostname/.
|
||||
|
||||
If you made it so far, you should be able to run this with no issues.
|
||||
|
||||
#+begin_src bash
|
||||
docker-compose up -d
|
||||
#+end_src
|
||||
|
||||
**** Prometheus Rules
|
||||
*Prometheus* supports *two* types of rules; recording and alerting. Let's expand
|
||||
a little bit on those two concepts.
|
||||
|
||||
***** Recording Rules
|
||||
First, let's start off with [[https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/][recording rules]]. I don't think I can explain it
|
||||
better than the *Prometheus* documentation which says.
|
||||
|
||||
#+begin_quote
|
||||
Recording rules allow you to precompute frequently needed or computationally
|
||||
expensive expressions and save their result as a new set of time series.
|
||||
Querying the precomputed result will then often be much faster than executing
|
||||
the original expression every time it is needed. This is especially useful for
|
||||
dashboards, which need to query the same expression repeatedly every time they
|
||||
refresh.
|
||||
#+end_quote
|
||||
|
||||
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
|
||||
create recording rules yet for my setup so I'll forgo this step.
|
||||
|
||||
***** Alerting Rules
|
||||
As the name suggests, [[https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules][alerting rules]] allow you to define conditional expressions
|
||||
based on metrics which will trigger notifications to alert you.
|
||||
|
||||
This is a very simple example of an /alert rule/ that monitors all the endpoints
|
||||
scraped by /Prometheus/ to see if any of them is down. If this expression return
|
||||
a result, an alert will fire from /Prometheus/.
|
||||
|
||||
#+begin_src yaml
|
||||
groups:
|
||||
- name: Instance down
|
||||
rules:
|
||||
- alert: InstanceDown
|
||||
expr: up == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: page
|
||||
annotations:
|
||||
summary: "Instance {{ $labels.instance }} down"
|
||||
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
|
||||
#+end_src
|
||||
|
||||
To be able to add this alert to *Prometheus*, we need to save it in a
|
||||
=rules.yml= file and then include it in the *Prometheus* configuration as follows.
|
||||
|
||||
#+NAME: prometheus-rule-files-config
|
||||
#+begin_src yaml
|
||||
rule_files:
|
||||
- "rules.yml"
|
||||
#+end_src
|
||||
|
||||
Making the configuration intiretly as follows.
|
||||
|
||||
#+begin_src yaml :noweb yes
|
||||
<<prometheus-rule-files-config>>
|
||||
|
||||
<<prometheus-scraping-config>>
|
||||
|
||||
<<prometheus-example-scraping-config>>
|
||||
#+end_src
|
||||
|
||||
At this point, make sure everything is mounted into the container properly and
|
||||
rerun your *Prometheus*.
|
||||
|
||||
**** Prometheus UI
|
||||
Congratulations if you've made it so far. If you visit http://localhost/ at
|
||||
stage you should get to Prometheus where you can query your metrics.
|
||||
|
||||
#+caption: Prometheus overview
|
||||
#+attr_html: :target _blank
|
||||
[[file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png][file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png]]
|
||||
|
||||
You can get all sorts of information under the /status/ drop-down menu.
|
||||
|
||||
#+caption: Prometheus Status drop-down menu
|
||||
#+attr_html: :target _blank
|
||||
[[file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png][file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png]]
|
||||
|
||||
**** Conclusion
|
||||
As you can see, deploying *Prometheus* is not too hard. If you're running
|
||||
/Kubernetes/, make sure you use the operator. It will make your life a lot
|
||||
easier in all sorts of things.
|
||||
|
||||
Take your time to familiarise yourself with *Prometheus* and consult the
|
||||
documentation as much as possible. It is well written and in most cases your
|
||||
best friend. Figure out different ways to create rules for recording and
|
||||
alerting. Most people at this stage deploy *Grafana* to start visualizing their
|
||||
metrics. Well... Not in this blog post we ain't !
|
||||
|
||||
I hope you enjoy playing around with *Prometheus* and until the next post.
|
||||
** Nikola :@nikola:
|
||||
*** DONE Welcome back to the old world :blog:org_mode:emacs:rst:
|
||||
:PROPERTIES:
|
||||
|
|
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
Binary file not shown.
264
content/posts/upgrade-your-monitoring-setup-with-prometheus.md
Normal file
264
content/posts/upgrade-your-monitoring-setup-with-prometheus.md
Normal file
|
@ -0,0 +1,264 @@
|
|||
+++
|
||||
title = "Upgrade your monitoring setup with Prometheus"
|
||||
author = ["Elia el Lazkani"]
|
||||
date = 2021-09-17
|
||||
lastmod = 2021-09-17
|
||||
tags = ["prometheus", "metrics", "container"]
|
||||
categories = ["monitoring"]
|
||||
draft = true
|
||||
+++
|
||||
|
||||
After running simple monitoring for quite a while, I decided to upgrade my
|
||||
setup. It is about time to get some real metric gathering to see what's going
|
||||
on. It's also time to get some proper monitoring setup.
|
||||
|
||||
There are a lot of options in this field and I should, probably, write a blog
|
||||
post on my views on the topic. For this experiment, on the other hand, the
|
||||
solution is already pre-chosen. We'll be running Prometheus.
|
||||
|
||||
<!--more-->
|
||||
|
||||
|
||||
## Prometheus {#prometheus}
|
||||
|
||||
To answer the question, _what is Prometheus?_, we'll rip a page out of the
|
||||
Prometheus [docs](https://prometheus.io/docs/introduction/overview/).
|
||||
|
||||
> Prometheus is an open-source systems monitoring and alerting toolkit originally
|
||||
> built at SoundCloud. Since its inception in 2012, many companies and
|
||||
> organizations have adopted Prometheus, and the project has a very active
|
||||
> developer and user community. It is now a standalone open source project and
|
||||
> maintained independently of any company. To emphasize this, and to clarify the
|
||||
> project's governance structure, Prometheus joined the Cloud Native Computing
|
||||
> Foundation in 2016 as the second hosted project, after Kubernetes.
|
||||
>
|
||||
> Prometheus collects and stores its metrics as time series data, i.e. metrics
|
||||
> information is stored with the timestamp at which it was recorded, alongside
|
||||
> optional key-value pairs called labels.
|
||||
|
||||
let's decypher all this jargon down to plain English. In simple terms,
|
||||
Prometheus is a system that scrape metrics, from your services and applications,
|
||||
and stores those metrics, in a time series database, ready to serve back again
|
||||
when queried.
|
||||
|
||||
Prometheus also offers a way to create rules on those metrics to alert you when
|
||||
something goes wrong. Combined with [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/), you got yourself a full
|
||||
monitoring system.
|
||||
|
||||
|
||||
## Configuration {#configuration}
|
||||
|
||||
Now that we briefly touched on a _few_ features of **Prometheus** and before we
|
||||
can deploy, we need to write our configuration.
|
||||
|
||||
This is an example of a bare configuration.
|
||||
|
||||
<a id="code-snippet--prometheus-scraping-config"></a>
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: prometheus
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- prometheus:9090
|
||||
```
|
||||
|
||||
This will make Prometheus scrape itself every 30 seconds for metrics. At least
|
||||
you get _some_ metrics to query later. If you want the full experience, I would
|
||||
suggest you enable _Prometheus metrics_ for your services. Consult the docs of
|
||||
the project to see if and how it can expose metrics for _Prometheus_ to scrape,
|
||||
then add the scrape endpoint to your configuration as shown above.
|
||||
|
||||
Here's a an example of a couple more, _well known_, projects; [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/) and
|
||||
[_node exporter_](https://github.com/prometheus/node%5Fexporter).
|
||||
|
||||
<a id="code-snippet--prometheus-example-scraping-config"></a>
|
||||
```yaml
|
||||
- job_name: alertmanager
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
|
||||
- job_name: node-exporter
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- node-exporter:9100
|
||||
```
|
||||
|
||||
A wider [list of exporters](https://prometheus.io/docs/instrumenting/exporters/) can be found on the Prometheus docs.
|
||||
|
||||
|
||||
## Deployment {#deployment}
|
||||
|
||||
Now that we got ourselves a cofniguration, let's deploy **Prometheus**.
|
||||
|
||||
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
|
||||
using `docker-compose` in this example to make it easier to translate later to
|
||||
other types of deployments.
|
||||
|
||||
<div class="admonition note">
|
||||
<p class="admonition-title">Note</p>
|
||||
|
||||
I'm still running on `2.x` API version. I know I need to upgrade to a newer
|
||||
version but that's a bit of networking work. It's an ongoing work.
|
||||
|
||||
</div>
|
||||
|
||||
The `docker-compose` file should look like the following.
|
||||
|
||||
```yaml
|
||||
---
|
||||
version: '2.3'
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: quay.io/prometheus/prometheus:v2.27.0
|
||||
container_name: prometheus
|
||||
mem_limit: 400m
|
||||
mem_reservation: 300m
|
||||
restart: unless-stopped
|
||||
command:
|
||||
- --config.file=/etc/prometheus/prometheus.yml
|
||||
- --web.external-url=http://prometheus.localhost/
|
||||
volumes:
|
||||
- "./prometheus/:/etc/prometheus/:ro"
|
||||
ports:
|
||||
- "80:9090"
|
||||
```
|
||||
|
||||
A few things to **note**, especially for the new container crowd. The container
|
||||
image **version** is explicitly specified, do **not** use `latest` in production.
|
||||
|
||||
To make sure I don't overload my host, I set memory limits. I don't mind if it
|
||||
goes down, this is a PoC (Proof of Concept) for the time being. In your case,
|
||||
you might want to choose higher limits to give it more room to breath. When the
|
||||
memory limit is reached, the container will be killed with _Out Of Memory_
|
||||
error.
|
||||
|
||||
In the **command** section, I specify the _external url_ for Prometheus to
|
||||
redirect me correctly. This is what Prometheus thinks its own hostname is. I
|
||||
also specify the configuration file, previously written, which I mount as
|
||||
_read-only_ in the **volumes** section.
|
||||
|
||||
Finally, we need to port-forward `9090` to our hosts' `80` if possible to access
|
||||
**Prometheus**. Otherwise, figure out a way to route it properly. This is a local
|
||||
installation, which is suggested by the Prometheus _hostname_.
|
||||
|
||||
If you made it so far, you should be able to run this with no issues.
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
|
||||
## Prometheus Rules {#prometheus-rules}
|
||||
|
||||
**Prometheus** supports **two** types of rules; recording and alerting. Let's expand
|
||||
a little bit on those two concepts.
|
||||
|
||||
|
||||
### Recording Rules {#recording-rules}
|
||||
|
||||
First, let's start off with [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording%5Frules/). I don't think I can explain it
|
||||
better than the **Prometheus** documentation which says.
|
||||
|
||||
> Recording rules allow you to precompute frequently needed or computationally
|
||||
> expensive expressions and save their result as a new set of time series.
|
||||
> Querying the precomputed result will then often be much faster than executing
|
||||
> the original expression every time it is needed. This is especially useful for
|
||||
> dashboards, which need to query the same expression repeatedly every time they
|
||||
> refresh.
|
||||
|
||||
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
|
||||
create recording rules yet for my setup so I'll forgo this step.
|
||||
|
||||
|
||||
### Alerting Rules {#alerting-rules}
|
||||
|
||||
As the name suggests, [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting%5Frules/#alerting-rules) allow you to define conditional expressions
|
||||
based on metrics which will trigger notifications to alert you.
|
||||
|
||||
This is a very simple example of an _alert rule_ that monitors all the endpoints
|
||||
scraped by _Prometheus_ to see if any of them is down. If this expression return
|
||||
a result, an alert will fire from _Prometheus_.
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: Instance down
|
||||
rules:
|
||||
- alert: InstanceDown
|
||||
expr: up == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: page
|
||||
annotations:
|
||||
summary: "Instance {{ $labels.instance }} down"
|
||||
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
|
||||
```
|
||||
|
||||
To be able to add this alert to **Prometheus**, we need to save it in a
|
||||
`rules.yml` file and then include it in the **Prometheus** configuration as follows.
|
||||
|
||||
<a id="code-snippet--prometheus-rule-files-config"></a>
|
||||
```yaml
|
||||
rule_files:
|
||||
- "rules.yml"
|
||||
```
|
||||
|
||||
Making the configuration intiretly as follows.
|
||||
|
||||
```yaml
|
||||
rule_files:
|
||||
- "rules.yml"
|
||||
|
||||
scrape_configs:
|
||||
- job_name: prometheus
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- prometheus:9090
|
||||
|
||||
- job_name: alertmanager
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
|
||||
- job_name: node-exporter
|
||||
scrape_interval: 30s
|
||||
static_configs:
|
||||
- targets:
|
||||
- node-exporter:9100
|
||||
```
|
||||
|
||||
At this point, make sure everything is mounted into the container properly and
|
||||
rerun your **Prometheus**.
|
||||
|
||||
|
||||
## Prometheus UI {#prometheus-ui}
|
||||
|
||||
Congratulations if you've made it so far. If you visit <http://localhost/> at
|
||||
stage you should get to Prometheus where you can query your metrics.
|
||||
|
||||
{{< figure src="/ox-hugo/01-prometheus-overview.png" caption="Figure 1: Prometheus overview" target="_blank" link="/ox-hugo/01-prometheus-overview.png" >}}
|
||||
|
||||
You can get all sorts of information under the _status_ drop-down menu.
|
||||
|
||||
{{< figure src="/ox-hugo/02-prometheus-status-drop-down-menu.png" caption="Figure 2: Prometheus Status drop-down menu" target="_blank" link="/ox-hugo/02-prometheus-status-drop-down-menu.png" >}}
|
||||
|
||||
|
||||
## Conclusion {#conclusion}
|
||||
|
||||
As you can see, deploying **Prometheus** is not too hard. If you're running
|
||||
_Kubernetes_, make sure you use the operator. It will make your life a lot
|
||||
easier in all sorts of things.
|
||||
|
||||
Take your time to familiarise yourself with **Prometheus** and consult the
|
||||
documentation as much as possible. It is well written and in most cases your
|
||||
best friend. Figure out different ways to create rules for recording and
|
||||
alerting. Most people at this stage deploy **Grafana** to start visualizing their
|
||||
metrics. Well... Not in this blog post we ain't !
|
||||
|
||||
I hope you enjoy playing around with **Prometheus** and until the next post.
|
BIN
static/ox-hugo/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
BIN
static/ox-hugo/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/ox-hugo/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
BIN
static/ox-hugo/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
Binary file not shown.
Loading…
Reference in a new issue