chore(): New blog post talking about running Prometheus
This commit is contained in:
parent
648843bca2
commit
68761158b6
6 changed files with 516 additions and 0 deletions
|
@ -5041,6 +5041,246 @@ If you want to go that far, maybe you should invest in a monitoring system with
|
||||||
**** Conclusion
|
**** Conclusion
|
||||||
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
|
Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
|
||||||
With a little of scripting, couple of commands and the power of cron we were able to make /healthchecks/ monitor our websites.
|
With a little of scripting, couple of commands and the power of cron we were able to make /healthchecks/ monitor our websites.
|
||||||
|
*** TODO Upgrade your monitoring setup with Prometheus :prometheus:metrics:container:
|
||||||
|
:PROPERTIES:
|
||||||
|
:EXPORT_HUGO_LASTMOD: 2021-09-17
|
||||||
|
:EXPORT_DATE: 2021-09-17
|
||||||
|
:EXPORT_FILE_NAME: upgrade-your-monitoring-setup-with-prometheus
|
||||||
|
:CUSTOM_ID: upgrade-your-monitoring-setup-with-prometheus
|
||||||
|
:END:
|
||||||
|
|
||||||
|
After running simple monitoring for quite a while, I decided to upgrade my
|
||||||
|
setup. It is about time to get some real metric gathering to see what's going
|
||||||
|
on. It's also time to get some proper monitoring setup.
|
||||||
|
|
||||||
|
There are a lot of options in this field and I should, probably, write a blog
|
||||||
|
post on my views on the topic. For this experiment, on the other hand, the
|
||||||
|
solution is already pre-chosen. We'll be running Prometheus.
|
||||||
|
|
||||||
|
#+hugo: more
|
||||||
|
|
||||||
|
**** Prometheus
|
||||||
|
To answer the question, /what is Prometheus?/, we'll rip a page out of the
|
||||||
|
Prometheus [[https://prometheus.io/docs/introduction/overview/][docs]].
|
||||||
|
|
||||||
|
#+begin_quote
|
||||||
|
Prometheus is an open-source systems monitoring and alerting toolkit originally
|
||||||
|
built at SoundCloud. Since its inception in 2012, many companies and
|
||||||
|
organizations have adopted Prometheus, and the project has a very active
|
||||||
|
developer and user community. It is now a standalone open source project and
|
||||||
|
maintained independently of any company. To emphasize this, and to clarify the
|
||||||
|
project's governance structure, Prometheus joined the Cloud Native Computing
|
||||||
|
Foundation in 2016 as the second hosted project, after Kubernetes.
|
||||||
|
|
||||||
|
Prometheus collects and stores its metrics as time series data, i.e. metrics
|
||||||
|
information is stored with the timestamp at which it was recorded, alongside
|
||||||
|
optional key-value pairs called labels.
|
||||||
|
#+end_quote
|
||||||
|
|
||||||
|
let's decypher all this jargon down to plain English. In simple terms,
|
||||||
|
Prometheus is a system that scrape metrics, from your services and applications,
|
||||||
|
and stores those metrics, in a time series database, ready to serve back again
|
||||||
|
when queried.
|
||||||
|
|
||||||
|
Prometheus also offers a way to create rules on those metrics to alert you when
|
||||||
|
something goes wrong. Combined with [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]], you got yourself a full
|
||||||
|
monitoring system.
|
||||||
|
|
||||||
|
**** Configuration
|
||||||
|
Now that we briefly touched on a /few/ features of *Prometheus* and before we
|
||||||
|
can deploy, we need to write our configuration.
|
||||||
|
|
||||||
|
This is an example of a bare configuration.
|
||||||
|
|
||||||
|
#+NAME: prometheus-scraping-config
|
||||||
|
#+begin_src yaml
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: prometheus
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- prometheus:9090
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
This will make Prometheus scrape itself every 30 seconds for metrics. At least
|
||||||
|
you get /some/ metrics to query later. If you want the full experience, I would
|
||||||
|
suggest you enable /Prometheus metrics/ for your services. Consult the docs of
|
||||||
|
the project to see if and how it can expose metrics for /Prometheus/ to scrape,
|
||||||
|
then add the scrape endpoint to your configuration as shown above.
|
||||||
|
|
||||||
|
Here's a an example of a couple more, /well known/, projects; [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]] and
|
||||||
|
[[https://github.com/prometheus/node_exporter][/node exporter/]].
|
||||||
|
|
||||||
|
#+NAME: prometheus-example-scraping-config
|
||||||
|
#+begin_src yaml
|
||||||
|
- job_name: alertmanager
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- alertmanager:9093
|
||||||
|
|
||||||
|
- job_name: node-exporter
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- node-exporter:9100
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
A wider [[https://prometheus.io/docs/instrumenting/exporters/][list of exporters]] can be found on the Prometheus docs.
|
||||||
|
|
||||||
|
**** Deployment
|
||||||
|
Now that we got ourselves a cofniguration, let's deploy *Prometheus*.
|
||||||
|
|
||||||
|
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
|
||||||
|
using =docker-compose= in this example to make it easier to translate later to
|
||||||
|
other types of deployments.
|
||||||
|
|
||||||
|
#+BEGIN_EXPORT html
|
||||||
|
<div class="admonition note">
|
||||||
|
<p class="admonition-title">Note</p>
|
||||||
|
#+END_EXPORT
|
||||||
|
I'm still running on =2.x= API version. I know I need to upgrade to a newer
|
||||||
|
version but that's a bit of networking work. It's an ongoing work.
|
||||||
|
#+BEGIN_EXPORT html
|
||||||
|
</div>
|
||||||
|
#+END_EXPORT
|
||||||
|
|
||||||
|
The =docker-compose= file should look like the following.
|
||||||
|
|
||||||
|
#+begin_src yaml
|
||||||
|
---
|
||||||
|
version: '2.3'
|
||||||
|
|
||||||
|
services:
|
||||||
|
prometheus:
|
||||||
|
image: quay.io/prometheus/prometheus:v2.27.0
|
||||||
|
container_name: prometheus
|
||||||
|
mem_limit: 400m
|
||||||
|
mem_reservation: 300m
|
||||||
|
restart: unless-stopped
|
||||||
|
command:
|
||||||
|
- --config.file=/etc/prometheus/prometheus.yml
|
||||||
|
- --web.external-url=http://prometheus.localhost/
|
||||||
|
volumes:
|
||||||
|
- "./prometheus/:/etc/prometheus/:ro"
|
||||||
|
ports:
|
||||||
|
- "80:9090"
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
A few things to *note*, especially for the new container crowd. The container
|
||||||
|
image *version* is explicitly specified, do *not* use =latest= in production.
|
||||||
|
|
||||||
|
To make sure I don't overload my host, I set memory limits. I don't mind if it
|
||||||
|
goes down, this is a PoC (Proof of Concept) for the time being. In your case,
|
||||||
|
you might want to choose higher limits to give it more room to breath. When the
|
||||||
|
memory limit is reached, the container will be killed with /Out Of Memory/
|
||||||
|
error.
|
||||||
|
|
||||||
|
In the *command* section, I specify the /external url/ for Prometheus to
|
||||||
|
redirect me correctly. This is what Prometheus thinks its own hostname is. I
|
||||||
|
also specify the configuration file, previously written, which I mount as
|
||||||
|
/read-only/ in the *volumes* section.
|
||||||
|
|
||||||
|
Finally, we need to port-forward =9090= to our hosts' =80= if possible to access
|
||||||
|
*Prometheus*. Otherwise, figure out a way to route it properly. This is a local
|
||||||
|
installation, which is suggested by the Prometheus /hostname/.
|
||||||
|
|
||||||
|
If you made it so far, you should be able to run this with no issues.
|
||||||
|
|
||||||
|
#+begin_src bash
|
||||||
|
docker-compose up -d
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
**** Prometheus Rules
|
||||||
|
*Prometheus* supports *two* types of rules; recording and alerting. Let's expand
|
||||||
|
a little bit on those two concepts.
|
||||||
|
|
||||||
|
***** Recording Rules
|
||||||
|
First, let's start off with [[https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/][recording rules]]. I don't think I can explain it
|
||||||
|
better than the *Prometheus* documentation which says.
|
||||||
|
|
||||||
|
#+begin_quote
|
||||||
|
Recording rules allow you to precompute frequently needed or computationally
|
||||||
|
expensive expressions and save their result as a new set of time series.
|
||||||
|
Querying the precomputed result will then often be much faster than executing
|
||||||
|
the original expression every time it is needed. This is especially useful for
|
||||||
|
dashboards, which need to query the same expression repeatedly every time they
|
||||||
|
refresh.
|
||||||
|
#+end_quote
|
||||||
|
|
||||||
|
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
|
||||||
|
create recording rules yet for my setup so I'll forgo this step.
|
||||||
|
|
||||||
|
***** Alerting Rules
|
||||||
|
As the name suggests, [[https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules][alerting rules]] allow you to define conditional expressions
|
||||||
|
based on metrics which will trigger notifications to alert you.
|
||||||
|
|
||||||
|
This is a very simple example of an /alert rule/ that monitors all the endpoints
|
||||||
|
scraped by /Prometheus/ to see if any of them is down. If this expression return
|
||||||
|
a result, an alert will fire from /Prometheus/.
|
||||||
|
|
||||||
|
#+begin_src yaml
|
||||||
|
groups:
|
||||||
|
- name: Instance down
|
||||||
|
rules:
|
||||||
|
- alert: InstanceDown
|
||||||
|
expr: up == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: page
|
||||||
|
annotations:
|
||||||
|
summary: "Instance {{ $labels.instance }} down"
|
||||||
|
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
To be able to add this alert to *Prometheus*, we need to save it in a
|
||||||
|
=rules.yml= file and then include it in the *Prometheus* configuration as follows.
|
||||||
|
|
||||||
|
#+NAME: prometheus-rule-files-config
|
||||||
|
#+begin_src yaml
|
||||||
|
rule_files:
|
||||||
|
- "rules.yml"
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
Making the configuration intiretly as follows.
|
||||||
|
|
||||||
|
#+begin_src yaml :noweb yes
|
||||||
|
<<prometheus-rule-files-config>>
|
||||||
|
|
||||||
|
<<prometheus-scraping-config>>
|
||||||
|
|
||||||
|
<<prometheus-example-scraping-config>>
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
At this point, make sure everything is mounted into the container properly and
|
||||||
|
rerun your *Prometheus*.
|
||||||
|
|
||||||
|
**** Prometheus UI
|
||||||
|
Congratulations if you've made it so far. If you visit http://localhost/ at
|
||||||
|
stage you should get to Prometheus where you can query your metrics.
|
||||||
|
|
||||||
|
#+caption: Prometheus overview
|
||||||
|
#+attr_html: :target _blank
|
||||||
|
[[file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png][file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png]]
|
||||||
|
|
||||||
|
You can get all sorts of information under the /status/ drop-down menu.
|
||||||
|
|
||||||
|
#+caption: Prometheus Status drop-down menu
|
||||||
|
#+attr_html: :target _blank
|
||||||
|
[[file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png][file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png]]
|
||||||
|
|
||||||
|
**** Conclusion
|
||||||
|
As you can see, deploying *Prometheus* is not too hard. If you're running
|
||||||
|
/Kubernetes/, make sure you use the operator. It will make your life a lot
|
||||||
|
easier in all sorts of things.
|
||||||
|
|
||||||
|
Take your time to familiarise yourself with *Prometheus* and consult the
|
||||||
|
documentation as much as possible. It is well written and in most cases your
|
||||||
|
best friend. Figure out different ways to create rules for recording and
|
||||||
|
alerting. Most people at this stage deploy *Grafana* to start visualizing their
|
||||||
|
metrics. Well... Not in this blog post we ain't !
|
||||||
|
|
||||||
|
I hope you enjoy playing around with *Prometheus* and until the next post.
|
||||||
** Nikola :@nikola:
|
** Nikola :@nikola:
|
||||||
*** DONE Welcome back to the old world :blog:org_mode:emacs:rst:
|
*** DONE Welcome back to the old world :blog:org_mode:emacs:rst:
|
||||||
:PROPERTIES:
|
:PROPERTIES:
|
||||||
|
|
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
Binary file not shown.
264
content/posts/upgrade-your-monitoring-setup-with-prometheus.md
Normal file
264
content/posts/upgrade-your-monitoring-setup-with-prometheus.md
Normal file
|
@ -0,0 +1,264 @@
|
||||||
|
+++
|
||||||
|
title = "Upgrade your monitoring setup with Prometheus"
|
||||||
|
author = ["Elia el Lazkani"]
|
||||||
|
date = 2021-09-17
|
||||||
|
lastmod = 2021-09-17
|
||||||
|
tags = ["prometheus", "metrics", "container"]
|
||||||
|
categories = ["monitoring"]
|
||||||
|
draft = true
|
||||||
|
+++
|
||||||
|
|
||||||
|
After running simple monitoring for quite a while, I decided to upgrade my
|
||||||
|
setup. It is about time to get some real metric gathering to see what's going
|
||||||
|
on. It's also time to get some proper monitoring setup.
|
||||||
|
|
||||||
|
There are a lot of options in this field and I should, probably, write a blog
|
||||||
|
post on my views on the topic. For this experiment, on the other hand, the
|
||||||
|
solution is already pre-chosen. We'll be running Prometheus.
|
||||||
|
|
||||||
|
<!--more-->
|
||||||
|
|
||||||
|
|
||||||
|
## Prometheus {#prometheus}
|
||||||
|
|
||||||
|
To answer the question, _what is Prometheus?_, we'll rip a page out of the
|
||||||
|
Prometheus [docs](https://prometheus.io/docs/introduction/overview/).
|
||||||
|
|
||||||
|
> Prometheus is an open-source systems monitoring and alerting toolkit originally
|
||||||
|
> built at SoundCloud. Since its inception in 2012, many companies and
|
||||||
|
> organizations have adopted Prometheus, and the project has a very active
|
||||||
|
> developer and user community. It is now a standalone open source project and
|
||||||
|
> maintained independently of any company. To emphasize this, and to clarify the
|
||||||
|
> project's governance structure, Prometheus joined the Cloud Native Computing
|
||||||
|
> Foundation in 2016 as the second hosted project, after Kubernetes.
|
||||||
|
>
|
||||||
|
> Prometheus collects and stores its metrics as time series data, i.e. metrics
|
||||||
|
> information is stored with the timestamp at which it was recorded, alongside
|
||||||
|
> optional key-value pairs called labels.
|
||||||
|
|
||||||
|
let's decypher all this jargon down to plain English. In simple terms,
|
||||||
|
Prometheus is a system that scrape metrics, from your services and applications,
|
||||||
|
and stores those metrics, in a time series database, ready to serve back again
|
||||||
|
when queried.
|
||||||
|
|
||||||
|
Prometheus also offers a way to create rules on those metrics to alert you when
|
||||||
|
something goes wrong. Combined with [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/), you got yourself a full
|
||||||
|
monitoring system.
|
||||||
|
|
||||||
|
|
||||||
|
## Configuration {#configuration}
|
||||||
|
|
||||||
|
Now that we briefly touched on a _few_ features of **Prometheus** and before we
|
||||||
|
can deploy, we need to write our configuration.
|
||||||
|
|
||||||
|
This is an example of a bare configuration.
|
||||||
|
|
||||||
|
<a id="code-snippet--prometheus-scraping-config"></a>
|
||||||
|
```yaml
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: prometheus
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- prometheus:9090
|
||||||
|
```
|
||||||
|
|
||||||
|
This will make Prometheus scrape itself every 30 seconds for metrics. At least
|
||||||
|
you get _some_ metrics to query later. If you want the full experience, I would
|
||||||
|
suggest you enable _Prometheus metrics_ for your services. Consult the docs of
|
||||||
|
the project to see if and how it can expose metrics for _Prometheus_ to scrape,
|
||||||
|
then add the scrape endpoint to your configuration as shown above.
|
||||||
|
|
||||||
|
Here's a an example of a couple more, _well known_, projects; [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/) and
|
||||||
|
[_node exporter_](https://github.com/prometheus/node%5Fexporter).
|
||||||
|
|
||||||
|
<a id="code-snippet--prometheus-example-scraping-config"></a>
|
||||||
|
```yaml
|
||||||
|
- job_name: alertmanager
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- alertmanager:9093
|
||||||
|
|
||||||
|
- job_name: node-exporter
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- node-exporter:9100
|
||||||
|
```
|
||||||
|
|
||||||
|
A wider [list of exporters](https://prometheus.io/docs/instrumenting/exporters/) can be found on the Prometheus docs.
|
||||||
|
|
||||||
|
|
||||||
|
## Deployment {#deployment}
|
||||||
|
|
||||||
|
Now that we got ourselves a cofniguration, let's deploy **Prometheus**.
|
||||||
|
|
||||||
|
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
|
||||||
|
using `docker-compose` in this example to make it easier to translate later to
|
||||||
|
other types of deployments.
|
||||||
|
|
||||||
|
<div class="admonition note">
|
||||||
|
<p class="admonition-title">Note</p>
|
||||||
|
|
||||||
|
I'm still running on `2.x` API version. I know I need to upgrade to a newer
|
||||||
|
version but that's a bit of networking work. It's an ongoing work.
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
The `docker-compose` file should look like the following.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
version: '2.3'
|
||||||
|
|
||||||
|
services:
|
||||||
|
prometheus:
|
||||||
|
image: quay.io/prometheus/prometheus:v2.27.0
|
||||||
|
container_name: prometheus
|
||||||
|
mem_limit: 400m
|
||||||
|
mem_reservation: 300m
|
||||||
|
restart: unless-stopped
|
||||||
|
command:
|
||||||
|
- --config.file=/etc/prometheus/prometheus.yml
|
||||||
|
- --web.external-url=http://prometheus.localhost/
|
||||||
|
volumes:
|
||||||
|
- "./prometheus/:/etc/prometheus/:ro"
|
||||||
|
ports:
|
||||||
|
- "80:9090"
|
||||||
|
```
|
||||||
|
|
||||||
|
A few things to **note**, especially for the new container crowd. The container
|
||||||
|
image **version** is explicitly specified, do **not** use `latest` in production.
|
||||||
|
|
||||||
|
To make sure I don't overload my host, I set memory limits. I don't mind if it
|
||||||
|
goes down, this is a PoC (Proof of Concept) for the time being. In your case,
|
||||||
|
you might want to choose higher limits to give it more room to breath. When the
|
||||||
|
memory limit is reached, the container will be killed with _Out Of Memory_
|
||||||
|
error.
|
||||||
|
|
||||||
|
In the **command** section, I specify the _external url_ for Prometheus to
|
||||||
|
redirect me correctly. This is what Prometheus thinks its own hostname is. I
|
||||||
|
also specify the configuration file, previously written, which I mount as
|
||||||
|
_read-only_ in the **volumes** section.
|
||||||
|
|
||||||
|
Finally, we need to port-forward `9090` to our hosts' `80` if possible to access
|
||||||
|
**Prometheus**. Otherwise, figure out a way to route it properly. This is a local
|
||||||
|
installation, which is suggested by the Prometheus _hostname_.
|
||||||
|
|
||||||
|
If you made it so far, you should be able to run this with no issues.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Prometheus Rules {#prometheus-rules}
|
||||||
|
|
||||||
|
**Prometheus** supports **two** types of rules; recording and alerting. Let's expand
|
||||||
|
a little bit on those two concepts.
|
||||||
|
|
||||||
|
|
||||||
|
### Recording Rules {#recording-rules}
|
||||||
|
|
||||||
|
First, let's start off with [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording%5Frules/). I don't think I can explain it
|
||||||
|
better than the **Prometheus** documentation which says.
|
||||||
|
|
||||||
|
> Recording rules allow you to precompute frequently needed or computationally
|
||||||
|
> expensive expressions and save their result as a new set of time series.
|
||||||
|
> Querying the precomputed result will then often be much faster than executing
|
||||||
|
> the original expression every time it is needed. This is especially useful for
|
||||||
|
> dashboards, which need to query the same expression repeatedly every time they
|
||||||
|
> refresh.
|
||||||
|
|
||||||
|
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
|
||||||
|
create recording rules yet for my setup so I'll forgo this step.
|
||||||
|
|
||||||
|
|
||||||
|
### Alerting Rules {#alerting-rules}
|
||||||
|
|
||||||
|
As the name suggests, [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting%5Frules/#alerting-rules) allow you to define conditional expressions
|
||||||
|
based on metrics which will trigger notifications to alert you.
|
||||||
|
|
||||||
|
This is a very simple example of an _alert rule_ that monitors all the endpoints
|
||||||
|
scraped by _Prometheus_ to see if any of them is down. If this expression return
|
||||||
|
a result, an alert will fire from _Prometheus_.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
groups:
|
||||||
|
- name: Instance down
|
||||||
|
rules:
|
||||||
|
- alert: InstanceDown
|
||||||
|
expr: up == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: page
|
||||||
|
annotations:
|
||||||
|
summary: "Instance {{ $labels.instance }} down"
|
||||||
|
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
|
||||||
|
```
|
||||||
|
|
||||||
|
To be able to add this alert to **Prometheus**, we need to save it in a
|
||||||
|
`rules.yml` file and then include it in the **Prometheus** configuration as follows.
|
||||||
|
|
||||||
|
<a id="code-snippet--prometheus-rule-files-config"></a>
|
||||||
|
```yaml
|
||||||
|
rule_files:
|
||||||
|
- "rules.yml"
|
||||||
|
```
|
||||||
|
|
||||||
|
Making the configuration intiretly as follows.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
rule_files:
|
||||||
|
- "rules.yml"
|
||||||
|
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: prometheus
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- prometheus:9090
|
||||||
|
|
||||||
|
- job_name: alertmanager
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- alertmanager:9093
|
||||||
|
|
||||||
|
- job_name: node-exporter
|
||||||
|
scrape_interval: 30s
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- node-exporter:9100
|
||||||
|
```
|
||||||
|
|
||||||
|
At this point, make sure everything is mounted into the container properly and
|
||||||
|
rerun your **Prometheus**.
|
||||||
|
|
||||||
|
|
||||||
|
## Prometheus UI {#prometheus-ui}
|
||||||
|
|
||||||
|
Congratulations if you've made it so far. If you visit <http://localhost/> at
|
||||||
|
stage you should get to Prometheus where you can query your metrics.
|
||||||
|
|
||||||
|
{{< figure src="/ox-hugo/01-prometheus-overview.png" caption="Figure 1: Prometheus overview" target="_blank" link="/ox-hugo/01-prometheus-overview.png" >}}
|
||||||
|
|
||||||
|
You can get all sorts of information under the _status_ drop-down menu.
|
||||||
|
|
||||||
|
{{< figure src="/ox-hugo/02-prometheus-status-drop-down-menu.png" caption="Figure 2: Prometheus Status drop-down menu" target="_blank" link="/ox-hugo/02-prometheus-status-drop-down-menu.png" >}}
|
||||||
|
|
||||||
|
|
||||||
|
## Conclusion {#conclusion}
|
||||||
|
|
||||||
|
As you can see, deploying **Prometheus** is not too hard. If you're running
|
||||||
|
_Kubernetes_, make sure you use the operator. It will make your life a lot
|
||||||
|
easier in all sorts of things.
|
||||||
|
|
||||||
|
Take your time to familiarise yourself with **Prometheus** and consult the
|
||||||
|
documentation as much as possible. It is well written and in most cases your
|
||||||
|
best friend. Figure out different ways to create rules for recording and
|
||||||
|
alerting. Most people at this stage deploy **Grafana** to start visualizing their
|
||||||
|
metrics. Well... Not in this blog post we ain't !
|
||||||
|
|
||||||
|
I hope you enjoy playing around with **Prometheus** and until the next post.
|
BIN
static/ox-hugo/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
BIN
static/ox-hugo/01-prometheus-overview.png
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
static/ox-hugo/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
BIN
static/ox-hugo/02-prometheus-status-drop-down-menu.png
(Stored with Git LFS)
Normal file
Binary file not shown.
Loading…
Reference in a new issue