chore(): New blog post talking about running Prometheus

2021-09-17 20:45:34 +02:00 · 2021-09-17 20:45:34 +02:00 · 68761158b6
commit 68761158b6
parent 648843bca2
6 changed files with 516 additions and 0 deletions
--- a/content-org/blog.org
+++ b/content-org/blog.org
@ -5041,6 +5041,246 @@ If you want to go that far, maybe you should invest in a monitoring system with
 **** Conclusion
 Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful.
 With a little of scripting, couple of commands and the power of cron we were able to make /healthchecks/ monitor our websites.
 *** TODO Upgrade your monitoring setup with Prometheus :prometheus:metrics:container:
 :PROPERTIES:
 :EXPORT_HUGO_LASTMOD: 2021-09-17
 :EXPORT_DATE: 2021-09-17
 :EXPORT_FILE_NAME: upgrade-your-monitoring-setup-with-prometheus
 :CUSTOM_ID: upgrade-your-monitoring-setup-with-prometheus
 :END:
 After running simple monitoring for quite a while, I decided to upgrade my
 setup. It is about time to get some real metric gathering to see what's going
 on. It's also time to get some proper monitoring setup.
 There are a lot of options in this field and I should, probably, write a blog
 post on my views on the topic. For this experiment, on the other hand, the
 solution is already pre-chosen. We'll be running Prometheus.
 #+hugo: more
 **** Prometheus
 To answer the question, /what is Prometheus?/, we'll rip a page out of the
 Prometheus [[https://prometheus.io/docs/introduction/overview/][docs]].
 #+begin_quote
 Prometheus is an open-source systems monitoring and alerting toolkit originally
 built at SoundCloud. Since its inception in 2012, many companies and
 organizations have adopted Prometheus, and the project has a very active
 developer and user community. It is now a standalone open source project and
 maintained independently of any company. To emphasize this, and to clarify the
 project's governance structure, Prometheus joined the Cloud Native Computing
 Foundation in 2016 as the second hosted project, after Kubernetes.
 Prometheus collects and stores its metrics as time series data, i.e. metrics
 information is stored with the timestamp at which it was recorded, alongside
 optional key-value pairs called labels.
 #+end_quote
 let's decypher all this jargon down to plain English. In simple terms,
 Prometheus is a system that scrape metrics, from your services and applications,
 and stores those metrics, in a time series database, ready to serve back again
 when queried.
 Prometheus also offers a way to create rules on those metrics to alert you when
 something goes wrong. Combined with [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]], you got yourself a full
 monitoring system.
 **** Configuration
 Now that we briefly touched on a /few/ features of *Prometheus* and before we
 can deploy, we need to write our configuration.
 This is an example of a bare configuration.
 #+NAME: prometheus-scraping-config
 #+begin_src yaml
 scrape_configs:
 - job_name: prometheus
  scrape_interval: 30s
  static_configs:
  - targets:
    - prometheus:9090
 #+end_src
 This will make Prometheus scrape itself every 30 seconds for metrics. At least
 you get /some/ metrics to query later. If you want the full experience, I would
 suggest you enable /Prometheus metrics/ for your services. Consult the docs of
 the project to see if and how it can expose metrics for /Prometheus/ to scrape,
 then add the scrape endpoint to your configuration as shown above.
 Here's a an example of a couple more, /well known/, projects; [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]] and
 [[https://github.com/prometheus/node_exporter][/node exporter/]].
 #+NAME: prometheus-example-scraping-config
 #+begin_src yaml
 - job_name: alertmanager
  scrape_interval: 30s
  static_configs:
  - targets:
    - alertmanager:9093
 - job_name: node-exporter
  scrape_interval: 30s
  static_configs:
  - targets:
    - node-exporter:9100
 #+end_src
 A wider [[https://prometheus.io/docs/instrumenting/exporters/][list of exporters]] can be found on the Prometheus docs.
 **** Deployment
 Now that we got ourselves a cofniguration, let's deploy *Prometheus*.
 Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
 using =docker-compose= in this example to make it easier to translate later to
 other types of deployments.
 #+BEGIN_EXPORT html
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 #+END_EXPORT
 I'm still running on =2.x= API version. I know I need to upgrade to a newer
 version but that's a bit of networking work. It's an ongoing work.
 #+BEGIN_EXPORT html
 </div>
 #+END_EXPORT
 The =docker-compose= file should look like the following.
 #+begin_src yaml
 ---
 version: '2.3'
 services:
  prometheus:
    image: quay.io/prometheus/prometheus:v2.27.0
    container_name: prometheus
    mem_limit: 400m
    mem_reservation: 300m
    restart: unless-stopped
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --web.external-url=http://prometheus.localhost/
    volumes:
      - "./prometheus/:/etc/prometheus/:ro"
    ports:
      - "80:9090"
 #+end_src
 A few things to *note*, especially for the new container crowd. The container
 image *version* is explicitly specified, do *not* use =latest= in production.
 To make sure I don't overload my host, I set memory limits. I don't mind if it
 goes down, this is a PoC (Proof of Concept) for the time being. In your case,
 you might want to choose higher limits to give it more room to breath. When the
 memory limit is reached, the container will be killed with /Out Of Memory/
 error.
 In the *command* section, I specify the /external url/ for Prometheus to
 redirect me correctly. This is what Prometheus thinks its own hostname is. I
 also specify the configuration file, previously written, which I mount as
 /read-only/ in the *volumes* section.
 Finally, we need to port-forward =9090= to our hosts' =80= if possible to access
 *Prometheus*. Otherwise, figure out a way to route it properly. This is a local
 installation, which is suggested by the Prometheus /hostname/.
 If you made it so far, you should be able to run this with no issues.
 #+begin_src bash
 docker-compose up -d
 #+end_src
 **** Prometheus Rules
 *Prometheus* supports *two* types of rules; recording and alerting. Let's expand
 a little bit on those two concepts.
 ***** Recording Rules
 First, let's start off with [[https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/][recording rules]]. I don't think I can explain it
 better than the *Prometheus* documentation which says.
 #+begin_quote
 Recording rules allow you to precompute frequently needed or computationally
 expensive expressions and save their result as a new set of time series.
 Querying the precomputed result will then often be much faster than executing
 the original expression every time it is needed. This is especially useful for
 dashboards, which need to query the same expression repeatedly every time they
 refresh.
 #+end_quote
 Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
 create recording rules yet for my setup so I'll forgo this step.
 ***** Alerting Rules
 As the name suggests, [[https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules][alerting rules]] allow you to define conditional expressions
 based on metrics which will trigger notifications to alert you.
 This is a very simple example of an /alert rule/ that monitors all the endpoints
 scraped by /Prometheus/ to see if any of them is down. If this expression return
 a result, an alert will fire from /Prometheus/.
 #+begin_src yaml
 groups:
 - name: Instance down
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
 #+end_src
 To be able to add this alert to *Prometheus*, we need to save it in a
 =rules.yml= file and then include it in the *Prometheus* configuration as follows.
 #+NAME: prometheus-rule-files-config
 #+begin_src yaml
 rule_files:
 - "rules.yml"
 #+end_src
 Making the configuration intiretly as follows.
 #+begin_src yaml :noweb yes
 <<prometheus-rule-files-config>>
 <<prometheus-scraping-config>>
 <<prometheus-example-scraping-config>>
 #+end_src
 At this point, make sure everything is mounted into the container properly and
 rerun your *Prometheus*.
 **** Prometheus UI
 Congratulations if you've made it so far. If you visit http://localhost/ at
 stage you should get to Prometheus where you can query your metrics.
 #+caption: Prometheus overview
 #+attr_html: :target _blank
 [[file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png][file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png]]
 You can get all sorts of information under the /status/ drop-down menu.
 #+caption: Prometheus Status drop-down menu
 #+attr_html: :target _blank
 [[file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png][file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png]]
 **** Conclusion
 As you can see, deploying *Prometheus* is not too hard. If you're running
 /Kubernetes/, make sure you use the operator. It will make your life a lot
 easier in all sorts of things.
 Take your time to familiarise yourself with *Prometheus* and consult the
 documentation as much as possible. It is well written and in most cases your
 best friend. Figure out different ways to create rules for recording and
 alerting. Most people at this stage deploy *Grafana* to start visualizing their
 metrics. Well... Not in this blog post we ain't !
 I hope you enjoy playing around with *Prometheus* and until the next post.
 ** Nikola :@nikola:
 *** DONE Welcome back to the old world :blog:org_mode:emacs:rst:
 :PROPERTIES:
--- a/content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png
+++ b/content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png
--- a/content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png
+++ b/content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png
--- a/content/posts/upgrade-your-monitoring-setup-with-prometheus.md
+++ b/content/posts/upgrade-your-monitoring-setup-with-prometheus.md
@ -0,0 +1,264 @@
 +++
 title = "Upgrade your monitoring setup with Prometheus"
 author = ["Elia el Lazkani"]
 date = 2021-09-17
 lastmod = 2021-09-17
 tags = ["prometheus", "metrics", "container"]
 categories = ["monitoring"]
 draft = true
 +++
 After running simple monitoring for quite a while, I decided to upgrade my
 setup. It is about time to get some real metric gathering to see what's going
 on. It's also time to get some proper monitoring setup.
 There are a lot of options in this field and I should, probably, write a blog
 post on my views on the topic. For this experiment, on the other hand, the
 solution is already pre-chosen. We'll be running Prometheus.
 <!--more-->
 ## Prometheus {#prometheus}
 To answer the question, _what is Prometheus?_, we'll rip a page out of the
 Prometheus [docs](https://prometheus.io/docs/introduction/overview/).
 > Prometheus is an open-source systems monitoring and alerting toolkit originally
 > built at SoundCloud. Since its inception in 2012, many companies and
 > organizations have adopted Prometheus, and the project has a very active
 > developer and user community. It is now a standalone open source project and
 > maintained independently of any company. To emphasize this, and to clarify the
 > project's governance structure, Prometheus joined the Cloud Native Computing
 > Foundation in 2016 as the second hosted project, after Kubernetes.
 >
 > Prometheus collects and stores its metrics as time series data, i.e. metrics
 > information is stored with the timestamp at which it was recorded, alongside
 > optional key-value pairs called labels.
 let's decypher all this jargon down to plain English. In simple terms,
 Prometheus is a system that scrape metrics, from your services and applications,
 and stores those metrics, in a time series database, ready to serve back again
 when queried.
 Prometheus also offers a way to create rules on those metrics to alert you when
 something goes wrong. Combined with [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/), you got yourself a full
 monitoring system.
 ## Configuration {#configuration}
 Now that we briefly touched on a _few_ features of **Prometheus** and before we
 can deploy, we need to write our configuration.
 This is an example of a bare configuration.
 <a id="code-snippet--prometheus-scraping-config"></a>
 ```yaml
 scrape_configs:
 - job_name: prometheus
  scrape_interval: 30s
  static_configs:
  - targets:
    - prometheus:9090
 ```
 This will make Prometheus scrape itself every 30 seconds for metrics. At least
 you get _some_ metrics to query later. If you want the full experience, I would
 suggest you enable _Prometheus metrics_ for your services. Consult the docs of
 the project to see if and how it can expose metrics for _Prometheus_ to scrape,
 then add the scrape endpoint to your configuration as shown above.
 Here's a an example of a couple more, _well known_, projects; [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/) and
 [_node exporter_](https://github.com/prometheus/node%5Fexporter).
 <a id="code-snippet--prometheus-example-scraping-config"></a>
 ```yaml
 - job_name: alertmanager
  scrape_interval: 30s
  static_configs:
  - targets:
    - alertmanager:9093
 - job_name: node-exporter
  scrape_interval: 30s
  static_configs:
  - targets:
    - node-exporter:9100
 ```
 A wider [list of exporters](https://prometheus.io/docs/instrumenting/exporters/) can be found on the Prometheus docs.
 ## Deployment {#deployment}
 Now that we got ourselves a cofniguration, let's deploy **Prometheus**.
 Luckily for us, Prometheus comes containerized and ready to deploy. We'll be
 using `docker-compose` in this example to make it easier to translate later to
 other types of deployments.
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 I'm still running on `2.x` API version. I know I need to upgrade to a newer
 version but that's a bit of networking work. It's an ongoing work.
 </div>
 The `docker-compose` file should look like the following.
 ```yaml
 ---
 version: '2.3'
 services:
  prometheus:
    image: quay.io/prometheus/prometheus:v2.27.0
    container_name: prometheus
    mem_limit: 400m
    mem_reservation: 300m
    restart: unless-stopped
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --web.external-url=http://prometheus.localhost/
    volumes:
      - "./prometheus/:/etc/prometheus/:ro"
    ports:
      - "80:9090"
 ```
 A few things to **note**, especially for the new container crowd. The container
 image **version** is explicitly specified, do **not** use `latest` in production.
 To make sure I don't overload my host, I set memory limits. I don't mind if it
 goes down, this is a PoC (Proof of Concept) for the time being. In your case,
 you might want to choose higher limits to give it more room to breath. When the
 memory limit is reached, the container will be killed with _Out Of Memory_
 error.
 In the **command** section, I specify the _external url_ for Prometheus to
 redirect me correctly. This is what Prometheus thinks its own hostname is. I
 also specify the configuration file, previously written, which I mount as
 _read-only_ in the **volumes** section.
 Finally, we need to port-forward `9090` to our hosts' `80` if possible to access
 **Prometheus**. Otherwise, figure out a way to route it properly. This is a local
 installation, which is suggested by the Prometheus _hostname_.
 If you made it so far, you should be able to run this with no issues.
 ```bash
 docker-compose up -d
 ```
 ## Prometheus Rules {#prometheus-rules}
 **Prometheus** supports **two** types of rules; recording and alerting. Let's expand
 a little bit on those two concepts.
 ### Recording Rules {#recording-rules}
 First, let's start off with [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording%5Frules/). I don't think I can explain it
 better than the **Prometheus** documentation which says.
 > Recording rules allow you to precompute frequently needed or computationally
 > expensive expressions and save their result as a new set of time series.
 > Querying the precomputed result will then often be much faster than executing
 > the original expression every time it is needed. This is especially useful for
 > dashboards, which need to query the same expression repeatedly every time they
 > refresh.
 Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to
 create recording rules yet for my setup so I'll forgo this step.
 ### Alerting Rules {#alerting-rules}
 As the name suggests, [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting%5Frules/#alerting-rules) allow you to define conditional expressions
 based on metrics which will trigger notifications to alert you.
 This is a very simple example of an _alert rule_ that monitors all the endpoints
 scraped by _Prometheus_ to see if any of them is down. If this expression return
 a result, an alert will fire from _Prometheus_.
 ```yaml
 groups:
 - name: Instance down
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
 ```
 To be able to add this alert to **Prometheus**, we need to save it in a
 `rules.yml` file and then include it in the **Prometheus** configuration as follows.
 <a id="code-snippet--prometheus-rule-files-config"></a>
 ```yaml
 rule_files:
 - "rules.yml"
 ```
 Making the configuration intiretly as follows.
 ```yaml
 rule_files:
 - "rules.yml"
 scrape_configs:
 - job_name: prometheus
  scrape_interval: 30s
  static_configs:
  - targets:
    - prometheus:9090
 - job_name: alertmanager
  scrape_interval: 30s
  static_configs:
  - targets:
    - alertmanager:9093
 - job_name: node-exporter
  scrape_interval: 30s
  static_configs:
  - targets:
    - node-exporter:9100
 ```
 At this point, make sure everything is mounted into the container properly and
 rerun your **Prometheus**.
 ## Prometheus UI {#prometheus-ui}
 Congratulations if you've made it so far. If you visit <http://localhost/> at
 stage you should get to Prometheus where you can query your metrics.
 {{< figure src="/ox-hugo/01-prometheus-overview.png" caption="Figure 1: Prometheus overview" target="_blank" link="/ox-hugo/01-prometheus-overview.png" >}}
 You can get all sorts of information under the _status_ drop-down menu.
 {{< figure src="/ox-hugo/02-prometheus-status-drop-down-menu.png" caption="Figure 2: Prometheus Status drop-down menu" target="_blank" link="/ox-hugo/02-prometheus-status-drop-down-menu.png" >}}
 ## Conclusion {#conclusion}
 As you can see, deploying **Prometheus** is not too hard. If you're running
 _Kubernetes_, make sure you use the operator. It will make your life a lot
 easier in all sorts of things.
 Take your time to familiarise yourself with **Prometheus** and consult the
 documentation as much as possible. It is well written and in most cases your
 best friend. Figure out different ways to create rules for recording and
 alerting. Most people at this stage deploy **Grafana** to start visualizing their
 metrics. Well... Not in this blog post we ain't !
 I hope you enjoy playing around with **Prometheus** and until the next post.
--- a/static/ox-hugo/01-prometheus-overview.png
+++ b/static/ox-hugo/01-prometheus-overview.png
--- a/static/ox-hugo/02-prometheus-status-drop-down-menu.png
+++ b/static/ox-hugo/02-prometheus-status-drop-down-menu.png