From 68761158b6ef92c630ab2fe1b3319bd7ff999394 Mon Sep 17 00:00:00 2001 From: Elia el Lazkani Date: Fri, 17 Sep 2021 20:45:34 +0200 Subject: [PATCH] chore(): New blog post talking about running Prometheus --- content-org/blog.org | 240 ++++++++++++++++ .../01-prometheus-overview.png | 3 + .../02-prometheus-status-drop-down-menu.png | 3 + ...e-your-monitoring-setup-with-prometheus.md | 264 ++++++++++++++++++ static/ox-hugo/01-prometheus-overview.png | 3 + .../02-prometheus-status-drop-down-menu.png | 3 + 6 files changed, 516 insertions(+) create mode 100644 content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png create mode 100644 content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png create mode 100644 content/posts/upgrade-your-monitoring-setup-with-prometheus.md create mode 100644 static/ox-hugo/01-prometheus-overview.png create mode 100644 static/ox-hugo/02-prometheus-status-drop-down-menu.png diff --git a/content-org/blog.org b/content-org/blog.org index 2f14296..45f2c26 100644 --- a/content-org/blog.org +++ b/content-org/blog.org @@ -5041,6 +5041,246 @@ If you want to go that far, maybe you should invest in a monitoring system with **** Conclusion Don't judge something by its simplicity. Somethings, out of simple components tied together you can make something interesting and useful. With a little of scripting, couple of commands and the power of cron we were able to make /healthchecks/ monitor our websites. +*** TODO Upgrade your monitoring setup with Prometheus :prometheus:metrics:container: +:PROPERTIES: +:EXPORT_HUGO_LASTMOD: 2021-09-17 +:EXPORT_DATE: 2021-09-17 +:EXPORT_FILE_NAME: upgrade-your-monitoring-setup-with-prometheus +:CUSTOM_ID: upgrade-your-monitoring-setup-with-prometheus +:END: + +After running simple monitoring for quite a while, I decided to upgrade my +setup. It is about time to get some real metric gathering to see what's going +on. It's also time to get some proper monitoring setup. + +There are a lot of options in this field and I should, probably, write a blog +post on my views on the topic. For this experiment, on the other hand, the +solution is already pre-chosen. We'll be running Prometheus. + +#+hugo: more + +**** Prometheus +To answer the question, /what is Prometheus?/, we'll rip a page out of the +Prometheus [[https://prometheus.io/docs/introduction/overview/][docs]]. + +#+begin_quote +Prometheus is an open-source systems monitoring and alerting toolkit originally +built at SoundCloud. Since its inception in 2012, many companies and +organizations have adopted Prometheus, and the project has a very active +developer and user community. It is now a standalone open source project and +maintained independently of any company. To emphasize this, and to clarify the +project's governance structure, Prometheus joined the Cloud Native Computing +Foundation in 2016 as the second hosted project, after Kubernetes. + +Prometheus collects and stores its metrics as time series data, i.e. metrics +information is stored with the timestamp at which it was recorded, alongside +optional key-value pairs called labels. +#+end_quote + +let's decypher all this jargon down to plain English. In simple terms, +Prometheus is a system that scrape metrics, from your services and applications, +and stores those metrics, in a time series database, ready to serve back again +when queried. + +Prometheus also offers a way to create rules on those metrics to alert you when +something goes wrong. Combined with [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]], you got yourself a full +monitoring system. + +**** Configuration +Now that we briefly touched on a /few/ features of *Prometheus* and before we +can deploy, we need to write our configuration. + +This is an example of a bare configuration. + +#+NAME: prometheus-scraping-config +#+begin_src yaml +scrape_configs: +- job_name: prometheus + scrape_interval: 30s + static_configs: + - targets: + - prometheus:9090 +#+end_src + +This will make Prometheus scrape itself every 30 seconds for metrics. At least +you get /some/ metrics to query later. If you want the full experience, I would +suggest you enable /Prometheus metrics/ for your services. Consult the docs of +the project to see if and how it can expose metrics for /Prometheus/ to scrape, +then add the scrape endpoint to your configuration as shown above. + +Here's a an example of a couple more, /well known/, projects; [[https://prometheus.io/docs/alerting/latest/alertmanager/][/Alertmanager/]] and +[[https://github.com/prometheus/node_exporter][/node exporter/]]. + +#+NAME: prometheus-example-scraping-config +#+begin_src yaml +- job_name: alertmanager + scrape_interval: 30s + static_configs: + - targets: + - alertmanager:9093 + +- job_name: node-exporter + scrape_interval: 30s + static_configs: + - targets: + - node-exporter:9100 +#+end_src + +A wider [[https://prometheus.io/docs/instrumenting/exporters/][list of exporters]] can be found on the Prometheus docs. + +**** Deployment +Now that we got ourselves a cofniguration, let's deploy *Prometheus*. + +Luckily for us, Prometheus comes containerized and ready to deploy. We'll be +using =docker-compose= in this example to make it easier to translate later to +other types of deployments. + +#+BEGIN_EXPORT html +
+

Note

+#+END_EXPORT +I'm still running on =2.x= API version. I know I need to upgrade to a newer +version but that's a bit of networking work. It's an ongoing work. +#+BEGIN_EXPORT html +
+#+END_EXPORT + +The =docker-compose= file should look like the following. + +#+begin_src yaml +--- +version: '2.3' + +services: + prometheus: + image: quay.io/prometheus/prometheus:v2.27.0 + container_name: prometheus + mem_limit: 400m + mem_reservation: 300m + restart: unless-stopped + command: + - --config.file=/etc/prometheus/prometheus.yml + - --web.external-url=http://prometheus.localhost/ + volumes: + - "./prometheus/:/etc/prometheus/:ro" + ports: + - "80:9090" +#+end_src + +A few things to *note*, especially for the new container crowd. The container +image *version* is explicitly specified, do *not* use =latest= in production. + +To make sure I don't overload my host, I set memory limits. I don't mind if it +goes down, this is a PoC (Proof of Concept) for the time being. In your case, +you might want to choose higher limits to give it more room to breath. When the +memory limit is reached, the container will be killed with /Out Of Memory/ +error. + +In the *command* section, I specify the /external url/ for Prometheus to +redirect me correctly. This is what Prometheus thinks its own hostname is. I +also specify the configuration file, previously written, which I mount as +/read-only/ in the *volumes* section. + +Finally, we need to port-forward =9090= to our hosts' =80= if possible to access +*Prometheus*. Otherwise, figure out a way to route it properly. This is a local +installation, which is suggested by the Prometheus /hostname/. + +If you made it so far, you should be able to run this with no issues. + +#+begin_src bash +docker-compose up -d +#+end_src + +**** Prometheus Rules +*Prometheus* supports *two* types of rules; recording and alerting. Let's expand +a little bit on those two concepts. + +***** Recording Rules +First, let's start off with [[https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/][recording rules]]. I don't think I can explain it +better than the *Prometheus* documentation which says. + +#+begin_quote +Recording rules allow you to precompute frequently needed or computationally +expensive expressions and save their result as a new set of time series. +Querying the precomputed result will then often be much faster than executing +the original expression every time it is needed. This is especially useful for +dashboards, which need to query the same expression repeatedly every time they +refresh. +#+end_quote + +Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to +create recording rules yet for my setup so I'll forgo this step. + +***** Alerting Rules +As the name suggests, [[https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#alerting-rules][alerting rules]] allow you to define conditional expressions +based on metrics which will trigger notifications to alert you. + +This is a very simple example of an /alert rule/ that monitors all the endpoints +scraped by /Prometheus/ to see if any of them is down. If this expression return +a result, an alert will fire from /Prometheus/. + +#+begin_src yaml +groups: +- name: Instance down + rules: + - alert: InstanceDown + expr: up == 0 + for: 5m + labels: + severity: page + annotations: + summary: "Instance {{ $labels.instance }} down" + description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." +#+end_src + +To be able to add this alert to *Prometheus*, we need to save it in a +=rules.yml= file and then include it in the *Prometheus* configuration as follows. + +#+NAME: prometheus-rule-files-config +#+begin_src yaml +rule_files: +- "rules.yml" +#+end_src + +Making the configuration intiretly as follows. + +#+begin_src yaml :noweb yes +<> + +<> + +<> +#+end_src + +At this point, make sure everything is mounted into the container properly and +rerun your *Prometheus*. + +**** Prometheus UI +Congratulations if you've made it so far. If you visit http://localhost/ at +stage you should get to Prometheus where you can query your metrics. + +#+caption: Prometheus overview +#+attr_html: :target _blank +[[file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png][file:images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png]] + +You can get all sorts of information under the /status/ drop-down menu. + +#+caption: Prometheus Status drop-down menu +#+attr_html: :target _blank +[[file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png][file:images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png]] + +**** Conclusion +As you can see, deploying *Prometheus* is not too hard. If you're running +/Kubernetes/, make sure you use the operator. It will make your life a lot +easier in all sorts of things. + +Take your time to familiarise yourself with *Prometheus* and consult the +documentation as much as possible. It is well written and in most cases your +best friend. Figure out different ways to create rules for recording and +alerting. Most people at this stage deploy *Grafana* to start visualizing their +metrics. Well... Not in this blog post we ain't ! + +I hope you enjoy playing around with *Prometheus* and until the next post. ** Nikola :@nikola: *** DONE Welcome back to the old world :blog:org_mode:emacs:rst: :PROPERTIES: diff --git a/content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png b/content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png new file mode 100644 index 0000000..4d50390 --- /dev/null +++ b/content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14ab6959e9fd5aee73e2da62d1cb7a2d58b41a4ff9d547052aea86b83135f21c +size 103761 diff --git a/content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png b/content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png new file mode 100644 index 0000000..111d72f --- /dev/null +++ b/content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f3cc4b18c99b48b7159ff9606d731aadccc6faa0e700ad1d0d2365376bb9577 +size 25542 diff --git a/content/posts/upgrade-your-monitoring-setup-with-prometheus.md b/content/posts/upgrade-your-monitoring-setup-with-prometheus.md new file mode 100644 index 0000000..fc1ed59 --- /dev/null +++ b/content/posts/upgrade-your-monitoring-setup-with-prometheus.md @@ -0,0 +1,264 @@ ++++ +title = "Upgrade your monitoring setup with Prometheus" +author = ["Elia el Lazkani"] +date = 2021-09-17 +lastmod = 2021-09-17 +tags = ["prometheus", "metrics", "container"] +categories = ["monitoring"] +draft = true ++++ + +After running simple monitoring for quite a while, I decided to upgrade my +setup. It is about time to get some real metric gathering to see what's going +on. It's also time to get some proper monitoring setup. + +There are a lot of options in this field and I should, probably, write a blog +post on my views on the topic. For this experiment, on the other hand, the +solution is already pre-chosen. We'll be running Prometheus. + + + + +## Prometheus {#prometheus} + +To answer the question, _what is Prometheus?_, we'll rip a page out of the +Prometheus [docs](https://prometheus.io/docs/introduction/overview/). + +> Prometheus is an open-source systems monitoring and alerting toolkit originally +> built at SoundCloud. Since its inception in 2012, many companies and +> organizations have adopted Prometheus, and the project has a very active +> developer and user community. It is now a standalone open source project and +> maintained independently of any company. To emphasize this, and to clarify the +> project's governance structure, Prometheus joined the Cloud Native Computing +> Foundation in 2016 as the second hosted project, after Kubernetes. +> +> Prometheus collects and stores its metrics as time series data, i.e. metrics +> information is stored with the timestamp at which it was recorded, alongside +> optional key-value pairs called labels. + +let's decypher all this jargon down to plain English. In simple terms, +Prometheus is a system that scrape metrics, from your services and applications, +and stores those metrics, in a time series database, ready to serve back again +when queried. + +Prometheus also offers a way to create rules on those metrics to alert you when +something goes wrong. Combined with [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/), you got yourself a full +monitoring system. + + +## Configuration {#configuration} + +Now that we briefly touched on a _few_ features of **Prometheus** and before we +can deploy, we need to write our configuration. + +This is an example of a bare configuration. + + +```yaml +scrape_configs: +- job_name: prometheus + scrape_interval: 30s + static_configs: + - targets: + - prometheus:9090 +``` + +This will make Prometheus scrape itself every 30 seconds for metrics. At least +you get _some_ metrics to query later. If you want the full experience, I would +suggest you enable _Prometheus metrics_ for your services. Consult the docs of +the project to see if and how it can expose metrics for _Prometheus_ to scrape, +then add the scrape endpoint to your configuration as shown above. + +Here's a an example of a couple more, _well known_, projects; [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/) and +[_node exporter_](https://github.com/prometheus/node%5Fexporter). + + +```yaml +- job_name: alertmanager + scrape_interval: 30s + static_configs: + - targets: + - alertmanager:9093 + +- job_name: node-exporter + scrape_interval: 30s + static_configs: + - targets: + - node-exporter:9100 +``` + +A wider [list of exporters](https://prometheus.io/docs/instrumenting/exporters/) can be found on the Prometheus docs. + + +## Deployment {#deployment} + +Now that we got ourselves a cofniguration, let's deploy **Prometheus**. + +Luckily for us, Prometheus comes containerized and ready to deploy. We'll be +using `docker-compose` in this example to make it easier to translate later to +other types of deployments. + +
+

Note

+ +I'm still running on `2.x` API version. I know I need to upgrade to a newer +version but that's a bit of networking work. It's an ongoing work. + +
+ +The `docker-compose` file should look like the following. + +```yaml +--- +version: '2.3' + +services: + prometheus: + image: quay.io/prometheus/prometheus:v2.27.0 + container_name: prometheus + mem_limit: 400m + mem_reservation: 300m + restart: unless-stopped + command: + - --config.file=/etc/prometheus/prometheus.yml + - --web.external-url=http://prometheus.localhost/ + volumes: + - "./prometheus/:/etc/prometheus/:ro" + ports: + - "80:9090" +``` + +A few things to **note**, especially for the new container crowd. The container +image **version** is explicitly specified, do **not** use `latest` in production. + +To make sure I don't overload my host, I set memory limits. I don't mind if it +goes down, this is a PoC (Proof of Concept) for the time being. In your case, +you might want to choose higher limits to give it more room to breath. When the +memory limit is reached, the container will be killed with _Out Of Memory_ +error. + +In the **command** section, I specify the _external url_ for Prometheus to +redirect me correctly. This is what Prometheus thinks its own hostname is. I +also specify the configuration file, previously written, which I mount as +_read-only_ in the **volumes** section. + +Finally, we need to port-forward `9090` to our hosts' `80` if possible to access +**Prometheus**. Otherwise, figure out a way to route it properly. This is a local +installation, which is suggested by the Prometheus _hostname_. + +If you made it so far, you should be able to run this with no issues. + +```bash +docker-compose up -d +``` + + +## Prometheus Rules {#prometheus-rules} + +**Prometheus** supports **two** types of rules; recording and alerting. Let's expand +a little bit on those two concepts. + + +### Recording Rules {#recording-rules} + +First, let's start off with [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording%5Frules/). I don't think I can explain it +better than the **Prometheus** documentation which says. + +> Recording rules allow you to precompute frequently needed or computationally +> expensive expressions and save their result as a new set of time series. +> Querying the precomputed result will then often be much faster than executing +> the original expression every time it is needed. This is especially useful for +> dashboards, which need to query the same expression repeatedly every time they +> refresh. + +Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to +create recording rules yet for my setup so I'll forgo this step. + + +### Alerting Rules {#alerting-rules} + +As the name suggests, [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting%5Frules/#alerting-rules) allow you to define conditional expressions +based on metrics which will trigger notifications to alert you. + +This is a very simple example of an _alert rule_ that monitors all the endpoints +scraped by _Prometheus_ to see if any of them is down. If this expression return +a result, an alert will fire from _Prometheus_. + +```yaml +groups: +- name: Instance down + rules: + - alert: InstanceDown + expr: up == 0 + for: 5m + labels: + severity: page + annotations: + summary: "Instance {{ $labels.instance }} down" + description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." +``` + +To be able to add this alert to **Prometheus**, we need to save it in a +`rules.yml` file and then include it in the **Prometheus** configuration as follows. + + +```yaml +rule_files: +- "rules.yml" +``` + +Making the configuration intiretly as follows. + +```yaml +rule_files: +- "rules.yml" + +scrape_configs: +- job_name: prometheus + scrape_interval: 30s + static_configs: + - targets: + - prometheus:9090 + +- job_name: alertmanager + scrape_interval: 30s + static_configs: + - targets: + - alertmanager:9093 + +- job_name: node-exporter + scrape_interval: 30s + static_configs: + - targets: + - node-exporter:9100 +``` + +At this point, make sure everything is mounted into the container properly and +rerun your **Prometheus**. + + +## Prometheus UI {#prometheus-ui} + +Congratulations if you've made it so far. If you visit at +stage you should get to Prometheus where you can query your metrics. + +{{< figure src="/ox-hugo/01-prometheus-overview.png" caption="Figure 1: Prometheus overview" target="_blank" link="/ox-hugo/01-prometheus-overview.png" >}} + +You can get all sorts of information under the _status_ drop-down menu. + +{{< figure src="/ox-hugo/02-prometheus-status-drop-down-menu.png" caption="Figure 2: Prometheus Status drop-down menu" target="_blank" link="/ox-hugo/02-prometheus-status-drop-down-menu.png" >}} + + +## Conclusion {#conclusion} + +As you can see, deploying **Prometheus** is not too hard. If you're running +_Kubernetes_, make sure you use the operator. It will make your life a lot +easier in all sorts of things. + +Take your time to familiarise yourself with **Prometheus** and consult the +documentation as much as possible. It is well written and in most cases your +best friend. Figure out different ways to create rules for recording and +alerting. Most people at this stage deploy **Grafana** to start visualizing their +metrics. Well... Not in this blog post we ain't ! + +I hope you enjoy playing around with **Prometheus** and until the next post. diff --git a/static/ox-hugo/01-prometheus-overview.png b/static/ox-hugo/01-prometheus-overview.png new file mode 100644 index 0000000..4d50390 --- /dev/null +++ b/static/ox-hugo/01-prometheus-overview.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14ab6959e9fd5aee73e2da62d1cb7a2d58b41a4ff9d547052aea86b83135f21c +size 103761 diff --git a/static/ox-hugo/02-prometheus-status-drop-down-menu.png b/static/ox-hugo/02-prometheus-status-drop-down-menu.png new file mode 100644 index 0000000..111d72f --- /dev/null +++ b/static/ox-hugo/02-prometheus-status-drop-down-menu.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f3cc4b18c99b48b7159ff9606d731aadccc6faa0e700ad1d0d2365376bb9577 +size 25542