builder_builds_triggered_total 0
# HELP engine_daemon_container_actions_seconds The number of seconds it takes to process each container action
# TYPE engine_daemon_container_actions_seconds histogram
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.005"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.01"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.025"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.05"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.1"} 1
In order for the docker daemon to apply the parameters, it must be restarted, which will lead to the fall of all containers, and when the daemon starts, the containers will be raised in accordance with their policy:
essh @ kubernetes-master: ~ $ sudo chmod a + w /etc/docker/daemon.json
essh @ kubernetes-master: ~ $ echo '{"metrics-addr": "127.0.0.1:9323", "experimental": true}' | jq -M -f / dev / null> /etc/docker/daemon.json
essh @ kubernetes-master: ~ $ cat /etc/docker/daemon.json
{
"metrics-addr": "127.0.0.1:9323",
"experimental": true
}
essh @ kubernetes-master: ~ $ systemctl restart docker
Prometheus will only respond to metrics on the same server from different sources. In order for us to collect metrics from different nodes and see the aggregated result, we need to put an agent collecting metrics on each node:
essh @ kubernetes-master: ~ $ docker run -d \
–v "/ proc: / host / proc" \
–v "/ sys: / host / sys" \
–v "/: / rootfs" \
–-net = "host" \
–-name = explorer \
quay.io/prometheus/node-exporter:v0.13.0 \
–collector.procfs / host / proc \
–collector.sysfs / host / sys \
–collector.filesystem.ignored-mount-points "^ / (sys | proc | dev | host | etc) ($ | /)"
1faf800c878447e6110f26aa3c61718f5e7276f93023ab4ed5bc1e782bf39d56
and register to listen to the address of the node, but for now everything is local, localhost: 9100. Now let's tell Prometheus to listen to agent and docker:
essh @ kubernetes-master: ~ $ mkdir prometheus && cd $ _
essh @ kubernetes-master: ~ / prometheus $ cat << EOF> ./prometheus.yml
global:
scrape_interval: 1s
evaluation_interval: 1s
scrape_configs:
– job_name: 'prometheus'
static_configs:
– targets: ['127.0.0.1:9090', '127.0.0.1:9100', '127.0.0.1:9323']
labels:
group: 'prometheus'
EOF
essh @ kubernetes-master: ~ / prometheus $ docker rm -f prometheus
prometheus
essh @ kubernetes-master: ~ / prometheus $ docker run \
–d \
–-net = host \
–-restart always \
–-name prometheus \
–v $ (pwd) /prometheus.yml:/etc/prometheus/prometheus.yml
prom / prometheus
7dd991397d43597ded6be388f73583386dab3d527f5278b7e16403e7ea633eef
essh @ kubernetes-master: ~ / prometheus $ docker ps \
–f name = prometheus
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7dd991397d43 prom / prometheus "/ bin / prometheus –c…" 53 seconds ago Up 53 seconds prometheus
1702 host metrics are now available:
essh @ kubernetes-master: ~ / prometheus $ curl http: // localhost: 9100 / metrics | grep -v '#' | wc -l
1702
out of all the variety, it is difficult to find the ones you need for everyday tasks, for example, the amount of memory used by node_memory_Active. There are metrics aggregators for this:
http: // localhost: 9090 / consoles / node.html
http: // localhost: 9090 / consoles / node-cpu.html
But it's better to use Grafana. Let's install it too, you can see an example:
essh @ kubernetes-master: ~ / prometheus $ docker run \
–d \
–-name = grafana \
–-net = host
grafana / grafana
Unable to find image 'grafana / grafana: latest' locally
latest: Pulling from grafana / grafana
9d48c3bd43c5: Already exists
df58635243b1: Pull complete
09b2e1de003c: Pull complete
f21b6d64aaf0: Pull complete
719d3f6b4656: Pull complete
d18fca935678: Pull complete
7c7f1ccbce63: Pull complete
Digest: sha256: a10521576058f40427306fcb5be48138c77ea7c55ede24327381211e653f478a
Status: Downloaded newer image for grafana / grafana: latest
6f9ca05c7efb2f5cd8437ddcb4c708515707dbed12eaa417c2dca111d7cb17dc
essh @ kubernetes-master: ~ / prometheus $ firefox localhost: 3000
We will enter the login admin and the password admin, after which we will be prompted to change the password. Next, you need to perform the subsequent configuration.
In Grafana, the initial login is admin and this password. First, we are prompted to select a source – select Prometheus, enter localhost: 9090, select the connection not as to the server, but as to the browser (that is, over the network) and select that we have basic authentication – that's all – click Save and Test and Prometheus is connected.
It is clear that it is not worth giving out a password and login from admin rights to everyone. To do this, you will need to create users or integrate them with an external user database such as Microsoft Active Directory.
I will select in the Dashboard tab and activate all three reconfigured dashboards. From the New Dashboard list in the top menu, select the Prometheus 2.0 Stats dashboard. But, there is no data:
I click on the "+" menu item and select "Dashboard", it is proposed to create a dashboard. A dashboard can contain several widgets, for example, charts that can be positioned and customized, so click on the add chart button and select its type. On the graph itself, we select edit by choosing a size, click edit, and the most important thing here is the choice of the displayed metric. Choosing Prometheus
Complete assembly available:
essh @ kubernetes-master: ~ / prometheus $ wget \
https://raw.githubusercontent.com/grafana/grafana/master/devenv/docker/ha_test/docker-compose.yaml
–-2019-10-30 07: 29: 52– https://raw.githubusercontent.com/grafana/grafana/master/devenv/docker/ha_test/docker-compose.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com) … 151.101.112.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com) | 151.101.112.133 |: 443 … connected.
HTTP request sent, awaiting response … 200 OK
Length: 2996 (2.9K) [text / plain]
Saving to: 'docker-compose.yaml'
docker-compose.yaml 100% [=========>] 2.93K –.– KB / s in 0s
2019-10-30 07:29:52 (23.4 MB / s) – 'docker-compose.yaml' saved [2996/2996]
Obtaining application metrics
Up to this point, we have looked at the case where Prometheus polled the standard metric accumulator, getting the standard metrics. Now let's try to create an application and submit our metrics. First, let's take a NodeJS server and write an application for it. To do this, let's create a NodeJS project:
vagrant @ ubuntu: ~ $ mkdir nodejs && cd $ _
vagrant @ ubuntu: ~ / nodejs $ npm init
This utility will walk you through creating a package.json file.
It only covers the most common items, and tries to guess sensible defaults.
See `npm help json` for definitive documentation on these fields
and exactly what they do.
Use `npm install
save it as a dependency in the package.json file.
name: (nodejs)
version: (1.0.0)
description:
entry point: (index.js)
test command:
git repository:
keywords:
author: ESSch
license: (ISC)
About to write to /home/vagrant/nodejs/package.json:
{
"name": "nodejs",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \" Error: no test specified \ "&& exit 1"
},
"author": "ESSch",
"license": "ISC"
}
Is this ok? (yes) yes
First, let's create a WEB server. I'll use the library to create it:
vagrant @ ubuntu: ~ / nodejs $ npm install Express –save
npm WARN deprecated Express@3.0.1: Package unsupported. Please use the express package (all lowercase) instead.
nodejs@1.0.0 / home / vagrant / nodejs
└── Express@3.0.1
npm WARN nodejs@1.0.0 No description
npm WARN nodejs@1.0.0 No repository field.
vagrant @ ubuntu: ~ / nodejs $ cat << EOF> index.js
const express = require ('express');
const app = express ();
app.get ('/ healt', function (req, res) {
res.send ({status: "Healt"});
});
app.listen (9999, () => {
console.log ({status: "start"});
});
EOF
vagrant @ ubuntu: ~ / nodejs $ node index.js &
[1] 18963
vagrant @ ubuntu: ~ / nodejs $ {status: 'start'}
vagrant @ ubuntu: ~ / nodejs $ curl localhost: 9999 / healt
{"status": "Healt"}
Our server is ready to work with Prometheus. We need to configure Prometheus for it.
The Prometheus scaling problem arises when the data does not fit on one server, more precisely, when one server does not have time to record data and when the processing of data by one server does not suit the performance. Thanos solves this problem by not requiring federation setup, by providing the user with an interface and API that it broadcasts to Prometheus instances. A web interface similar to Prometheus is available to the user. He himself interacts with agents that are installed on instances as a side-car, as Istio does. He and the agents are available as containers and as a Helm chart. For example, an agent can be brought up as a container configured on Prometheus, and Prometheus is configured with a config followed by a reboot.
docker run –rm quay.io/thanos/thanos:v0.7.0 –help
docker run -d –net = host –rm \
–v $ (pwd) /prometheus0_eu1.yml:/etc/prometheus/prometheus.yml \
–-name prometheus-0-sidecar-eu1 \
–u root \
quay.io/thanos/thanos:v0.7.0 \
sidecar \
–-http-address 0.0.0.0:19090 \
–-grpc-address 0.0.0.0:19190 \
–-reloader.config-file /etc/prometheus/prometheus.yml \
–-prometheus.url http://127.0.0.1:9090
Notifications are an important part of monitoring. Notifications consist of firing triggers and a provider. A trigger is written in PromQL, as a rule, with a condition in Prometheus. When a trigger is triggered (metric condition), Prometheus signals the provider to send a notification. The standard provider is Alertmanager and is capable of sending messages to various receivers such as email and Slack.
For example, the metric "up", which takes the values 0 or 1, can be used to poison a message if the server is off for more than 1 minute. For this, a rule is written:
groups:
– name: example
rules:
– alert: Instance Down
expr: up == 0
for: 1m
When the metric is equal to 0 for more than 1 minute, then this trigger is triggered and Prometheus sends a request to the Alertmanager. Alertmanager specifies what to do with this event. We can prescribe that when the InstanceDown event is received, we need to send a message to the mail. To do this, configure Alertmanager to do this:
global:
smtp_smarthost: 'localhost: 25'
smtp_from: 'youraddress@example.org'
route:
receiver: example-email
receivers:
– name: example-email
email_configs:
– to: 'youraddress@example.org'
Alertmanager itself will use the installed protocol on this computer. In order for it to be able to do this, it must be installed. Take Simple Mail Transfer Protocol (SMTP), for example. To test it, let's install a console mail server in parallel with the Alert Manager – sendmail.
Fast and clear analysis of system logs
OpenSource full-text search engine Lucene is used for quick search in logs. On its basis, two low-level products were built: Sold and Elasticsearch, which are quite similar in capabilities, but differ in usability and license. Many popular assemblies are built on them, for example, just a delivery set with ElasticSearch: ELK (Elasticsearch (Apache Lucene), Logstash, Kibana), EFK (Elasticsearch, Fluentd, Kibana), and products, for example, GrayLog2. Both GrayLog2 and assemblies (ELK / EFK) are actively used due to the lesser need to configure non-test benches, for example, you can put EFK in a Kubernetes cluster with almost one command
helm install efk-stack stable / elastic-stack –set logstash.enabled = false –set fluentd.enabled = true –set fluentd-elastics
An alternative that has not yet received much consideration are systems built on the previously considered Prometheus, for example, PLG (Promtail (agent) – Loki (Prometheus) – Grafana).
Comparison of ElasticSearch and Sold (systems are comparable):
Elastic:
** Commercial with open source and the ability to commit (via approval);
** Supports more complex queries, more analytics, out of the box support for distributed queries, more complete REST-full JSON-BASH, chaining, machine learning, SQL (paid);
*** Full-text search;
*** Real-time index;
*** Monitoring (paid);
*** Monitoring via Elastic FQ;
*** Machine learning (paid);
*** Simple indexing;
*** More data types and structures;
** Lucene engine;
** Parent-child (JOIN);
** Scalable native;
** Documentation from 2010;
Solr:
** OpenSource;
** High speed with JOIN;
*** Full-text search;
*** Real-time index;
*** Monitoring in the admin panel;
*** Machine learning through modules;
*** Input data: Work, PDF and others;
*** Requires a schema for indexing;
*** Data: nested objects;
** Lucene engine;
** JSON join;
** Scalable: Solar Cloud (setting) && ZooKeeper (setting);
** Documentation since 2004.
At the present time, micro-service architecture is increasingly used, which allows due to weak
the connectivity between their components and their simplicity to simplify their development, testing, and debugging.
But in general, the system becomes more difficult to analyze due to its distribution. To analyze the condition
in general, logs are used, collected in a centralized place and converted into an understandable form. Also arises
the need to analyze other data, for example, access_log NGINX, to collect metrics about attendance, mail log,
mail server to detect attempts to guess a password, etc. Take ELK as an example of such a solution. ELK means
a bunch of three products: Logstash, Elasticsearch and Kubana, the first and last of which are heavily focused on the central and
provide ease of use. More generally ELK is called Elastic Stack, since the tool for preparing logs Logstash
can be replaced by analogs such as Fluentd or Rsyslog, and the Kibana renderer can be replaced by Grafana. For example, although
Kibana provides great analysis capabilities, Grafana provides notifications when events occur, and
can be used in conjunction with other products, for example, CAdVisor – analysis of the state of the system and individual containers.
EKL products can be self-installed, downloaded as self-contained containers for which you need to configure
communication or as a single container.
For Elasticsearch to work properly, you need the data to come in JSON format. If the data is submitted to
text format (the log is written in one line, separated from the previous one by a line break), then it can
provide only full-text searches as they will be interpreted as one line. For transmission
logs in JSON format, there are two options: either configure the product under investigation to be output in this format,
for example, for NGINX there is such a possibility. But, often this is impossible, since there is already
the accumulated database of logs, and traditionally they are written in text format. For such cases, it is necessary
post processing of logs from text format to JSON, which is handled by Logstash. It is important to note that if
it is possible to immediately transfer data in a structured form (JSON, XML and others), then this follows
do, because if you do detailed parsing, then any deviation is a one-sided deviation from the format
will lead to inoperability, and if superficial – we lose valuable information. Anyway, parsing in
this system is a bottleneck, although it can be scaled to a limited extent to a service or log
file. Fortunately, more and more products are starting to support structured logging, such as
the latest versions of NGINX support logs in JSON format.
For systems that do not support this format, you can use the conversion to it using such
programs like Logstash, File bear and Fluentd. The first one is included in the standard Elastic Stack delivery from the vendor
and can be installed in one way ELK in Docker – container. It supports fetching data from files, network and
standard stream both at the input and at the output, and most importantly, the native Elastic Search protocol.
Logstash monitors log files based on modification date or receives over the network telnet data from a distributed
systems, for example, containers and, after transformation, it is sent to the output, usually in Elastic Search. It is simple and
comes standard with the Elastic Stack, making it easy and hassle-free to configure. But thanks to
Java machine inside is heavy and not very functional, although it supports plugins, for example, synchronization with MySQL
to send new data. Filebeat provides slightly more options. An enterprise tool for everything
cases of life can serve Fluentd due to its high functionality (reading logs, system logs, etc.),
scalability and the ability to roll out across Kubernetes clusters using the Helm chart, and monitor everything
data center in the standard package, but about this relevant section.
To manage logs, you can use Curator, which can archive old ones from ElasticSearch
logs or delete them, increasing the efficiency of its work.
The process of obtaining logs is logical carried out by special collectors: logstash, fluentd, filebeat or
others.
fluentd is the least demanding and simpler analogue of Logstash. Customization
produced in /etc/td-agent/td-agent.conf, which contains four blocks:
** match – contains settings for transferring received data;
** include – contains information about file types;
** system – contains system settings.
Logstash provides a much more functional configuration language. Logstash agent daemon – logstash monitors
changes in files. If the logs are not located locally, but on a distributed system, then logstash is installed on each server and
runs in agent mode bin / logstash agent -f /env/conf/my.conf . Since run
logstash only as an agent for sending logs is wasteful, then you can use a product from those
the same developers Logstash Forwarder (formerly Lumberjack) forwards logs via the lumberjack protocol to
logstash to the server. You can use the Packetbeat agent to track and retrieve data from MySQL
(https://www.8host.com/blog/sbor-metrik-infrastruktury-s-pomoshhyu-packetbeat-i-elk-v-ubuntu-14-04/).
Also logstash allows you to convert data of different types:
** grok – set regular expressions to rip fields from a string, often for logs from text format to JSON;
** date – in case of archived logs, set the date when the log was created not as the current date, but take it from the log itself
** kv – for logs like key = value;
** mutate – select only the required fields and change the data in the fields, for example, replace the "/" character with "_";
** multiline – for multi-line logs with delimiters.
For example, you can decompose a log in the format "date type number" into components, for example "01.01.2021 INFO 1" decompose into a hash "message":
filter {
grok {
type => "my_log"
match => ["message", "% {MYDATE: date}% {WORD: loglevel} $ {ID.id.int}"]
}
}
The $ {ID.id.int} template takes the class – the ID template, the resulting value will be substituted into the id field and the string value will be converted to the int type.
In the "Output" block, we can specify: output data to the console using the "Stdout" block, to a file – "File", transfer via http via JSON REST API – "Elasticsearch" or send by mail – "Email". You can also order conditions for the fields obtained in the filter block. For instance,:
output {
if [type] == "Info" {
elasticsearch {
host => localhost
index => "log -% {+ YYYY.MM.dd}"
}
}
}
Here the Elasticsearch index (a database, if we can analogy with SQL) changes every day. To create a new index, you do not need to create it specially – this is how NoSQL databases do it, since there is no strict requirement to describe the structure – property and type. But it is still recommended to describe it, otherwise all fields will be with string values, if a number is not specified. To display Elasticsearch data, a plugin of the WEB-ui interface in AngularJS – Kibana is used. To display a timeline in its charts, you need to describe at least one field with the date type, and for aggregate functions – a numeric one, be it an integer or floating point. Also, if new fields are added, indexing and displaying them requires re-indexing the entire index, so the most complete description of the structure will help to avoid the very time-consuming operation of reindexing.
The division of the index by days is done to speed up the work of Elasticsearch, and in Kibana you can select several by pattern, here log- * , the limitation of one million documents per index is also removed.
Consider a more detailed Logstash output plugin:
output {
if [type] == "Info" {
elasticsearch {
claster => elasticsearch
action => "create"
hosts => ["localhost: 9200"]
index => "log -% {+ YYYY.MM.dd}"
document_type => ....
document_id => "% {id}"
}
}
}
Interaction with ElasticSearch is carried out through the JSON REST API, for which there are drivers for most modern languages. But in order not to write code, we will use the Logstash utility, which also knows how to convert text data to JSON based on regular expressions. There are also predefined templates, like classes in regular expressions, such as % {IP: client} and others, which can be viewed at https://github.com/elastic/logstash/tree/v1.1.9/patterns. For standard services with standard settings on the Internet there are many ready-made configs, for example, for NGINX – https://github.com/zooniverse/static/blob/master/logstash- Nginx.conf. More similarly, it is described in the article https://habr.com/post/165059/.
ElasticSearch is a NoSQL database, so you don't need to specify a format (set of fields and its types). For searching, he still needs it, so he defines it himself, and with each format change, re-indexing occurs, in which work is impossible. To maintain a unified structure in the Serilog logger (DOT Net) there is an EventType field in which you can encrypt a set of fields and their types, for the rest you will have to implement them separately. To analyze the logs from a microservice architecture application, it is important to set the ID while it is being executed, that is, the request ID, which will be unchanged and transmitted from the microservice to the microservice, so that you can trace the entire path of the request.
Install ElasticSearch (https://habr.com/post/280488/) and check that curl -X GET localhost: 9200 works
sudo sysctl -w vm.max_map_count = 262144
$ curl 'localhost: 9200 / _cat / indices? v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open graylog_0 h2NICPMTQlqQRZhfkvsXRw 4 0 0 0 1kb 1kb
green open .kibana_1 iMJl7vyOTuu1eG8DlWl1OQ 1 0 3 0 11.9kb 11.9kb
yellow open indexname le87KQZwT22lFll8LSRdjw 5 1 1 0 4.5kb 4.5kb
yellow open db i6I2DmplQ7O40AUzyA-a6A 5 1 0 0 1.2kb 1.2kb
Create an entry in the blog database and post table curl -X PUT "$ ES_URL / blog / post / 1? Pretty" -d '
ElasticSearch search engine
In the previous section, we looked at the ELK stack that ElasticSearch, Logstash, and Kibana make up. In the full set, and often it is still extended by Filebeat – more tailored to work with the Logstash extension, for working with text logs. Despite the fact that Logstash quickly performs its task unnecessarily, they do not use it, and logs in JSON format are sent via the dump upload API directly to Logstash.