pod / liveness created
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness 1/1 Running 2 2m53s
controlplane $ kubectl describe pod / liveness | tail -n 15
SecretName: default-token-9v5mb
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
–– – – – –
Normal Scheduled 3m44s default-scheduler Successfully assigned default / liveness to node01
Normal Pulled 68s (x3 over 3m35s) kubelet, node01 Container image "alpine: 3.5" already present on machine
Normal Created 68s (x3 over 3m35s) kubelet, node01 Created container healtcheck
Normal Started 68s (x3 over 3m34s) kubelet, node01 Started container healtcheck
Warning Unhealthy 23s (x9 over 3m3s) kubelet, node01 Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
Normal Killing 23s (x3 over 2m53s) kubelet, node01 Container healtcheck failed liveness probe, will be restarted
We also see on cluster events that when cat / tmp / health fails, the container is re-created:
controlplane $ kubectl get events
controlplane $ kubectl get events | grep pod / liveness
13m Normal Scheduled pod / liveness Successfully assigned default / liveness to node01
13m Normal Pulling pod / liveness Pulling image "alpine: 3.5"
13m Normal Pulled pod / liveness Successfully pulled image "alpine: 3.5"
10m Normal Created pod / liveness Created container healtcheck
10m Normal Started pod / liveness Started container healtcheck
10m Warning Unhealthy pod / liveness Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
10m Normal Killing pod / liveness Container healtcheck failed liveness probe, will be restarted
10m Normal Pulled pod / liveness Container image "alpine: 3.5" already present on machine
8m32s Normal Scheduled pod / liveness Successfully assigned default / liveness to node01
4m41s Normal Pulled pod / liveness Container image "alpine: 3.5" already present on machine
4m41s Normal Created pod / liveness Created container healtcheck
4m41s Normal Started pod / liveness Started container healtcheck
2m51s Warning Unhealthy pod / liveness Liveness probe failed: cat: can't open '/ tmp / healthy': No such file or directory
5m11s Normal Killing pod / liveness Container healtcheck failed liveness probe, will be restarted
Let's take a look at RadyNess trial. The availability of this test indicates that the application is ready to accept requests and the service can switch traffic to it:
controlplane $ cat << EOF> readiness.yaml
apiVersion: apps / v1
kind: Deployment
metadata:
name: readiness
spec:
replicas: 2
selector:
matchLabels:
app: readiness
template:
metadata:
labels:
app: readiness
spec:
containers:
– name: readiness
image: python
args:
– / bin / sh
– -c
– sleep 15 && (hostname> health) && python -m http.server 9000
readinessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 1
periodSeconds: 5
EOF
controlplane $ kubectl create -f readiness.yaml
deployment.apps / readiness created
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-fd8d996dd-cfsdb 0/1 ContainerCreating 0 7s
readiness-fd8d996dd-sj8pl 0/1 ContainerCreating 0 7s
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-fd8d996dd-cfsdb 0/1 Running 0 6m29s
readiness-fd8d996dd-sj8pl 0/1 Running 0 6m29s
controlplane $ kubectl exec -it readiness-fd8d996dd-cfsdb – curl localhost: 9000 / health
readiness-fd8d996dd-cfsdb
Our containers work great. Let's add traffic to them:
controlplane $ kubectl expose deploy readiness \
–-type = LoadBalancer \
–-name = readiness \
–-port = 9000 \
–-target-port = 9000
service / readiness exposed
controlplane $ kubectl get svc readiness
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
readiness LoadBalancer 10.98.36.51
controlplane $ curl localhost: 9000
controlplane $ for i in {1..5}; do curl $ IP: 9000 / health; done
one
2
3
four
five
Each container has a delay. Let's check what happens if one of the containers is restarted – whether traffic will be redirected to it:
controlplane $ kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-5dd64c6c79-9vq62 0/1 CrashLoopBackOff 6 15m
readiness-5dd64c6c79-sblvl 0/1 CrashLoopBackOff 6 15m
kubectl exec -it .... -c .... bash -c "rm -f healt"
controlplane $ for i in {1..5}; do echo $ i; done
one
2
3
four
five
controlplane $ kubectl delete deploy readiness
deployment.apps "readiness" deleted
Consider a situation when a container becomes temporarily unavailable for work:
(hostname> health) && (python -m http.server 9000 &) && sleep 60 && rm health && sleep 60 && (hostname> health) sleep 6000
/ bin / sh -c sleep 60 && (python -m http.server 9000 &) && PID = $! && sleep 60 && kill -9 $ PID
By default, the container enters the Running state upon completion of the execution of scripts in the Dockerfile and the launch of the script specified in the CMD instruction if it is overridden in the configuration in the Command section. But, in practice, if we have a database, it still needs to rise (read data and transfer their RAM and other actions), and this can take a lot of time, while it will not respond to connections, and other applications, although read and ready to accept connections will not be able to do so. Also, the container transitions to the Feils state when the main process in the container crashes. In the case of a database, it can endlessly try to execute an incorrect request and will not be able to respond to incoming requests, while the container will not be restarted, since the database daemon (server) did not formally crash. For these cases, two identifiers have been invented: readinessProbe and livenessProbe, which check the transition of the container to a working state or its failure by a custom script or HTTP request.
esschtolts @ cloudshell: ~ / bitrix (essch) $ cat health_check.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: healtcheck
name: healtcheck
spec:
containers:
– name: healtcheck
image: alpine: 3.5
args:
– / bin / sh
– -c
– sleep 12; touch / tmp / healthy; sleep 10; rm -rf / tmp / healthy; sleep 60
readinessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
exec:
command:
– cat
– / tmp / healthy
initialDelaySeconds: 15
periodSeconds: 5
The container starts after 3 seconds and after 5 seconds a readiness check starts every 5 seconds. On the second check (at 15 seconds of life), the readiness check cat / tmp / healthy will be successful. At this time, the livenessProbe operability check begins and at the second check (at 25 seconds) it ends with an error, after which the container is recognized as not working and is recreated.
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl create -f health_check.yaml && sleep 4 && kubectl get
pods && sleep 10 && kubectl get pods && sleep 10 && kubectl get pods
pod "liveness-exec" created
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 5s
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 15s
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 0 26s
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 53s
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 0/1 Running 0 1m
esschtolts @ cloudshell: ~ / bitrix (essch) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec 1/1 Running 1 1m
Kubernetes also provides a startup, which remakes the moment when you can turn the readiness and liveness of the sample into work. This is useful if, for example, we are downloading an application. Let's consider in more detail. Let's take www.katacoda.com/courses/Kubernetes/playground and Python for the experiment. There are TCP, EXEC and HTTP, but HTTP is better, as EXEC spawns processes and can leave them as "zombie processes". In addition, if the server provides interaction via HTTP, then it is against it that you need to check (https://www.katacoda.com/courses/kubernetes/playground):
controlplane $ kubectl version –short
Client Version: v1.18.0
Server Version: v1.18.0
cat << EOF> job.yaml
apiVersion: v1
kind: Pod
metadata:
name: healt
spec:
containers:
– name: python
image: python
command: ['sh', '-c', 'sleep 60 && (echo "work"> health) && sleep 60 && python -m http.server 9000']
readinessProbe:
httpGet:
path: / health
port: 9000
initialDelaySeconds: 3
periodSeconds: 3
livenessProbe:
httpGet:
path: / health
port: 9000
initialDelaySeconds: 3
periodSeconds: 3
startupProbe:
exec:
command:
– cat
– / health
initialDelaySeconds: 3
periodSeconds: 3
restartPolicy: OnFailure
EOF
controlplane $ kubectl create -f job.yaml
pod / healt
controlplane $ kubectl get pods # not loaded yet
NAME READY STATUS RESTARTS AGE
healt 0/1 Running 0 11s
controlplane $ sleep 30 && kubectl get pods # not loaded yet but image is already zipped
NAME READY STATUS RESTARTS AGE
healt 0/1 Running 0 51s
controlplane $ sleep 60 && kubectl get pods
NAME READY STATUS RESTARTS AGE
healt 0/1 Running 1 116s
controlplane $ kubectl delete -f job.yaml
pod "healt" deleted
Self-diagnosis of micro service application
Let's consider how the probe works on the example of the microservice application bookinfo, which is part of Istio as an example: https://github.com/istio/istio/tree/master/samples/bookinfo. The demo will be at www.katacoda.com/courses/istio/deploy-istio-on-kubernetes. After deployment, it will be available
Infrastructure management
Although Kubernetes also has its own graphical interface – a UI dashboard, it does not provide other than monitoring and simple actions. More possibilities are given by OpenShift, providing a combination of graphic and text creation. A full-fledged product with a formed Google ecosystem in Kubernetes does not provide, but provides a cloud solution – Google Cloud Platform. However, there are third-party solutions, such as Open Shift and Rancher, that allow you to use it fully through a graphical interface at its own facilities. If desired, of course, you can sync with the cloud.
Each product is often not API compatible with each other, the only known exception being Mail. Cloud, which claims support for Open Shift. But, there is a third-party solution that implements the infrastructure as code approach and supports the API of most well-known ecosystems – Terraform. He, like Kubernetes, applies the concept of infrastructure as code, but not to containerization, but to virtual machines (servers, networks, disks). The Infrastructure as Code principle implies a declarative configuration – that is, a description of the result without explicitly specifying the actions themselves. Upon activation, the configuration (in Kubernetes it is kubectl apply -f name_config .yml , and in Hashicorp Terraform it is terraform apply ) of the system is brought into line with the configuration files, when the configuration or infrastructure changes, the infrastructure in the conflicting parts is brought into line with its declaration, when the system itself decides how to achieve this, and the behavior can be different, for example, when the meta information in the POD changes, it will be changed, and when the image changes, the POD will be deleted and created as a new one. If, before that, we created the server infrastructure for containers in an imperative form using the gcloud command of the Google Cloud Platform (GCP) public cloud, now we will consider how to create a similar configuration using the configuration in the declarative description of the pattern infrastructure as code using the universal Terraform tool that supports cloud GCP.
Terraform did not appear out of nowhere, but became a continuation of the long history of the emergence of software products for configuring and managing server infrastructure, I will list in the order of appearance and transition:
** CFN;
** Pupet;
** Chef;
** Ansible;
** Cloud AWS API, Kubernetes API;
* IasC: Terraform does not depend on the type of infrastructure (it supports more than 120 providers, including not only clouds), in contrast to the bucket counterparts that support only themselves: CloudFormation for Amazon WEB Service, Azure Resource Manager for Microsoft Azure, Google Cloud Deployment Manager from Google Cloud Engine.
CloudFormation is built by Amazon and is intended to be worthless, and is also fully integrated into the CI / CD of its infrastructure hosted on AWS S3, which makes GIT versioning difficult. We will consider a platform independent Terraform: the syntax of the basic functionality is the same, and the specific one is connected through the Providers entities (https://www.terraform.io/docs/providers/index.html). Terraform is one binary file, supports a huge number of providers, and of course AWS and GCE. Terraform, like most products from Hashicorp, is written in Go and is a single binary executable file, does not require installation, you just need to download it to the Linux folder:
(agile-aleph-203917) $ wget https://releases.hashicorp.com/terraform/0.11.13/terraform_0.11.13_linux_amd64.zip
(agile-aleph-203917) $ unzip terraform_0.11.13_linux_amd64.zip -d.
(agile-aleph-203917) $ rm -f terraform_0.11.13_linux_amd64.zip
(agile-aleph-203917) $ ./terraform version
Terraform v0.11.13
It supports splitting into modules that you can write yourself or use ready-made ones (https://registry.terraform.io/browse?offset=27&provider=google). To orchestrate and support changes in dependencies, you can use Terragrunt (https://davidbegin.github.io/terragrunt/), for example:
terragrant = {
terraform {
source = "terraform-aws-modules / …"
}
dependencies {
path = ["..network"]
}
}
name = "…"
ami = "…"
instance_type = "t3.large"
Unified semantics for different providers (AWS, GCE, Yandex. Cloud and many others) configurations, which allows you to create a transcendental infrastructure, for example, permanently loaded services are located to save on their own capacities, and are variably loaded (for example, during the promotional period) in public clouds … Due to the fact that management is declarative and can be described by files (IaC, infrastructure as code), the creation of infrastructure can be added to the CI / CD pipeline (development, testing, delivery, everything is automatic and with version control). Without CI / CD, config file locking is supported to prevent concurrent editing when working together. the infrastructure is not created by a script, but is brought into conformity with the configuration, which is declarative and cannot contain logic, although it is possible to inject BASH scripts into it and use Conditions (term operator) for different environments.
Terraform will read all files in the current directory with a .tf extension in the Hachicort Configuraiton Language (HCL) format or .tf format . json in JSON format. Often, instead of one file, it is divided into several, at least two: the first containing the configuration, the second – private data in variables.
To demonstrate Terraform's capabilities, we will create a GitHub repository due to its ease of authorization and API. First, we get a token generated in the WEB interface: SettingsDeveloper sittings -> Personal access token -> Generate new token and setting permissions. We will not create anything, just check the connection:
(agile-aleph-203917) $ ls * .tf
main.tf variables.tf
$ cat variables.tf
variable "github_token" {
default = "630bc9696d0b2f4ce164b1cabb118eaaa1909838"
}
$ cat main.tf
provider "github" {
token = "$ {var.github_token}"
}
(agile-aleph-203917) $ ./terraform init
(agile-aleph-203917) $ ./terraform apply
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Now, let's create a manager account Settings -> Organizations -> New organization -> Create organization. … Using: Terraform Repository API www.terraform.io/docs/providers/github/r/repository. html add a description of the repository to the config:
(agile-aleph-203917) $ cat main.tf
provider "github" {
token = "$ {var.github_token}"
}
resource "github_repository" "terraform_repo" {
name = "terraform-repo"
description = "my terraform repo"
auto_init = true
}
Now it remains to apply, look at the plan for creating a repository, agree with it:
(agile-aleph-203917) $ ./terraform apply
provider.github.organization
The GitHub organization name to manage.
Enter a value: essch2
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ github_repository.terraform_repo
id:
allow_merge_commit: "true"
allow_rebase_merge: "true"
allow_squash_merge: "true"
archived: "false"
auto_init: "true"
default_branch:
description: "my terraform repo"
etag:
full_name:
git_clone_url:
html _url:
http_clone_url:
name: "terraform-repo"
ssh_clone_url:
svn_url:
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
github_repository.terraform_repo: Creating …
allow_merge_commit: "" => "true"
allow_rebase_merge: "" => "true"
allow_squash_merge: "" => "true"
archived: "" => "false"
auto_init: "" => "true"
default_branch: "" => "
description: "" => "my terraform repo"
etag: "" => "
full_name: "" => "
git_clone_url: "" => "
html_url: "" => "
http_clone_url: "" => "
name: "" => "terraform-repo"
ssh_clone_url: "" => "
svn_url: "" => "
github_repository.terraform_repo: Creation complete after 4s (ID: terraform-repo)
Apply complete! Resources: 1 added, 0 changed, 0 destroyed
Now you can see an empty terraform-repo repository in the WEB interface. Reapplying will not create a repository because Terraform only applies the changes that weren't:
(agile-aleph-203917) $ ./terraform apply
provider.github.organization
The GITHub organization name to manage.
Enter a value: essch2
github_repository.terraform_repo: Refreshing state … (ID: terraform-repo)
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
But if I change the name, then Terraform will try to apply the changes to the name by deleting and creating a new one with the current name. It is important to note that any data that we would push into this repository after the name change would be deleted. To check how updates will be performed, you can first ask for a list of actions to be performed with the command ./Terraform plane . And so, let's get started:
(agile-aleph-203917) $ cat main.tf
provider "github" {
token = "$ {var.github_token}"
}
resource "github_repository" "terraform_repo" {
name = "terraform-repo2"
description = "my terraform repo"
auto_init = true
}
(agile-aleph-203917) $ ./terraform plan
provider.github.organization
The GITHub organization name to manage.
Enter a value: essch
Refreshing Terraform state in-memory prior to plan …
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
github_repository.terraform_repo: Refreshing state … (ID: terraform-repo)
–– –
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
+ github_repository.terraform_repo
id:
allow_merge_commit: "true"
allow_rebase_merge: "true"
allow_squash_merge: "true"
archived: "false"
auto_init: "true"
default_branch:
description: "my terraform repo"
etag:
full_name:
git_clone_url:
html_url:
http_clone_url:
name: "terraform-repo2"
ssh_clone_url:
svn_url:
"terraform apply" is subsequently run.
esschtolts @ cloudshell: ~ / terraform (agile-aleph-203917) $ ./terraform apply
provider.github.organization
The GITHub organization name to manage.
Enter a value: essch2
github_repository.terraform_repo: Refreshing state … (ID: terraform-repo)
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
– / + destroy and then create replacement
Terraform will perform the following actions:
– / + github_repository.terraform_repo (new resource required)
id: "terraform-repo" =>
allow_merge_commit: "true" => "true"
allow_rebase_merge: "true" => "true"
allow_squash_merge: "true" => "true"
archived: "false" => "false"
auto_init: "true" => "true"
default_branch: "master" =>
description: "my terraform repo" => "my terraform repo"
etag: "W / \" a92e0b300d8c8d2c869e5f271da6c2ab \ "" =>
full_name: "essch2 / terraform-repo" =>
git_clone_url: "git: //github.com/essch2/terraform-repo.git" =>
html_url: "https://github.com/essch2/terraform-repo" =>
http_clone_url: "https://github.com/essch2/terraform-repo.git" =>
name: "terraform-repo" => "terraform-repo2" (forces new resource)
ssh_clone_url: "git@github.com: essch2 / terraform-repo.git" =>
svn_url: "https://github.com/essch2/terraform-repo" =>
Plan: 1 to add, 0 to change, 1 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
github_repository.terraform_repo: Destroying … (ID: terraform-repo)
github_repository.terraform_repo: Destruction complete after 0s
github_repository.terraform_repo: Creating …
allow_merge_commit: "" => "true"
allow_rebase_merge: "" => "true"
allow_squash_merge: "" => "true"
archived: "" => "false"
auto_init: "" => "true"
default_branch: "" => "
description: "" => "my terraform repo"
etag: "" => "
full_name: "" => "
git_clone_url: "" => "
html_url: "" => "
http_clone_url: "" => "
name: "" => "terraform-repo2"
ssh_clone_url: "" => "
svn_url: "" => "
github_repository.terraform_repo: Creation complete after 5s (ID: terraform-repo2)
Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
For reasons of clarity, I created a big security hole – I put the token in the configuration file, and therefore in the repository, and now anyone who can access it can delete all repositories. Terraform provides several ways to set variables besides the one used. I'll just recreate the token and override it with the one passed on the command line: