Configure liveness and readiness probes

In Astronomer Software, you can create liveness and readiness probes to assess whether your Kubernetes Pods or network are healthy and can process requests.

Some components in Astronomer Software include liveness and readiness probes by default, but all components support adding and configuring them. Astronomer Software allows you to use the Kubernetes liveness and readiness probe definitions so you can monitor the state of your Pods.

Liveness probes can be useful in all cases. However, readiness probes might be most useful for the following scenarios:

  • If you have network ports open on the container
  • Containers that do not have open ports, but have multiple processes within the container
  • A setup where a process in a container might never reach a healthy state because it is waiting for some state to be achieved.

Default probe behavior

You can use the following structure to define your probes in your values.yaml file. For example, you might want to adjust any default values by configuring the amount of time until a timeout. You can refer to some of the existing Default probe configurations.

You can add any definitions that are compatible with Kubernetes probes. However, because Kubernetes does not allow having more than one handler of probes you must be sure that you do not define probes to use both exec and httpGet. For consistency, the examples shown in the Default Astronomer Helm probe configurations use httpGet, but can use exec when appropriate.

Liveness probe templates

1livenessProbe:
2 enabled: true
3 httpGet:
4 path: /index.html
5 port: 443

Readiness probe templates

1readinessProbe:
2 enabled: true
3 httpGet:
4 path: /index.html
5 port: 444

Retrieve existing probe definitions

You can retrieve the default probe definitions from the Kubernetes manifest. The following example shows how to retrieve the definitions for Houston.

$kubectl -n "${NAMESPACE}" get deployment -l component=houston -o yaml

This command produces a large amount of yaml output describing your Houston configuration. Within this output, is a section describing the livenessProbe, which looks like the following:

1livenessProbe:
2 failureThreshold: 10
3 httpGet:
4 path: /v1/healthz
5 port: 8871
6 scheme: HTTP
7 initialDelaySeconds: 30
8 periodSeconds: 10
9 successThreshold: 1
10 timeoutSeconds: 1

You can copy and paste this output into your values.yaml file for your Houston configuration, then adjust the values you want to customize. Then apply a platform config change.

Reference Helm values within your probes

Because values for the liveness and readiness probes are passed through the Helm template function, you can reference Helm values within the probes. Specifically, the livenessProbe and readiness values are rendered to yaml, then passed through the Helm template function, which renders any Helm template syntaxes into the produced yaml.

For example, instead of hardcoding values for your probes to match values defined by other configurations in your values.yaml file, you can use the configuration variable itself.

The following example, using the alertsmanager yaml configuration, shows how the path and ports are defined by Values.ports.http and Values.prefixURL elsewhere in the values.yaml file.

1readinessProbe:
2 httpGet:
3 path: {{ .Values.prefixURL }}/#/status
4 port: {{ .Values.ports.http }}
5 initialDelaySeconds: 30
6 timeoutSeconds: 30

Default Astronomer Helm probe configurations

The following components have their default probe configuration defined in the Astronomer Helm chart.

If a component does not have probes defined by default, you can see which options can have custom probe configurations.

Alert manager

1alertmanager_auth-proxy:
2 livenessProbe:
3 httpGet:
4 path: /healthz
5 port: 8084
6 scheme: HTTP
7 initialDelaySeconds: 10
8 periodSeconds: 10
9 readinessProbe:
10 httpGet:
11 path: /healthz
12 port: 8084
13 scheme: HTTP
14 initialDelaySeconds: 10
15 periodSeconds: 10

The following can also be configured to include liveness and readiness probes:

1alertmanager:
2 livenessProbe: {}

Astronomer

1astronomer:
2 astroUI:
3 livenessProbe:
4 httpGet:
5 path: /
6 port: 8080
7 initialDelaySeconds: 10
8 periodSeconds: 10
9 readinessProbe:
10 httpGet:
11 path: /
12 port: 8080
13 initialDelaySeconds: 10
14 periodSeconds: 10
15 commander:
16 livenessProbe:
17 httpGet:
18 path: /healthz
19 port: 8880
20 scheme: HTTP
21 initialDelaySeconds: 10
22 periodSeconds: 10
23 failureThreshold: 5
24 successThreshold: 1
25 timeoutSeconds: 5
26 readinessProbe:
27 httpGet:
28 path: /healthz
29 port: 8880
30 initialDelaySeconds: 10
31 periodSeconds: 10
32 houston:
33 livenessProbe:
34 httpGet:
35 path: /v1/healthz
36 port: 8871
37 initialDelaySeconds: 30
38 periodSeconds: 10
39 failureThreshold: 10
40 readinessProbe:
41 httpGet:
42 path: /v1/healthz
43 port: 8871
44 initialDelaySeconds: 30
45 periodSeconds: 10
46 failureThreshold: 10
47 registry:
48 livenessProbe:
49 httpGet:
50 path: /
51 port: 5000
52 initialDelaySeconds: 10
53 periodSeconds: 10
54 timeoutSeconds: 5
55 readinessProbe:
56 httpGet:
57 path: /
58 port: 5000
59 initialDelaySeconds: 10
60 periodSeconds: 10
61 timeoutSeconds: 5

The following can also be configured to include liveness and readiness probes:

1astronomer:
2 configSyncer:
3 livenessProbe: {}
4 readinessProbe: {}
5 houston:
6 bootstrapper:
7 livenessProbe: {}
8 readinessProbe: {}
9 cleanupAirflowDb:
10 livenessProbe: {}
11 readinessProbe: {}
12 cleanupDeployRevisions:
13 livenessProbe: {}
14 readinessProbe: {}
15 cleanupDeployments:
16 livenessProbe: {}
17 readinessProbe: {}
18 dbMigration:
19 livenessProbe: {}
20 readinessProbe: {}
21 taskUsageMetrics:
22 livenessProbe: {}
23 readinessProbe: {}
24 updateCheck:
25 livenessProbe: {}
26 readinessProbe: {}
27 updateResourceStrategy:
28 livenessProbe: {}
29 readinessProbe: {}
30 updateRuntimeCheck:
31 livenessProbe: {}
32 readinessProbe: {}
33 upgradeDeployments:
34 livenessProbe: {}
35 readinessProbe: {}
36 waitForDB:
37 livenessProbe: {}
38 readinessProbe: {}
39 worker:
40 livenessProbe: {}
41 readinessProbe: {}

Elasticsearch

1elasticsearch:
2 client:
3 livenessProbe:
4 httpGet:
5 path: /_cluster/health?local=true
6 port: 9200
7 initialDelaySeconds: 90
8 readinessProbe:
9 httpGet:
10 path: /_cluster/health?local=true
11 port: 9200
12 initialDelaySeconds: 5
13 exporter:
14 livenessProbe:
15 httpGet:
16 path: /healthz
17 port: http
18 initialDelaySeconds: 30
19 timeoutSeconds: 10
20 readinessProbe:
21 httpGet:
22 path: /healthz
23 port: http
24 initialDelaySeconds: 10
25 timeoutSeconds: 10
26 master:
27 livenessProbe:
28 tcpSocket:
29 port: 9300
30 readinessProbe:
31 httpGet:
32 path: /_cluster/health?local=true
33 port: 9200
34 initialDelaySeconds: 5

The following components do not have probes configured by default:

1elasticsearch:
2 curator:
3 livenessProbe:
4 exec:
5 command:
6 - /bin/true
7 readinessProbe:
8 exec:
9 command:
10 - /bin/true
11 data:
12 readinessProbe:
13 exec:
14 command:
15 - /bin/true
16 nginx:
17 livenessProbe:
18 exec:
19 command:
20 - /bin/true
21 readinessProbe:
22 exec:
23 command:
24 - /bin/true
25 sysctlInitContainer:
26 livenessProbe:
27 exec:
28 command:
29 - /bin/true
30 readinessProbe:
31 exec:
32 command:
33 - /bin/true

External-es-proxy

The following components do not have probes configured by default:

1external-es-proxy:
2 awsproxy:
3 livenessProbe:
4 exec:
5 command:
6 - /bin/true
7 readinessProbe:
8 exec:
9 command:
10 - /bin/true
11 livenessProbe:
12 exec:
13 command:
14 - /bin/true
15 readinessProbe:
16 exec:
17 command:
18 - /bin/true

Fluentd

1livenessProbe:
2 exec:
3 command:
4 - /bin/bash
5 - -c
6 - >-
7 if (( $(ruby -e "require 'net/http';require 'uri';uri = URI.parse('http://127.0.0.1:24231/metrics');response = Net::HTTP.get_response(uri);puts response.body" | grep 'fluentd_output_status_buffer_queue_length{' | awk '{ print ($NF > 8) }') )); then exit 1; fi; exit 0
8 failureThreshold: 3
9 initialDelaySeconds: 30
10 periodSeconds: 15
11 successThreshold: 1
12 timeoutSeconds: 5

The following components do not have probes configured by default:

1fluentd:
2 readinessProbe:
3 exec:
4 command:
5 - /bin/true

Global

The following components do not have probes configured by default:

1global:
2 loggingSidecar:
3 readinessProbe: {}
4 livenessProbe: {}
5 dagOnlyDeployment:
6 server:
7 readinessProbe: {}
8 livenessProbe: {}
9 client:
10 readinessProbe: {}
11 livenessProbe: {}
12 authSidecar:
13 readinessProbe: {}
14 livenessProbe: {}

Grafana

1grafana:
2 livenessProbe:
3 httpGet:
4 path: /api/health
5 port: 3000
6 initialDelaySeconds: 10
7 periodSeconds: 10
8 readinessProbe:
9 httpGet:
10 path: /api/health
11 port: 3000
12 initialDelaySeconds: 10
13 periodSeconds: 10

The following components do not have probes configured by default:

1grafana:
2 bootstrapper:
3 livenessProbe: {}
4 readinessProbe: {}
5 waitForDB:
6 livenessProbe: {}
7 readinessProbe: {}

Kibana

1kibana:
2 livenessProbe:
3 httpGet:
4 path: /healthz
5 port: 8084
6 scheme: HTTP
7 initialDelaySeconds: 10
8 periodSeconds: 10
9 readinessProbe:
10 httpGet:
11 path: /healthz
12 port: 8084
13 scheme: HTTP
14 initialDelaySeconds: 10
15 periodSeconds: 10

The following components do not have probes configured by default:

1kibana:
2 defaultIndexJob:
3 livenessProbe:
4 exec:
5 command:
6 - /bin/true
7 readinessProbe:
8 exec:
9 command:
10 - /bin/true
11 livenessProbe:
12 exec:
13 command:
14 - /bin/true
15 readinessProbe:
16 exec:
17 command:
18 - /bin/true

Kube-state

The following components do not have probes configured by default:

1kube-state:
2 readinessProbe: {}

NATS

1nats:
2 nats:
3 livenessProbe:
4 httpGet:
5 path: /
6 port: 8222
7 initialDelaySeconds: 10
8 timeoutSeconds: 5
9 readinessProbe:
10 httpGet:
11 path: /
12 port: 8222
13 initialDelaySeconds: 10
14 timeoutSeconds: 5

The following components do not have probes configured by default:

1nats:
2 exporter:
3 enabled: true
4 livenessProbe:
5 exec:
6 command:
7 - /bin/true
8 readinessProbe:
9 exec:
10 command:
11 - /bin/true
12 reloader:
13 livenessProbe:
14 exec:
15 command:
16 - /bin/true
17 readinessProbe:
18 exec:
19 command:
20 - /bin/true

nginx

The following components do not have probes configured by default:

1 defaultBackend:
2 readinessProbe:
3 exec:
4 command:
5 - /bin/true
6 readinessProbe:
7 exec:
8 command:
9 - /bin/true

PgBouncer

1pgbouncer:
2 livenessProbe:
3 tcpSocket:
4 port: 5432
5 readinessProbe:
6 tcpSocket:
7 port: 5432

PostgreSQL

1 livenessProbe:
2 exec:
3 command:
4 - sh
5 - -c
6 - exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432
7 initialDelaySeconds: 30
8 periodSeconds: 10
9 timeoutSeconds: 5
10 successThreshold: 1
11 failureThreshold: 6
12 readinessProbe:
13 exec:
14 command:
15 - sh
16 - -c
17 - -e
18 - 'pg_isready -U "postgres" -h 127.0.0.1 -p 5432\n'
19 initialDelaySeconds: 5
20 periodSeconds: 10
21 timeoutSeconds: 5
22 successThreshold: 1
23 failureThreshold: 6

The following components do not have probes configured by default:

1postgresql:
2 metrics:
3 livenessProbe: {}
4 readinessProbe: {}

Prometheus

1prometheus:
2 livenessProbe:
3 httpGet:
4 path: /-/healthy
5 port: 9090
6 initialDelaySeconds: 10
7 periodSeconds: 5
8 failureThreshold: 3
9 timeoutSeconds: 1
10 readinessProbe:
11 httpGet:
12 path: /-/ready
13 port: 9090
14 initialDelaySeconds: 10
15 periodSeconds: 5
16 failureThreshold: 3
17 timeoutSeconds: 1
18 authproxy:
19 livenessProbe:
20 httpGet:
21 path: /healthz
22 port: 8084
23 scheme: HTTP
24 initialDelaySeconds: 10
25 periodSeconds: 10
26 readinessProbe:
27 httpGet:
28 path: /healthz
29 port: 8084
30 scheme: HTTP
31 initialDelaySeconds: 10
32 periodSeconds: 10
33prometheus-blackbox-exporter:
34 livenessProbe:
35 httpGet:
36 path: /health
37 port: http
38 readinessProbe:
39 httpGet:
40 path: /health
41 port: http
42prometheus-node-exporter:
43 livenessProbe:
44 httpGet:
45 path: /
46 port: 9100
47 readinessProbe:
48 httpGet:
49 path: /
50 port: 9100
51prometheus-postgres-exporter:
52 livenessProbe:
53 tcpSocket:
54 port: 9187
55 initialDelaySeconds: 5
56 periodSeconds: 10
57 readinessProbe:
58 tcpSocket:
59 port: 9187
60 initialDelaySeconds: 5
61 periodSeconds: 10

The following components do not have probes configured by default:

1prometheus:
2 configMapReloader:
3 livenessProbe: {}
4 readinessProbe: {}
5 filesdReloader:
6 livenessProbe: {}
7 readinessProbe: {}

STAN

1 livenessProbe:
2 httpGet:
3 path: /streaming/serverz
4 port: monitor
5 initialDelaySeconds: 10
6 timeoutSeconds: 5
7 readinessProbe:
8 httpGet:
9 path: /streaming/serverz
10 port: monitor
11 initialDelaySeconds: 10
12 timeoutSeconds: 5

The following components do not have probes configured by default:

1stan:
2 exporter:
3 livenessProbe:
4 exec:
5 command:
6 - /bin/true
7 readinessProbe:
8 exec:
9 command:
10 - /bin/true
11 waitForNatsServer:
12 livenessProbe:
13 exec:
14 command:
15 - /bin/true
16 readinessProbe:
17 exec:
18 command:
19 - /bin/true

Default Airflow chart probe configurations

You can also define liveness and readiness probes using the Astronomer Airflow chart.

Airflow

This includes:

  • dagProcessor
  • flower
  • pgbouncer
  • postgresql
  • scheduler
  • triggerer
  • webserver
  • workers
1airflow:
2 dagProcessor:
3 livenessProbe:
4 command: null
5 failureThreshold: 5
6 initialDelaySeconds: 10
7 periodSeconds: 60
8 timeoutSeconds: 20
9 readinessProbe:
10 initialDelaySeconds: 10
11 timeoutSeconds: 20
12 failureThreshold: 5
13 periodSeconds: 60
14 logGroomerSidecar:
15 enabled: true
16 livenessProbe:
17 initialDelaySeconds: 60
18 timeoutSeconds: 20
19 failureThreshold: 5
20 periodSeconds: 60
21 readinessProbe:
22 initialDelaySeconds: 60
23 timeoutSeconds: 20
24 failureThreshold: 5
25 periodSeconds: 60
26 flower:
27 livenessProbe:
28 failureThreshold: 10
29 initialDelaySeconds: 10
30 periodSeconds: 5
31 timeoutSeconds: 5
32 readinessProbe:
33 failureThreshold: 10
34 initialDelaySeconds: 10
35 periodSeconds: 5
36 timeoutSeconds: 5
37 pgbouncer:
38 metricsExporterSidecar:
39 livenessProbe:
40 initialDelaySeconds: 10
41 periodSeconds: 10
42 timeoutSeconds: 1
43 readinessProbe:
44 initialDelaySeconds: 10
45 periodSeconds: 10
46 timeoutSeconds: 1
47 postgresql:
48 metrics:
49 customLivenessProbe: {}
50 customReadinessProbe: {}
51 livenessProbe:
52 enabled: true
53 failureThreshold: 6
54 initialDelaySeconds: 5
55 periodSeconds: 10
56 successThreshold: 1
57 timeoutSeconds: 5
58 readinessProbe:
59 enabled: true
60 failureThreshold: 6
61 initialDelaySeconds: 5
62 periodSeconds: 10
63 successThreshold: 1
64 timeoutSeconds: 5
65 primary:
66 customLivenessProbe: {}
67 customReadinessProbe: {}
68 livenessProbe:
69 enabled: true
70 failureThreshold: 6
71 initialDelaySeconds: 30
72 periodSeconds: 10
73 successThreshold: 1
74 timeoutSeconds: 5
75 readinessProbe:
76 enabled: true
77 failureThreshold: 6
78 initialDelaySeconds: 5
79 periodSeconds: 10
80 successThreshold: 1
81 timeoutSeconds: 5
82 readReplicas:
83 customLivenessProbe: {}
84 customReadinessProbe: {}
85 livenessProbe:
86 enabled: true
87 failureThreshold: 6
88 initialDelaySeconds: 30
89 periodSeconds: 10
90 successThreshold: 1
91 timeoutSeconds: 5
92 readinessProbe:
93 enabled: true
94 failureThreshold: 6
95 initialDelaySeconds: 5
96 periodSeconds: 10
97 successThreshold: 1
98 timeoutSeconds: 5
99 scheduler:
100 livenessProbe:
101 command: null
102 failureThreshold: 5
103 initialDelaySeconds: 10
104 periodSeconds: 60
105 timeoutSeconds: 30
106 triggerer:
107 livenessProbe:
108 command: null
109 failureThreshold: 5
110 initialDelaySeconds: 10
111 periodSeconds: 60
112 timeoutSeconds: 20
113 webserver:
114 livenessProbe:
115 failureThreshold: 5
116 initialDelaySeconds: 15
117 periodSeconds: 10
118 scheme: HTTP
119 timeoutSeconds: 5
120 readinessProbe:
121 failureThreshold: 5
122 initialDelaySeconds: 15
123 periodSeconds: 10
124 scheme: HTTP
125 timeoutSeconds: 5
126 workers:
127 livenessProbe:
128 command: null
129 enabled: true
130 failureThreshold: 5
131 initialDelaySeconds: 10
132 periodSeconds: 60
133 timeoutSeconds: 20

Auth sidecar

The following can also be configured to include liveness and readiness probes:

1authSidecar:
2 livenessProbe: {}
3 readinessProbe: {}

DAG deploy server

The following can also be configured to include liveness and readiness probes:

1dagDeploy:
2 livenessProbe: {}
3 readinessProbe: {}

Logging sidecar

The following can also be configured to include liveness and readiness probes:

1loggingSidecar:
2 livenessProbe: {}
3 readinessProbe: {}