Astronomer Software system components
Astronomer Software utilizes a variety of tools to run Airflow securely and reliably in your private cloud. This guide contains names and descriptions of all system components required to run Astronomer Software.
Platform components
Astronomer Software brings together best-of-class components into a complete "Managed Airflow on Kubernetes" system:
- Astro CLI - Command line tool for pushing deployments from your local machine to your workspaces running on Kubernetes. The CLI also provides the ability to launch a local stack using docker for local development and testing of DAGs, hooks, and operators.
- Software UI (React) - A modern web based interface to create manage workspaces and deployments. Through the UI you can scale up or down your resources per deployment, invite new users and monitor Airflow logs
- Houston (GraphQL API) - The core GraphQL API layer to interact with your astronomer workspaces and deployments. Use GraphQL queries directly, or integrate with your CI/CD platform to automate Airflow deployments.
- Docker Registry - Each Airflow deployment on your cluster will have it’s own set of required libraries and environment settings. Every time you create/update a deployment, a new docker image is built and pushed to a private registry created for your Astronomer platform. Kubernetes will pull from this registry when creating new pods.
- Commander - the provisioning component of the Astronomer Platform. It is responsible for interacting with the underlying infrastructure layer. gRPC service to communicate between our API and Kubernetes
- Prometheus - A monitoring platform used to collect metrics from StatsD. Prometheus collects Airflow metrics and pushes them to Granfana for visualization. Email alerts can also be setup to help quickly identify issues.
- Grafana - A web dashboard to help visualize and monitor Airflow metrics flowing in from Prometheus. Astronomer has pre-built plenty of dashboards to monitor your cluster or you can create your own custom dashboards to meet your needs.
- Alert Manager - Email alerts from Prometheus metrics. Enter emails for anyone you want to be alerted in the Software UI. These alerts can help notify you of issues on your cluster such as the Airflow scheduler running slowly.
- NGINX - NGINX is used as an ingress controller to enforce authentication and direct traffic to the various services such as Airflow webserver, Grafana, Kibana etc. NGINX is also used to serve Airflow logs back up to the Airflow web UI from ElasticSearch.
- FluentD - FluentD is a data collector that is used to collect and push the Airflow log data into ElasticSearch.
- Elasticsearch - A powerful search engine used to centralize and index logs from Airflow deployments.
- Kibana - A web dashboard to help visualize all of your Airflow logs powered by ElasticSearch. Create your own dashboards to centralize your logs across all of your deployments.
- Prisma ORM - An interface between the HoustonGraphQL API and your Postgres database. This handles read/writes to the database as well as migrations for upgrades.
- Astronomer Helm - Helm charts for the Astronomer Platform
- Docker images - Docker images for deploying and running Astronomer on DockerHub.
Airflow components
When you create an Airflow deployment in Astronomer, the following components are installed:
- Scheduler - Determines dependencies and decides when DAGs should run and when tasks are ready to be scheduled.
- Webserver - Airflow’s web UI used to view DAGs, Connections, variables, logs etc.
- pgBouncer - Provides connection pooling for Postgres. This helps prevent the Airflow database from being overwhelmed by too many connections.
- StatsD - Provides DAG and task level metrics from Airflow. Astronomer collects these metrics and pushes to a centralized view in Grafana
- Celery components:
Customer-supplied components
To run Astronomer in your environment, you just need to bring a Kubernetes cluster and a Postgres database:
- Kubernetes - You bring your own Kubernetes environment (EKS, GKE, AKS, other). Coordinates communication between the services, and provides fault tolerance on failures.
- PostgreSQL - database used as the backend for the Houston service as well as each Airflow deployment.