Version:

v0.10.0

Documentation

Overview of Astronomer Enterprise


Astronomer makes it easy to run, monitor, and scale Apache Airflow deployments in our cloud or yours. Source code is made available for the benefit of customers.

If you'd like to see the platform in action, start a free trial on our SaaS service, Astronomer Cloud and run through our getting started guide. This is a good first step, even if you're ultimately interested in running Astronomer Enterprise in your own Kubernetes cluster.

Architecture

Astronomer Enterprise Overview

Installation Guides

We have created guides for installing Astronomer on a number of Kubernetes environments:

Customizing Your Installation

Because the platform uses Helm throughout, it's very easy to customize your Astronomer installation. Below are some guides for most common customizations:

Administration

There are many tools at your disposal for administrating Astronomer:

License

Usage of Astronomer requires an Astronomer Platform Enterprise Edition license.

Platform Components

Astronomer Enterprise brings together best-of-class components into a complete "Managed Airflow on Kubernetes" system:

  • Astro CLI - Command line tool for pushing deployments from your local machine to your workspaces running on Kubernetes. The CLI also provides the ability to launch a local stack via docker for local development and testing of DAGs, hooks and operators.
  • Orbit (React UI) - A modern web based interface to create manage workspaces and deployments. Through the UI you can scale up or down your resources per deployment, invite new users and monitor Airflow logs
  • Houston (GraphQL API) - The core GraphQL API layer to interact with your astronomer workspaces and deployments. Use GraphQL queries directly, or integrate with your CI/CD platform to automate Airflow deployments.
  • Docker Registry - Each Airflow deployment on your cluster will have it’s own set of required libraries and environment settings. Every time you create/update a deployment, a new docker image is built and pushed to a private registry created for your Astronomer platform. Kubernetes will pull from this registry when creating new pods.
  • Commander - the provisioning component of the Astronomer Platform. It is responsible for interacting with the underlying infrastructure layer. gRPC service to communicate between our API and Kubernetes
  • Prometheus - A monitoring platform used to collect metrics from StatsD. Prometheus collects Airflow metrics and pushes them to Granfana for visualization. Email alerts can also be setup to help quickly identify issues.
  • Grafana - A web dashboard to help visualize and monitor Airflow metrics flowing in from Prometheus. Astronomer has pre-built plenty of dashboards to monitor your cluster or you can create your own custom dashboards to meet your needs.
  • Alert Manager - Email alerts from Prometheus metrics. Enter emails for anyone you want to be alerted in the Orbit UI.These alerts can help notify you of issues on your cluster such as the Airflow Scheduler running slowly.
  • NGINX - NGINX is used as an ingress controller to enforce authentication and direct traffic to the various services such as Airflow webserver, Grafana, Kibana etc. NGINX is also used to serve Airflow logs back up to the Airflow web UI from ElasticSearch.
  • FluentD - FluentD is a data collector that is used to collect and push the Airflow log data into ElasticSearch.
  • Elasticsearch - A powerful search engine used to centralize and index logs from Airflow deployments.
  • Kibana - A web dashboard to help visualize all of your Airflow logs powered by ElasticSearch. Create your own dashboards to centralize your logs across all of your deployments.
  • Prisma ORM - An interface between the HoustonGraphQL API and your Postgres database. This handles read/writes to the database as well as migrations for upgrades.
  • Astronomer Helm - Helm charts for the Astronomer Platform
  • db-bootstrapper - Init container for bootstrapping system databases
  • Docker images - Docker images for deploying and running Astronomer on DockerHub.

Airflow Components

When you create an Airflow deployment in Astronomer, the following components are installed:

  • Scheduler - Determines dependencies and decides when DAGs should run and when tasks are ready to be scheduled.
  • Webserver - Airflow’s web UI used to view DAGs, Connections, variables, logs etc.
  • pgBouncer - Provides connection pooling for Postgres. This helps prevent the Airflow database from being overwhelmed by too many connections.
  • StatsD - Provides DAG and task level metrics from Airflow. Astronomer collects these metrics and pushes to a centralized view in Grafana
  • Celery components:

    • Worker - A service running to process Airflow tasks, which can be scaled up to increase the throughput.
    • Flower - Web UI for Celery distributed task queue. Used to monitor your Airflow worker services
    • Redis - In memory data store used as the backend by the Celery task queue

Customer-Supplied Resources

To run Astronomer in your environment, you just need to bring a Kubernetes cluster and a Postgres database:

  • Kubernetes - You bring your own Kubernetes environment (EKS, GKE, AKS, other). Coordinates communication between the services, and provides fault tolerance on failures.
  • PostgreSQL - database used as the backend for the Houston service as well as each Airflow deployment.