For a site reliability engineer (SRE) or administrator new to Astro, security and authentication are issues you should be thinking about from the start while building your infrastructure. If you previously used open-source Airflow, you may have developed some bad habits that can be time consuming and costly. So what’s the best way to proceed? Astro, the Airflow-powered orchestration platform, answers that question with a rich set of security and authentication options you can choose from to best suit your needs.
Below, we outline options for achieving secure connectivity and authentication with Astro.
How to Achieve Secure Data Service Connectivity in Astro
For Astro to work to the best of its ability, the Astro administrator must first set up infrastructure-level connectivity between Astro and data services. No two cloud environments are exactly alike, which means that different solutions are necessary for different use cases.
This is the easiest way to connect to your data services. But, in the case of cloud-native services, it can also be the most expensive, because connecting to a cloud-native service over its public endpoint can incur network transfer costs. For example, an Airflow DAG writing data from an Amazon S3 bucket in another account would normally rack up transfer charges (don’t worry, Astro has you covered in this scenario — all S3 requests go over a private endpoint, as explained in the next section).
Using private endpoints means your traffic never leaves the cloud network, which not only has security benefits but can also be more cost-effective than using the public version of the same service. Depending on your cloud, private endpoint functionality may be called PrivateLink (AWS), Private Link (Azure), or Private Services Connect (GCP).
Since AWS and Azure charge per endpoint, only endpoints required for Astro to work are inserted. If you want to add your favorite endpoint to the cluster, open a support case with Astronomer.
You can also create your private endpoint as an alternative to traditional point-to-point network connectivity (peering) and add it to your Astro cluster.
Connect Networks Manually
Private endpoints greatly simplify network connectivity (at some extra cost), but they aren’t a magical solution.
You may want to connect to an on-premise network, run a bespoke IaaS service, or use a cloud service that doesn’t offer private endpoints (for example, most AWS RDS databases don’t have private endpoints). In this case, you may need to resort to the oldest method of connecting cloud networks — peering.
Astro supports VPC peering/VNet peering on all supported public clouds. If you would like to initiate a peering request, get in touch with the Astro support team to get started.
You may need to start peering with many more networks as your service grows. In this case, consider peering Astro to a transit network that connects all your networks in a hub-and-spoke model. On AWS, you can also connect Astro to an AWS Transit Gateway, a much simpler way of setting up transit networks.
How to Achieve Authentication / Authorization in Astro
Once you’ve connected Astro to your external data services, you need to address authentication and authorization. Network connectivity is necessary but insufficient for your DAG to access your data service. You must also authenticate to the data service before you can read/write from it. What are your options?
Astro Keeps Your Secret
You can store your authentication credentials as secret environment variables in your Astro deployment. These variables are encrypted and stored in a secrets manager running in the Astro Control Plane.
Airflow Keeps Your Secret
With this option, you can continue to use Airflow to keep your secret variables; they’re stored and encrypted in the Data Plane.
You Keep Your Secrets
You can store your credentials as secrets in your own secrets manager, running in your cloud account. Astro provides several options for secrets manager integration — the cloud-native options are supported, along with running your own vault.
There Are No Secrets
Astro also allows you to use cloud-native identity and access management (IAM) to give an Airflow deployment access to cloud-native resources. With this approach, there are no secrets to extract and manage.
But a downside is that this only works for cloud-native services. For example, you can use an AWS IAM role to access RDS. But if you run your own database in an EC2 instance, you’d still need to know the user ID and password to access this database.
On AWS, Astro uses IRSA to map deployments to AWS Roles; on GCP, Astro uses Workload Identity to map deployments to service accounts. Astro provides a mechanism for authorizing this IAM entity for your data service. Support for Azure Workload Identity will be added in the future.
Take the Next Step
Astro is constantly evolving and adding support for new connectivity and authorization options. Visit the Astro docs for the latest information and guidance on connecting Astro to external data sources and managing authorization.
And if you’re not already using Astro, sign up for a demo customized around your orchestration needs.