Configure git-sync code deploys
Configure git-sync code deploys
Configure git-sync code deploys
You can deploy DAGs to an Astronomer Deployment using git-sync. After setting up this feature, you can deploy DAGs from a Git repository without any additional CI/CD. DAGs deployed with git-sync automatically appear in the Airflow UI without requiring additional action or causing downtime. You can also roll back images with the Software UI and Houston API.
git-sync-relay RWX volume does not work with azurefile-csi.This guide provides details about setup options and the steps for configuring git-sync as a DAG deploy option.
You can choose how your implementation uses git-sync to optimize the speed of your code deploys and the frequency that your Deployment interacts with your GitHub Repo. The two main choices you have for your implementation are:
In Poll Mode, the git-sync relay downloads changes from the remote GitHub repository at regular intervals. This strategy optimizes performance for git-sync configurations that connect to a remote repository with frequent changes, to any branches. However, this has a tradeoff where frequent changes in the repository that is frequently checked can cause large volumes of network traffic between Deployments and the repository.
You can instead choose to configure a Webhook instead of Poll mode, so that changes are fetched whenever activity in the GitHub repository occurs. This strategy optimizes performance for git-sync configurations that connect to a remote repository that does not have frequent activity, so that your Deployment does not perform unnecessary checks. If configured for a specific branch in a busy repository, the git-sync relay only downloads changes made to the branch, however, the webhook is still activated for every change in the GitHub repository, even if it only triggers downloads for changes made to the configured branch.
The Repo Share Mode includes a choice between whether you want to transmit your DAGS over your network, or have them exist on a shared filesystem in a ReadWriteMany (RWX) volume.
If you use a git-daemon configuration:
git-sync relay Pod, which contains both a git-sync container that stores the Git repo and git-daemon container that serves the local repo to the Airflow deployment namespace.Alternatively, you can use the shared-volume configuration:
git-sync relay Pod with a git-sync container, which pulls from the external Git repo. This Pod connects to an RWX volume, where the Git repo is stored.To enable the git-sync deploy feature, you need:
To configure a git-sync deploy mechanism for a Deployment on Astronomer, you need Workspace Editor permissions.
To deploy DAGs to a Deployment using a git-sync deploy mechanism, you need permission to push code to a Git repository configured for git-sync deploys.
Git-sync deploys must be explicitly selected using the UI for each Airflow Deployment for both git-daemon and shared-volume modes.
However, for the shared-volume mode, an Astronomer Admin must configure the RWX shared volume storage class name, storageClassName, in the Houston configuration.
For example, update your values.yaml file with the following values, including the path to your RWX compatible storage:
Workspace editors can configure a new or existing Airflow Deployment to use a git-sync mechanism for DAG deploys. From there, any member of your organization with write permissions to the Git repository can deploy DAGs to the Deployment. To configure a Deployment for git-sync deploys:
In the Software UI, create a new Airflow Deployment or open an existing one.
Go to the DAG Deployment section of the Deployment’s Settings page.
For your Mechanism, select Git Sync.
Configure the following values:
./. Other changes outside the DAGs directory in your Git repository must be deployed using astro deploy.ssh-keyscan -t rsa <provider-domain>. For an example of how to retrieve GitHub’s public key, refer to Apache Airflow documentation.(Webhook Only) You can now open your GitHub repository and set up a Repository Webhook, or you can return to your Deployment details page to configure this later. Be sure to set the following configurations:
If you complete your Deployment configuration for git-sync and have an error for a first Deployment, you might need to force restart the Airflow Deployment at least once, several minutes after you initially create it. For example, you can add any new environment variable to your Deployment, like FOO=foo, to force the Deployment containers to restart.
After you see your DAGs update in the Airflow UI, you can remove the environment variable.
After you configure your Deployment, any code pushes to your DAG directory of the specified Git repo and branch will appear in your Deployment with zero downtime.
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL in your Airflow deployment.The Git repo you want to sync should contain a directory of DAGs that you want to deploy to Astronomer. You can include additional files in the repo, such as your other Astro project files, but note that this might affect performance when deploying new changes to DAGs.
If you want to deploy DAGs with a private Git repo, you additionally need to configure SSH so that your Astronomer Deployment can access the contents of the repo. This process varies slightly between Git repository management tools. For an example of this configuration, read GitLab’s SSH Key documentation.
You can add Kubernetes scheduling configurations, tolerations, nodeSelector, and affinity, to your global git-sync relay configuration. These configurations allow you to:
These settings can allow you to comply with security or compliance requirements for workload isolation, optimize resource utilization by co-locating related components, and handle tainted nodes in mixed-use Kubernetes clusters. These are not required parameters for git-sync relay functionality, so you only need to add nodeSelector, affinity, or tolerations to you configuration if you need specific node placement for your git-sync-relay components.