Configure git-sync code deploys

You can deploy Dags to an Astro Private Cloud Deployment using git-sync. After setting up this feature, you can deploy Dags from a Git repository without any additional CI/CD. Dags deployed with git-sync automatically appear in the Airflow UI without requiring additional action or causing downtime. You can also roll back images with the Astro Private Cloud UI and Houston API.

git-sync-relay RWX volume does not work with azurefile-csi.

This guide provides details about setup options and the steps for configuring git-sync as a Dag deploy option. It is worth noting that you can use both polling and webhook strategies.

Choose a git-sync strategy

When you configure git-sync, you must choose both a repo fetch mode and a repo share mode:

Repo fetch mode

Repo fetch mode determines how the git-sync relay retrieves changes from your Git repository. Choose one of the following options:

Poll mode

The git-sync relay checks the remote Git repo for changes at clearly defined intervals. Use poll mode for repositories with frequent changes across branches. Note that the relay will be checking for updates continuously.

Tradeoff: Frequent polling generates unnecessary network traffic between your Deployment and the repository when changes are infrequent.

Webhook

The git-sync relay fetches changes only when a push event fires from the Git repository. Use Webhook mode for repositories that don’t change frequently, to avoid unnecessary network traffic between your Deployment and the repository.

Tradeoff: If configured for a specific branch, the git-sync relay only downloads changes for that branch regardless of fetch mode. However, in Webhook mode, the webhook fires for every push event in the repository — not just pushes to the configured branch. This means a busy repository can still generate frequent webhook calls even when branch filtering is in place.

Repo share mode

Repo share mode determines how the git-sync relay distributes the synced repository to Airflow pods in the Deployment. Choose one of the following options:

git-daemon

A git-daemon container serves the repository within the namespace using the Git protocol on port 9418. The Airflow Deployment contains a git-sync relay Pod with both a git-sync container that stores the Git repo and a git-daemon container that serves the repo to the namespace.

Tradeoff: All Airflow containers must clone the repository at startup, which can cause significant network use with large repositories and increase startup time.

Shared volume

The git repository contents are stored on a ReadWriteMany (RWX) storage volume mounted into each Airflow pod, which eliminates git clone activity between pods. The git-sync relay Pod pulls from the external Git repo and writes to the RWX volume.

Requirement: An RWX-compatible StorageClass volume. RWX-compatible StorageClasses aren’t included in standard Kubernetes. You must provision additional cloud infrastructure to support RWX volumes, and the configuration steps differ between cloud providers. See your cloud provider’s documentation for details.

Prerequisites

To enable the git-sync deploy feature, you need:

  • An Astro Private Cloud installation running an OSS Airflow Chart (this is the default for most installations).
  • Permission to push new configuration changes to your Astro Private Cloud installation.
  • (Shared volume mode) A ReadWriteMany (RWX) compatible StorageClass volume. RWX-compatible StorageClasses require additional cloud infrastructure that varies between providers. See your cloud provider’s documentation for configuration steps.

To configure a git-sync deploy mechanism for a Deployment on APC, you need Workspace Editor permissions.

To deploy Dags to a Deployment using a git-sync deploy mechanism, you need permission to push code to a Git repository configured for git-sync deploys.

Enable git-sync

Git-sync deploys must be explicitly selected using the UI for each Airflow Deployment for both git-daemon and shared-volume modes.

However, for the shared-volume mode, an APC Admin must configure the RWX shared volume storage class name, storageClassName, in the Houston configuration.

For example, update your values.yaml file with the following values, including the path to your RWX compatible storage:

1astronomer:
2 houston:
3 config:
4 deployments:
5 configureDagDeployment: true
6 gitSyncDagDeployment: true
7 gitSyncRelay:
8 storageClassName: <your-RWX-storage>
9 repoShareMode: "shared-volume"

Configure your APC Deployment

Workspace editors can configure a new or existing Airflow Deployment to use a git-sync mechanism for Dag deploys. From there, any member of your organization with write permissions to the Git repository can deploy Dags to the Deployment. To configure a Deployment for git-sync deploys:

  1. In the Astro Private Cloud UI, create a new Airflow Deployment or open an existing one.

  2. Go to the Dag Deployment section of the Deployment’s Settings page.

  3. For your Mechanism, select Git Sync.

  4. Configure the following values:

    • Repository URL: The URL for the Git repository that hosts your Astro project
    • Branch Name: The name of the Git branch that you want to sync with your Deployment
    • Sync Interval: The time interval between checks for updates in your Git repository, in seconds. A sync is only performed when an update is detected. Astronomer recommends a minimum interval of 60 seconds.
    • Dags Directory: The directory in your Git repository that hosts your Dags. Specify the directory’s path as relative to the repository’s root directory. To use your root directory as your Dags directory, specify this value as ./. Other changes outside the Dags directory in your Git repository must be deployed using astro deploy.
    • Rev: The commit reference of the branch that you want to sync with your Deployment
    • Ssh Key: The SSH private key for your Git repository
    • Known Hosts: The public key for your Git provider, which can be retrieved using ssh-keyscan -t rsa <provider-domain>. For an example of how to retrieve GitHub’s public key, refer to Apache Airflow documentation.
    • Repo Fetch Mode: Choose Poll or WebHook. If you select WebHook, you need the Webhook URL and Webhook Secret Key for your GitHub Configuration.
    • Webhook URL: (Webhook mode only)
    • Webhook Secret Key: (Webhook mode only)
    • Ephemeral Storage Overwrite Gigabytes: The storage limit for your Git repository. If your Git repo is larger than 2GB, Astronomer recommends setting this slider to your repo size + 1 Gi
    • Sync Timeout: The maximum amount of seconds allowed for a sync. Astronomer recommends increasing this value if your repo is larger than 1GB
  5. (Webhook Only) You can now open your GitHub repository and set up a Repository Webhook, or you can return to your Deployment details page to configure this later. Be sure to set the following configurations:

  • Payload URL: Paste the Webhook URL from the Astro Private Cloud UI
  • Content Type: Select JSON.
  • Secret: Paste the Webhook Secret Key from the Astro Private Cloud UI
  • Enable SSL verification
  • Choose Just the push event for the event trigger
  1. Save your changes.
Repo Share Mode - First deploy error

If you complete your Deployment configuration for git-sync and encounter an error during the first Deployment, you might need to force restart the Airflow Deployment at least once, several minutes after you initially create it. For example, you can add any new environment variable to your Deployment, like FOO=foo, to force the Deployment containers to restart.

After you see your Dags update in the Airflow UI, you can remove the environment variable.

After you configure your Deployment, any code pushes to your Dag directory of the specified Git repo and branch will appear in your Deployment with zero downtime.

Newly created Dag files can take up to five minutes (default configuration) from syncing to appear in the Airflow UI. To shorten this delay, we recommend tuning AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL in your Airflow deployment.

Configure a Git repo for git-sync deploys

The Git repo you want to sync should contain a directory of Dags that you want to deploy to APC. You can include additional files in the repo, such as your other Astro project files, but note that this might affect performance when deploying new changes to Dags.

If you want to deploy Dags with a private Git repo, you additionally need to configure SSH so that your APC Deployment can access the contents of the repo. This process varies slightly between Git repository management tools. For an example of this configuration, read GitLab’s SSH Key documentation.

Add Kubernetes scheduling configurations for git-sync relay

You can add Kubernetes scheduling configurations, tolerations, nodeSelector, and affinity, to your global git-sync relay configuration. These configurations allow you to:

  • Specify node selection criteria nodeSelector
  • Configure pod affinity and anti-affinity rules
  • Set tolerations for tainted nodes

These settings can allow you to comply with security or compliance requirements for workload isolation, optimize resource utilization by co-locating related components, and handle tainted nodes in mixed-use Kubernetes clusters. These are not required parameters for git-sync relay functionality, so you only need to add nodeSelector, affinity, or tolerations to you configuration if you need specific node placement for your git-sync-relay components.

1helm:
2 gitSyncRelay:
3 nodeSelector:
4 # Your node selection criteria
5 tolerations:
6 # Your toleration configurations
7 affinity:
8 nodeAffinity:
9 # Your affinity rules