Skip to main content

Migrate to Astro from Google Cloud Composer

This is where you'll find instructions for migrating an Airflow environment from Google Cloud Composer (GCC) to Astro.

To complete the migration process, you will:

  • Prepare your source Airflow and set up your Astro Airflow environment.
  • Migrate metadata from your source Airflow environment.
  • Migrate DAGs and additional Airflow components from your source Airflow environment.
  • Complete the cutover process to Astro.

Prerequisites

Before starting the migration, ensure that the following are true:

On your local machine, make sure you have:

On the cloud service from which you're migrating, ensure that you have:

  • A source Airflow environment on Airflow 2 or later.
  • Read access to the source Airflow environment.
  • Read access to any cloud storage buckets that store your DAGs.
  • Read access to any source control tool that hosts your current Airflow code, such as GitHub.
  • Permission to create new repositories on your source control tool.
  • (Optional) Access to your secrets backend.
  • (Optional) Permission to create new CI/CD pipelines.

All source Airflow environments on 1.x need to be upgraded to at least Airflow 2.0 before you can migrate them. Astronomer professional services can help you with the upgrade process.

info

If you're migrating to Astro from OSS Airflow or another Astronomer product, and you currently use an older version of Airflow, you can still create Deployments with the corresponding version of Astro Runtime even if it is deprecated according to the Astro Runtime maintenance policy. This allows you to migrate your DAGs to Astro without needing to make any code changes and then immediately upgrade to a new version of Airflow. Note that after you migrate your DAGs, Astronomer recommends upgrading to a supported version of Astro Runtime as soon as you can.

See Run a deprecated Astro Runtime version.

You can additionally use the gcloud CLI to expedite some steps in this guide.

Step 1: Install Astronomer Starship

The Astronomer Starship migration utility connects your source Airflow environment to your Astro Deployment and migrates your Airflow connections, Airflow variables, environment variables, and DAGs.

The Starship migration utility works as a plugin with a user interface, or as an Airflow operator if you are migrating from a more restricted Airflow environment.

See the following table for information on which versions of Starship are available depending on your source Airflow environment:

Source Airflow environmentStarship pluginStarship operator
Airflow 1.x
Cloud Composer 1 - Airflow 2.x✔️️
Cloud Composer 2 - Airflow 2.x✔️️

To install the Starship plugin on your Cloud Composer 1 or Cloud Composer 2 instance, install the astronomer-starship package in your source Airflow environment. See Install packages from PyPI

tip

You can alternatively complete this installation with the gcloud CLI by running the following command:

gcloud composer environments update [GCC_ENVIRONMENT_NAME] \
--location [LOCATION] \
--update-pypi-package=astronomer-starship

Step 2: Create an Astro Workspace

In your Astro Organization, you can create Workspaces, which are a collection of users that have access to the same Deployments. Workspaces are typically owned by a single team.

You can choose to use an existing Workspace, or create a new one. However, you must have at least one Workspace to complete your migration.

  1. Follow the steps in Manage Workspaces to create a Workspace in the Astro UI for your migrated Airflow environments. Astronomer recommends naming your first Workspace after your data team or initial business use case with Airflow. You can update these names in the Astro UI after you finish the migration.

  2. Follow the steps in Manage Astro users to add users from your team to the Workspace. See Astro user permissions for details about each available Workspace user role.

You can add users to a Workspace an Organization using the Astro CLI. See:

You can also automate adding batches of users to Astro with shell scripts. See Add a group of users to Astro using the Astro CLI.

Step 3: Create an Astro Deployment

A Deployment is an Astro Runtime environment that is powered by the core components of Apache Airflow. In a Deployment, you can deploy and run DAGs, configure worker resources, and view metrics.

You can choose to use an existing Deployment, or create a new one. However, you must have at least one Deployment to complete your migration.

Before you create your Deployment, copy the following information from your source Airflow environment:

  • Environment name
  • Airflow version
  • Environment class or size
  • Number of schedulers
  • Minimum number of workers
  • Maximum number of workers
  • Execution role permissions
  • Airflow configurations
  • Environment variables
Alternative setup for Astro Hybrid

This setup varies slightly for Astro Hybrid users. See Deployment settings for all configurations related to Astro Hybrid Deployments.

  1. In the Astro UI, select a Workspace.

  2. On the Deployments page, click Deployment.

  3. Complete the following fields:

    • Name: Enter the name of your source Airflow environment.
    • Astro Runtime: Select the Runtime version that's based on the Airflow version in your source Airflow environment. See the following table to determine which version of Runtime to use. Where exact version matches are not available, the nearest Runtime version is provided with its supported Airflow version in parentheses.
    Airflow VersionRuntime Version
    2.03.0.4 (Airflow 2.1.1)¹
    2.24.2.9 (Airflow 2.2.5)
    2.46.3.0 (Airflow 2.4.3)
info

¹The earliest available Airflow version on Astro Runtime is 2.1.1. There are no known risks for upgrading directly from Airflow 2.0 to Airflow 2.1.1 during migration. For a complete list of supported Airflow versions, see Astro Runtime release and lifecycle schedule.

  • Description: (Optional) Enter a description for your Deployment.
  • Cluster: Choose whether you want to run your Deployment in a Standard cluster or Dedicated cluster. If you don't have specific networking or cloud requirements, Astronomer recommends using the default Standard cluster configurations.

To configure and use dedicated clusters, see Create a dedicated cluster. If you don't have the option of choosing between standard or dedicated, that means you are an Astro Hybrid user and must choose a cluster that has been configured for your Organization. See Manage Hybrid clusters.

  • Executor: Choose the same executor as in your source Airflow environment.

  • Scheduler: Use the following table to determine the Deployment size you need based on the size of your source Airflow environment.

    Environment sizeScheduler sizevCPUMemoryEphemeral Storage
    Small (Up to ~50 DAGs)Small12Gi5 Gi
    Medium (Up to ~250 DAGs)MediumScheduler: 1
    DAG Processor: 1
    Scheduler: 2 Gi
    DAG Processor: 2 Gi²
    5 Gi
    Large (Up to ~1000 DAGs)LargeScheduler: 1
    DAG Processor: 3
    Scheduler: 2 Gi
    DAG Processor: 6 Gi²
    5 Gi
    Extra Large (Up to ~2000 DAGs)Extra-largeScheduler: 1
    DAG Processor (x2): 3.5
    Scheduler: 2 Gi
    DAG Processor (x2): 6 Gi²
    5 Gi
info

²Some of the following recommendations for CPU and memory might be less than what you currently allocate to Airflow components in your source environment. If you notice significant performance differences or your Deployment on Astro parses DAGs more slowly than your source Airflow environment, adjust your resource use on Astro. See Configure Deployment resources

  • Worker Type: Select the worker type for your default worker queue. See Worker queues.
  • Min / Max # Workers: Set the same minimum and maximum worker count as in source Airflow environment.
  • KPO Pods: (Optional) If you use the KubernetesPodOperator or Kubernetes Executor, set limits on how many resources your tasks can request.
  1. Click Create Deployment.
  2. Specify any system-level environment variables as Astro environment variables. See Environment variables.
  3. Set an email to receive alerts from Astronomer support about your Deployments. See Configure Deployment contact emails.

Step 4: Use Starship to Migrate Airflow Connections and Variables

You might have defined Airflow connections and variables in the following places on your source Airflow environment:

  • The Airflow UI (stored in the Airflow metadata database).
  • Environment variables
  • A secrets backend.

If you defined your Airflow variables and connections in the Airflow UI, you can migrate those to Astro with Starship. You can check which resources will be migrated by going to Admin > Variables and Admin > Connections in the Airflow UI to find your source Airflow environment information.

warning

Some environment variables or Airflow Settings, like global environment variable values, can't be migrated to Astro. See Global environment variables for a list of variables that you can't migrate to Astro.

  1. Log in to Astro. In the Astro UI, open the Deployment you're migrating to.

  2. Click Open Airflow to open the Airflow UI for the Deployment. Copy the URL for the home page. It should look similar to https://<your-organization>.astronomer.run/<id>/home.

  3. Create a Deployment API token for the Deployment. The token should minimally have permissions to update the Deployment and deploy code. Copy this token. See Create and manage Deployment API tokens for additional setup steps.

  4. Open the Airflow UI for your source Airflow environment, then go to Astronomer > Migration Tool 🚀.

    Location of the Astro migration menu in the Astro UI

  5. Ensure that the Astronomer Product toggle is set to Astro.

  6. In the Airflow URL section, fill in the fields so that the complete URL on the page matches the URL of the Airflow UI for the Deployment you're migrating to.

  7. Specify your API token in the Token field. Starship will confirm that it has access to your Deployment.

  8. Click Connections. In the table that appears, click Migrate for each connection that you want to migrate to Astro. After the migration is complete, the status Migrated ✅ appears.

  9. Click Pools. In the table that appears, click Migrate for each connection that you want to migrate to Astro. After the migration is complete, the status Migrated ✅ appears.

  10. Click Variables. In the table that appears, click Migrate for each variable that you want to migrate to Astro. After the migration is complete, the status Migrated ✅ appears.

  11. Click Environment variables. In the table that appears, check the box for each environment variable that you want to migrate to Astro, then click Migrate. After the migration is complete, the status Migrated ✅ appears.

  12. Click DAG History. In the table that appears, check the box for each DAG whose history you want to migrate to Astro, then click Migrate. After the migration is complete, the status Migrated ✅ appears.

Step 5: Create an Astro project

  1. Create a new directory for your Astro project:

    mkdir <your-astro-project-name>
  2. Open the directory:

    cd <your-astro-project-name>
  3. Run the following Astro CLI command to initialize an Astro project in the directory:

    astro dev init

    This command generates a set of files that will build into a Docker image that you can both run on your local machine and deploy to Astro.

  4. Add the following line to your Astro project requirements.txt file:

    astronomer-starship

    When you deploy your code, this line installs the Starship migration tool on your Deployment so that you can migrate Airflow resources from your soure environment to Astro.

  5. (Optional) Run the following command to initialize a new git repository for your Astro project:

    git init

Step 6: Migrate project code and dependencies to your Astro project

  1. Open your Astro project Dockerfile. Update the Runtime version in first line to the version you selected for your Deployment in Step 3. For example, if your Runtime version was 6.3.0, your Dockerfile would look like the following:

    FROM quay.io/astronomer/astro-runtime:6.3.0

    The Dockerfile defines the environment where all your Airflow components run. You can modify it to include build-time arguments for your Airflow environment, such as environment variables or credentials. For this migration, you only need to modify the Dockerfile to update your Astro Runtime version.

  2. Open your Astro project requirements.txt file and add all Python packages that you installed in your source Airflow environment. See Google documentation for how to view a list of Python packages in your source Airflow environment.

    warning

    To avoid breaking dependency upgrades, Astronomer recommends pinning your packages to the versions running in your soure Airflow environment. For example, if you're running apache-airflow-providers-snowflake version 3.3.0 on Cloud composer, you would add apache-airflow-providers-snowflake==3.3.0 to your Astro requirements.txt file.

  3. Open your Astro project dags folder. Copy your DAG files to dags from either your source control platform or GCS Bucket.

  4. If you used the plugins folder in your Cloud Composer storage bucket, copy the contents of this folder from your source control platform or GCS Bucket to your Astro project /plugins folder.

  5. If you used the data folder in Cloud Composer, copy the contents of that folder from your source control platform or GCS Bucket to your Astro project include folder.

After you confirm that your Astro project has all necessary dependencies, deploy the project to your Astro Deployment.

  1. Run the following command to authenticate to Astro:

    astro login
  2. Run the following command to deploy your project

    astro deploy

    This command returns a list of Deployments available in your Workspace and prompts you to pick one.

Step 7: Configure additional data pipeline infrastructure

The core migration of your project is now complete. Read the following topics to see whether you need to set up any additional infrastructure on Astro before cutting over your DAGs.

Set up CI/CD

If you used CI/CD to deploy code to your source Airflow environment, read the following documentation to learn about setting up a similar CI/CD pipeline for your Astro project:

Similarly to GCC, you can deploy DAGs to Astro directly from a Google Cloud Storage (GCS) bucket. See Deploy DAGs to from Google Cloud Storage to Astro.

Set up a secrets backend

If you currently store Airflow variables or connections in a secrets backend, you need to integrate your secrets backend with Astro to access those objects from your migrated DAGs. See Configure a Secrets Backend for setup steps.

Instance permissions and trust policies

You can utilize Workload Identity or Service Account Keys to grant your Astro Deployment the same level of access to Google Services as your source Airflow environment. See Connect GCP - Authorization options.

Step 8: Test locally and check for import errors

Depending on how thoroughly you want to test your Airflow environment, you have a few options for testing your project locally before deploying to Astro.

  • In your Astro project directory, run astro dev parse to check for any parsing errors in your DAGs.
  • Run astro run <dag-id> to test a specific DAG. This command compiles your DAG and runs it in a single Airflow worker container based on your Astro project configurations.
  • Run astro dev start to start a complete Airflow environment on your local machine. After your project starts up, you can access the Airflow UI at localhost:8080. See Troubleshoot your local Airflow environment.

Note that your migrated Airflow variables and connections are not available locally. You must deploy your project to Astro to test these resources.

Step 9: Deploy to Astro

  1. Run the following command to authenticate to Astro:

    astro login
  2. Run the following command to deploy your project

    astro deploy

    This command returns a list of Deployments available in your Workspace and prompts you to pick one.

  3. In the Astro UI, open your Deployment and click Open Airflow. Confirm that you can see your deployed DAGs in the Airflow UI.

Step 10: Cut over from your source Airflow environment to Astro

After you successfully deploy your code to Astro, you need to migrate your workloads from your source Airflow environment to Astro on a DAG-by-DAG basis. Depending on how your workloads are set up, Astronomer recommends letting DAG owners determine the order to migrate and test DAGs.

You can complete the following steps in the few days or weeks following your migration set up. Provide updates to your Astronomer Data Engineer as they continue to assist you through the process and any solve any difficulties that arise.

Continue to validate and move your DAGs until you have fully cut over your source Airflow instance. After you finish migrating from your source Airflow environment, repeat the complete migration process for any other Airflow instances in your source Airflow environment.

Confirm connections and variables

In the Airflow UI for your Deployment, test all connections that you migrated from your source Airflow environment.

Additionally, check Airflow variable values in Admin > Variables.

Test and validate DAGs in Astro

To create a strategy for testing DAGs, determine which DAGs need the most care when running and testing them.

If your DAG workflow is idempotent and can run twice or more without negative effects, you can run and test these DAGs with minimal risk. If your DAG workflow is non-idempotent and can become invalid when you rerun it, you should test the DAG with more caution and downtime.

Cut over DAGs to Astro using Starship

Starship includes features for simultaneously pausing DAGs in your source Airflow environment and starting them on Astro. This allows you to cut over your production workflows without downtime.

For each DAG in your Astro Deployment:

  1. Confirm that the DAG ID in your Deployment is the same as the DAG ID in your source Airflow environment.

  2. In the Airflow UI for your source Airflow environment, go to Astronomer > Migration Tool 🚀.

  3. Click DAGs cutover. In the table that appears, click the Pause icon in the Local column for the DAG you're cutting over.

  4. Click the Start icon in the Remote column for the DAG you're cutting over.

  5. After completing this cutover, the Start and Pause icons switch. If there's an issue after cutting over, click the Remote pause button and then the Local start button to move your workflow back to your source Airflow environment.

Optimize Deployment resource usage

Review DAG development features

Astro includes several features that enhance the Apache Airflow development experience, from DAG writing to testing. To make the most of these features, you might want to make adjustments to your exisitng DAG development workflows.

As you get started on Astro, review the list of features and changes that Astro brings to the Airflow development experience and consider how you want to implement these details in your development experience. See Write and run DAGs on Astro.

Monitor analytics

As you cut over DAGs, view Deployment metrics to get a sense of how many resources your Deployment is using. Use this information to adjust your worker queues and resource usage accordingly, or to tell when a DAG isn't running as expected.

Modify instance types or use worker queues

If your current worker type doesn't have the right amount of resources for your workflows, see Deployment settings to learn about configuring worker types on your Deployments.

You can additionally configure worker queues to assign each of your tasks to different worker instance types. View your Deployment metrics to help you determine what changes are required.

Enable DAG-only deploys

Deploying to Astro with DAG-only deploys enabled can make deploys faster in cases where you've only modified your dags directory. To enable the DAG-only deploy feature, see Deploy DAGs only.

Was this page helpful?

Sign up for Developer Updates

Get a summary of new Astro features once a month.

You can unsubscribe at any time.
By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.