Join us for upcoming online events!
Data-Aware Scheduling with the Astro Python SDK
Live with Astronomer dives into implementing data-aware scheduling with the Astro Python SDK. The new Airflow Datasets feature allows you to schedule DAGs based on updates to your data and easily view cross-DAG relationships. This feature is part of the Astro Python SDK, so it requires almost no effort from the DAG author to implement. We'll show you everything you need to do (and don't need to do) to take advantage of Datasets.
Running Airflow Tasks in Isolated Environments
Running tasks in a separate environment can help you avoid common data pipeline issues, like dependency conflicts or out-of-memory errors, and it can save resources. Airflow DAG authors have multiple options for running tasks in isolated environments. In this webinar, we'll cover everything you need to know.
How to Migrate from Oozie to Airflow: A Guided Walkthrough
Migrating between orchestrators can be a difficult process fraught with technical and organizational hurdles. However, the end result of applying Airflow’s orchestration capabilities is worth the effort, and working with the right partner can make this journey much easier. In this webinar, we’ll cover everything you need to know about migrating from Oozie to Airflow.
The New DAG Schedule Parameter
Live with Astronomer will discuss the new consolidated `schedule` parameter introduced in Airflow 2.4. We’ll provide a quick refresher of scheduling concepts and discuss how scheduling DAGs is easier and more powerful in newer versions of Airflow.
Batch Inference with Airflow and SageMaker
We’ll walk through using the new SageMaker Async Operators, as well as the new SageMaker OpenLineage integration, for end-to-end ML OPs for batch inference use cases.
Dynamic Task Mapping on Multiple Parameters
On October 25, Live with Astronomer will dive into updates to the dynamic task mapping feature released in Airflow 2.4. We’ll show a couple of new methods for mapping over multiple parameters, and discuss how to choose the best mapping method for your use case.
Dynamic Tasks in Airflow
With the releases of Airflow 2.3 and 2.4, users can write DAGs that dynamically generate parallel tasks at runtime. In this webinar, we’ll cover everything you need to know to implement dynamic tasks in your DAGs.
Data Driven Scheduling
In this session, Live with Astronomer explores the new datasets feature introduced in Airflow 2.4. We’ll show how DAGs that access the same data now have explicit, visible relationships, and how DAGs can be scheduled based on updates to these datasets.
What’s New in Airflow 2.4
In this webinar, we’ll cover all the great 2.4 updates that make Airflow even more powerful and observable.
A Deep Dive into the Airflow UI
In this webinar, we’ll take an in-depth tour of the Airflow UI and cover the many features that users may not be aware of.
Data Transformations with the Astro Python SDK
On September 13, Live with Astronomer will dive into implementing data transformations with the Astro Python SDK. The Astro Python SDK is an open source Python package that allows for clean and rapid development on ELT workflows. We’ll show how you can use the transform and dataframe functions to easily transform your data using Python or SQL and seamlessly transition between the two.
Implementing Data Quality Checks in Airflow
Executing SQL queries — one of the most common use cases for data pipelines — is a simple way to implement data quality checks. In this webinar, we’ll cover everything you need to know about using SQL for data quality checks.
The Astro Python SDK Load File Function
The next Live with Astronomer will dive into the Astro Python SDK load_file function. The Astro Python SDK is an open source Python package that allows for clean and rapid development on ELT workflows. We’ll show how you can use load_file for the ‘Extract’ step of your pipeline to easily get data from your filesystems into your data warehouse, without any operator-specific knowledge.
The Astro Python SDK
Astronomer is excited to announce the release of the Astro Python SDK version 1.0. The Astro Python SDK is an open source tool powered by Airflow and maintained by Astronomer, that allows for rapid and clean development of ETL workflows using Python.
The SQL Table Check Operator
In this session we’ll dive into the new Common SQL provider package and show how to use the SQLTableCheckOperator. We’ll show how you can easily use this operator to implement data quality checks in your DAGs, ensuring that errant data never makes it to production.
Airflow 101: Essential Tips For Beginners
What is Airflow? Apache Airflow is a platform used to programmatically author, schedule, and monitor data pipelines.
The SQL Column Check Operator
In this session we’ll show how you can easily use the SQLColumnCheckOperator operator to implement data quality checks in your DAGs, ensuring that errant data never makes it to production.
Using Airflow with Tensorflow and MLFlow
We’ll delve further into how Airflow can be integrated with Tensorflow and MLFlow specifically to manage ML pipelines in production, using a worked example to demonstrate.
Reusable DAG Patterns with TaskGroups
In this session we’ll show how Astronomer’s data and intelligence team uses TaskGroups to reduce the amount of code the team has to write while adhering to DAG authoring best practices.
Anatomy of an Operator
Operators are the building blocks of Apache Airflow. In this webinar we’ll look under the hood, covering everything you need to know about operators to tailor them for your use cases.
Using the Snowflake Deferrable Operator
Live with Astronomer will dive into using the Snowflake Deferrable Operator. We’ll show how with a very small update to your DAGs, you can start saving money when orchestrating your Snowflake queries with Airflow.
Writing Functional DAGs with Decorators
In this webinar, we’ll demystify decorators and show you everything you need to know to start using decorators in your DAGs.
The Python Task Decorator
Live with Astronomer will dive into the Python task decorator. We’ll show how to easily turn your Python functions into tasks in your DAG using functional programming, and how using the Python task decorator can limit the boilerplate code needed in your DAGs.
ML in Production with Airflow
Although often regarded as a data engineering and pipelining tool, Airflow is also wildly popular among machine learning teams. In this webinar, we’ll dive into how Airflow can consolidate various ML tools into dependable production systems.
Airflow Grid View
Live with Astronomer will dive into the new Airflow Grid View introduced in Airflow 2.3. The Grid View replaces the Tree View, and provides significant insight into DAG and task run history. We’ll cover all the Grid View features, and touch on the possibilities this view opens up for future Airflow feature development.
Intro: Getting Started with Airflow
What is Airflow? Apache Airflow is a platform used to programmatically author, schedule, and monitor data pipelines.
Dynamic Task Mapping
On May 24, Live with Astronomer will dive into the Dynamic Task Mapping feature introduced in Airflow 2.3. We’ll show how to easily add dynamic tasks to your DAGs, and discuss ways to make the best use of this feature.
This webinar will dive into the Astronomer Providers repository, which includes Airflow Providers containing Deferrable Operators and Sensors created by Astronomer. We’ll go beyond the basics to look at key implementation details and best practices.
What’s New in Airflow 2.3
The Airflow project is rapidly evolving, with frequent releases bringing advancements in DAG authoring, observability, and project stability. We’re super excited for the release of Airflow 2.3, which comes with big changes in the flexibility of DAG creation, improvements to the Airflow UI, and much more.
What is OpenLineage
Live with Astronomer will dive into the OpenLineage Airflow integration. We'll cover the basics for getting started with OpenLineage, how lineage helps with everything from recovering from failures to widespread data governance, and additional topics that will be essential as you begin your journey.
Using Airflow as a Data Analyst
Airflow is sometimes thought of as primarily a data engineering tool, but its use cases are really much broader. A data analyst’s workflow typically involves ingesting and transforming data to extract insights, then presenting the insights in a manner that allows business stakeholders to easily interpret trends and take appropriate action. Airflow’s ease of use and extensive provider ecosystem make it an ideal tool for orchestrating such analytics workflows.
OpenLineage and Airflow: A Deeper Dive
Data lineage is the complex set of relationships between your jobs and datasets. Using OpenLineage with Apache Airflow, you can observe and analyze these relationships, allowing you to find and fix issues more quickly. This webinar will provide a deeper dive on OpenLineage, extending beyond the basics into key implementation details and best practices.
Improve Your DAGs with Hidden Airflow Features
Apache Airflow is flexible and powerful. It has a rich ecosystem and an incredibly active community. But are you sure you haven’t missed anything? A new feature or concept that could put your DAGs at another level? It can be challenging to keep up with the latest Airflow features, and sometimes we miss the most useful ones. For this webinar, I'd like to introduce you to a couple of lesser-known features of Apache Airflow that can dramatically improve your data pipelines.
Scaling Out Airflow
Airflow is purpose-built for high-scale workloads and high availability on a distributed platform. Since the advent of Airflow 2.0, there are even more tools and features to ensure that Airflow can be scaled to accommodate high-throughput, data-intensive workloads. In this webinar, Alex Kennedy will discuss the process of scaling out Airflow utilizing the Celery and Kubernetes Executor, including the parameters that need to be tuned when adding nodes to Airflow and the thought process behind deciding when it’s a good idea to scale Airflow, horizontally and vertically. Consistent and aggregated logging is key when scaling Airflow, and we will also briefly discuss best practices for logging on a distributed Airflow platform, as well as the pitfalls that many Airflow users experience when designing and building their distributed Airflow platform.
Data Quality Use Cases with Airflow and Great Expectations
At this webinar, Benji Lampel (Enterprise Platform Architect @ Astronomer) and Tal Gluck (Software Engineer @ Superconductive) will present several Airflow DAGs using Great Expectations that cover more advanced DAG patterns and data quality checking cases.
The Airflow API
Did you know that Airflow has a fully stable REST API? In this webinar, we’ll cover how to use the API, and why it’s a great tool in your Airflow toolbox for managing and monitoring your data pipelines.
Introducing Astro: Data Centric DAG Authoring
With Airflow 2.0, we introduced the concept of providers. We’re taking that to the next level with Astro, a new DAG writing experience, brought to you by Astronomer.
Data Lineage with OpenLineage and Airflow
If one out of your hundreds of DAGs fails, how do you know which downstream datasets have become out-of-date? The answer is data lineage. Data lineage is the complex set of relationships between your jobs and datasets. In this webinar, you'll learn how to use OpenLineage to collect lineage metadata from Airflow and assemble a lineage graph - a picture of your pipeline worth way more than a thousand words.
Best Practices for Writing DAGs in Airflow 2
Because Airflow is 100% code, knowing the basics of Python is all it takes to get started writing DAGs. However, writing DAGs that are efficient, secure, and scalable requires some Airflow-specific finesse. In this webinar, you’ll learn the best practices for writing DAGs that will ensure you get the most out of Airflow. We’ll include a reference repo with DAGs you can run yourself with the Astro CLI.
Iterative Data Quality in Airflow DAGs
Data quality is an often overlooked component of data pipelines. Learn why it is a valuable part of data systems and how to get started integrating data quality checks into existing pipelines with a variety of tools.
Intro To Data Orchestration With Airflow
What is Airflow? Definition: Apache Airflow is a way to programmatically author, schedule and monitor data pipelines.
Scheduling In Airflow
The flexibility and freedom that Airflow offers you is incredible, but to really take advantage of it you need to master some concepts first, one of which has just been released in Airflow 2.2 By the end of the webinar, you will be able to define schedule intervals that you thought were impossible before.
Everything you Need to Know About Airflow 2.2
In this informative webinar we will cover everything you need to know about Airflow 2.2. We'll go through all of the new features large and small, as well as show you how to leverage all of the new features and how you can get cleaner and more efficient DAGs as a result
Testing Airflow to Bullet Proof Your Code
Airflow, by nature, is an orchestration framework, not a data processing framework. At first sight it can be unclear how to test Airflow code. Are you triggering DAGs in the UI to validate your Airflow code? In this webinar we'll demonstrate various examples how to test Airflow code and integrate tests in a CI/CD pipeline, so that you're certain your code works before deploying to production.
Manage Dependencies Between Airflow Deployments, DAGs, and Tasks
More often that not, your Airflow components will have a desired order of execution particularly if you are performing a traditional ETL process—for example, before the Transform step in ETL, Extraction had to have happened in an upstream pipeline. In this webinar we will discuss how to properly setup dependencies and define an order of execution or operation for your pipelines using dependencies.
Create Powerful Data Pipelines by Mastering Sensors
Do you use Sensors in your data pipelines? Do you need to wait for a file before executing the next step? Are you looking to execute your task after a task completes in another DAG? Would you like to wait for an import in your SQL table before executing the next task? The answer...
Using Airflow with Azure Data Factory
While Airflow and ADF (Azure Data Factory) have pros and cons, they can be used in tandem for data pipelines across your organization. In this webinar, we’ll cover how using the two together can really get you the best of both worlds!
Monitor Your DAGs with Airflow Notifications
Anytime you’re running business critical pipelines, you need to know when something goes wrong. Airflow has a built in notification system that can be used to throw alerts when your DAGs fail, succeed, or anything in between. In this webinar, we’ll do a deep dive into how you can customize your notifications in Airflow to meet your needs.
Intro to Airflow for ETL With Snowflake
ETL is one of the most common data engineering use cases, and it's one where Airflow really shines. In this webinar, we'll cover everything you need to get started as a new Airflow user, and dive into how to implement ETL pipelines as Airflow DAGs.
Getting Started With the Official Airflow Helm Chart
The official helm chart of Apache Airflow is out! The days of wondering what Helm Chart to use in production are over. Now, you only have one chart maintained and tested by Airflow PMC members as well as the community. It’s time to get your hands on it and take it for a spin! At the end of the webinar, you will have a fully functional Airflow instance deployed with the Official Helm Chart and running within a Kubernetes cluster locally.
In this webinar, we'll talk about when you might want to dynamically generate your DAGs, show a couple of methods for doing so, and discuss problems that can arise when implementing dynamic generation at scale.
Using Airflow with Multiple AWS Accounts
In AWS, it's common for organizations to use multiple AWS accounts for various reasons, from Dev, Stage, Prod accounts to accounts being dedicated to LOBs. What do you do when your Data Pipeline needs to span AWS accounts? This webinar will show how you can run a single DAG across multiple AWS accounts in a secure manner.
Intro to Airflow
Learn about the core concepts, components, and benefits of working with Airflow. Watch this Intro to Airflow webinar today!
Airflow 2.0 + Kubernetes
Learn more about using Airflow 2.0 with Kubernetes.
Airflow 2.0 Providers
Learn everything about Airflow 2.0 providers including what defines a provider, how to create your own provider, and customizing provider packages.
TaskFlow API in Airflow 2.0
Watch the webinar recap and learn how Taskflow API can help simplify DAGs that make heavy use of Python tasks and XComs.
Secrets Management in Airflow 2.0
Watch the webinar recording to learn the best practices for managing secrets with various backends in Apache Airflow 2.0.
DAG Writing Best Practices in Apache Airflow
Learn the best practices for writing DAGs in Apache Airflow with a repo of example DAGs that you can run with the Astro CLI.