Airflow in Action: Modernizing Legacy Data Systems for AI and Analytics With Procter & Gamble

  • M

In this recent episode of Astronomer’s Data Flowcast, Adonis Castillo Cordero, Senior Automation Manager at Procter & Gamble, shares how his team modernizes legacy data systems and powers AI workloads across a global enterprise. Listeners will learn how Apache Airflow® fits into P&G’s data architecture, how it works alongside technologies like Apache Spark® and Apache Kafka®, and the key practices his team follows to ensure scalable, reliable data pipelines.

From Carbon Layer to AI: P&G’s Data Pipeline With a Touch of Airflow Magic

Adonis leads one of several engineering teams at P&G focused on managing complex master data pipelines. His team plays a pivotal role in helping modernize legacy systems that still power many mission-critical functions—particularly supporting operational workflows where near real-time monitoring is required. P&G’s pipelines consolidate raw data from a variety of sources into a modern medallion architecture, starting from what Adonis calls a “carbon layer”—even earlier than bronze. From there, the team transforms, enriches, and stores data to support AI and analytics workloads across the business.

Apache Airflow serves as the orchestration layer for this architecture. It connects disparate systems—whether managing workflows that stream events via Kafka, or processing batch transformations using Spark—and ensures each stage of the pipeline runs reliably and in the right order.

Airflow’s flexibility allows P&G to work across heterogeneous cloud environments and on-premise data centers without imposing uniformity too early in the data lifecycle. Airflow makes it possible to standardize where it matters most: orchestration, visibility, and downstream data readiness.

In the Data Flowcast, Adonis shared how Procter & Gamble uses Apache Airflow to modernize data systems, power AI, and streamline legacy workflows at global scale.

From Legacy to Lift-Off: Four Pipeline Best Practices

Adonis outlined four best practices for teams looking to scale their Airflow usage to support legacy modernization.

  1. Start with Dependency Mapping Define pipeline dependencies early, both upstream and downstream. Teams often discover critical architectural gaps too late—when something fails in production. Proactive dependency mapping simplifies debugging and prepares the groundwork for lineage tools like OpenLineage. When dependencies are clear, teams can automate even the most complex workflows and react faster to changes.

  2. Use Anomaly Detection for Proactive Alerting Integrate anomaly detection to anticipate performance degradation before it causes failures. Whether using cloud-native tools like Azure Monitor or AWS Lookout, or open-source ML libraries, Adonis recommends detecting outliers in pipeline performance and transformation complexity. For example, deeply nested JSON files may process fine at small volumes, but cause DAGs to fail at scale. Predictive monitoring mitigates these risks.

Take a look at Astro Observe for a ready-made solution to anomaly detection. Observe brings pipeline-level visibility and SLA monitoring to your data products.

  1. Implement Tiered Data Quality Monitoring Not all pipeline failures are equal. Use data quality checks to monitor for changes to schema and transformation logic. Set rules for which issues should block downstream execution and which can simply raise alerts. This avoids unnecessary disruption while ensuring critical failures are handled with urgency.

Here at Astronomer, we’ve just kicked off the preview program for data quality in Astro Observe. Applications to evaluate the upcoming functionality are open now.

  1. Keep BI Dashboards Simple and Focused Monitoring dashboards often get overcomplicated. Adonis advises surfacing only the most important metrics—like pipeline freshness, latency, and system capacity—to enable fast diagnosis. Engineering leaders don’t need to see every row of detail; they need clarity on health, performance, and whether data is flowing as expected.

Learning More and Getting Started on Airflow 3.0

Procter & Gamble’s approach to data pipeline modernization blends careful architecture, real-time monitoring, and platform flexibility. By using Apache Airflow as the orchestration backbone, Adonis’s team enables faster innovation with AI and analytics while managing the complexity of legacy systems.

Want to hear more? Listen to the full Data Flowcast to dive deeper into P&G’s orchestration strategy and get actionable insights for your own data platform.

Adonis wrapped up his appearance on the Data Flowcast by saying he would be kicking off testing Airflow 3.0, the biggest release in Apache Airflow’s history. You can quickly and easily check out the game-changing Airflow 3.0 features on the fully managed Astro platform.

Build, run, & observe
your data workflows.
All in one place.

Try Astro today and get up to $500 in free credits during your 14-day trial.