


Astro + Apache Spark
Run big data transformations in Spark with Astro, the modern data orchestration platform powered by Apache Airflow. Astro’s Spark provider package lets you easily kick off Spark jobs and execute Spark SQL from within your data pipelines. You get full observability over the Spark jobs you’re running.


About Apache Spark
Apache Spark is an open source multi-language unified data and analytics platform for distributed data processing. Use Astro as your orchestration platform, and use the Apache Spark execution framework to do the heavy lifting in your data engineering, data science, and machine learning data pipelines.

Use Case
Transforming petabytes of data requires a framework that can handle distributed heavy data loads. Apache Spark has become one of the core tools in interacting with large amounts of data in a swift and reliable way and Astro is the ideal platform to orchestrate Spark jobs on complex schedules.