

Process Data at Scale Using Google Dataproc, a Service for Running Big Data Frameworks
Schedule a Demo


Astro + Google Dataproc
Orchestrate big data operations on Google Dataproc clusters with Astro, the modern data orchestration platform powered by Apache Airflow. Astro offers ready-to-use Google Dataproc integration using the comprehensive Google provider package. Leverage a full set of specialized Airflow operators to create clusters and submit jobs to Dataproc.


About Google Dataproc
Google Dataproc is a highly scalable service to run Apache Spark, Apache Flink, Presto, and many more open source tools fully integrated in Google Cloud. Use Google Dataproc to run your compute-intensive Astro tasks handling large amounts of data for data science and ETL processes.

Use Case
Gaining insights from large amounts of data using distributed machine learning is a common use case for orchestrating jobs in Google Dataproc using Astro. Astro offers specialized operators to effortlessly leverage async processes when interacting with Google Dataproc, making your pipeline more cost-effective.