For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Overview
        • Context graphs
        • Hybrid search
        • Product insights
        • Fine-tuning with Anyscale
        • AI-powered education operations
    • Glossary

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • Architecture
  • Airflow features
  • Next Steps
Reference ArchitecturesGenAI

Processing User Feedback: an LLM-fine-tuning reference architecture with Ray on Anyscale

Edit this page
Built with

Info

This page has not yet been updated for Airflow 3. The concepts shown are relevant, but some code may need to be updated. If you run any examples, take care to update import statements and watch for any other breaking changes.

The Processing User Feedback GitHub repository is a free and open-source reference architecture showing how to use Apache Airflow® with Anyscale, a distributed compute platform built on Ray, to build an automated system that processes and categorizes user feedback relating to video games using a fine-tuned Large Language Model (LLM). The repository includes full source code, documentation, and deployment instructions for you to adapt and implement this architecture in your own projects.

Screenshot of the Airflow UI showing the graph view of the Finetune_llmm_and_deploy_challenger DAG from the reference architecture.

This reference architecture serves as a practical learning tool, illustrating how to use Apache Airflow to orchestrate fine-tuning of LLMs on the Anyscale platform. The Processing User Feedback application is designed to be adaptable, allowing you to tailor it to your specific use case. You can customize the workflow by:

  • Changing the data that is ingested for fine-tuning and inference.
  • Modifying the Anyscale jobs and services to align with your requirements.
  • Adjusting the data processing steps and model fine-tuning parameters.

By providing a flexible framework, this architecture enables developers and data scientists to implement and scale their own LLM-based feedback processing systems using distributed compute.

Note

This tutorial uses Anyscale with the Anyscale provider to run Ray jobs. If you want to run Ray jobs on other platforms, you can use the Ray provider instead. See also Orchestrate Ray jobs on Anyscale with Apache Airflow®.

Architecture

Processing User Feedback reference architecture diagram.

The Processing User Feedback use case consists of 2 main components:

  • Data ingestion: new user feedback about video games is collected from several APIs, preprocessed, and stored in an S3 bucket.
  • Fine-tuning and deploying of Mistral-7B: once a threshold of 200 new feedback entries is reached, the data is used to fine-tune a pre-trained LLM model, Mistral-7B, on Anyscale using distributed compute. The fine-tuned model is deployed using Anyscale Services.

Additionally, the architecture includes an advanced champion-challenger version of the fine-tuning process.

Airflow features

The DAGs in this reference architecture highlight several key Airflow features and best practices:

  • Branching: Using Airflow Branching, DAGs can execute different paths based on runtime conditions or results from previous tasks. This allows for dynamic workflow adjustments depending on the data or processing requirements. In this reference architecture branching is used to determine whether the fine-tuning process should be executed.
  • Airflow retries: To protect against transient API failures and rate limits, all tasks are configured to automatically retry after an adjustable delay.
  • Dynamic task mapping: Transforming data from multiple data sources is split into multiple parallelized tasks using dynamic task mapping. The number of parallelized tasks is determined at runtime based on the number of data sources that need to be processed.
  • Data-aware scheduling: The DAGs run on a data-driven schedule to regularly and automatically update the LLM model when new data has been ingested. Aside from data-driven scheduling, Airflow offers options such as time-based scheduling or scheduling based on external events detected using sensors.
  • Task groups: In the champion-challenger DAG, related tasks are organized into logical groups within the DAG with Airflow task groups. This improves the overall structure of complex workflows and makes them easier to understand and maintain.

Next Steps

Get the Astronomer GenAI cookbook to view more examples of how to use Airflow to build generative AI applications.

If you’d like to build your own pipeline using Anyscale with Airflow, feel free to fork the repository and adapt it to your use case. We recommend deploying the Airflow pipelines using a free trial of Astro.