Blog|

The EduAgent: The AI Agent Behind Astronomer Academy's Support

13 min read |

The Problem

The Astronomer Academy certifies thousands of Airflow practitioners every year. With that volume comes a steady stream of support tickets: missing badges, exam extensions, name corrections on certificates, invoice requests, coupon validations, refund inquiries, and account merges between email addresses.

Most of these tickets follow predictable resolution paths. A "where's my badge?" ticket means: look up the user's exam status in Snowflake, check whether Credly already issued the badge, issue it if not, and write back confirming what happened. An exam extension request means: identify which certification, call the Skilljar API to extend the enrollment deadline by 30 days, and respond. An invoice request means: pull order data, generate a PDF, and send it.

While the resolution logic is deterministic, the Education team has to do every lookup, every API call, and draft every response. Each ticket takes 5-15 minutes of context switching: open Zendesk, read the ticket, open Snowflake, run a query, open Credly, check the badge, go back to Zendesk, write the response, set the status. Multiply that by dozens of tickets per day and the Education team was spending hours on work that could be scripted, except that understanding what the user is actually asking for requires natural language comprehension.

That's where the AI agent comes in. We built a system orchestrated by Apache Airflow that handles the full ticket lifecycle end-to-end: classify the ticket, execute the deterministic business logic, draft a response, and present it to a human reviewer in Slack for a single-click approval.

An Example Ticket

Let's walk through a real scenario to make this concrete.

A user submits a Zendesk ticket: "Hi, I passed the Airflow Fundamentals exam last week but I never received my badge. Can you help?"

Here's what happens, start to finish:

  1. Ingestion. The fetch_and_process_tickets DAG runs daily. It fetches open tickets from Zendesk created within the data interval and fans out processing using task.expand(), creating one task instance per ticket.
  2. Email resolution. Before anything else, an email resolution agent analyzes the ticket to determine who submitted it and whose account the action should apply to. In this case, they're the same person. But when a manager submits a ticket on behalf of an employee, or a user changed jobs and is writing from a new email, the target email differs from the requestor. This agent catches that.
  3. Categorization. A Claude-powered categorization agent classifies the ticket into one of 11 categories. This one comes back as CERTIFICATE_ISSUES with a confidence score of 0.95 and structured reasoning.
  4. Deterministic service handler. The system dispatches to the certificate service, which:
    • Queries Snowflake to check if the user passed the exam → YES
    • Calls the Credly API to check if a badge was already issued → NO
    • Issues the badge via the Credly API → Done
    • No AI involved in this step. It is a deterministic workflow.
  5. Response generation. The response agent (Claude) receives the ticket context, the Academy Handbook, and the result of the automated action. It drafts a reply: "Hi! Great news, we've verified that you passed the Apache Airflow Fundamentals exam. Your badge was missing, so we've automatically issued it to your email. You should receive a notification from Credly shortly. Congratulations!"
  6. Status classification. A status classifier agent reads the response and determines the appropriate Zendesk ticket status: solved.
  7. Slack review. The response is posted to a Slack channel with the ticket category, processing result, draft response, suggested status, and three buttons: Approve, Regenerate, Skip.
  8. Human approval. A team member reads the response, clicks Approve, and the response is posted to Zendesk with the ticket status updated. Total human effort: reading and one click.

What used to take 10 minutes now takes 10 seconds of review.

Architecture Overview

The system has three layers, all orchestrated by Airflow:

Layer 1: Ingestion

The fetch_and_process_tickets DAG runs at midnight daily. It queries Zendesk for open tickets created within the data interval, extracts ticket IDs, and fans out processing:

@task(outlets=[process_ticket_asset], max_active_tis_per_dag=1)
def process_a_ticket(ticket_id):
    yield Metadata(
        process_ticket_asset,
        {
            "ticket_id": ticket_id,
            "user_input": None,
            "ticket_status": None
        }
    )


ticket_ids = fetch_tickets()
process_a_ticket.expand(ticket_id=ticket_ids)

Each ticket yields an Asset event, which triggers the processing layer.

Layer 2: Processing

The process_tickets DAG is scheduled on two Assets:

@dag(
    schedule=(regenerate_ticket_asset | process_ticket_asset),
)
def process_tickets_dag():
    ...

This means it fires when either a new ticket arrives (from the daily fetch) or a human requests regeneration (from Slack). The same processing pipeline handles both cases. The only difference is that regeneration passes additional context from the reviewer's feedback.

Inside, the pipeline runs sequentially:

  1. Email resolution: resolve requestor vs. target email
  2. Categorization: classify into one of 11 categories with confidence scoring
  3. Category dispatch: route to the appropriate deterministic service handler
  4. Response generation: draft the reply using the service result as context
  5. Status classification: determine the Zendesk status
  6. Slack posting: send for human review

Layer 3: Human Review

The handle_slack_events DAG is fully event-driven. It uses an AssetWatcher with a MessageQueueTrigger pointed at an SQS queue:

trigger = MessageQueueTrigger(
    aws_conn_id="aws_default",
    queue="https://sqs.us-east-1.amazonaws.com/.../slack-interactions-queue",
    waiter_delay=5,
)


sqs_asset_queue = Asset(
    "sqs_asset_queue",
    watchers=[AssetWatcher(name="sqs_watcher", trigger=trigger)]
)

When a reviewer clicks a button in Slack, the interaction payload flows through API Gateway → SQS → AssetWatcher → DAG. The DAG branches on the action:

  • Approve: posts the response to Zendesk, updates the ticket status, sends a confirmation to Slack
  • Regenerate: extracts the reviewer's feedback, yields a regenerate_ticket_asset event (which re-triggers the processing DAG)
  • Skip: no action

AI Where It Matters, Determinism Where It Counts

This is the architectural decision that makes the system reliable.

The AI handles two things: understanding what the user is asking (categorization) and writing a natural language response. Everything in between, the actual business logic, is deterministic Python code.

Categorization

The categorization agent uses Claude with Pydantic AI's structured output to return a typed result:

class CategoryClassification(BaseModel):
    category: TicketCategory
    confidence_score: float
    reasoning: str


categorization_agent = Agent(
    model=create_agent_model_no_settings(),
    output_type=CategoryClassification,
    system_prompt="""You are an expert ticket categorization system..."""
)

The output is a Python enum, a float, and a string. The LLM outputs structured data that code can act on directly.

Dispatch

Once categorized, the ticket routes through a dispatch dictionary:

CATEGORY_HANDLERS = {
    TicketCategory.CERTIFICATE_ISSUES: handle_certificate_issues,
    TicketCategory.EXAM_EXTENSIONS: handle_exam_extensions,
    TicketCategory.NAME_CORRECTIONS: handle_name_corrections,
    TicketCategory.INVOICE_REQUESTS: handle_invoice_requests,
    TicketCategory.EXAM_RETAKE_REQUESTS: handle_exam_retake_requests,
    TicketCategory.COUPON_VALIDATION: handle_coupon_validation,
    TicketCategory.REFUND_REQUESTS: handle_refund_requests,
    TicketCategory.MERGE_REQUESTS: handle_merge_requests,
}


handler = CATEGORY_HANDLERS.get(categorized.category)
if handler:
    category_processing_result = await handler(ticket)

The remaining three categories (Exam Access Vouchers, Course Content Issues, General Inquiries) don't need a service handler. They skip straight to the response agent, which can handle them directly from the ticket context and the Academy Handbook.

Each handler is a deterministic function. No LLM calls inside (with the exception of a sub-agent that classifies which certification the user is referring to). Here's what each handler does:

CategoryWhat the handler doesExternal Systems
Certificate IssuesChecks exam status, issues badge if missingSnowflake, Credly
Name CorrectionsReplaces badge with corrected nameCredly
Exam ExtensionsExtends enrollment deadline by 30 daysSkilljar
Invoice RequestsGenerates PDF invoiceweasyprint
Exam RetakesResets exam attemptSkilljar
Coupon ValidationValidates coupon code against poolSnowflake
Refund RequestsLooks up order, forwards to billingSkilljar
Account MergesTransfers progress between accountsSkilljar (Selenium)

Why this split matters. The handlers are testable: you can unit test that a passed exam + missing badge = badge issuance without involving an LLM. They're auditable: the processing result string tells the reviewer exactly what the system did. And they're predictable: same input, same output, every time.

The AI only handles what requires language understanding: reading the ticket, deciding which category it belongs to, and writing a human-friendly response that incorporates the deterministic result.

Email Resolution: The Edge Case That Required Its Own Agent

Early in development, we hit a pattern that broke simple email extraction: cross-account requests.

"Hi, my employee john.doe@company.com passed the Airflow exam but hasn't received his badge. Can you check?", sent from manager@company.com.

If we naively used the ticket submitter's email to look up the exam, we'd find nothing. The action needs to happen on john.doe@company.com, not the manager's address.

These cases are surprisingly common: managers acting on behalf of reports, users who changed jobs and are writing from a new email, people who used a personal email for the exam but submit tickets from their work email.

We solved this with a dedicated email resolution agent that runs before categorization:

email_resolution = await 
email_resolution_service.resolve_email_context(ticket)
ticket["target_email"] = email_resolution.target_email

The agent analyzes the full ticket context (description, comments, metadata) and outputs a structured resolution: requestor email, target email, action context, confidence score, and reasoning. A validation layer ensures the target email actually appears somewhere in the ticket content (the agent can't hallucinate an email that was never mentioned).

The resolved target email then flows through the entire downstream pipeline. Every service handler operates on target_email, not the raw user_email from Zendesk metadata. When it's a cross-account request, the processing result notes it: "Cross-account request detected: Operating on john.doe@company.com (requested by manager@company.com)", giving the human reviewer full visibility.

The Human-in-the-Loop

Support responses go out under the Astronomer brand. Edge cases exist that no amount of prompt engineering can anticipate. And the cost of a single bad response (wrong badge issued, incorrect refund information, leaked internal details) outweighs the cost of a few seconds of human review. That's why we keep a human in the loop for every ticket response.

The Slack Review Interface

Each processed ticket arrives in Slack as a structured message:

  • Header: category label and ticket ID
  • Processing Result: what the system automated (e.g., "Badge issued to user@email.com")
  • Agent Response: the full draft reply
  • Suggested Status: solved, pending, or open
  • Action Buttons: Approve / Regenerate / Skip
  • Text Input: optional field for regeneration feedback

The reviewer reads the response in context, sees exactly what automated actions occurred, and makes a decision with one click.

The Regeneration Feedback Loop

When a response isn't quite right, the reviewer clicks Regenerate and optionally types feedback: "Don't mention the exam score, just confirm the badge was issued."

This triggers a two-stage feedback loop:

@task(outlets=[regenerate_ticket_asset])
def handle_regenerate(triggering_asset_events=None):
    for event in triggering_asset_events[sqs_asset_queue]:
        message_body = event.extra['payload']['message_batch'][0]['Body']
        metadata = process_regeneration(message_body)
        if metadata:
            yield metadata

The regeneration metadata (ticket ID + reviewer feedback) yields a regenerate_ticket_asset event, which triggers the process_tickets DAG. The feedback flows to both stages:

  1. The categorization agent re-evaluates, because the feedback might reveal the ticket was miscategorized
  2. The response agent incorporates the feedback when drafting the new reply

The new response appears in Slack for another review round. In practice, regeneration is needed on roughly 10-15% of tickets, and a single round of feedback almost always produces an acceptable response.

Beyond Tickets: The Slack-Native Data Layer

The same agent infrastructure extends beyond ticket processing into a conversational data layer in Slack.

Channel Listener

The handle_slack_channel_messages DAG uses a custom SlackChannelTrigger that polls configured Slack channels for new messages. When a message arrives, it routes based on content:

match = DOMAIN_PATTERN.match(text)
if match:
    domain = match.group(1)
    domains[domain] = {"ts": thread_ts, "channel": msg.get("channel", "")}
elif text:
    data_query_messages.append({
        "user": user, "text": text, "ts": thread_ts, "channel": channel
    })

Domain URLs (like astronomer.io) route to the group_summary DAG, which queries Snowflake for customer enrollment metrics: number of students, course enrollments, certifications earned, and Astro Onboarding progress.

Natural language questions (like "how many people certified this month?") route to the handle_data_query DAG, where a data query agent generates SQL against Snowflake and returns the answer in-thread.

Weekly Executive Summary

A weekly_academy_summary DAG runs every Monday morning. It queries Snowflake for the week's metrics (new signups, course completions, certifications issued by type, top courses, top organizations), feeds them to an AI agent that writes a 3-4 paragraph executive narrative, and posts structured blocks to Slack. The team starts each week with a data-driven snapshot of Academy health, without anyone having to build a dashboard or run a query.

What We Learned

Follow-Up Detection

A user submits a ticket, gets a response, then replies "thanks!" or asks a new question. Without special handling, the system would re-categorize the original issue and try to resolve it again.

We solved this by detecting when the Education Team has already responded. The system finds the last response with the Education Team signature and treats everything after it as a new conversation:

def truncate_after_education_team_response(answers):
    education_signature = "Astronomer Education Team"
    last_education_index = -1
    for i, answer in enumerate(answers):
        if '@astronomer.io' in answer.get('author', '').lower() \
           and education_signature in answer.get('body', ''):
            last_education_index = i
    if last_education_index >= 0 and last_education_index < len(answers) - 1:
        return answers[last_education_index:], True
    return answers, False

A "thank you" follow-up gets categorized as GENERAL_INQUIRIES and receives a warm acknowledgment. A new question gets treated on its own merits.

Guard Against Hallucination

The response agent has an explicit boundary: it reports on actions the system already took, never promises actions it hasn't performed. The system prompt says:

"Report on any automated actions that have already been completed. Guide users on next steps if issues remain unresolved."

Combined with guardrails that prohibit mentioning internal tools (Skilljar, Snowflake, Slack channels), sharing employee information, or initiating actions without consent, the agent stays within a well-defined lane. It's a communication layer, not a decision layer.

Confidence Scoring

The categorization agent returns a confidence score between 0.0 and 1.0 for every classification. Low-confidence tickets (<0.7) surface clearly in the Slack review, so the reviewer knows to pay extra attention. In practice, the categorizer runs above 0.9 confidence on ~85% of tickets, confirming that the 11-category taxonomy covers the problem space well.

Closing

This system started with a simple observation: most support tickets don't require judgment. They require lookup, API calls, and a well-written response. The judgment part is narrow (what is the user asking for?) and the execution part is mechanical (query, call, respond).

The architecture reflects that insight. AI handles the narrow judgment calls: reading a ticket, classifying intent, and drafting natural language. Deterministic code handles everything else: querying Snowflake, calling Credly, generating PDFs, extending deadlines. And a human reviews every outgoing response with a one-click approval.

This pattern transfers to any support workflow where resolution paths are predictable but the input is unstructured natural language. The ingredients are straightforward: an orchestration layer (Airflow), a structured-output LLM framework (Pydantic AI), service integrations for the actual business logic, and a human review loop.

The key insight: the AI is the interface layer, not the logic layer. Business rules live in code.

If you want to build something similar, the stack is: Apache Airflow for orchestration, Pydantic AI for structured LLM interactions, and the Airflow AI SDK for bringing them together.

Get started free.

OR

API Access
Alerting
SAML-Based SSO
Airflow AI Assistant
Deployment Rollbacks
Audit Logging

By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.