Response
A server-sent events stream of diagnosis events. Progress events (`text_delta`) stream as the investigation runs, a `heartbeat` event keeps the connection warm, and the `rca_diagnosis` event carries the structured diagnosis described by this schema. An `end` event marks the end of the stream, and an `error` event reports a failure.
titlestring
Short descriptive title for the finding.
summarystring
Concise incident-style summary of what happened and the root cause.
root_causestring
The root cause best supported by the retrieved evidence.
root_cause_taskstring
The task ID (dag_id.task_id) identified as the root cause. Empty for a Dag-level issue.
root_cause_typestring
Root-cause category code, for example USER_CODE_PYTHON_TYPE_ERROR, NETWORK_READ_TIMEOUT, or OUT_OF_MEMORY_ERROR. Returns INSUFFICIENT_EVIDENCE when no retrieved evidence points to a cause, or OTHER when no listed code fits. Workspace or Deployment guidance can define custom values.
transienceenum
Failure transience: PERMANENT fails every time, TRANSIENT is a one-off or self-healing failure, INTERMITTENT fails sometimes.
severityenum
Severity of the issue.
priorityenum
Incident priority, from P1 (critical outage) to P4 (low).
confidencedouble
Confidence in the diagnosis, from 0.0 to 1.0.
confidence_justificationstring
Explanation of why the confidence score was assigned.
evidence_statusenum
How well-evidenced the diagnosis is: confirmed (grounded in retrieved logs plus code or config), hypothesis (plausible but key evidence missing), or insufficient_evidence (no retrieved signal points to a cause).
evidencelist of strings
Key pieces of evidence supporting the diagnosis.
symptomslist of strings
Observable symptoms of the failure.
contributing_factorslist of strings
Additional factors that contributed to or amplified the failure.
evidence_gapslist of objects
One entry per key field the agent could not populate, with the reason.
log_snippetstring
The most relevant raw log excerpt showing the error.
exception_classstring
The Python exception class name, for example DatabaseError or ConnectionError.
exception_messagestring
The exception message string.
log_classificationstring
Classification of the log error pattern, for example connection_timeout, auth_failure, resource_exhaustion, data_validation, parse_error, import_error, or permission_denied.
log_signalslist of strings
Structured signal tags extracted from logs.
dag_level_checkslist of objects
Dag-wide diagnostic checks performed before per-task analysis.
taskslist of objects
Per-task diagnostic breakdown, one entry per failed or upstream-failed task.
cascade_chainlist of objects
Ordered chain of tasks or Dags in the failure cascade, from root to leaf.
cascade_depthinteger
Number of hops from the root failure to the furthest affected task.
is_cascade_amplifierboolean
Whether a single task failure was amplified into many downstream failures.
amplifier_descriptionstring
Explanation of how the cascade amplification occurred.
blast_radiusobject
The impact scope of the failure.
cofailure_pairslist of objects
Pairs of tasks or Dags that fail together because of a shared resource.
effective_availabilitydouble
Actual availability percentage, accounting for retries and partial failures.
reported_success_ratedouble
The nominal success rate reported by Airflow.
health_gapstring
Explanation of the gap between effective availability and reported success rate.
failure_onset_datestring
ISO 8601 timestamp of when failures first started.
last_success_datestring
ISO 8601 timestamp of the last successful run.
timeline_eventslist of objects
Chronological sequence of events leading to and during the failure.
change_signalslist of objects
Config or deploy changes correlated with the failure onset.
suggested_fixstring
Actionable fix, with code snippets or config changes when applicable.
remediationstring
Step-by-step remediation instructions.
prevention_measureslist of objects
Forward-looking recommendations to prevent recurrence.
vendor_remediationstring
Vendor-specific fix guidance, separate from the Airflow-side fix.
vendor_namestring
External vendor or service involved, for example Snowflake, S3, or BigQuery.
vendor_destinationstring
Specific vendor resource target.
vendor_dashboard_urlstring
URL to a vendor monitoring dashboard for further investigation.
rendered_conn_idslist of strings
Airflow connection IDs involved in the failure.
affected_dagslist of strings
All Dag IDs affected by the failure, including downstream.
child_dag_findingslist of objects
Findings for downstream or child Dags affected by the failure cascade.
match_countinteger
Number of matching failures with the same error signature in the observed window.
coverage_qualitystring
Depth of analysis performed: deep, moderate, shallow, or partial.
duration_percentilesobject
Task duration distribution percentiles, in seconds.
trend_directionstring
Trend direction of the failure rate or duration: improving, stable, degrading, or unknown.
change_point_datestring
ISO 8601 date when a significant change in behavior was detected.
seasonality_signalstring
Description of any seasonal or periodic pattern detected.
fleet_health_scoredouble
Overall Deployment health score, from 0.0 (unhealthy) to 1.0 (healthy).
from_dag_run_idstring
Original Dag run ID when the diagnosis was replayed from the cache. Absent on fresh diagnoses.
session_idstring
Durable ID of the investigation session. Include it when you contact Astronomer support.