Snowflake data quality
Private Preview
This feature is in Private Preview. Please reach out to your account team to enable this feature.Astro Observe data quality helps users monitor Snowflake tables to ensure data accuracy, completeness, and integrity across your pipelines. It automatically tracks key metrics such as column null percentages, schema changes, and table row counts to detect anomalies or unexpected shifts in your data.
Ensure the necessary permissions in Snowflake
Before connecting to Astro Observe, create a Snowflake service user and role and ensure the necessary Snowflake permissions are configured.
All Snowflake integrations require that the Observe role has access to both ACCOUNT_USAGE and INFORMATION_SCHEMA system tables. The service user must have a default warehouse configured for all discovery and monitoring operations.
Setup key-pair authentication in Snowflake
Astronomer recommends key-pair authentication for Snowflake service users. Generate an RSA key pair, then assign the public key to the Observe service user to enable secure authentication.
Connect Snowflake to Astro Observe
After you configure Snowflake permissions and key-pair auth, create the Observe connection.
Fill in connection details
Complete the following fields:
- Name: A name for the connection.
- Description: Optional description.
- Connection Type: Snowflake.
- Polling Schedule: How frequently Observe polls Snowflake for metrics (examples: every 1 hour, 6 hours, 1 day). Polling frequency is the maximum rate at which Observe updates data quality metrics and monitors; more frequent polling may increase Snowflake compute costs.
- Account Identifier: Your Snowflake account identifier (for example,
FY02423-GP2141). Observe maps assets to a connection by account identifier. - Username: The Snowflake service user (
ASTRO_OBSERVE_USER). - Private Key: Paste your private key for key-pair authentication if using key-pair auth.
Only one Observe connection is allowed per Snowflake account identifier. If you have multiple Snowflake accounts, create a separate connection for each account identifier.
All connections require that the Observe role has access to both the ACCOUNT_USAGE and INFORMATION_SCHEMA system tables. The service user must have a default warehouse configured to support discovery and ongoing data quality monitoring.
Navigating Snowflake data quality in Astro Observe
Asset Catalog
Navigate to Asset Catalog, filter by Snowflake tables, and select the desired table.
You can sort tables by popularity to quickly identify frequently used tables. Popularity rankings are based on query frequency and the number of unique users accessing each table.
Schema
The Schema tab shows table structure details:
- Column names
- Data types
- Completeness status
- Nullability
- Default values
You can enable monitoring for specific columns to actively track completeness.
Event Timeline
The Event Timeline tab shows data quality events for a selected timeframe. Events are color-coded by severity: Success, Neutral, and Failure. Click an event to view details, historical patterns, and affected metrics.
Data quality
The data quality tab provides visualizations for monitored metrics:
- Table Volume: track changes in row counts and percent change over time to identify unexpected fluctuations.
- Completeness: visualize column null percentages against thresholds to surface completeness problems.
Monitors
The Monitors tab lists all configured data quality monitors (Column Null Percentage, Table Schema Change, Row Volume Change). Each monitor’s schedule and modification history are shown for management.
Set up a data quality monitor
Follow these steps to create a monitor for a Snowflake table or column.
Create a monitor
- If no monitors exist, click Monitor (or + Monitor) to create your first monitor.
- Select the monitor type: Column Null Percentage, Row Volume Change, or Schema Change.
Volume checks
- Specify thresholds based on percentage changes or absolute row-count changes.
- Monitors execute checks according to the schedule you define. If you set the monitor to run every 6 hours, it evaluates whether row counts exceed the configured thresholds within that interval.
Column null percentage checks
- Select the column to monitor and define the null percentage threshold.
- The monitor evaluates the column at the interval you specify. If the null percentage exceeds the threshold, the monitor triggers.