Using Kerberos in Apache Airflow
An overview of the support for Kerberos in Airflow today and how you can use Kerberized hooks.
Kerberos is an authentication system which allows Airflow to access and submit jobs to Kerberos-enabled clusters. Typically this feature would be used for something like a long-running Spark cluster in a large enterprise environment.
Kerberos support is available in Astronomer EE.
Support for Kerberos in Airflow
Core support for Kerberos in Airflow is there including a ticket renewer service in the CLI. Some of its limitations are described in Airflow - Security.
For Kerberos support in Airflow, install it with the
kerberos install option:
pip install apache-airflow[kerberos]
As far as hook support goes, each hook has to implement the Kerberos ticket renewal to support it. Most hooks don't have this yet, but Kerberizing a hook isn't particularly difficult.
A few hooks for services that people commonly use Kerberos for like Spark clusters have the support built-in already. SparkSubmitHook is one that already has Kerberos support.
We recommend using built-in hooks with Kerberos support if they work for your use case. If that's not an option, we recommend, contributing support to a particular hook. You can also reach out to us for Airflow services work.
Settings for Kerberos can be configured via the
[kerberos] group in
[kerberos] ccache = /tmp/airflow_krb5_ccache # gets augmented with fqdn principal = airflow reinit_frequency = 3600 kinit_path = kinit keytab = airflow.keytab
As well as via the
security setting in
If you want to use Kerberos authentication via Airflow's experimental REST API, you'll also want to set
auth_backend under the
[api] group. See the Authentication section in Airflow's Experimental REST API doc.
You can start the Kerberos ticket renewer service via the Airflow CLI
airflow kerberos ...
See Airflow CLI - Kerberos for more info.