Guide

Best Practices Calling AWS Lambda and Cloud Functions from Airflow


You might be wanting to trigger an AWS Lambda function directly via Airflow. Here are some tid bits of knowlege accumulated at Astronomer that might help you perform that function successfully.

AWS Lambda Basics

At a high-level, AWS Lambda lets you run code without provisioning or managing servers. You can use AWS Lambda to execute code in response to triggers such as a change in data or a particular user action.

It can be directly triggered by a variety of services, and can be orchestrated into workflows. Naturally, the latter can effectively be done via Apache Airflow.

In general, we'd recommend using a serverless framework such as Zappa or Serverless, as there is a surprising amount of boilerplate involved in writing, deploying, updating, and maintaining both serverless functions and an API Gateway.

Lambda & Airflow

To call an AWS Lambda function in Airflow, you have a few options.

1. Invoke Call in Boto3

The simplest way to call the AWS Lambda function in Airflow is to invoke it in Boto3 as a PythonOperator.

The aws_lambda_hook itself uses the AWS_hook, which is a wrapper around the boto3 library (the standard way to interact with AWS via Python). If you add the AWS connections correctly, you can use the hook in one of your self-written operators to trigger a particular lambda function.

2. Use the SimpleHttpOperator

Another option would be to use the SimpleHttpOperator in Airflow to hit the Lambda function, much like a REST API.

3. Trigger Lambda via an AWS API Gateway

You could also trigger Lambda functions via an AWS API Gateway.

In this case, you'd use the HTTP-Operator as a starting point, but would also have to handle HTTP-Authorization.

Use Cases

In the past, we've seen folks running Lambda in parallel for a token refresh.

Usually, this looks like an ELT structure with GA/Pardot/HubSpot/Marketo to sever a lambda token and refresh it as soon as it expires.

Caveats

There are a few caveats to invoking Lambda:

  • Execution time is limited (5 min max IIRC)
  • It's relatively difficult to debug
  • Light security concerns

Alternatives

If AWS Lambda doesn't fit your needs, you can likely run what you need using the following:


Ready to build your data workflows with Airflow?

Astronomer is the data engineering platform built by developers for developers. Send data anywhere with automated Apache Airflow workflows, built in minutes...