Info
This page has not yet been updated for Airflow 3. The concepts shown are relevant, but some code may need to be updated. If you run any examples, take care to update import statements and watch for any other breaking changes.
OpenAI is an AI research and deployment company that provides an API for accessing state of the art models like GPT-4 and DALL·E 3. The OpenAI Airflow provider offers modules to easily integrate OpenAI with Airflow.
In this tutorial you’ll use Airflow and the OpenAI Airflow provider to ask a question to Star Trek captains, create embeddings of the answers from each captain, and plot them in two dimensions.
OpenAI offers a variety of powerful model endpoints for different tasks like text generation, vector embedding, and translation tasks. These models are used in both user-facing applications, such as chatbots, and internal applications, such as a smart search for internal knowledge base content.
Integrating OpenAI with Airflow into an end-to-end machine learning pipeline allows you to:
This tutorial takes approximately 15 minutes to complete.
To get the most out of this tutorial, make sure you have an understanding of:
Create a new Astro project:
Add the following lines to your requirements.txt file to install the OpenAI Airflow provider and other supporting packages:
To create an Airflow connection to OpenAI, add the following environment variables to your .env file. Make sure to replace <your-openai-api-key> with your own OpenAI API key.
In your dags folder, create a file called captains_dag.py.
Copy the following code into the file.
This DAG consists of four tasks to make a simple MLOps pipeline.
get_captains_list task fetches the list of Star Trek captains you want to ask your question to. You’ll provide the list of captains when you run the DAG with Airflow params.ask_a_captain task uses the OpenAIHook to connect to the OpenAI API. It then uses the chat completion endpoint to generate answers to the question you provide. This task is dynamically mapped over the list of captains to generate one dynamically mapped task instance per captain.get_embeddings task is defined using the OpenAIEmbeddingOperator to generate vector embeddings of the answers generated by the upstream ask_a_captain task. This task is dynamically mapped over the list of answers to retrieve one set of embeddings per answer. This pattern allows for efficient parallelization of the vector embedding generation.plot_embeddings task takes the embeddings created by the upstream task and performs dimensionality reduction using PCA to plot the embeddings in two dimensions.
Run astro dev start in your Astro project to start Airflow, then open the Airflow UI at localhost:8080.
In the Airflow UI, run the captains_dag DAG by clicking the play button. Then, provide Airflow params for:
Question to ask the captain: The question you want to ask the captains.captains_to_ask: A list of Star Trek captains you want to ask the question to. Make sure to create one line per captain and to provide at least two names.max_tokens_answer: The maximum number of tokens available for the answer.randomness_of_answer: The randomness of the answer. The value provided is divided by 10 and given to the temperature parameter of the chat completion endpoint. The scale for the param ranges from 0 to 20, with 0 being the most deterministic and 20 being the most random.
After the DAG run completed, go to the include folder to view the image file created by the plot_embeddings task. The image should look similar to the one below.

Congratulations! You used Airflow and OpenAI to get answers from your favorite Star Trek captains and compare them visually. You can now use Airflow to orchestrate OpenAI operations in your own machine learning pipelines. 🖖