Live Blogging Through a Migration

One of the core modules is our clickstream module. It empowers marketers and product mangers to know their target audience and how they’re interacting with a site or app.

Under the hood, a huge part of our clickstream module is powered by Apache Airflow. Analytics.js events get sent to S3 and loaded into Redshift by dynamically generated DAGs.

Our Airflow Clickstream powers our Redshift loader DAGS for clickstream events.

Recently, we pushed a pretty big update to our Airflow Clickstream. You’ll see some screenshots of our engineering Slack channel through deployment night.

1518736661-9pm migration

First, some intros:

1518737823-intro

@kicksopenminds - our resident Pythonista

1518736731-taylor

@CJ - Astronomer’s dev team’s free safety

1518736763-cj

CEO @rywalker, noted lover of live blogs.

1518736786-ry

1518736812-joker here it goes

1518736820-cj isnt late

People forget they called him “Mr. Punctual” in high school.

1518736844-alert canary pray

We rolled out the new DAG version to our own Astronomer DAGs first for our website, then for our app.

Bah followed by a prayer emoji…oh boy.

1518736879-bah elaborate

First obstacle (and ambiguous bah) averted!

Houston is our GraphQL API - it acts as ground control between all the different services that run our platform. You can read more about why we chose to write it in GraphQL.

1518736940-popcorn

Now the spectators start arriving!

20 minutes in, something is still off

1518737014-hefty-1

1518737020-fuck it hefty

Another API explorer never hurt anyone.

1518737049-uh oh

Airflow logs everything to a database, so remember to check your SQL!

Most common phrase in the Astronomer Slack- “wtf Airflow?”

1518737080-wtf is this

But it’s all good because we love Airflow anyways, but it certainly has its quirks…

1518737122-front end date

10:40 - morale was high.

1518737154-channel change

1518737165-taylor dag caught

Those are our internal DAGs that control how we handle reporting and inbound marketing.

Over the next half hour, some of our other DAGs caught up successfully as we re-enabled them.

1518737232-cynical cj

Fun Fact: They also called CJ “Mr Cynical” in high school.

1518737274-move fast and break things

Startups, man.

5 minutes later…

1518737305-logs logs logs

A wild @andscooper appears!

1518737345-ceo confirms

Executive confirmation always helps.

1518737369-incoming

After some investigation, we figured out the issue. As the scheduler was catching up, it was hitting Houston with a higher than expected request volume.

We were DDoSsing ourselves.

1518737834-weebay

Added some caching magic (added server-side caching on the GraphQL API endpoint), bumped the docker tag, and tried again.

1518737865-dags caught up

Some last touch-ups…

1518737886-cj all the way

1518737901-its done

Andddddd we’re through the finish line. What a rollercoaster of emotions that was!

Modern,Cloud-Native Data Orchestration powered by Apache Airflow

Start building your next-generation data platform with Astro.

Get Started