When Should You Start to Warehouse Your Data?

These days, startups want to be data-driven, and web and mobile apps can generate quite a bit of data.

"Big Data" and "Analytics" have become buzzwords lately, but "Data Warehousing" seems like something for big companies and corporations, due to the perception of cost and complexity.

Having control of your customer data is becoming more of a must have. We see more startups who begin with the end in mind.

There are several cloud-based data warehouses that decrease the costs and complexity of setup and maintenance:

  • Amazon Redshift - Fast, managed, petabyte-scale data warehouse
  • Google BigQuery - Fully managed, NoOps, low cost data analytics
  • IBM DashDB - Fully managed cloud data warehousing service
  • SAP Hana - In-memory, columnar, relational database system

Why?

Gary Marcus, cognitive scientist from NYU, said that data has its place after you have formulated a hypothesis or theory about the problem you are trying to address. It is important that the data collected is relevant to the problem you are solving. Therefore, the user events you are collecting need to be relative to the test.

Luckily, it's pretty easy to determine the "one-metric-that-matters" for your early stage company, and over the course of hitting that number, many more questions will emerge.

When?

You don't need to sign up for Amazon Redshift when your launching a landing page, or acquiring your first few users.

As you become more clear on who you think your target is, questions will emerge.

And once you start scaling your product, it'll be hard to find the time to get analytics right. And the first step of that is getting your data right. Amazon Redshift can cost as little as $1k/yr, and provides a solid foundation.

This will scale with your business, and when the time comes, you can link your Redshift data to a number of cloud business intelligence tools:

  • Tableau - Fast analytics and visualizations
  • Yellowfin - Relational databases and multi-dimensional cubes
  • GoodData - SaaS Business Intelligence and Analytics
  • Sisense - Analyze and visualize complex datasets
  • Periscope - Unifies business data from multiple sources
  • Chartio - Analyze critical data through an intuitive interface
  • Looker - Make data accessible to the organization
  • Mode - Provides online services for analyzing data

In Summary

Businesses live, die and thrive by data. Not having complete ownership over this data restricts your ability to truly understand your business.

Plan for the future, own your data.

Bootstrap with early-stage cloud tools (like Mixpanel, Amplitude, etc) until you’ve tested hypotheses.

Then jump into cloud-based data warehouses (like Amazon Redshift, [Google BigQuery, etc) when you start to gain traction.


Ready to build your data workflows with Airflow?

Astronomer is the data engineering platform built by developers for developers. Send data anywhere with automated Apache Airflow workflows, built in minutes...