ETL stands for "extract, transform, load". It is the process of replicating data siloed in a data source such as Shopify or Facebook ads, to a destination.
The most common example of a destination would be a data warehouse such as Google BigQuery or Amazon Redshift.
There is fundamentally two ways for a business to implement ETL capabilities.
The first is to set up the servers and write the code needed to ETL the business's data. This is a costly approach which requires having the right in-house human resources.
The more common approach is to use an out-of-the box ETL provider such as Fivetran, Daton, or any of the many solutions on the market.
The first major step a Shopify business needs to take in order to implementing a modern data stack is to implement ETL support for its high priority data sources.
ETL would be considered the first piece of data infrastructure that would need to be put into place.
In the image above you can see that ETL sits between the data sources of the business and the data warehouse.
Connections would be set up between the sources and the ETL solution, and between the ETL solution and the data warehouse. This would facilitate the flow of data from the sources into the raw data layer of the data warehouse.
Once the data is extracted, transformed and loaded into the warehouse, analysts and/or data engineers can move onto the next major step which is data modeling, the process of transforming the raw data into usable production-ready data sets.
This guide will cover 5 popular ETL providers which have great support for popular eCommerce-related data sources. The table below providers a quick comparison of the most important variables that you should consider when choosing your primary ETL solution.
The first ETL provider we will be covering is Daton by Saras Analytics.
Daton is one of projectBI's favorite ETL providers since it offers great coverage for a reasonable price.
Daton also positions itself as an ETL provider for eCommerce businesses specifically. Instead of trying to cover hundreds of popular data sources used by tech companies in general, Saras has taken the approach of focusing primarily on popular eCommerce data sources. It's this approach which has made Daton popular among high 8 and 9-figure Shopify brands that have chosen to implement a modern data stack.
Daton is used by some of the most well known direct-to-consumer brands in the world including True Classic, Hexclad, and Ridge.
Daton has support for over 200 data sources, offers a tiered usage-based pricing model, a 14 day free trial, annual subscriptions and custom pricing for larger brands.
In short, Daton is a highly recommended ETL provider for Shopify businesses that are serious about investing in their business intelligence capabilities and have a large number of data sources.
Fivetran is a world-class ETL solution used by some of the biggest tech companies in the world. Fivetran has been around the longest out of all the solutions covered in this guide and also has the highest number of connectors (over 700).
Fivetran does not position itself as eCom-focused but does cover a large number of data sources relevant to DTC businesses.
Fivetran is a highly reliable solution with great support. Whenever the APIs of data sources you have connected to your Fivetran account are changing, and new tables need to be loaded, Fivetran sends detailed emails with all the relevant information you need.
The biggest negative of Fivetran is its pricing. Fivetran is an expensive option which often disqualifies it for small and medium-sized DTC businesses who have a tight budget for their data stack.
I do however recommend Fivetran to 9-figure DTC brands that have a large number of data sources, are building an internal data team, and are comfortable spending mid-five figures a year on ETL.
Stitch is another ETL provider that has been in the market for a very long time.
I list Stitch in this guide because it is a well-known ETL provider and you'll definitely come across it while doing your research but unfortunately I can't recommend it to DTC businesses.
Stitch is a fantastic option for early-stage SaaS companies but it simply isn't good enough for DTC brands. It doesn't have enough coverage of the relevant eCom data sources and has poor depth of coverage.
Stitch is a very affordable option, has a great UI and user experience and is very reliable. It has a tiered usage-based model which is very generous.
Stitch may fit the needs of small DTC brands that are interested in centralizing their data in a data warehouse and don't have a large number of data sources. However, I'd still recommend Daton or Fivetran over Stitch in almost all cases.
Estuary is a newer member of the ETL club and the solution in this guide that projectBI is least familiar with.
What differentiates Estuary from the other solutions in this guide is its pricing model. Instead of charging users on the number of rows they replicate, Estuary charges a flat fee per connector and incrementally based on the size of data replicated. This is an intriguing model and worth exploring further since it may result in significantly lower usage rates than the other solutions covered in this guide. Their free plan is also very generous and may include enough to cover a number of small data sources for some DTC businesses.
Estuary unfortunately does not do a great job of covering popular eCom-related data sources since its targeted more towards the tech market. Estuary does have more coverage than Stitch but far less than Daton and Fivetran.
Unfortunately the projectBI team has limited experience with Estuary so we can't talk to reliability, depth of coverage and the overall experience of working with the solution.
Airbyte is a unique ETL solution in that it can be used as either a self-hosted, open-source implementation, or as a standard SaaS solution.
Most DTC businesses don't have the in-house capabilities to spin up servers and set up software so Airbyte's cloud-based offering is a lot more relevant.
Since Airbyte started as an open-source solution it has over 2,000 contributors to its code base which has resulted in a wide range of coverage of popular data sources. Airbyte's open-source solution has over 600 connectors while its cloud-based offering has +-550.
Airbyte cloud offers a pay-as-you go pricing model based on the number of rows that are replicated. The rate they offer is very generous when compared to Stitch, Daton and Fivetran.
Unfortunately the projectBI team has limited experience working with Airbyte so we can't talk to reliability, depth of coverage and the overall experience of working with Airbyte.
Portable is a very interesting ETL solution in that it takes a very different approach than the other ETL solutions on the market.
Portable is unique in that its pricing model is based entirely on the number of connectors you create in your account. They don't charge customers based on the volume of data they load. This is a very attractive option for fast growing DTC businesses that have very high volumes of data (very heavy users of Klaviyo, Facebook ads, and other "heavy" data sources).
Where Portable falls short however is that it only serves US-based businesses and has big jumps between its pricing tiers. For example, a business that has 15 data sources will need to be on their "Pro" plan which supports 25 connectors. The "Scale" plan supports 10 so you're kind of stuck and forced to be on a higher tier plan.
Another major benefit of Portable is they take a very aggressive approach to adding new connectors and boast support for over 1,500 sources!! If anyone covers that niche SaaS tool that you are using, its Portable, and even if they don't, they'll happily explore the possibily of adding it to their catalog.
Portable could be a great supplementary ETL provider for 9-figure DTC brands that want to cover some niche data sources which aren't covered by Daton or Fivetran, or explore reducing usage-based pricing for some of their very heavy data sources. Adopting Portable's "Scale" plan to cover a handful of heavy data sources may be a great way to lower your Daton or Fivetran annual bill.
Choosing the right ETL solution for your eCommerce business can be a daughting task. There are many options on the market, each of them having their own strengths and weaknesses.
My recommendation is to focus on the following areas:
You ideally want to have one ETL solution that covers all the data sources you'd like to replicate to your data warehouse.
You can use the source-to-ETL mapping table above to help you identify the ETL providers that cover your sources.
If you have more than 15 data sources you may not be able to find an ETL that covers all your sources. If that's the situation you find yourself, I'd suggest listing your highest priority sources and find an ETL that covers the vast majority of them. You can consult with projectBI or other data specialists on your options to cover the non-supported sources.
It's not the end of the world to have more than 1 ETL provider but this will most often raise your infrastructure costs and add additional complexity to your data stack.
Depth of coverage is an important aspect to consider when choosing an ETL provider. Depth of coverage can be measured in the number of entities / objects that an ETL allows you to load for a given source.
For example, when it comes to Shopify, some of the less known data sets that can be extracted are "users", "fulfillment events", and "discount codes". Not all ETL providers have support for these data sets. Stitch for example has stopped supporting new API end points for many of the sources you see listed on their website. Fivetran is on the opposite side of the spectrum, they are very vigilent in support all available entities / objects.
There are two ways you can work out the depth of coverage for the ETL providers you are assessing. The first way is by signing up to their free trial and seeing the tables they allow you to extract. The second is by going through the documentation for each source on the websites of the ETL providers. Not all ETL providers keep high quality documentation that clearly indicates the depth of coverage but this approach can still help in most cases.
An obvious consideration when choosing an ETL provider is cost.
Determining the exact cost for extracting, loading and transforming all your data into a data warehouse in a 12 month period is very challenging.
There are many variables that go into calculating the cost from the ETL provider you choose, the number of data sources that need to be connected, the desired loading frequency, size of your business, your growth rate, etc.
From our experience working with over a dozen 8 and 9-figure DTC businesses I can give you a rough range of between $15,000 and $50,000 a year for ETL.
To help you determine your estimated annual price, you can contact the ETL providers directly. They will help you come up with an estimate and in many cases provide discounts if you're willing to sign an annual contract. You can of course contact projectBI and we will help you with this process.