How Much Do Your Snowflake Optimizations Actually Save You?

Blog graphic: Data Engineering categorized blogs

Whether you spend $50K or $20M+, monitoring and reducing costs is a top priority. This includes:

  • Manually optimizing the expensive queries
  • Writing large DBT models to summarize the data
  • Right-sizing warehouses and tweaking their parameters
  • Monitoring and setting alerts for spend triggers
  • Leveraging visualization tools to track spend
  • Using automated optimization to make adjustments in real-time

Regardless of your approach, you must answer one question: how much did you save? Without this, you cannot evaluate ROI.

This article explains how to answer that question so you can make the best strategic and tactical decisions around your use of Snowflake. 

Why calculating cost savings in Snowflake is easier said than done

In a stable workload, costs remain consistent week to week. This is straightforward. You optimize and see a clear cost reduction. See the example below:

Keebo | How Much Do Your Snowflake Optimizations Actually Save You?

Unfortunately, reality isn’t so simple. Consider the following screenshot from an actual Snowflake warehouse:

Keebo | How Much Do Your Snowflake Optimizations Actually Save You?

Snowflake workloads vary across queries, users, and usage patterns. If daily usage is unpredictable, you can aggregate data weekly or monthly to identify trends. For example, the previous warehouse, once aggregated on a monthly, reveals a more predictable spend:  

Keebo | How Much Do Your Snowflake Optimizations Actually Save You?

Here you’re able to zoom out and get more predictability. Less granularity makes it harder to attribute cost changes. Changes may come from optimizations or external factors such as user behavior or workload shifts, changes in what queries they were writing, or other changes in the business reduced the overall load. 

On the other hand, it’s possible you actually did reduce the cost by 20%, but as a result of your data volumes growing, your actual usage went up. You may see smaller savings even if optimizations were effective.

These scenarios show how difficult it is to measure savings in a dynamic environment. This limits effective decision-making.

The core challenge facing Snowflake savings calculations

Thankfully, there’s a solution. At Keebo, we’ve invested significantly in researching the best, most reliable way to calculate optimization-based savings. These calculations are reproducible, independently verifiable, and provable by both our own customers and by third parties. 

Keebo analyzes the impact of its optimizations to calculate how much the customer has saved due to those optimizations. But before I can get into the solution, we first need to talk about why this problem is non-trivial. I’ll illustrate with an analogy.

Imagine leaving the office at 5pm. You can either take the freeway or the local streets. So you look up Google Maps, and it tells you the freeway will take 40 minutes, but the local streets will take you 60 minutes. 

Google cannot guarantee the faster route. It provides an estimate based on limited data. The only way to know for sure which is faster is if your clone (or evil twin) got in a similar car, left at the same time, and exhibited the same driving behaviors, just on different streets. Then, you just compare what time each of you got home to calculate exactly how much you saved by taking Google maps’ recommended route.

Similarly, if you truly want to calculate with 100% accuracy how much your Snowflake optimizations save you, you would have to duplicate every single warehouse with the exact same size, and run every single query twice by simultaneously sending it to your original warehouse and the replicated warehouse that is being optimized by Keebo. And you would have to do that on a daily basis in order to know exactly how much Keebo saved you by comparing the bill for your optimized warehouses to the bill for their unoptimized replica.

This creates a problem: tracking savings this way doubles costs. This defeats the purpose of optimization.

The best outcome is an estimate. The goal is to make that estimate accurate and transparent.

How does Keebo provide an automated solution to this problem? 

Here’s how Keebo calculates savings from each optimization. The best way to illustrate is with a couple of examples. 

Example: a warehouse starts at 9am. A bunch of queries come in and start the warehouse, which remains active for twelve minutes. Let’s say you have a default auto-suspend of three minutes, so you’re billed for 15 (12+3) minutes. If you’re running Keebo, our ML algorithms may have calculated the probability of another query arriving after 1 minute and 20 seconds as extremely low, and so suspended the warehouse after exactly 1 minute and 20 seconds (i.e., at 9:1:20am), which means Keebo has saved you 1:40 of idle warehouse time. You can turn this into a monetary value by multiplying it according to the cost per minute of this particular warehouse.

As another example, let’s say your warehouse has a default size of Large, but Keebo downsizes it for 12 minutes from a Large to a Medium based on the anticipated resources required for the incoming queries. Keebo saves the cost difference for 12 minutes. Of course, here you have to account for the fact that some queries take longer on a Medium than a Large, so Keebo would account for the slowdown by looking at historical data for similar queries and calculate a more accurate savings estimate.

Keebo calculates savings from each action. These savings aggregate across warehouses to calculate your total Snowflake savings as a result of using Keebo versus other savings that might be the result of your own manual efforts or changes in your own workload. 

Keebo shares the code that can calculate the savings with its customers so they can independently verify those savings without having to trust or rely on Keebo. This degree of transparency and verifiability is the result of significant R&D investment on our side but it is also what has helped Keebo become the leader in Snowflake optimizations and it is what helps us gain and retain our customers’ trust.

Final thoughts on calculating Snowflake cost savings

The level of granularity and detail described above is only possible when you have an automated tool to constantly make these adjustments and calculate their impact. If you tried to do the calculations I mentioned in the previous section manually, the time you spend figuring all that out would cost more than what you saved.

Not only that, but by the time you figured out the optimization needed, the moment would have probably passed. Remember, that first example saved you 1:40 minutes of warehouse spend. And with the second, every minute the warehouse remained at a Large would’ve resulted in more spend.

These kinds of optimizations and adjustments require you to leverage AI and ML powered algorithms to work at all, much less at scale. It’s too much for a human to handle, and not the best use of their time. Instead, you need an automated solution.