How Much Do Your Snowflake Optimizations Actually Save You?
Whether you’re spending $50K or $20M+ on cloud data warehousing, monitoring those costs and trying to reduce them are top priorities. This can mean a variety of activities:
- Manually optimizing the expensive queries
- Writing large DBT models to summarize the data
- Right-sizing warehouses and tweaking their parameters
- Monitoring and setting alerts for spend triggers
- Leveraging visualization tools to track spend
- Using automated optimization to make adjustments in real-time
Regardless of your specific solution, at some point you have to answer this question: how much did all these efforts save us? If you don’t, you won’t know whether it’s worth the time, effort, and resources that you are putting in.
This article will walk you through how to answer that core question so you can make the best strategic and tactical decisions around your use of Snowflake.
Why calculating cost savings in Snowflake is easier said than done
Best case scenario: you have a Snowflake workload where you spend the same amount each week. It’s a straightforward situation. You do your optimizations, see a 20% reduction in costs, and that’s that. See the example below:
Unfortunately, reality isn’t so simple. Consider the following screenshot from an actual Snowflake warehouse:
Snowflake workloads vary significantly over time: query volumes, query types, active users, etc. all fluctuate. If you have an unpredictable daily pattern, you can aggregate your data on a weekly or monthly basis to see a more predictable spend. For example, the previous warehouse, once aggregated on a monthly, reveals a more predictable spend:
Here you’re able to zoom out and get more predictability. But there’s a problem: the less granularity you have, the harder it is to tell what caused the overall change in spend. Maybe it was your optimization efforts. But maybe there were factors outside your control: users were out of office, changes in what queries they were writing, or other changes in the business reduced the overall load.
On the other hand, it’s possible you actually did reduce the cost by 20%, but as a result of your data volumes growing, your actual usage went up. So you may only see a 10% reduction in spend, but your optimizations were twice as effective.
These are just a few scenarios showing how Snowflake’s dynamic nature makes it less than straightforward to track how much you save. This hinders you from making effective decisions to keep costs under control and, ultimately, help your bottom line.
The core challenge facing Snowflake savings calculations
Thankfully, there’s a solution. At Keebo, we’ve invested significantly in researching the best, most reliable way to calculate optimization-based savings. These calculations are reproducible, independently verifiable, and provable by both our own customers and by third parties.
Keebo analyzes the impact of its optimizations to calculate how much the customer has saved due to those optimizations. But before I can get into the solution, we first need to talk about why this problem is non-trivial. I’ll illustrate with an analogy.
Imagine you’re working from an office and leave at 5pm. You can either take the freeway or the local streets. So you look up Google Maps, and it tells you the freeway will take 40 minutes, but the local streets will take you 60 minutes.
But Google can’t verify that it will actually take you 20 minutes less to get home. It’s simply taking the (limited) data at its disposal and providing an estimate. The only way to know for sure which is faster is if your clone (or evil twin) got in a similar car, left at the same time, and exhibited the same driving behaviors, just on different streets. Then, you just compare what time each of you got home to calculate exactly how much you saved by taking Google maps’ recommended route.
Similarly, if you truly want to calculate with 100% accuracy how much your Snowflake optimizations save you, you would have to duplicate every single warehouse with the exact same size, and run every single query twice by simultaneously sending it to your original warehouse and the replicated warehouse that is being optimized by Keebo. And you would have to do that on a daily basis in order to know exactly how much Keebo saved you by comparing the bill for your optimized warehouses to the bill for their unoptimized replica.
You can see the obvious problem here: in an effort to track how much you’re saving, you’re doubling your spend. Which, of course, defeats the purpose of saving in the first place.
The best you can expect is an estimate. The question, then, is how do you make that estimate as accurate as possible, and as verifiable and transparent as possible?
How does Keebo provide an automated solution to this problem?
Let’s now walk through how Keebo calculates the cost savings of each optimization we perform by isolating our impact. The best way to illustrate is with a couple of examples.
Let’s say you have a warehouse that’s currently suspended. At 9am, a bunch of queries come in and start the warehouse, which remains active for twelve minutes. Let’s say you have a default auto-suspend of three minutes, so you’re billed for 15 (12+3) minutes. If you’re running Keebo, our ML algorithms may have calculated the probability of another query arriving after 1 minute and 20 seconds as extremely low, and so suspended the warehouse after exactly 1 minute and 20 seconds (i.e., at 9:1:20am), which means Keebo has saved you 1:40 of idle warehouse time. You can turn this into a monetary value by multiplying it according to the cost per minute of this particular warehouse.
As another example, let’s say your warehouse has a default size of Large, but Keebo downsizes it for 12 minutes from a Large to a Medium based on the anticipated resources required for the incoming queries. That means Keebo saved 12 minutes of the cost difference between a Medium and Large warehouse. Of course, here you have to account for the fact that some queries take longer on a Medium than a Large, so Keebo would account for the slowdown by looking at historical data for similar queries and calculate a more accurate savings estimate.
The point is, Keebo can look at its own actions and calculate the cost savings of each down to the penny. Then, you can roll that up across all your various warehouses to calculate your total Snowflake savings as a result of using Keebo versus other savings that might be the result of your own manual efforts or changes in your own workload.
Keebo shares the code that can calculate the savings with its customers so they can independently verify those savings without having to trust or rely on Keebo. This degree of transparency and verifiability is the result of significant R&D investment on our side but it is also what has helped Keebo become the leader in Snowflake optimizations and it is what helps us gain and retain our customers’ trust.
Final thoughts on calculating Snowflake cost savings
The level of granularity and detail described above is only possible when you have an automated tool to constantly make these adjustments and calculate their impact. If you tried to do the calculations I mentioned in the previous section manually, the time you spend figuring all that out would cost more than what you saved.
Not only that, but by the time you figured out the optimization needed, the moment would have probably passed. Remember, that first example saved you 1:40 minutes of warehouse spend. And with the second, every minute the warehouse remained at a Large would’ve resulted in more spend.
These kinds of optimizations and adjustments require you to leverage AI and ML powered algorithms to work at all, much less at scale. It’s too much for a human to handle, and not the best use of their time. Instead, you need an automated solution.
Learn more about Keebo and how we save Snowflake costs here.