Snowflake has good capability to optimize and reduce your bill–if you can find the time to learn and do it. Or you could offload this task to AI.
When it comes to any pay-as-you-go product like electricity, water, or a cloud platform, there is one surefire way to reduce your bill: stop using it. So there you go–my article on Snowflake cost optimization is complete!
Hopefully you know I’m kidding here, but as the saying goes, many a true thing is said in jest. It is true that the primary way to reduce your Snowflake bill is to reduce your use by being as efficient as possible with each query and balancing that with your response time SLAs. You’d think that Snowflake wouldn’t care about optimized usage. After all, they get paid more if you use it inefficiently. But to their credit, they provide a toolbox of utilities and best practices because they know that inefficient, expensive usage is not good for long-term business.
In this article, I will provide a brief overview of Snowflake’s native cost optimization capabilities. After that, I’d appreciate the opportunity to make the case that you should not depend on these capabilities alone, and that you should offload much of this work to AI.
Snowflake cost optimization categories
Snowflake provides three general areas of optimization capabilities. My source for this information is a presentation from Snowflake Summit 2023 which is now available for anyone to watch upon registration.
- Visibility: monitor, understand, and attribute spending
- Control: various settings that will limit usage
- Optimization: serverless technology that can more easily be tuned automatically
You can only manage what you measure. To this end, you can use several features to measure usage.
- Account and usage data. Snowflake has a robust set of usage data you can query. It can be an overwhelming list, but a good place to start might be metering_daily_history.
- Tagging. As of this writing, tagging is in private preview. When it is available for everyone, you’ll be able to put tags on queries that will identify the account and organization.
- Budgets. Another feature not publicly available as of this writing. Budgets are exactly like you’d imagine: you set a spending limit for a specific time interval on a set of Snowflake objects. When that budget is reached, you’ll be notified.
- Warehouse utilization. Another feature that will be available in the near future, this will give you another metric to consider, showing warehouse usage as a percentage so you can balance your workloads across warehouses.
- Per-job cost attribution. Yet another feature coming soon, this will allow you to understand spending better on a job basis.
To use these, you have to create your own reports with Snowsight or similar tools. Or you could check out Snowflake’s open-source Streamlit usage app to get started.
Once you have visibility on your spend, you can make adjustments to three key settings to limit spending.
- Warehouse size. If your warehouse is too small, your queries can be too slow for your SLAs. But if it is too big, you are paying for power you don’t need.
- Warehouse suspend. If a warehouse isn’t in use, Snowflake can shut it down for you after a time interval you set. But take care you don’t shut down too soon, since restarting a warehouse will slow down queries.
- Multi-cluster. While increasing warehouse size will help individual queries, increasing the number of clusters improves concurrency. But again, if you make this too big, you are paying for power you don’t need. Too small and you might not get the performance your users need.
Snowflake offers “serverless” services that, because they are serverless, are easier for them to automatically tune in three areas:
- Query acceleration via materialized views. Materialized views are a well-known, universal concept that can definitely improve query performance. But the decisions on what to build and when to build them are still left to admins. Requires Enterprise Edition.
- Storage optimization. Snowflake auto clustering can optimize storage to better answer queries by tuning how data is clustered.
- Search optimization. For specific types of queries, especially those from data science teams, this service can improve performance by essentially making search access paths persistent, allowing the pruning of micro partitions to be faster. Requires Enterprise Edition.
The reality of Snowflake optimization
With the exception of the serverless services, which have their own costs, the reality is that Snowflake does not offer any optimization. Instead, I would argue that what they provide are the abilities to observe and report your usage. This is also true of the growing collection of tools like Slingshot, SELECT.DEV, and BlueSky. While observations and reports are nice, true optimization takes a significant amount of effort.
The consensus on best practices for optimization basically comes down to 3 steps:
- Write efficient queries to being with
- Gather a lot of usage data
- Control warehouse usage to be as efficient as possible
The trouble with this is twofold. First, someone has to do all this. If you have a large team or an underutilized team, this isn’t a problem. But if you are reading this far, it isn’t likely you have such a team. Second, workloads are dynamic. The optimizations you make today may not work tomorrow, and then you are right back to square one. This is true even if you buy a product like Slingshot to automate step #2.
How Keebo’s AI automates Snowflake optimization
Keebo goes beyond “observe and report.” We actually do the work. Here’s what that means:
- Always watching. Even with Snowflake’s growing set of observability metrics, they are fundamentally no different than any other query that feeds a report or dashboard. It is valuable information to the extent that you can consume it, understand it, and then make decisions from it in time to make a difference. For warehouse optimization, Keebo monitors 76 Snowflake usage parameters 24×7 and then we apply machine learning to make the right optimization decisions. Is your team able to do that?
- Always optimizing. Remember those three key control parameters Snowflake urges you to optimize? They are warehouse size, suspension, and clustering. Keebo sets those for you, often making hundreds of these optimizations each week without you needing to do anything at all. The result is savings of 20% and more. For queries, Keebo can rewrite them on-the-fly and build smart models dynamically to accelerate them (similar to materialized views). No matter how often they use the word “automatic” or “optimization,” nobody else–including Slingshot or Snowflake themselves–can do this.
- Price based on savings. For Snowflake Warehouse Optimization, what you pay Keebo is based on what we save you, making it very low risk and certainly cheaper than trying to do it yourself.
Thanks for reading this all the way to the end. I hope you’ll take a look at Keebo. You can offload optimization work to our AI, have it set up in 30 minutes, and pay based on what you save. It’s why we are the data team’s best friend.