Snowflake’s advice on optimizing and saving money over the years has always suggested a lot of human involvement. Machine Learning and automation can make this easier, cheaper, and more effective.
I recently came across a 3 year old blog from Snowflake about the best practices every admin can do to optimize resources. Since Snowflake optimization is what we do every day at Keebo, I was interested to compare this to current Snowflake advice. After all, 3 years is a long time in the software business. Like with dog years, 3 normal years seems more like 21 software years!
What has changed about Snowflake optimization?
Snowflake has published a lot of articles recently on optimization as well as many customer stories. I’ve read and watched nearly all of them. While the old advice was more like a list of tasks, the newer advice is more about creating a system or process for optimization. Here are 3 observations about what has and hasn’t changed.
Good queries have become the starting point
When you look at the old blog, you’ll notice that the immediate emphasis is on Snowflake’s game-changing architecture that separates storage and compute. You’ll see terms like dynamic, elastic, and unlimited. It’s almost as if they hadn’t really seen what users and their queries can do to an “unlimited” system! Sure, it is unlimited from Snowflake’s perspective, but it turns out customer budgets aren’t as elastic. There seems to be some recognition of this, so Snowflake encouraged its customers to set monitors in order to cut off expensive users.
Now, decades later (in “software years” anyway), the advice is to write good queries to begin with. This is certainly a more mature approach to optimization and there are many mistakes you can avoid in your queries. But this is also harder to achieve. It requires more discipline and knowledge from the entire user base. But of course it is better to begin with an optimized query than to depend on data cloud scalability to save you, which leads to overspending.
Optimization still requires a lot of monitoring and analyzing
One thing that hasn’t changed about optimization advice: you need to track a lot of stuff. In the old blog, the emphasis was on setting up resource monitors to alert you when users or processes were approaching limits (presumably so you could stop them before it was too late). This isn’t very granular. Today, in addition to high level resource and budgeting metrics, Snowflake encourages you to get detailed in your analysis of queries. Specifically, tracking these 5 metrics:
|In general, the higher the bytes scanned to answer a query, the larger the warehouse needed. If most queries scan less than a GB you probably only need a small or extra small warehouse.
|If this is high, it indicates expensive (long-running) queries. You need to either optimize these queries or scale-up to a larger warehouse.
|Query load percent
|If this is less than 50 for a query, then it should be moved to another warehouse or you could increase the warehouse size.
|Bytes spilled to local and bytes spilled to remote
|This indicates the warehouse is undersized. Bytes that can’t fit into memory “spill” to disk storage. This will be slow on local storage and even slower on remote storage.
|A query that is competing with others for resources will have a high queued time. Increase warehouse size or move queries to a less utilized warehouse.
What is the point of tracking all these metrics? It is to choose the correct warehouse size (more on that in the next section). But the broader observation I’ve made here is that Snowflake optimization is still heavily dependent on gathering and analyzing usage data. I think all data teams should do this, but the challenge is that someone still has to take action on all this data. Unless your team can monitor and act 24×7, there will be opportunities to save money that you miss, and there will be optimizations made today that are actually not ideal for tomorrow’s workload.
Emphasis is now on warehouse size
In the most recent advice from Snowflake and their own customers, all optimization roads seem to lead to one key destination: choosing the right warehouse size. This is because no other factor will determine how much you spend on Snowflake. If you look at the pricing for each warehouse size, it is easy to see why, as the cost doubles for each increase in warehouse size:
|Credits per hour (cost)
If you recall from the previous section, one of the key parameters to track is bytes scanned. You can then translate that to recommendations for warehouse size. The following chart was shown frequently in many Snowflake presentations:
According to this chart, if your bytes scanned is between 1-20GB, the recommendation is either a small, medium, large, x-large or 2x-large warehouse. OK, but I’d love to get more specific. Unfortunately, the exact warehouse size isn’t easy to determine since query workloads vary wildly. Even in Snowflake’s own documentation, the advice is to experiment until you find the ideal size: “Experiment by running the same queries against warehouses of multiple sizes (e.g. X-Large, Large, Medium). The queries you experiment with should be of a size and complexity that you know will typically complete within 5 to 10 minutes (or less).”
Where does this leave us? While Snowflake’s optimization advice over the years has gotten more specific and detailed, it ultimately depends on your data team gathering lots of information about usage and then experimenting. In today’s era of machine learning, we can do better than this.
Turning to automation to help
When we look at this article 21 software years from now, there will still be one enduring truth about optimizing Snowflake or any pay-as-you-go resource: you save money by using it as little as possible, and when you do use it, use it as small as possible. This is the essence of the advice whether it was years ago or now. But depending on humans to collect and absorb all the data and then make optimization decisions, even if they want to do this, has 2 big problems.
- Missed opportunities. Imagine gathering and analyzing the data about all your queries and settings across all warehouses for all time. Even motivated people can’t keep up with it. They’ll have to focus on the biggest and most problematic queries, which means they will miss many opportunities to optimize and save money.
- Dynamic environments. Even those motivated data engineers who can fix some of the biggest problems will still face the reality that the big problem today is not the big problem tomorrow. Further, the optimization made today might actually be harmful tomorrow. Is your data engineer available 24×7 to remove or redo optimizations?
The good news is that there is an automated solution that can do this more effectively and cheaply than humans, which frees you up for more important work. Keebo is the only fully-automated Snowflake optimizer. There are lots of other products that can gather all the data for you, but they still leave you to do all the actual optimizations, which just goes back to the 2 big problems I list above.
Keebo not only monitors and analyzes your system, it does the optimizations for you, using patented machine learning based on years of cutting-edge research at top universities. It even protects query performance in your dynamic environment by backing off changes that turn out to be sub-optimal in the future. In the screenshot below you can see the results from a real customer using Keebo to automate the optimization of auto-suspend (we also optimize warehouse size and multi-cluster settings). In one week for one warehouse alone, we made 407 optimizations! They did not need to do anything, but they still maintained full visibility and control.
While the automatic optimization makes the data team happy, our pricing makes the finance team happy, too. What we charge is based on what we save you. This means our goals are aligned. If you’d like to know more, we have a detailed practical guide you could read next, or contact us and we’ll be happy to demonstrate and set up a free trial.