Snowflake’s Cost Optimization Capabilities

Blog graphic: FinOps categorized blogs

Snowflake provides strong cost optimization capabilities—if you have time to use them. You can also offload this work to AI.

For any pay-as-you-go system, the simplest way to reduce cost is to reduce usage.

The primary way to reduce Snowflake cost is to improve query efficiency while meeting response time SLAs. Snowflake benefits from efficient usage because it supports long-term customer success. But, they provide a toolbox of utilities and best practices because they know that inefficient, expensive usage is not good for long-term business.

This article reviews Snowflake’s native cost optimization capabilities. It also explains why these capabilities are not enough on their own and where AI helps.

Snowflake Cost Optimization Categories

Snowflake provides three general areas of optimization capabilities. These categories are based on Snowflake Summit 2023 presentations.

  • Visibility: monitor, understand, and attribute spending
  • Control: various settings that will limit usage
  • Optimization: serverless technology that can more easily be tuned automatically

Visibility

Effective cost management requires visibility. To this end, you can use several features to measure usage.

  • Account and usage data. Snowflake has a robust set of usage data you can query. It can be an overwhelming list, but a good place to start might be metering_daily_history
  • Tagging. Tagging is currently in private preview. When it is available for everyone, you’ll be able to put tags on queries that will identify the account and organization.
  • Budgets. Another feature not publicly available as of this writing. Budgets are exactly like you’d imagine: you set a spending limit for a specific time interval on a set of Snowflake objects. When that budget is reached, you’ll be notified.
  • Warehouse utilization. Another feature that will be available in the near future, this will give you another metric to consider, showing warehouse usage as a percentage so you can balance your workloads across warehouses. 
  • Per-job cost attribution. Yet another feature coming soon, this will allow you to understand spending better on a job basis.

To use these, you have to create your own reports with Snowsight or similar tools. Or you could check out Snowflake’s open-source Streamlit usage app to get started.

Control

Once you have visibility on your spend, you can make adjustments to three key settings to limit spending. 

  • Warehouse size. If your warehouse is too small, your queries can be too slow for your SLAs. But if it is too big, you are paying for power you don’t need.
  • Warehouse suspend. If a warehouse isn’t in use, Snowflake can shut it down for you after a time interval you set. But take care you don’t shut down too soon, since restarting a warehouse will slow down queries.
  • Multi-cluster. While increasing warehouse size will help individual queries, increasing the number of clusters improves concurrency. But again, if you make this too big, you are paying for power you don’t need. Too small and you might not get the performance your users need.

These three control parameters in particular are excellent candidates for AI and automation, as I will describe later.

Optimization

Snowflake offers “serverless” services that, because they are serverless, are easier for them to automatically tune in three areas:

  • Query acceleration via materialized views. Materialized views are a well-known, universal concept that can definitely improve query performance. But the decisions on what to build and when to build them are still left to admins. Requires Enterprise Edition.
  • Storage optimization. Snowflake auto clustering can optimize storage to better answer queries by tuning how data is clustered.
  • Search optimization. For specific types of queries, especially those from data science teams, this service can improve performance by essentially making search access paths persistent, allowing the pruning of micro partitions to be faster. Requires Enterprise Edition.

The Reality Of Snowflake Optimization

With the exception of the serverless services, which have their own costs, the reality is that Snowflake does not offer any optimization. Instead, it primarily provides observability and control. This is also true of the growing collection of tools like Slingshot, SELECT.DEV, and BlueSky. While observations and reports are nice, true optimization takes a significant amount of effort.

The consensus on best practices for optimization basically comes down to 3 steps:

  • Write efficient queries to being with
  • Gather a lot of usage data
  • Control warehouse usage to be as efficient as possible

The trouble with this is twofold. First, someone has to do all this. If you have a large team or an underutilized team, this isn’t a problem. But if you are reading this far, it isn’t likely you have such a team. Second, workloads are dynamic. The optimizations you make today may not work tomorrow, and then you are right back to square one. This is true even if you buy a product like Slingshot to automate step #2.

The good news is that both query optimization and warehouse optimization can be offloaded to AI and automation.

How Keebo’s AI Automates Snowflake Optimization

Keebo goes beyond “observe and report.” We actually do the work. Here’s what that means:

  • Always watching. Even with Snowflake’s growing set of observability metrics, they are fundamentally no different than any other query that feeds a report or dashboard. It is valuable information to the extent that you can consume it, understand it, and then make decisions from it in time to make a difference. For warehouse optimization, Keebo monitors 76 Snowflake usage parameters 24×7 and then we apply machine learning to make the right optimization decisions. Is your team able to do that?
  • Always optimizing. Remember those three key control parameters Snowflake urges you to optimize? They are warehouse size, suspension, and clustering. Keebo sets those for you, often making hundreds of these optimizations each week without you needing to do anything at all. The result is savings of 20% and more. For queries, Keebo can rewrite them on-the-fly and build smart models dynamically to accelerate them (similar to materialized views). No matter how often they use the word “automatic” or “optimization,” nobody else–including Slingshot or Snowflake themselves–can do this.
  • Price based on savings. For Snowflake Warehouse Optimization, what you pay Keebo is based on what we save you, making it very low risk and certainly cheaper than trying to do it yourself.

Thanks for reading this all the way to the end. I hope you’ll take a look at Keebo. You can offload optimization work to our AI, have it set up in 30 minutes, and pay based on what you save. It’s why we are the data team’s best friend.