Keebo | Snowflake Cost Optimization: The Pros & Cons of Various Approaches

Snowflake Cost Optimization: The Pros & Cons of Various Approaches

When it comes to Snowflake cost optimization, you might wonder why it’s even a topic of conversation. Since you are in the cloud, you’re automatically controlling costs, right?

Hopefully it’s obvious that I’m kidding here. But you’d be astonished by how many companies are surprised by how quickly cloud costs add up. Without continuous monitoring and real-time action, it’s easy for things to get out of control. 

In this article, we’ll peel back the layers of Snowflake cost optimization, the different approaches data engineers take, and the pros and cons of each.

What cost optimization tools does Snowflake offer?

Let’s start with Snowflake’s own recommendations for cost optimization. In addition to pushing the consumption-based aspect of their pricing model, Snowflake advertises what it calls Built-In Cost Management Capabilities. 

At a surface level, these all seem sophisticated and helpful. But when you dig a little deeper, the value they offer depends on your time and effort. 

The majority of Snowflake’s recommendations require individual users to implement certain best practices. Although Snowflake bills these as “capabilities,” in reality the burden of optimization falls on the user, not the platform. Still, some of these recommendations are worth considering, so we’ll lay some of them out here: 

  • Query optimization, either through quantity reduction or qualitative changes
  • Changing the query schema to use a virtual warehouse rather than cloud services (e.g. switching from INFORMATION_SCHEMA to ACCOUNT_USAGE)
  • Using batch commands instead of single updates or single row inserts, clustering tables on customer_ID
  • Changing the structure of Amazon Simple Storage Service (S3) bucket to list only necessary targeting files
  • Deploying a task to periodically update the destination table
  • Reviewing cloning patterns for sufficient granularity and eliminating unnecessary executions
  • Simplifying SQL queries to reduce query size, joins/Cartesian products, or large lists within the IN operator
  • Avoiding DML operations and ensuring files are optimally sized before loading
  • Controlling access to restrict warehouse usage and, by extension, any modifications

In terms of monitoring and controlling platform usage, Snowflake uses the Snowflake Performance Index (SPI) to measure improvements in performance over time. The SPI is calculated by taking a group of stable and comparable customer workloads—both in query volume and amount of data processed over a given period—and measuring the user’s system against these benchmarks. 

Where do these optimizations miss the mark?

Although reportedly effective at reducing costs, each of these Snowflake cost optimization strategies falls short. Sure, they all do what they say. But the advice is not specific enough to meet data engineers’ needs. Here’s why. 

Difficulty finding & using the tools

The Snowflake cost optimization tools I mentioned above are hard to find and use. What’s more, there’s no recommended process for how to implement them. For example: first measure bytes scanned, then look at your warehouse size, etc. 

It’s more like: here’s a bag of tools, and good luck. 

Getting value out of these tools requires a lot of trial and error. This turns Snowflake cost optimization into a full-time job in itself, trading one resourcing problem for another. 

Recommendations, not action

Virtually every Snowflake cost optimization tool, both first- and third-party, offers recommendations, not action. A human being has to follow through on whatever recommendations they provide, or nothing happens. 

In a data cloud environment, this can cause a number of problems:

  • Dynamic workloads change over time, so a recommendation today is irrelevant tomorrow
  • Missed opportunities, as cloud usage fluctuates by seconds and minutes, and there are dozens and hundreds of optimization opportunities that, while small, compound over time
  • Human error in implementing the recommendation can cause a counterproductive effect

Peer benchmarking

The Snowflake Performance Index is lauded as a tool that can help reduce costs. But there are serious problems with relying on the SPI as a performance benchmark:

  • Snowflake chooses which customer workloads are “comparable” to your own—and they’re incentivized to keep you at your current tier or upgrade you to a higher one
  • You’re effectively measuring your performance against other non-optimized workloads, which means you’re not actually optimizing your performance
  • There’s no objective standard of what “good” looks like—you’re just comparing yourself to what everyone else is doing

Benchmarking against peers is a nice data point if you have the time, but it isn’t the same thing as optimization. 

Adding, not subtracting, to your workload

If you have a large or under-utilized team, you probably can handle this. If you have a team that’s highly technical and knows the ins and outs of the Snowflake schema, you can probably handle this. 

What if you don’t fall into these categories? You need an approach that’s fast, efficient, and handles dynamic workloads. The good news: there’s a simple way to handle this. 

A better approach to Snowflake cost optimization 

So what’s the alternative? How can you actually optimize Snowflake costs in a way that’s efficient and doesn’t involve trading one resource problem for another? The answer, believe it or not, is contained within Snowflake itself. 

Because—and here’s a secret—Snowflake stores all the metadata for every interaction, engagement, adjustment, all of it is just sitting there in the Snowflake schema. 

Problem is: that’s a heck of a lot of data just sitting there. For a human to go in and review every single line in real time, much less take action on it, is impossible. That’s why there’s an entire industry of third-party Snowflake tools that process this data, observe “heavy hitter” users and applications, and report back their recommendations for adjusting data cloud parameters, especially size and suspend settings.

This is a good first step. But again, it requires someone to actually implement those recommendations. Remember what I said about dynamic workloads? Within minutes, the recommendations could be out of date. So if you want them to actually help, you have to take action instantly. 

This requires more than just fielding reports. It requires automated action. 

How to start automating your Snowflake cost optimization

To be truly effective, Snowflake cost optimization requires automation. It’s not something that a human can do well. But if you take the leap and give Snowflake cost optimization over to a “robot,” you’ll see outsized results:

If you want to learn more about what this automation process looks like, we’ll show you. Set up a demo with our team and we’ll walk you through Keebo’s automation solution.

Author

Keebo | Snowflake Cost Optimization: The Pros & Cons of Various Approaches
Will Dagley
Articles: 2