7 Common Pitfalls in Snowflake Configuration & How to Avoid Them
The choices you make when setting up Snowflake are more important than you’d think. Critical mistakes during Snowflake configuration can lead to a number of inefficiencies. These, in turn, drive up costs and degrade performance.
Let’s look at some common pitfalls in Snowflake configuration and some steps your team can take to correct them.
The challenge in identifying Snowflake configuration errors
When optimizing Snowflake cost and performance, often the first place data teams look is the platform’s usage and workloads. This makes sense: Snowflake’s biggest cost center is warehouse compute resources, which are directly correlated to workload size and complexity.
But while workload monitoring is a critical component of Snowflake optimization, it has a major blind spot. Most Snowflake workloads start with a baseline of “normal” performance or spend. Spikes or dips are measured against that baseline.
You can see the challenge here. If your initial Snowflake configuration is rife with inefficiencies, you’ll end up with a suboptimal baseline. Typical workload monitoring methods won’t catch them.
So while ongoing workload and usage monitoring is important to optimize Snowflake, equally important is benchmarking your configuration against best practices to maximize cost savings and performance.
7 common pitfalls in Snowflake configuration
Most data teams make one or more of these seven mistakes when configuring their Snowflake accounts. Addressing them can help your Snowflake engine run more efficiently, drastically reduce un- or under-utilized resources, and bring your baseline costs down.
1. Improper warehouse sizing
The biggest drag on Snowflake cost is improper warehouse sizing. Getting your warehouse configuration right is a bit like searching for the Goldilocks Zone. Too big, and the warehouse will burn through too many credits for even the simplest queries. Too small, and your performance will slow down drastically. The example we use at Keebo is: watermelons, apples, and cherries. Where watermelons represent large queries, apples – medium, and cherries – small. Sure, you can run an apple and cherry in a large warehouse – but that would be overkill
And to make things nice and complicated, the optimal size of a given warehouse isn’t static. As workloads increase and decrease, the resources needed to service them fluctuate.
As such, optimizing Snowflake warehouses requires you a solution that will rightsize warehouses in real time. Because these changes happen 24/7, it’s not feasible to rely on a human engineer. An AI-powered automation tool is the best solution.
2. Under- and over-provisioning Snowflake credits
Snowflake’s pricing plan offers two options for purchasing credits: Capacity and On-Demand. Capacity allows you to pre-purchase a set number of credits for a reduced rate. On-Demand is a true pay-as-you-go model, but has a higher per-credit charge.
For those using Capacity plans, getting that amount right is critical. But, again, Snowflake workloads are dynamic. Unless you have a crystal ball in your back pocket, it’s impossible to know exactly what you’ll need over the next year.
For this reason, under- or overprovisioning Snowflake resources is common. If you underprovision, you’ll exceed your Capacity agreement and be charged a higher per-credit rate. If you overprovision, you end up spending on credits you don’t need.
3. Distributed business logic
Most organizations using Snowflake struggle with distributed business logic. Often data teams are forced to set up Snowflake warehouses to accommodate scenarios that don’t necessarily follow the optimal warehouse logic. These can include:
- Service-level agreements that require certain resource availability and partitioning
- Client-specific warehouses to simplify billing, security, data management, and more
- Tool-specific warehouses (e.g. a dedicated warehouse for Looker vs. Tableau)
Often these logics don’t align with the ideal warehouse structure: small warehouses running smaller queries, while large warehouses run larger queries. As such, queries are running on inefficiently sized warehouses, resulting in suboptimal spend or performance.
Managing a distributed business logic manually is highly complex at scale. An automated query routing solution can help route queries from overutilized warehouses to underutilized, helping you get the most out of your existing resources without increasing costs or decreasing performance.
4. Inadequate Snowflake monitoring
When configuring your Snowflake account, it’s critical to set up a system for monitoring metrics in real time. Without this, sticker shock when your Snowflake bill comes due is practically guaranteed.
Snowflake monitoring can help not only track your usage and workloads, but also alert you if your baseline performance degrades or costs increase over time. The sooner you set this up, the more visibility you’ll have into platform usage.
5. Poor data management & hygiene
Although most Snowflake costs come via warehouse compute resources, Snowflake also charges on a flat rate per terabyte (TB) for loading and storing database tables. If your data has been poorly managed in the past, the odds that you’re importing bad data into Snowflake are high.
Say, for instance 10% of your tables consist of duplicate data. If you don’t dedupe before uploading that data to Snowflake, your storage costs will be 10% higher than necessary. Good data management and hygiene before, during, and after Snowflake configuration can help you unlock additional savings.
What’s more, cleaning your data also reduces the number of rows that a SQL query has to scan. This can help your queries run more efficiently and reduce latency.
6. Excessive data transfer
Snowflake does not charge to bring data into your account. However, transferring data from a Snowflake account to another cloud platform or different Snowflake region does cost you. If your team is undisciplined with this transferring of data, you could end up with an unexpected charge on your bill.
7. Non-warehouse compute resources
If you want to find some additional savings on the margins, consider implementing some guardrails for engineers and analysts using Serverless and Cloud Services Compute services. For example, excessive use of Search Optimization or Snowpipe can end up accumulating unnecessary costs over time.
Specifically, Snowlake only bills for Cloud Services—authentication, metadata management, API, access control, etc.—if the resources consumed exceed 10% of daily warehouse usage. If you’re monitoring warehouse usage, it should be easy to calculate 10% of your baseline and implement restrictions accordingly.
Final thoughts on optimizing Snowflake configuration
Optimizing your Snowflake configuration can present some low-hanging cost and performance opportunities. It’s easier if you handle this when you first set up Snowflake, but you can always make changes no matter how long you’ve been using the platform.
As we mentioned earlier, Snowflake’s optimal configuration isn’t static. It varies based on your query workload. To ensure the best performance and lowest costs, you should consider an AI-powered optimization tool to make the necessary changes in real time.
Learn more about Keebo’s automated Snowflake optimization tools here.