Keebo | Why is Cloud Over-Provisioning Such a Common Problem?

Why is Cloud Over-Provisioning Such a Common Problem?

Virtually everyone in the cloud struggles with cloud over-provisioning. Which, if you think about it, is odd. Because the whole point of getting into the cloud was to control resource usage.

Turns out, there’s a shadow side to infinite scalability. Yes, you can scale down. But you can also scale up—and it’s easy for things to get out of control. 

For Snowflake users specifically, this challenge specifically comes up when organizations are having pricing and contract conversations. Figuring out exactly how many Snowflake resources you need is a balancing act, especially if you buy discount credits at Capacity. You don’t want to overprovision and waste dollars on credits you don’t use, but also don’t want to underprovision and risk incurring more expenses when you exceed your cap. 

But the problem is even more complex. You may want to take advantage of a pricing incentive on credits you may or may not need, in order to get the additional features and benefits of a higher Snowflake tier—say, the security functionality with the Snowflake Business Critical tier. 

Suffice it to say, there are a number of incentives for Snowflake users to overprovision. Yet overprovisioning means buying resources you don’t actually need; resources that could be better spent elsewhere. Whether we’re talking $10K or $1M, this is a significant savings opportunity. 

Let’s walk through why companies are prone to cloud over-provisioning, why it’s hurting your ability to grow and scale your applications, and what steps you can take to overcome this problem. 

Why is it so easy to overprovision in the cloud?

The whole point of the cloud is the resource flexibility it offers. Instead of physical servers, you’re able to focus your purchase on just  the resources you need. So costs should, theoretically, stay in your control. 

In reality, however, cloud costs frequently get out of control, and that first bill often comes with serious sticker shock. 

So why the disconnect? How do you get from “pay as you go” to “we’ve blown our budget”? There are typically three reasons why. 

Cloud application pricing structures

As I mentioned earlier, nearly every cloud platform—whether Snowflake, AWS, Azure, GCP, or the like—have intentionally intricate pricing structures. Even something as simple as a “pay as you go model” offers different rates depending on which tier you buy. 

And that’s only at the base level. When the end of the month or quarter rolls around, it’s not uncommon for salespeople to get creative with incentives so they can meet their quota. Therefore, these pricing scenarios often become even more complex. 

I’ll give a few examples from Snowflake, but keep in mind that these are only scratching the surface of how complex it can get:

  • Pricing based on three types of resources: data storage, data transfer, and compute
  • Four pricing tiers based on desired features: Standard, Enterprise, Business Critical, VPS
  • On-Demand (true pay-as-you-go) vs. Capacity (prepaid amount of credits, which translate into compute resources

There is a diverse range of iterations of all these variables, each of which has its own pros and cons. For the average data engineer, comparing and contrasting even among the various offerings is rarely apples to apples, which hinders your ability to choose the most cost effective option.

Expectations vs. reality

Even if you can work out all the permutations of Snowflake pricing options, you almost certainly don’t have all the information you need to make the best decision. Expectations for the cloud never align with reality, simply because you can’t anticipate the number of users, high-end demands, query structures, and other variables that play into a pay-as-you-go model. 

This can go one of two ways: either overestimate your usage and overprovision, or underestimate it and jeopardize your system. Thus, most cloud buyers play it safe and overprovision. 

Lack of cost optimization tools

Every cloud platform has high-end, heavy-hitting users who make excessive demands on the platform. There are usually only a handful of these, but they make up a significant amount of the work. Most cloud buyers anticipate this reality, and buy more resources than they need.

Often these buyers are unaware that there are significant actions they can take to reduce the impact of these heavy hitters, thus reducing their overall cloud resource needs. What’s more, deploying these same optimization tools and tactics to save here and there with average users also change the picture of what you actually need.

Basically, cloud buyers rarely figure cost optimization into their pricing calculation. Which means, in the end, they overprovision. 

How to overcome the pain of cloud over-provisioning

The most obvious problem with cloud overprovisioning is cost: you pay more than for what you actually need. That means resources are being deployed against the cloud that could be better spent elsewhere: building new data pipelines, managing your database, maintaining overall system security, etc. In other words, there’s an opportunity cost to overprovisioning.

So how do you avoid this problem? Here are four critical steps.

1. Proactively monitor cloud usage

You can’t optimize what you aren’t measuring. Before you start thinking about how to reduce your cloud resource consumption, first start monitoring your actual usage to get a baseline. Then, make sure you’re accounting for outliers and spikes.

To start monitoring your cloud usage for free, download Snowflake Workload Intelligence from the Marketplace today. 

2. Identify heavy-hitting users and queries

The most obvious place to start with cost optimization are your heavy-hitting users and queries. These outliers offer some low-hanging fruit to reduce your compute needs:

  • Streamline query writing to use the fewest possible resources to 
  • Improve querying effectiveness to reduce failure rates
  • Route high-compute queries to underutilized warehouses
  • Temporarily increase warehouse size to reduce query execution time

3. Optimize costs in real time

Then you can go to the next step: build automated systems to identify areas where you have cost saving potential and take action as quickly as possible. For example, if you have a warehouse experiencing a 22-minute spike in query volume, you can temporarily increase that warehouse size for those 22 minutes, then bring it back down once you go back to baseline—rather than keep the warehouse at a larger size just to handle a few short-term spikes. 

4. Measure your cost savings

Then you’ll want to turn around and make sure you’re not spending more than you’re saving. This can especially be a problem if you’re deploying engineers to automatically make adjustments, rather than using an automation tool. 

But measuring the impact of your cost reduction efforts is easier said than done. Like I said, there are a bunch of variables to consider, many of which aren’t outside your control. So you’ll need to carefully tease out what’s you and what’s not. 

Why cost optimization should be handled be a robot, not a human

Given the ongoing and continuous nature of Snowflake optimization, the question I often ask organizations is: are constant adjustments the best use of your data engineers’ time? 

Because optimization isn’t something you can just hand off to an intern. The data is too complex and the tools require too much in-depth knowledge for an entry-level employee to handle. 

But you’re also not paying a data engineer to spend their time tweaking and adjusting Snowflake. Optimization, while important, can’t become their full-time focus. They’re too valuable for that. 

Read on to learn why Snowflake optimization is best handled by a robot, not an individual human engineer. 

Keebo | Why is Cloud Over-Provisioning Such a Common Problem?
Skye Callan
Articles: 8