Keebo | Query Routing Reimagined: Keebo’s Smarter Approach to Warehouse Optimization

Query Routing Reimagined: Keebo’s Smarter Approach to Warehouse Optimization

The case for decoupling your queries from their target warehouses

“Well, where are we going to run this?”

data engineers ask themselves whenever a new application needs access to their Snowflake environment. After projecting performance and cost implications, the answer is either:

  • Aggregation: Run the workload using an existing warehouse
  • Separation: Spin up a dedicated warehouse of a given size

Then, they adjust their application accordingly by adjusting their Looker / Mode / JDBC connection settings or issuing a USE WAREHOUSE command. 

Each approach has its own set of advantages and disadvantages:

Aggregation has the potential of being a more cost-effective approach. Since the warehouse draws from a larger set of queries, it can execute more of them in parallel, potentially delivering better value for your Snowflake spend. This may come at the cost of higher latency if queries start queuing due to overutilization. Furthermore, adding a new workload may result in a performance penalty due to additional pressure on shared resources (e.g., caches).

Separation makes it easy to track the cost incurred by the application and prevents other workloads from affecting its performance through cache contention or queuing. However, it also means that the warehouse may not be fully utilized: Whenever the warehouse is running fewer queries than its capacity, you are not getting the full value for the cost. Furthermore, frequent suspends and resumes incur costs due to provisioning time and cold caches.

None of the approaches is a silver bullet. We at Keebo believe that you shouldn’t have to pick between the two. Instead, we propose a new, more fine-grained approach:

Decoupling your application’s queries from their original target warehouse and route individual queries to suitable target warehouses.

Why haven’t we been doing this all along?

Assigning tasks to the most suitable from a pool of workers is a common pattern:

  • Video streaming and content delivery heavily relies on routing requests to the closest resource
  • Your micro services and APIs have load balancers in front of them to reduce request latency or handle outages
  • Your operating system schedules processes on CPU cores based on cache affinity and utilization, while optimizing for energy efficiency, fairness, or latency.

Given that Snowflake warehouses are nothing but compute resources that processes arbitrary SQL queries, it’s surprising that we haven’t been applying the same pattern all along. It warrants the question:

What is so special about applications running SQL queries that they get to decide on the target warehouse?

The answer is: Nothing really. But it’s hard to make a fine-grained assignment from warehouses to queries in the same way given a lack of

  • Ability: Snowflake provides you with no help at scheduling queries to different warehouses. You can add multiple replicas of a cluster to a warehouse – if you are an enterprise customer. But if you want to schedule queries across differently sized warehouses, you still have to rely on the application to choose the warehouse. If you are using Looker, Mode, or any other application that doesn’t let you inject your own code for changing the warehouse per query, you are out of luck entirely. 
  • Observability: Deciding on how to route the query is a hard problem. You need to have a very good understanding of your queries and warehouse to choose wisely. And you need to make these decisions in real-time.

You may have already guessed that Keebo has the solution for these problems. But let’s make things a little bit more tangible first.

Say, we treat the warehouse a query was originally going to run on as a friendly suggestion, but allow ourselves to route it to a different warehouse if we want to. What kinds of problems can we solve with this kind of approach?

We highlight this by introducing you to common query patterns that justify switching the target warehouse.

The Offenders: Blinker, Piggy-Backers, and Cloggers

There are a couple of query patterns we frequently see at customers that occur in aggregated or separated warehouses, that are especially well-suited for rerouting. Among them are:

Blinkers are single or a short burst of queries that wake up a warehouse, followed by a long period of inactivity. This leaves you paying for the autosuspend or the minimum warehouse usage time of a minute. Ouch: You are paying the startup cost for this warehouse, just to throw it all away again moments later, and then also paying for the time until the warehouse suspends. Blinkers appear frequently with interactive workloads.

unnamed 4
Query q is a Blinker as it resumes a warehouse that suspends right after.

Piggy-Backers are short-running queries that appear before or after long and resource-intensive queries. Because they rarely make full use of a large warehouse, it’s more cost-effective to route them to a smaller warehouse—especially when the original one is idle. Piggy-backers are common in batch workloads.

unnamed 1 1
Query 1, 2, 5, and 6 are Piggy-Backers as are relatively short queries accompanying the larger queries 3 and 4

Cloggers are queries that need a lot of warehouse resources, run for a long time, and cause other queries to wait disproportionately long. This pattern can occur when interactive short running queries are running together with batch type queries, or if interactive queries vary substantially in resource demand and execution time.

unnamed 2 1
Query 1 is a clogger as it blocks queries 3-6 from running together with query 2

Wouldn’t it be great if we could move the Blinkers and Piggy-Backers to an appropriately-sized warehouse that is already running? 

Wouldn’t it be great if we could move the Cloggers to a separate warehouse, so they don’t degrade the performance of other queries?

Of course it would! And Keebo Query Routing is exactly the tool you need for the job!

Keebo Query Routing Sends Your Queries Where They Belong

Keebo Query Routing (KQR) allows you to deal with offending query patterns – and lets you do far more than that. It is a service that acts as a router between your application and Snowflake. It gives you fine-grained control over your queries’ target warehouse by acting on routing rules: You decide where your queries run based on the original warehouse of a session, the state of your warehouses, query text, and various other factors provided in real-time. In that sense it gives you both the ability and the observability required to decouple your applications.

The setup is straightforward: Instead of pointing your applications database connection to your Snowflake host, you point it to the Keebo Query Routing endpoint. After making the change, you can reroute your application’s queries without touching a single line of application code.

unnamed 3 1

KQR rules are really powerful and cover a multitude of use cases.

  • Move particular queries to any active warehouse (e.g., Blinkers)
  • Move particular queries to another warehouse with a configuration (e.g., Piggy-Backers, Cloggers)
  • Automatically move queries to a different warehouse based on past performance (e.g., spillage, execution time)
  • Load balance queries across multiple warehouses – much appreciated by customers without access to Snowflake’s multi-cluster enterprise feature
  • Consolidate workloads to maximize warehouse utilization
  • A/B testing – Test whether a particular workload would be faster or cheaper on a different warehouse size / warehouse type / warehouse configuration
  • Disaster recovery: Temporarily route new queries to a different warehouse, e.g., SLAs are missed due to abnormal queueing on one warehouse

Our team is always happy to assist you in setting up rules for one of these use cases – or to help you with a completely new use case.

Detecting Offending Patterns: Keebo Warehouse Intelligence

Now that we have established the right tool to separate your queries from the warehouse they run on, you may ask yourself: How do I identify Blinkers, Piggy-Backers, Cloggers and other problematic patterns? That sounds like a lot of work.

Luckily, it is not! Keebo Warehouse Intelligence provides you with insights on your warehouse and query health – including the offending query patterns introduced earlier. Our powerful AI algorithms for pattern detection can help you translate these findings into rules for KQR.

Summary

Keebo’s query routing product allows you to decouple your application from the Snowflake warehouse it uses. It enables you to adjust target warehouses in real-time for cost-efficiency and performance. Keebo’s Warehouse Intelligence helps you identify issues in your Snowflake environment and turn them into action.

You may already enjoy Keebo’s Warehouse Optimization for reducing Snowflake costs. Keebo Warehouse Intelligence and Keebo Query Routing are the perfect companions to further monitor and optimize your Snowflake environment.

Interested in taking a look for yourself? Get a personalized demo to see firsthand how Keebo’s platform can help you get the most out of Snowflake.

Author

Martin Kiefer
Martin Kiefer
Articles: 1