What Is Data Learning And Why Is It Important?

The top three technology disruptors of the past two decades are mobile devices, social media, and cloud computing. They have transformed how we communicate and interact with digital content. At the center of these trends is a massive and growing amount of data. Big data is changing how businesses analyze and interpret information.

Enterprises rushed to collect massive data across marketing, manufacturing, and customer interactions.

Organizations soon realized that amassing Big Data and investing in Hadoop or Spark clusters wasn’t enough to lead. Executives learned that data is only valuable if the insight it provides is actionable.

There are essentially two main obstacles here: humans and tools.

1. Why Humans Are A Bottleneck

Despite machine learning hype, the best decisions come from domain experts, who may struggle with data and models. Data scientists may lack domain expertise, and top talent is scarce and in high demand. Many companies have data engineering or BI teams to handle complex data tasks and provide insights. Finding people with both data and domain expertise remains a major business bottleneck.

2. Why Data Processing Tools Are A Bottleneck

Despite progress in Big Data tools, many organizations still struggle to analyze large datasets efficiently.

In conversations with executives and data scientists, I noticed three ways data tools can become bottlenecks. These tools hinder actionable insights when they are slow, manual, or costly. More specifically:

Too Slow

Real-time analytics is overstated. Answering complex questions often requires extreme patience and time. Waiting for a slow website is frustrating, and the same applies to data-heavy tools. This has to do with physical and hardware limitations.

Even the fastest data warehouse can’t join hundreds of millions of rows instantly; it may take 20 seconds to 20 minutes.

Delays matter. Humans have short attention spans and may filter ideas if computer interactions exceed a few seconds. Waiting minutes for a dashboard discourages deeper queries and exploration. A lack of desire to dive deeper and get past high-level metrics can lead to missed opportunities or sometimes superficial conclusions.

Too Manual

Often, you must know the answer before asking the right question, but most tools assume you do.

BI tools track KPIs, but important trends may appear without affecting those KPIs immediately. Trends may go unnoticed until they affect KPIs, possibly causing significant lost revenue. Even when KPIs change, many causes must be manually inspected to separate noise from signal.

Too Costly

Teams often choose between speed and cost; spending more money may seem easy but is rarely economical.

That’s where most hosted analytics vendors make their fortunes — by offering a variety of tiered solutions to help you burn through your cash. A cloud analytics vendor told me customers accept slow queries because paying more makes them faster, with budget alerts provided.

When notified they exceed their budget, teams often use yesterday’s data instead of querying the latest. Teams dislike spending their budget on data warehousing, even if they want the latest data.

The Shift To Data Learning

You can’t replace domain experts with fully automated machine learning. You need experts to extract insights and tools to help them do more with less.

Data learning adds a layer using machine learning to optimize data access and analysis.

I call this “data learning”: humans and computers focus on what they do best.

The Goal Of Data Learning

Data learning improves three stages of analysis:

Detection: Models can capture the underlying distribution of the data, and understand it better than any single individual in the organization. The models can then be the first to detect when something is out of the ordinary. They can identify new trends, spot data quality issues, or uncover unexpected outliers, then run them by a domain expert to decide if further investigation is required.
Analysis: When a new investigation begins, models accelerate analytics and reduce large-scale data scans, time, and compute costs. This idea can be applied to both approximate and exact query processing. This is all about “recycling compute cycles.” Once a computation is performed, it can be reused for future queries that share all or some of the same computation. Doing this at-scale, correctly, is far beyond what humans can manually do. However, a principled and automated inference process can determine when and how to reuse previous computations most efficiently.
Explanation: After analysis, experts use models to quickly narrow focus to a few directions. The goal is to test promising hypotheses instead of a tedious, aimless search.

Three Ways Data Learning Lets Data Engineers Innovate More and Maintain Less

Reduce time spent optimizing performance: Data learning automates performance optimization, saving engineers repeated work in building and deploying data products. With the time you free up, you can spend more time building additional innovative data assets.
Reduce time spent maintaining data products: Data learning not only optimizes data engineering efforts required for new data products, queries, and models, but it also supports performance SLAs for data assets in production automatically. A “learn and refine” approach reduces time spent on SLAs and increases innovation.
Enable analysts to handle more fine-tuning: As a data engineer, your goal is to deliver data products that meet the majority of your organization’s needs. For business analysts, these assets often need fine-tuning to meet unique requirements beyond their core needs. Self-service visualization tools help, but they lack engineers’ performance optimization capabilities. Data learning fills this gap by optimizing query performance for analysts automatically. Eliminating this data engineering workload is yet another way to increase time for innovation.

This idea started as academic research but gained traction with startups and Fortune 100 firms, evolving into Keebo.

Keebo Data Learning

At Keebo, our platform learns from data to deliver faster queries without code changes. Keebo is designed to integrate seamlessly with any BI tool or data warehouse platform.

In the past 18 months, we’ve seen the clear impact of Data Learning on our customers. Customers have seen Data Learning speed analytics, cut costs, automate tasks, and improve user experiences.

Data learning is still early, but its impact is clear. Like machine learning, Data Learning will transform how enterprises access, visualize, and analyze data.

Learn More

1. Why Humans Are A Bottleneck

2. Why Data Processing Tools Are A Bottleneck

Too Slow

Too Manual

Too Costly

The Shift To Data Learning

The Goal Of Data Learning

Three Ways Data Learning Lets Data Engineers Innovate More and Maintain Less

Keebo Data Learning

Related Posts

Snowflake Micro-Partitions & Clustering: How Data Organization Impacts Performance

Why Your Snowflake Clustering Strategy Is Probably Costing You Money

Automating FinOps without Losing Accountability: A Data Engineering Guide