Big Data is often defined as data in volume, velocity, and variety that exceeds the capabilities of traditional systems to store and process. That’s catchy, but it doesn’t really capture the watershed opportunity Big Data offers businesses, and it’s not very useful if you’re trying to understand what Big Data analytics means to your business and your IT organization. Big Data is enabling—and demanding—fundamental changes in business intelligence (BI), analytics, and the IT infrastructure to support them. So I’d like to talk for a bit about that change, what’s required to achieve it, and what Intel is doing to enable it.
Where are we now?
To highlight the watershed, let’s first consider BI and analytics as most organizations understand and practice it. The data we are used to dealing with is structured business data—organized into the neat rows and columns of relational data base systems. Analytics is mostly to answer questions about what happened in the past (descriptive analytics) and to find insight to understand why it happened (diagnostic analytics). We rely on business managers to apply that information to make better decisions about what to do in the future.
What’s different with Big Data?
While that’s useful, the question business managers really need us to answer is “What will happen in the future if I take a certain action?” (Predictive analytics). Or even more useful, “What action alternatives are available to me, and what would be the likely outcome of each one?” (Prescriptive analytics.) Predictive and prescriptive analytics can leverage new sources of data—much of it unstructured data like text, audio, or even images—new ways to store and analyze it, and new ways to deliver the resulting insight to the point where the decision is made. Further, prescriptive analytics incorporates a feedback loop that captures the action taken and correlates it with actual results to improve the analytics model itself.
To make prescriptive analytics most effective, we want to embed the analytic model directly into business applications, so it occurs in real time and delivers the results directly to the decision point at exactly the instant it’s needed. A Web site would offer a visitor products and promotions determined in real time based on analysis of their past buying habits and current market trends. And a customer service rep in the call center would be presented not only with the account information and history of a caller but also with an assessment of the current sentiment and recommended approaches for handling the call.
What do we need to get there?
Achieving predictive and prescriptive analytics—integrated into business processes and applications—requires data and processing tasks to be distributed across clusters of systems to speed access and processing. And that is the approach taken by modern Big Data analytics systems like Apache Hadoop and Spark.
Making the leap to predictive and prescriptive analytics requires three things:
- Advances in computing power that enable real-time, in-memory processing of vast amounts of data
- Tools that let data scientists develop analytic models using machine-learning techniques that can extract value from unstructured data
- Frameworks that enable software developers to easily incorporate data science models in the applications they develop
How is Intel helping?
At Intel we’re working to enable predictive and prescriptive analytics on multiple fronts. First, we provide the computing performance needed to process massive amounts of unstructured data, in real time, in memory. Servers based on Intel Xeon processor E7 v4 provide 24 cores/48 threads per socket and support up to 24 TB of memory in 8 sockets, allowing massive datasets to be stored completely in memory, rather than on hard drives, to accelerate time to insight and decision-making.
But we know it takes more than just horsepower, so we’re working to provide the tools and frameworks data scientists and software developers need to make prescriptive analytics a reality in their business. We developed the Trusted Analytics Platform (TAP). It’s an open source software project that lets data scientists easily publish data sources, data analytical pipelines, and applications, so they can focus more on the analytics and leave the coding to Java programmers. Most recently, we expanded TAP with Gearpump—a framework that adds key data ingestion capabilities to enable dynamic workflows with low latency and high availability.
Since TAP and Gearpump are open source, there’s a growing community of data scientists and developers building and enhancing the platform and sharing tools and solutions. (To learn more about TAP, check out my earlier blog.)
What does it mean for my organization?
As your organization climbs and crosses the Big Data divide, what should your business and your IT organization expect? Your enterprise data warehouse won’t go away; you still need to know what happened in the past. But your data center will grow in scope and capability as it expands into a cluster of servers harboring a growing amount of data from many sources. Data scientists will develop prescriptive analytics models and collaborate with software developers to embed them in business applications so insight occurs in real time and flows automatically to the point where it must be applied. Computing hardware will become increasing powerful to support these new capabilities.
And to assure that happens smoothly, we will continue to innovate to develop technology and enabling ecosystems and to make them available via open source and through the vendors you choose. So even though we’re dealing with a disruptive change, some things aren’t changing at all.