Architecture trade-offs for enterprise analytics

Intel Corporation explores the questions decision-makers need to ask when thinking about an architecture for enterprise analytics.

Given the wealth of technology options now available, deciding which analytics ‘stack’ to adopt involves a series of architectural trade-offs. In our experience of building analytics systems and working with customers to help deliver on their ambitions, we have learned the most important questions are around where to store and process data, what kinds of databases to use and how to make sure the right people have access.

On the first question, we are all faced with the reality that data can come from literally anywhere. As we outline in our guide, Five Steps to Delivering the Data-Driven Business, a first step is to understand these information flows. Once this is done, infrastructure teams must decide whether to take and process feeds of data from the source, or whether to pull data into a managed repository such as a data lake, from where it can be processed.

Each option has plusses and minuses. Taking data direct from the source minimizes the need for storage infrastructure, but increases the load on the network. Meanwhile, data lakes can result in storing information that will never be needed, potentially slowing down analysis and using up valuable storage capacity.

For larger enterprises, the solution often consists of multiple repositories, each designed to meet the need of certain types, and sources, of data. For example, a public cloud repository may be the most appropriate option for collating large volumes of external data before it is accessed for analytics; some data may be more straightforward to access directly from the source, whereas others (particularly transactional data) can be harvested into a central store, accessible on the internal network; and so on.

Having decided where the data should live, attention can turn to how it should be managed. The choice of database stack has implications, not just in terms of the type, quantity and formatting of data being stored but also how it will be used. For example:

  • In-memory analytics such as SAP HANA* or SAS solutions can reduce query times from hours to minutes, for data that needs to be analyzed quickly
  • Open source tools such as Hadoop* or NoSQL* approaches enable fast analysis of trends and hypothesis testing
  • Cloud-based streaming and data management architectures, including no-ops models such as AWS Lambda*, enable data-driven workflows, preprocessing and cleaning to be completed quickly

Deciding on the tool for the job requires balancing business criteria such as specificity, timeliness, value and accuracy of results against data-related criteria such as volumes, velocity and variability. Cost is also an inevitable factor, meaning that traditional database management tools and hardware will always have a place.

While the result is about being best of breed, it is important to standardize on a set of technologies that will meet most your needs: the alternative (free-for-all) scenario leads to management overhead as well as risk due to increased complexity and skills requirements.

The third question is about ensuring data is directly accessible by those who need it, while being inaccessible to others. This is where data architecture and information security architecture meet, invoking questions such as whether and how to secure the perimeter, how to manage identities and roles, what data to encrypt, how to enable mobile data access and so on.

Some answers to these questions will depend on understanding the importance of the data to your organization. Some insights may be confidential, as they give an organization competitive advantage; whereas other data can be treated as ‘open’ and accessible to third parties and/or the public at large.

Ultimately, deciding on the right analytics architecture requires a series of trade-offs in terms of where data should exist, how it should be stored and processed, and how it is secured. As each trade-off can have an impact on the business value the data can bring, it is important to involve business decision makers in these trade-offs.

However technical an analytics architecture decision may appear, it should always be tied back to delivering on a business goal. In this way, the architecture can be a source of increasing value, as the organization becomes attuned to how it can benefit from more advanced forms of analytics, such as deep learning, to drive its business goals forward.

Learn more about how advanced analytics can help you transform your business, and what you can do to make it happen, by reading the Five Steps To Delivering The Data-Driven Business white paper from Intel.

Find more information on data-driven insights and advanced analytics by visiting our Turn Data Into Insight website, where you can find it all in one convenient location.