A Platform that Brings Developers and Data Scientists Together

For developers working with big data, a common challenge is that data is not readily available for use. In other words, it’s not served up in the same way the application requires it to be. This barrier can lead to laborious exchanges between data scientists and app developers and lengthy delays in the development process.

Let’s simplify this story with a metaphor. To make a first-class pasta dish from scratch, you need to first take your raw materials and make the pasta in the form you want it in—linguini, fettuccine, spaghetti—and then get down to the business of cooking your pasta and sauce and seasoning everything up just right.  A logical culinary process would first have one expert take the raw ingredients for the pasta and turn them into a form that a chef can use to make the finished product.

In the case of big data and analytics-based solutions, the raw material is data that a data scientist processes. It is stored and served into a form that a developer can use to create an application that generates business value. In this endeavor, it helps greatly when data scientists and app developers can work in the same “kitchen”—a common data platform for application development.

That’s the Trusted Analytics Platform, or TAP. This open source project, initiated by Intel, invites data scientists and developers into the same workspace, so data is ingested, processed and served up in the environment the developer can ultimately deploy to. TAP allows data scientists and developers to work in a more collaborative manner, which is one of the keys to turning big data into business value in less time. It introduces a new, more mutually beneficial workflow that simplifies and accelerates the creation of secure, high performance big data analytics applications in cloud environments.

TAP includes the necessary tools, algorithms, and engines to make it easier for developers to collaborate with data scientists in a shared environment to conduct advanced analytics. With this toolset, data scientists and developers have the capabilities they need to ingest data from multiple sources and in disparate formats, analyze that data, and design analytic models. They can then make this processed data easily consumable by business applications, as shown in this top-level view from trustedanalytics.org:


To drill down a little bit deeper, TAP includes open source software with hardware-enhanced performance and security features. The platform provides an end-to-end solution that spans four key layers of the IT stack:

  • An infrastructure support layer allowing TAP to easily deploy locally or to the cloud, with out-of-the box container support featuring a marketplace of the most popular data storage and related services
  • A data layer that includes (or leverages an existing deployment of) Apache Hadoop, Spark, and other data components optimized for performance and security
  • An analytics layer that features data science tools to simplify model development and an extensible framework to add your own algorithms and services.
  • An application layer that includes a managed runtime environment for cloud-native apps

As a longtime developer and platform evangelist turned developer advocate for big data solutions, I’m excited about TAP for many reasons. As I’ve begun engaging our growing community, the biggest eye-opener is that TAP is an open source platform, complete with support for every popular language, framework and service to create cloud-native apps that can leverage the skills and the value data scientists contribute to the solution.

Ultimately, TAP helps ensure that developers and data scientists can work together in a common kitchen to accelerate time to market for the analytics-based solutions that will increasingly be a key to business success.

For a closer look at capabilities of TAP, visit trustedanaytics.org. To get started with TAP or explore its code and documentation, visit: http://trustedanalytics.github.io

If you’re attending Strata+Hadoop World March 29-31, Intel and TAP have planned many events around the conference to meet, network and provide hands-on experience for anyone interested in learning more about TAP. A schedule and summary of what’s happening in San Jose can be found at http://trustedanalytics.org/?p=316.