Visual Data: Pack Rat to Explorer

In recent years, I’ve had the opportunity to visit several Hollywood studios and production houses. I’ve been very impressed by their ability to manage the thousands of media assets generated during the making of a film or TV show. Keeping track of all of the valuable shots, takes, audio tracks, and edited clips distributed among directors, editors, and others across a potentially multiyear process is extremely complex and the data piles up. Current studio digital cinema practices mean that a one minute clip can use 5 GB or more of storage. And, each asset can contain valuable information about where the shot was filmed, who was in it, what they said and did and what the lighting and coloring looked like. Modern asset management systems help the studios manage this overload.

But, I was also struck by the discovery that many, if not most of, the assets are discarded when production is complete. The rationale is valid – storage of these assets is costly and studios lack tools and processes to make use of the assets over the long term. In my mind, this housecleaning is a lost opportunity to use archived assets in the creation of new content, improving the quality of experience for existing content, or simply learning and sharing the techniques different directors have used and evolved over time.

While studios are an extreme case of this video data management challenge, other environments like video surveillance, corporate communications and marketing, and medical imaging have similar issues with archiving, storing, and efficiently accessing large volumes of video and image data (aka “visual data”) and corresponding text information like video location, dialog transcription, or the actors in the video (aka “metadata”). This challenge exists at two levels, the asset/workflow management layer and the visual data management layer. The top layer deals with user tasks and the information that the user interacts with. The bottom layer deals with the actual visual data and metadata in the system.

The Visual Data Management System

Here at Intel Labs, working with the Intel Science and Technology Center for Visual Cloud Systems (VCS ISTC), we’re focused on the visual data management layer. The VCS ISTC was launched in 2016 as a collaborative effort between Intel, Carnegie Mellon University (CMU), and Stanford University to research systems issues in large scale deployments of visual computing solutions. The Visual Data Management System (VDMS) project, led by Vishakha Gupta-Cledat, now at ApertureData*, and Luis Remis, Christina Strong, Phillip Lantz, Ragaad Altarawneh, and Pablo Munoz from Intel Labs is applying modern data management approaches to create a system that is designed for visual data as a first principle.

Historically, the visual data management layer has been implemented using general purpose data management technologies: visual data is stored in a file or object store and metadata is loaded into an RDBMS or NoSQL database. This approach has been expedient, but it takes no advantage of the inherent characteristics of visual data.

VDMS is re-architecting this stack around the following premises:

  • Performance and maintainability can be improved by designing for the characteristics and uses of visual data.
    • Graph databases are better suited for the use patterns of visual metadata
    • Array databases are well suited for manipulation of visual data
  • Unified brokered access to visual data and metadata allows for efficient loading, query, and retrieval of related data.
  • Frequently used visual processing primitives should be optimized operators in a visual data management system. (e.g., resizing, cropping, rotating, flipping)
  • The high cost of running machine learning processing (e.g., object detection) on visual data means that intermediate feature vectors should be stored as another visual data type.
  • The availability of low cost persistent memory such as Intel® Optane™ DC persistent memory makes it efficient to do substantial metadata management in memory.

These principles are embodied in the VDMS high level architecture at right. Let’s quickly look at the components of this architecture.

The Request Server is the interface from VDMS to external systems – either asset and workflow systems or general data analytics platforms. The interface is a way for other systems to launch integrated queries on both the visual data and the metadata. So, in a medical imaging example, the query might be:

“Retrieve all brain scans and a thresholded and resized copy of each scan corresponding to people over 75 who had a chemotherapy using the drug Temodar*. Retrieve bounding boxes for regions that match a reference tumor image.”

The Request Server receives the query via a simple JSON-based API, parses it, and coordinates retrieval and consolidation of the results from the Persistent Memory Graph Database and the Visual Compute Library.

The Persistent Memory Graph Database (PMGD) handles the metadata – finding the specific metadata and image identifiers by patient, age, and treatment protocol. PMGD uses a graph database which allows great flexibility and extensibility in the schema. As new metadata properties become available from the data scientist (e.g., a new drug treatment protocol) or derived from the visual data (e.g., the growth in tumor size since the last scan), they can be easily added to the PMGD schema. For use patterns of visual data, retrieval is also more efficient using a graph database. Visual metadata can often have complex relationships that are efficient to query using graph traversal but are slow on RDMS because of the need for complex joins across tables. PMGD is also designed from the ground up for the benefits of persistent memory like Intel® Optane™ DC persistent memory. This architecture allows the metadata to grow virtually without bound but with near the performance of an in-memory database.

The Visual Compute Library (VCL) is the interface to the visual data. Currently, visual data takes three forms: losslessly compressed image data and image feature vectors stored in TileDB*, a leading array database, and encoded video/image files stored in a traditional file or object store. TileDB is particularly effective for image and feature data because it allows very efficient data compression and retrieval of matrix oriented data. However, uncompressed video produces very large amounts of visual data that is prohibitive to store in any storage system so these files are stored in compressed form (e.g., H.264). Greater native support for compressed video and more pixel datatypes like motion vectors and depth maps are expected in the future. VCL also provides support for a number of common visual data processing operations that can be applied at query time. These operations include image/video processing such as thresholding, resizing and cropping, and analytics processing such as k-nearest neighbor search on feature vectors. Feature vectors can also be stored and accessed using FAISS through the VDMS. While VDMS is not intended to be a general purpose visual data processing system, we do expect to add additional operations over time and the platform is extensible so that users can add their own operations.

The VDMS team has benchmarked VDMS against a baseline system constructed from MemSQL* Server, an Apache* web file server and OpenCV for visual data operations using queries similar to the medical imaging queries above. Both systems were run on the same Intel® Xeon® Gold 6140 CPU based platform. They saw that for complex queries, the current implementation of VDMS was more than 2X faster than the baseline system1. Further testing of additional query types is underway. We are also in the middle of implementing a seamless integration of another VCS ISTC system – Scanner that I talked about in a previous blog. And, ApertureData*, the first startup based on VDMS has launched led by CEO Vishakha Gupta-Cledat.

VDMS is available now on github and is ready for use. It’s being used in a number of exploratory projects within Intel and elsewhere. Release 2.0.0 was just announced at the beginning of February. This release adds a number of fixes and enhancements including support for video files, feature vectors, and classification operators. Docker* images to enable easy deployment are also available. Please visit us at github and give us your feedback.


1 Performance results are based on testing as of October 2018 and may not reflect all publicly available security updates. Results are summarized in "VDMS: Efficient Big-Visual-Data Access for Machine Learning Workloads"