Seeing Further Down the Visual Cloud Road

Almost three years ago, Carnegie Mellon University Prof. Dave Andersen and I announced the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS) at the 2016 NAB Show. Along with Prof. Kayvon Fatahalian at Stanford, Dave has led the center and collaborated closely with other academic and Intel Labs researchers to push the boundaries in visual cloud systems. We set out to study and find solutions for some of the key problems with gathering, storing and analyzing video data in large scale distributed environments. With the completion of the center now drawing near, it’s time to take stock of the results and to talk of work yet to be done.

The center set out to look at four primary visual cloud challenges:

  1. How can visual data processing and analytics be scaled out across many compute nodes?
  2. How can visual data be stored, managed and accessed in an integrated way?
  3. How can distributed sensors, edge nodes and cloud data centers efficiently collaborate on visual analytics workflows?
  4. How can visual information be queried to gain insights about the content itself?

The center’s approach has been to bring together systems researchers and computer vision, AI and graphics researchers to create prototype systems that allow investigation of these topics. The prototype systems have been integrated in an end-to-end reference architecture and exercised in an urban testbed in Pittsburgh where we’ve focused on the problems of smart city video applications. With this approach, we were able to understand more fully the challenges of real world applications and create, as a byproduct, code that others can use to continue the experiments. I’ve talked about many of these efforts in previous blogs. This blog is a summary of the accomplishments, the key learnings and some of the big challenges remaining.

What did we make?

First, the concrete stuff. We produced four open source systems platforms, one for each of the research vectors:

  • Scanner – to scale out video processing and analytics workloads across large numbers of compute nodes
  • The Visual Data Management System (VDMS) – to efficiently integrate the storage and retrieval of visual information and metadata associated with that visual data
  • The Streaming Analytics Framework (SAF) – to connect cameras, edge nodes and clouds together for distributed end-to-end video processing
  • Eureka – to enable the search for machine learning training data on data distributed at the edge

In addition, some other valuable systems were created including Esper – an application framework for video analytics built on Scanner, OpenRTIST – a generative graphics application that shows the value of edge computing for user interactive visual experiences and Rekall is a python library for programmatically specifying complex events of interest in video.

We published many academic papers (listed below) including innovative publications in multitenant video processing at the edge and distributed neural network training. To make the work tangible, we demonstrated many use cases including 360 degree video creation, volumetric rendering, drone video analytics, video summarization, media dataset analysis, medical imaging, retail shopper tracking and traffic monitoring. For example, for CVPR 2019, Intel researcher Pablo Munoz led the creation of a 3D pose estimation demonstration using Scanner to improve visual quality for sports volumetric rendering. For IBC 2019, Intel researchers Chaunte Lacewell, Ragaad Altarawneh, Pablo Munoz, Luis Remis also using Scanner, VDMS, Gstreamer led the creation of a video summarization demonstration.

And, two new startups have been formed using the work of the center, ApertureData focused on VDMS founded by former Intel Labs researchers Vishakha Gupta and Luis Remis and a new company focused on edge analytics founded by CMU Prof. Dave Andersen and former Intel researcher Michael Kaminsky.

But, more importantly, what did we learn?

There have been many learnings from the center but here are my top seven:

  • Multitenancy and heterogeneity at the edge – Just as virtualized data centers and clouds needed to protect co-resident applications from noisy and nosy neighbors, shared edge networks will need to build-in multitenancy. A blind pedestrian relying on real time intersection navigation mustn’t be impacted by a nearby group of augmented reality gamers sharing the same edge node. Video data stored at the edge for, say, law enforcement purposes must be inaccessible from applications that citizens use to check traffic at a specific intersection.

Similarly, with the higher costs of deploying resources to edges, it is an economic necessity to provide and share CPU, GPU, FPGA, network and storage across multiple edge tenants. An operator can’t afford to allocate, say, a full high-end GPU to a single user for the duration of an augmented reality game.

  • Bandwidth cost and capacity drives architecture – In the abstract, it’s tempting to say “just stream it all to the cloud”. In practice, real world connectivity limits that approach. The consumer and commercial Internet often has upload speeds much lower than download and cameras are mostly about upload. Even if massive upload is available, it comes at a price. Yet, distributed video analytics applications often require high resolution content to perform well. This means placing some analytics at the edge to intelligently determine what to send to the cloud. Balancing this edge compute cost against network cost and capacity will be a primary engineering challenge of edge infrastructure.
  • Video analytics aren’t the same as data analytics – The big data revolution includes the video analytics revolution but the tools, algorithms and skills needed to analyze more traditional data are not the same as those needed to work with visual data. Yes, both types of workloads make heavy use of multi-dimensional matrix multiplies and running a neural network on a pixel array bears a passing resemblance to, say, a security pricing algorithm. But, using Apache Spark or Pandas, designed for table, string and numeric data types, to build a computer vision application using video, image and pixel data types is the wrong approach for the job. A system like Scanner complements a system like Spark by treating visual data types as first class citizens.
  • Visual applications are multimodal – The flip side of the previous learning is that visual data rarely lives alone in an application. It is almost always joined with metadata associated with the visual data – metadata like date, time, duration, licenses or tags of objects in the video. It will often be joined with other time synchronized sensor data like audio, lidar, altitude, or GPS position. And, it may be analyzed in the context of non-synchronous but related information like traffic signal timings or previous 24 hour snowfall data at an intersection monitored by a traffic camera.
  • Edge to cloud workload and data distribution is an art, not a science – With the broad diversity of infrastructure capabilities and application needs, there are few easy methods and even fewer tools to guide application developers in dividing application execution and data between intelligent end devices, edge infrastructure and cloud data centers. Distribution is therefore highly dependent on developer knowledge, intuition and ease of development and that distribution tends to be static with little adjustment to varying conditions like network load or device capability.
  • Perpetual training with a human-in-the-loop is our near future – Much early artificial intelligence research has focused on algorithms and network design. However, as large scale system applications emerge, teaching them how to respond to novel and unanticipated situations – like construction of a new building or a new need to differentiate buses from trucks – becomes a key requirement. With today’s technologies, these requirements often require retraining the AI solution for the new situation. And that usually requires a lengthy and labor intensive manual collection of an enhanced training data set. The cycle time from need to solution is gated by that manual effort.
  • There is no algorithm holy grail – the system designer should view the panoply of neural network, machine learning, computer vision, image processing and data analytics algorithms as a design palette. Algorithms are suited to different application requirements and available resources. An algorithm that is best suited for a power-constrained smartphone may be bettered by another algorithm when the application executes in the cloud. Algorithm innovation also continues unabated, so today’s best algorithm may be replaced by another tomorrow. And, using multiple algorithms together to, say, allow a low cost algorithm to triage content for a high cost algorithm can yield efficiency gains.

“The more you know, the more you know you don't know” ― Aristotle

After three years, we certainly know a lot more than we did at the beginning. But, all the areas above are rich for further research.  Look for a future blog that outlines Intel Labs’ on-going research agenda for visual cloud systems. To foreshadow that agenda, you can be pretty sure it will concentrate on the above issues as well as on newer areas like applying AI techniques to media and graphics pipelines.

Conclusion

Intel Science and Technology Centers are intended to foster collaborations and development of communities among Intel and academia and the ISTC-VCS has role modeled that intent. We’d like to see that collaboration continue. Please reach out to any of the involved faculty for more information on their work. For more on Intel’s work in the center and our future research agenda in this domain, please contact Intel Fellow, Ravishankar Iyer. See below for most of the public publications, presentations and code from the center.

A wrap up of the ISTC-VCS would not be complete without acknowledgement our friend and colleague, Intel researcher Scott Hahn who passed away suddenly in June 2018. Scott was instrumental in the formation of the center and led Intel’s involvement until his death. He is deeply missed and he would have been proud of all we accomplished.

References

ISTC-VCS Related Blogs

ISTC-VCS Related Publications, Presentations and Code

Scanner

Visual Data Management System

Edge to Cloud Video Analytics

Data Sets and Training for Video Analytics

Computer Vision and Machine Learning Algorithms

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© Intel Corporation.