At NABShow 2017, I had the pleasure of announcing the launch of the Intel Science and Technology Center for Visual Cloud Systems (VCS ISTC) with Carnegie Mellon University computer science professor Dave Andersen. In the last year and a half, much progress has been made in the center and I’ll highlight the work of this great team in my blog over the next few months.
Scanner: Large Scale Offline Video Data Processing
The ability to process different types of “Big Data” has led to many new data analytics applications and new analytics platforms like Apache Hadoop* and Spark*. While these platforms scale well for many applications, they were not designed for the requirements of video data.
To achieve throughput and efficiency, large video processing systems must scale across many compute nodes and enormous video datasets that are often stored in compressed format. These systems must support highly optimized video processing algorithms and enable execution on heterogeneous hardware platforms that accelerate those algorithms. Video datasets will soon reach exabyte sizes and running an image processing pipeline on that much data could take years on a single node system—even with the best hardware and algorithms. Addressing this scale of video processing will be critical to visual cloud applications like camera-based smart city traffic planning, virtual reality TV broadcasts, and video data mining in healthcare and sociology.
In the VCS ISTC, the “Scanner” project begins to address this challenge. Scanner's intent is to enable high productivity and high efficiency video processing on widely available cloud computing resources. Stanford professors Kayvon Fatahalian and Pat Hanrahan and their students, Alex Poms and Will Crichton, unveiled Scanner at SIGGRAPH 2018 and in the August 2018 ACM Transactions on Graphics.
Scanner is able to take a video analytics task—for example, analyzing the contents of a large archive of TV news broadcasts—and scale it across thousands of cloud compute nodes. By scaling out, the time to complete a task like this can be reduced from days to minutes, and, by using cloud-based pre-emptible instances, the costs can be kept within budget.
Scanner is already being used for projects at Stanford, University of Washington, Google, Facebook, and Intel. Facebook and Stanford, for example, are collaborating on using Scanner in the Surround 360 VR stitching pipeline. This application requires simultaneously accessing 14 input video streams and scheduling up to 44 computation graph operations. They ran a Scanner version of Surround 360 on a one minute sequence (28GB, 25k total frames) across 256 Intel® Xeon® processor cores on Google Cloud Platform.
The Stanford team, the Brown Institute for Media Innovation, and the Stanford Journalism Program are using Scanner to analyze nearly a decade of 27/4 news video (200,000 hours) to understand the characteristics of people shown on news broadcasts (for example, their gender, what they look like, who they appear on screen with) and what contributors from different demographics are asked to talk about. They are asking questions like how much screen time do female panelists get when compared with male panelists.
Researchers at the University of Washington are beginning to use Scanner to help scale the creation of content for their “Soccer on Your Tabletop” work. I have personally used Scanner to benchmark the costs and performance of video processing tasks on various types of cloud infrastructure.
One of the beauties of Scanner is its reuse of existing visual computing and cloud technologies. Its innovation is in video scale out distribution across heterogeneous nodes and providing a Python* API to create scalable video processing pipelines. An example of this API is shown here. This code is a complete Scanner pipeline to draw bounding boxes around faces in a collection of videos.
An Open Environment for Video Processing
To minimize reinvention, Scanner uses open video processing and analytics frameworks like OpenCV*, TensorFlow*, Caffe*, and FFMPEG*. With this approach, Scanner automatically picks up software optimizations for Intel® Xeon® processors in these software frameworks. For cloud infrastructure and orchestration, it uses Docker*, Kubernetes*, and GRPC*. Because Scanner is fully open source, users can extend however they would like. For example, Intel is incorporating our open source OpenVINOTM toolkit into Scanner.
Scanner can be run locally on a single system or in a private cloud. It also runs in Amazon Web Services and Google Cloud Platform, and could be easily ported to other cloud platforms. It runs on both CPU and GPU instances.
Scanner is in its second release now, is quite stable, and well documented. The Scanner team is ready to engage new collaborators, contributors, and users in the evolution this exciting project. Check out the SIGGRAPH paper and the code and let us know what you think.
- Scanner Paper: “Scanner: Efficient Video Analysis at Scale”, Alex Poms, Will Crichton, Pat Hanrahan, Kayvon Fatahalian, SIGGRAPH 2018, Vancouver, BC. (Code)
- Scanner Code: https://github.com/scanner-research/scanner
- “Soccer on Your TableTop” on YouTube: https://www.youtube.com/watch?v=eRGAB4QBS6U
- Intel® Distribution of OpenVINO™ toolkit and the open source Apache 2.0 version of OpenVINO™ toolkit