I remember in the heady ur-cloud days of 1999 and 2000 having discussions with executives at major hosting companies like Exodus Communications*, LoudCloud* and, yes, Intel® Online Services during which they asserted that the future of computing looked like massively centralized data centers operated by large service providers. After the 2000 dotcom crash and starting with the 2007 launch of Amazon* Web Services, those claims proved to be prophetic. And, economics were in their favor – the history of computing is the constant tension between network bandwidth cost and the economies of scale of centralization. The massive rollout of high speed LTE and WiFi networks made cloud computing the lowest cost alternative for many applications.
Over the last few years, many of us in the tech industry have begun to believe that the pendulum will swing somewhat back to geographically distributed computing – generally referred to as edge computing. There have, of course, always been countervailing forces to centralization – the same decade that saw cloud take off also saw an explosion in the number of smart handheld devices and the rapid growth in content delivery networks (CDN) to support video streaming consumption. If anything, those forces accelerated cloud growth as they enabled amazing new experiences. Like many centralization/decentralization shifts, edge computing is driven by two things – a major network technology discontinuity and a compelling set of user applications and experiences that aren’t practical in the existing model.
The impending rollout of 5G over the next 5 to 10 years will create the technology discontinuity. Suffice to say 5G greatly expands wireless network bandwidth and reduces the latency between user devices and the edge but doesn’t necessarily drive a corresponding growth in wide area network capability. This creates an opportunity for edge nodes to do things that can’t viably be done by far away data centers. And, the primary beneficiaries are – you guessed it – visual cloud applications like augmented, virtual and merged reality, cloud gaming and distributed video analytics. They have three characteristics that make them very friendly to edge computing:
- Users often have rich and time sensitive interactions with the application – think Pokemon Go* --- driving a need for low latency Round trip times from user to compute and back again usually need to be less than 50ms and sometimes lower than 15ms.
- The content – often relatively high quality encoded video – requires very high bandwidth and can often have many simultaneous users. CDNs were invented specifically to address this issue for video streamed to user devices. However, CDNs don’t typically support bi-directional video traffic.
- The content may also be subject to privacy rules that necessitate that it not be transmitted or stored far from its origin. For example, under the European Union General Data Protection Regulation (GDPR), video of identifiable individuals is considered personal information and must be protected and reported accordingly.
The “Cloudlet” work led by Professor Mahadev Satyanarayanan (Satya) here at Carnegie Mellon University (CMU) and partially funded by the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS) has been doing pioneering work in edge computing since 2009. Their groundbreaking research spans edge node infrastructure, network and latency studies, urban testbeds and applications and uses. To make all of this amazing work more tangible, Satya saw a need for a demonstration application that could easily and definitively show the experience benefits of running visual cloud applications at the edge rather than at a remote data center. From that need came OpenRTIST, a simple, real time augmented reality application created by Phd student Shilpa George and Senior Project Scientist Tom Eiszler using the Gabriel platform from Zhuo Chen. Padmanabhan (Babu) Pillai advised the team and ported the server to work with Intel® Distribution of OpenVINO™ toolkit.
One of the biggest challenges with augmented reality is that it requires a real time fusion of a live real world view with a synthetic overlay of interesting detail. In a simple case, some interesting information – say, the name and address of a building you’re looking at – is superimposed on a live view of the building through your phone or other display. While you’re watching, your head, your phone and the scene may all move. You need the real world and the view on the phone to be in sync with each other. The lag between the real world and what you see on your phone is what you experience as latency. Studies have shown that, for immersive applications like augmented reality to be acceptable to users, end-to-end latency – often referred to as round trip time (RTT) – should be less than 100ms. With highly immersive applications, the experience improves with RTT less than 50ms. With head mounted displays and with fast twitch gaming, RTT has to be less than about 15ms to be acceptable. Numerous studies show that Jitter – i.e., variability in RTT – also has an impact on user experience.
Latency increases at every point in the round trip – in the phone, in the wireless and wide area networks and by all the processing nodes in the backend servers that identify the building and send the information back to the phone for display. In this application, you may not notice even a one second lag between looking at a new building and the display of the information. But, for applications that are more immersive and interactive than this simple case, the RTT is critically important. OpenRTIST is an example of an application that demonstrates the importance of RTT to user experience.
OpenRTIST combines video-oriented edge computing with an algorithm called style transfer. Style transfer uses deep neural networks to learn a visual style from one image and render another image using that same style. Pix2Pix is a widely known implementation of this technology. So far, style transfer has mostly been applied to offline processing of images. OpenRTIST applies it to real time video streamed from a mobile phone. My styled selfie below shows the kind of real time view that OpenRTIST creates – in this case using the mosaic style from PyTorch at right.
In OpenRTIST, the video captured by the phone is live streamed to a remote server as a sequence of JPEG-encoded frames. The server decodes the video, applies style transfer to each frame and streams back for display on the phone. For this application, the RTT needs to be less than 100ms to keep the video displayed on the phone perceptually in sync with the camera position. As shown below, the phone, the LTE network and server processing add something like 75ms of latency regardless of where the server processing occurs – leaving about 25ms to transmit the video frames to the server and back. However, measurements have shown that RTT to distant data centers can be substantially higher than 25ms. These measurements show 2-15ms RTT for a simple ping test when going over LTE or WiFi to a local “cloudlet” but 75-125ms when going to transcontinental clouds. And, a ping test doesn’t account for the additional time required to transmit an image frame over the many hops required to traverse the internet. Of course, even the 75ms RTT of the cloudlet implementation is too high for many applications. 5G technology and advances in deep learning performance promise to reduce this latency significantly and make dramatic new experiences possible.
The OpenRTIST team deployed on local servers at CMU in Pittsburgh, Amazon Web Services (AWS) US East in Virginia and across the country in AWS US West in Oregon. The user experience was significantly degraded when using the transcontinental server compared with the local server. It’s worth noting that AWS US East gave nearly the same experience as the local server due to close network proximity and the high bandwidth of the CMU to AWS East connection. This difference in experience qualitatively demonstrates the value of relatively local resources.
OpenRTIST also provides an easy way to test this in your own environment. The OpenRTIST client app is available on Google Play* and the code for the server can be easily deployed on AWS or on an edge server of your choice. The edge server requires a GPU, for example, Intel® Iris® Pro graphics using Intel® Distribution of OpenVINO™ toolkit. Cloud hosted edge servers using Intel graphics are available from PhoenixNAP*. The code is available on github and a docker container for the server is available here. There’s also a demonstration video on Youtube*.
OpenRTIST has no illusions about being the killer app for edge computing – its goal is simply to provide a visually compelling but conceptually straightforward demonstration of the value of edge computing in immersive media. However, we hope that this simple application will help to motivate experience makers and infrastructure builders to move forward with edge deployment. If you’d like to feel the impact that edge computing can have on user experience – check out OpenRTIST.
- OpenRTIST: https://github.com/cmusatyalab/openrtist
- Wikipedia: https://en.wikipedia.org/wiki/Cloudlet
- Treasure Trove of CMU Cloudlet and Edge Computing Publications http://elijah.cs.cmu.edu/
- Open Edge Computing Initiative http://openedgecomputing.org/
- Open Edge Computing Initiative Living Edge Lab http://openedgecomputing.org/lel.pdf
- Intel® Distribution of OpenVINO™ Toolkit
- “Quantifying the Impact of Edge Computing on Mobile Applications”
- “Latency – the sine qua non of AR and VR”
- “Latency Requirements for Head-Worn Display S/EVS Applications”
- “Perceptual sensitivity to head tracking latency in virtual environments with varying degrees of scene complexity”
- “The Perceptual and Attentive Impact of Delay and Jitter in Multimedia Delivery”
- “The Effects of Jitter on the Perceptual Quality of Video”
- “Assessing the Impact of Latency and Jitter on the Perceived Quality of Call of Duty Modern Warfare 2”