As part of a recent consulting engagement at the University of Southern California, I visited the World Building Media Lab at the USC School of Cinematic Arts. Artist, engineer and senior researcher Bradley Newman strapped me into a motion-tracking virtual reality rig to give me a live demo of Leviathan, their 2014 experiment in immersive storytelling. The gear consisted of a state-of-the-art VR headset (an Oculus Rift), two crude gloves to track the position of my hands, and some muffy headphones. All this gear was housed within a living room-sized space perimetered by motion-tracking cameras. This setup persistently monitored the relative position of my head and hands, and fed that information back to my headset through aural and visual content that reflected my actions in virtual space. Within a minute, my body completely trusted the system, and there I was in a virtual realm.
This technology — much of which will be released by Facebook-owned Oculus in early 2016 — has gotten very, very good at imitating life. The VR goggles fully encompass the user’s visual field, and creates the illusion of boundless virtual distance. This digital world shares many key somatic geometries with the real. You locomote in real space, and your avatar traverses a proportional quantity of virtual turf. You turn your head and your virtual gaze shifts as you would expect. Virtual sounds grow louder as you close in on their virtual origin. The technology is so good at mapping virtual stimuli onto your senses, Leviathan effectively belies the actual and inordinate complexity of the engineering involved. I found myself imagining a world in which virtualized experiences (such as the L.A. Philharmonic’s recent experiments in VR) are as accessible as online content is today.
Still, there are limits. When I stuck my head outside the virtual “walls” of the narrative space, the motion-sensing cameras lost the ability to calculate my location. This caused my visual field to first discombobulate, then degrade, which brought on some rather serious nausea. The body responds violently when the eyes sense motion but the inner ear doesn’t. “VR sickness,” the polite term for barfing in VR, is the same inner ear, oculomotor catastrophe that causes seasickness.
VR systems situate the body within a mesh of integrated sensors. Not coincidentally, this is how the “Internet of Things” (IoT) will function in the future. In the rig — as in IoT — sensors constantly probe for difference, intention and meaning in all actions of the body, such as movement in space or hand gestures. Built into Leviathan were gestural controls that permitted my character to fly through the virtual environment. By placing my right hand next to my head, I could cue the system that I was about to execute a motion gesture. When I pushed my hand forward in space (after executing the hand-at-head move), my virtual avatar flew forward, moving my visual and sonic fields accordingly. Pushing down resulted in floating down; pushing backwards propelled the avatar backwards. This combination of inputs — one proportional to actual locomotion in a restricted field, one visually propelling me without my actually moving — was discombobulating and revved up the barf engine. The gestural controls were highly approximate, and I frequently moved forward when I meant to move up, or moved down when I meant to move back, etc. Amidst all the complexity of Leviathan, it was this tiny gestural boondoggle that prevented me from completely losing myself in VR.
Gestures, in other words, are hard for computers to understand. In Leviathan — which, it should be noted, was built to explore immersive narrative, not fine motor interfaces — my hands were gloved with three motion-tracking balls, permitting the system to understand the relative pitch, yaw and roll of my gestures, but not their exertion, muscular squish or finger positioning. The system primarily provides interactions on the basis of cranial positioning, with gestural interfaces secondarily inserted — a hierarchy generally consistent with the state-of-the-art. There is decent hand-gesture-sensing hardware on the market — the Leap Motion Controller and the Intel Realsense come to mind — but they suffer particularly from sensor occlusion, which is when the computer loses the ability to discern where a particular joint or digit is because the rest of the hand is in the way. One sensor, typically mounted inside a laptop, can perceive hand gestures accurately with a measure of horizontality. Were one to make a vertical slicing gesture (e.g., the “Tony Blair“) with any finger wiggling, the sensor would need to guess which finger was wagged.
On the one hand (pun marginally intended), this is a solvable engineering problem. The sensors baked into contemporary hardware just aren’t powerful or sophisticated enough yet to understand digital (as in finger) movement. Smaller sensors mean more of them can be baked into more things. The technology will evolve, and quickly. More difficult, however, is the choreography of a unified gestural language, something that would allow different — though related — gestural fonts, whether one taps on conductive Google pants or plays a video game in a Microsoft living room. In the absence of such gestural typography, Volkswagen will inevitably come up with a unique gestural vocabulary for their 2016 Golf, as will Samsung for their “Smart” televisions, as will Apple for their Apple Watch. The sensors that promise to make our interactions with technology more seamless (and beautiful) could very well be mired by a lack of common choreography. As a choreographer myself, I now consider commissions a means to create repurposable choreographies that function both on stage and in choreographic interface design.
User-interface technologies like the mouse and keyboard, even performance technologies like the proscenium and music notation, became popularly accessible due to open and broadly accepted standards of interaction. There are no standard gestures for flying or descending in virtual reality, no standard operating procedure for rotating an object or zooming in. Gestural interfaces are therefore the oddball in the panoply of choreographic interfaces. Siri, Apple’s plain-speech “virtual assistant,” turns five this year, and has been joined in the market by Google’s Cortana and Amazon’s Echo. Affective computing, in which sensors “read” the passive signals of micro-facial expressions to understand emotional intentionality, are still more nascent, but companies such as Affectiva already have major commercial contracts to bring “somatic semiotics” to a mass market. Technologies baked into Google’s Nest thermostat uses the body’s movement through one’s home, including its presence and absence, to make algorithmically informed predictions about the desired temperature of a space. A decade ago, the aim of human-machine interfaces was to build better buttons. Today the apotheosis of engineering is to sense the body’s repeatable actions in space and time — that is, choreographies — to communicate intention to Web-connected computing devices. If VR and Leviathan are any indication, the future is here. It’s just not fully choreographed yet.
[Many, many thanks to Emily Wanerski and Bradley Newman, for introducing me to one another and to Leviathan, which is fully awesome.]