Research & Development

A New Interaction Language

Five years of research went into developing Soli, a miniature radar chip that detects motion and understands nonverbal cues. The team behind the project details what it took to design a truly intuitive interface, and why this is only the beginning.

People have always had to learn new behaviors in order to operate technology. Basic computer functions like dragging and dropping, for example, or mouse and pointer interaction didn’t just come naturally. Even common gestures like touching, swiping, and pinching require re-mastering and recontextualization amid a landscape of smartphones and tablets. But as technology becomes ever more present in our lives, it’s fair to start asking technology to take a few more cues from us. This idea is central to the work of Advanced Technology & Projects (ATAP), a multidisciplinary team at Google where we’re creating a whole new interaction paradigm based on the nuances of human movement—and the promise of a miniature radar chip called Soli.

What began as Project Soli more than five years ago, now ships standard in the Pixel 4. The chip itself measures a mere 5mm x 6.5mm—about the size of a pencil’s eraser—and contains a sensor that can track motion around the phone at very high speed and accuracy. With this technology, you can skip a song or snooze an alarm simply by gesturing. And that’s only the beginning: Soli has the potential to give our devices a new kind of awareness—something akin to emotional intelligence.

Making your phone or tablet more socially aware is an achievement that requires a diverse set of experts. Our goal from the start was to get underneath the question of what exactly makes an interaction feel human, so that we could imbue those qualities back into the experience. We started by putting together a team organized by project rather than function. This means that everyone—hardware and software engineers, research scientists, designers, creative technologists—worked together from the earliest stages of R&D onward (and typically didn’t work across other projects in ATAP’s portfolio). This structure is key to exploring a new technology like radar, when product requirements or target users aren’t even defined yet. We believe that designers, in particular, can bring both creativity and a humanistic perspective to the process by helping to frame a new technology and discover its intrinsic properties. In addition to running experiments, the design team studied nonverbal communication in-depth, by talking to experts in fields as diverse as proxemics and primatology. In doing so, we were able to create a design framework and define the foundational qualities of Soli.

An unspoken language

We set out to first understand basic human behavior. The team became fascinated with what are known as implicit behaviors, defined loosely as “the movements you don’t normally have to think about.” This could be something like motioning for someone to go in front of you in line, or waving to a friend. When humans communicate they make unconscious movements with the body, often to reinforce verbal communication. Psychologist Katherine Nelson has a lovely way of describing the movement of our hands in particular as, “a mode of unconscious meaning, consciously expressed.” So where do these moves come from? Are they learned or taught? An early conversation with primatologist Frans de Waal gave us insight into the origins of our instinctive behaviors. Primates have two fundamental gestures: a movement of the hand from the body toward the outside (push) to refuse food or send others away, and a movement of the hand from the outside toward the body (pull) to ask for food or other things. Because these movements are so deeply ingrained, they’re also full of shared meaning—you can easily spot them in yourself or a loved one the next time you’re out to dinner.

Building on implicit behaviors, we were then able to establish Soli’s gesture interaction patterns. Each contains a movement and intent pairing that make sense intuitively, and can be understood without any explanation. Let’s look at swipe, reach, and tap. Swipe is a movement away from the body which maps to dismissing something. We see similar movements when someone pushes away their plate when full (remember we’re also primates), or makes a “shoo” motion with their hand if they don’t want something. The swipe to snooze then, is a sweeping motion over the screen that mirrors the innate behavior and intent. Reach is the opposite, and signals the start of an interaction. We see similar movements when someone reaches to grab an item off a grocery store shelf, or reaches to shake someone’s hand. It follows then that reach is used when you want to start an interaction with your phone. The tap gesture is slightly different and is used to start and stop actions—like playing music. We see tap-like movements when someone raises their hand to “pause” someone speaking, or when you tap someone on the shoulder to get their attention. In each case, what may seem like a simple interaction, actually contains many layers (and months) of research and development.

The initial idea was to create a new sensor capable of capturing the subtleties of the human hand, but the vision for the project expanded early on to include nonverbal communication more broadly and embrace a diversity of human movement.

Taking cues from daily life

Gesture recognition is only a small subset of what Soli can do. Our team’s broader goal is to create a new interaction model built on the understanding of body language and gesture. That’s not to say that gestures—particularly those without tactile affordances—as a remote control for your devices isn’t novel, but that a larger shift is underway.

Soli is capable of reading much more than gestures or explicit interactions, because it can also detect implicit signals, like presence, and body language. The two are related because they both deal with movements of the human body, and when combined allow a framework that’s inspired by the way people normally organize their social interactions in daily life. How to behave in a social context, like at a bar or party, is different than when you’re home alone and relaxing on the couch. Soli, by design, understands what’s happening around the device and therefore has the ability to interpret human intent by moving through three different states: aware, engaged, active.

During our research, we also turned to dance theorist Rudolf von Laban, to better understand the notion of body attitudes. We learned that the way that we hold our body, in terms of posture and movement, reflects inner intention—and by understanding and leveraging each subtle nuance, it’s possible to better determine user intention. For example, similar to entering a room and registering the people around you, Soli mirrors that initial awareness state—first understanding what’s happening around the device. If you want to start talking to someone in a crowded room, you’ll first need to get their attention. Soli makes the same distinction, waiting for behavioral cues like a reach or lean toward the phone to engage and anticipate what you’re going to want to do next.

Even still, technology doesn’t really read us as living, breathing humans—at least not yet. There’s a famous drawing in Dan O’Sullivan's Physical Computing that depicts how computers see us: essentially a finger and an eye. Though it was done in 2004, the image is still pretty applicable. Our devices understand our digital context well but less so our physical context. With Soli, there’s potential for that to change; for our technology to gain the capability to understand and “see” us as people.

Reading the room

Personal distance may be top of mind now, but the interplay between body language and closeness is well studied. ATAP’s assumption is that proxemics—the field or branch of study that deals with the amount of space people feel it necessary to set between themselves and others—can be leveraged in interaction design to mediate our relationship with technology. Typically, people expect increasing engagement and intimacy as they approach others, so it’s natural that they expect increasing connectivity and interaction with their devices, too.

At the core of proxemic theory, cultural anthropologist Edward T. Hall identifies four different spaces: intimate distance for embracing, touching, or whispering; personal distance for interactions among good friends or family; social distance for interactions among acquaintances; and public distance used for public speaking. A good example of social distance in action is when you’re in an elevator—there’s this constant shifting in space when someone new steps in or out, as people move to accommodate one another.

Everyone possesses an inherent sense of how to negotiate space, without even thinking about it. As a person, you don’t have to learn any of this because it’s what we normally do. At ATAP, we’re trying to understand how we can use this intuitive spatial understanding that we have as humans—in an interaction with a device.

Can a device learn to behave, so to speak? With Soli, it appears that it’s possible to make technology not just smarter, but eventually more polite as well. A device that knows to operate at a lower volume because you’re nearby or speaking in a whisper, is already more considerate. And with the understanding of body language, a device can discriminate between when you’re talking to it directly, or to other people in the room. When you have a conversation with another person, it’s very easy to understand when to start and stop talking based on their body orientation and spatial cues. Right now, this type of interaction is really hard to do with technology; it’s why you have to start every assistant conversation with “Okay, Google.” Imagine a device with a full understanding of non-verbal human communication and the potential of this technology comes into focus.

Adapting to us

A new interaction language that considers personal, social, and spatial awareness is just the beginning. As technology progresses, the ultimate goal is devices that truly understand us, so we don’t need to spend precious moments trying to explain ourselves. This means less time managing the technology in your life and the distractions it brings. Take something as simple as silencing an alarm. As soon as you wake up, you have to reach for the phone, pick it up, bring it to your face and find a little button to press. Once you’re there, you see a push notification with news that you can’t ignore or maybe you hop on Twitter, and soon you’re down the rabbit hole of your digital life—before you’ve even gotten out of bed. Imagine a different scenario where you’re far away from your device and interface elements appear larger automatically, shrinking as you approach; your voice assistant provides more information without prompting, because it understands you can’t see the screen. In contrast, new patterns could mean an end to these small but time-consuming microtasks, like switching your phone on and off, that keep you in the digital world longer than you intended. And it would free us up to spend more time connecting with other people, and being present in the physical world.

Eventually, there might even be real-time learning for gestures, so the machine can adapt and relate to you, as a person, specifically. What once felt robotic, will take on new meaning. More importantly, the next generation of Soli could embrace the beauty and diversity of natural human movement, so each of us could create our own way of moving through the world with technology—just as we do in everyday life.