Three stylized figures enjoy leisure activities. One plays basketball, another climbs stairs with a magnifying glass, and the third uses a telescope. Thought bubbles show a slice of pizza, a coffee cup, and a rainbow.

ML and the Evolution of Web-Based Experiences: Fast, Real-Time, and Fully Interactive

Lessons from designing “Emoji Scavenger Hunt”

The advent of Machine Learning (ML) is clearly a groundbreaking moment in modern computer science. As designers—and as users—we’ve already seen tangible impacts: ML can help to transform medical diagnoses, improve energy efficiency in data centers, and even identify bowls of ramen by shop.

ML has also enabled the development of new, cutting-edge products and user experiences, creating exciting opportunities for web designers. In March, Google announced TensorFlow.js, TensorFlow’s open-source framework for ML with JavaScript. TensorFlow.js lets web developers train and deploy ML models right in web browsers like Google Chrome. In other words, ML is now publicly available and accessible to anyone with an Internet connection. But what does this really mean for web designers?

Emoji Scavenger Hunt. The game will show you an emoji, and you have to find its real-world version before time expires. While you search, the neural network will try and guess what it’s seeing.

Google Brand Studio recently released Emoji Scavenger Hunt, a fun mobile web game powered by TensorFlow.js. The game is pretty simple: it shows you an emoji, and you use your phone’s camera to find the object in the real world before the clock runs out. Find it in time and you advance to the next emoji.

Players have hunted more than 2 million emoji around the world; to date, they’ve found 85k different types of 💡and 66k pairs of 👖. Finding ✋ seems pretty easy (2.91 seconds on average) while hunting 📭 was a little harder (21.2 seconds). But, how does the game accurately identify images? For instance, how does it know the timekeeping device on your wrist is a watch? This is where ML comes into play.

Browser-based machine learning is a game-changer for web designers

Kyle McDonald’s tweet and retweet of Takashi Kawashima announcing the Emoji Scavenger Hunt. The game uses mobile browsers, cameras, and machine learning to find emojis.

Media artist Kyle McDonald expects a combination of real-time ML and Mobile browsers with sensors will be opening up a lot of possibilities to be explored.

ML has revealed ways to enhance product experiences; similarly, ML in browsers brings many new, previously unseen interaction design opportunities for web designers. In the case of Emoji Scavenger Hunt, we wanted to create a fast-paced, fun, and straightforward experience--much like the concept of communicating with emojis—which web-based ML helped us to accomplish.

Enabling superfast real-time interactions

When playing Emoji Scavenger Hunt, you point your phone or laptop’s camera at an object, but the distance, light, and angle can all vary. It’s impossible to predict all the different ways you can capture an object on your phone. Yet even I was surprised to see how quickly our ML model identified objects; on my Pixel 2 phone, the image prediction algorithm ran 15 times per second, and even faster on my laptop (60 times per second). The game’s algorithm runs so swiftly that it’s constantly predicting matches as you move your phone, significantly improving the likelihood of a correct guess. This results in a superfast real-time interaction experience, making the game smooth and enjoyable to play.

One of the main reasons why Tensorflow.js is fast is because it utilizes WebGL, a JavaScript API that allows you to render graphics in browsers using the device’s Graphic Processing Unit (GPU). This speeds up the execution of neural networks, while allowing you to run ML models locally on individual devices without having to access the server, or make trips to and from the backend. By speeding up the ML model, nearly 500 daily objects—from 👖 to 🐱 and 🍔 to 🍲—can be identified almost instantly.

A tweet showcasing an augmented reality game on a phone, using TensorFlow.js. The game uses the phone's camera to identify objects in real time with impressive accuracy.

Real-time ML based image classifications on Chrome with Pixel2 XL. The debug window shows the ML model updating a list of detected objects and confidence level scores about 15 times a second. Try it for yourself by accessing this URL.

Cacheable files and client-side computations mean quick load times

If you’ve ever spent time waiting for a website to load, you know that speed is critical to a good web experience. Even if your ML model is brilliant, if it takes too long to load, users won’t engage with your experience. This is where the TensorFlow.js converter can be helpful. It converts existing TensorFlow models into up to 75% smaller and cacheable files that can be run directly in the browser.

For example, the prediction model we used for Emoji Scavenger Hunt is only a couple of megabytes—about the size of a single image on your phone. Once it’s loaded, the files are saved locally on the device so the game runs even faster on subsequent loads.

Another benefit of browser-based ML is that it allows all the ML computations—in this case, image recognition—to happen on the client-side (e.g. within their browser), while a conventional ML experience normally requires a lot of processing power on the server side. For Emoji Scavenger Hunt, the server only has to access website assets like graphics and the actual html files during game play. This makes the backend scalability relatively easy and cost effective.

Power of the web meets power of ML

Although most designers and developers today put so much focus on developing apps, the web is still an incredibly powerful medium. It’s cross-platform, and works with all kinds of different devices, from mobile and tablet to desktop, and across different OS (Android, iOS and Mac, Windows, etc.), with just one URL. Unlike apps, there’s no need to download and install, and it doesn’t require complex configurations. With the web, users are just one tap away from diving into your experience. And, of course, web-based content and experiences are really easy to share.

Today, people crave quick, fun experiences; combining the power of the web with ML allows for powerful new interactions utilizing a device’s own sensors.

Using a device’s camera for image recognition is just one example. Using the Generic Sensor API, web developers can now access a range of device sensors including the accelerometer, microphone, and GPS. By combining device sensors with in-browser ML, you can imagine and design any number of new interactions.

ML and the future

We’re just starting to see the many ways ML can transform web development. There’s a whole world of potential applications just waiting to be explored, and I can’t wait to see all the new interactive experiences people will design. Working on a new project? Share it with us by using the #tensorflowjs hashtag, or submit your project to AI Experiments. And if you’re interested in the technical side of this project, all the code is available on GitHub.

—

This work was made possible through a collaboration between Brand Studio and the TensorFlow.js team at Google. I’d also like to thank Jacques Bruwer, Jason Kafalas, Shuhei Iitsuka, Cathy Cheng, Kyle Gray, Blake Davidoff, Kyle Conerty, Daniel Smilkov, Nikhil Thorat, Ping Yu, and Sarah Sirajuddin.

Takashi Kawashima* is a designer and creative lead at Google Brand Studio. Prior to joining the team, he spent three years as an art director for the Google Data Arts Team where he worked on Chrome Experiments.