ML and the Evolution of Web-Based Experiences: Fast, Real-Time, and Fully Interactive
Lessons from designing “Emoji Scavenger Hunt”
The advent of Machine Learning (ML) is clearly a groundbreaking moment in modern computer science. As designers—and as users—we’ve already seen tangible impacts: ML can help to transform medical diagnoses, improve energy efficiency in data centers, and even identify bowls of ramen by shop.
Google Brand Studio recently released Emoji Scavenger Hunt, a fun mobile web game powered by TensorFlow.js. The game is pretty simple: it shows you an emoji, and you use your phone’s camera to find the object in the real world before the clock runs out. Find it in time and you advance to the next emoji.
Players have hunted more than 2 million emoji around the world; to date, they’ve found 85k different types of 💡and 66k pairs of 👖. Finding ✋ seems pretty easy (2.91 seconds on average) while hunting 📭 was a little harder (21.2 seconds). But, how does the game accurately identify images? For instance, how does it know the timekeeping device on your wrist is a watch? This is where ML comes into play.
Browser-based machine learning is a game-changer for web designers
ML has revealed ways to enhance product experiences; similarly, ML in browsers brings many new, previously unseen interaction design opportunities for web designers. In the case of Emoji Scavenger Hunt, we wanted to create a fast-paced, fun, and straightforward experience--much like the concept of communicating with emojis—which web-based ML helped us to accomplish.
Enabling superfast real-time interactions
When playing Emoji Scavenger Hunt, you point your phone or laptop’s camera at an object, but the distance, light, and angle can all vary. It’s impossible to predict all the different ways you can capture an object on your phone. Yet even I was surprised to see how quickly our ML model identified objects; on my Pixel 2 phone, the image prediction algorithm ran 15 times per second, and even faster on my laptop (60 times per second). The game’s algorithm runs so swiftly that it’s constantly predicting matches as you move your phone, significantly improving the likelihood of a correct guess. This results in a superfast real-time interaction experience, making the game smooth and enjoyable to play.
Cacheable files and client-side computations mean quick load times
If you’ve ever spent time waiting for a website to load, you know that speed is critical to a good web experience. Even if your ML model is brilliant, if it takes too long to load, users won’t engage with your experience. This is where the TensorFlow.js converter can be helpful. It converts existing TensorFlow models into up to 75% smaller and cacheable files that can be run directly in the browser.
For example, the prediction model we used for Emoji Scavenger Hunt is only a couple of megabytes—about the size of a single image on your phone. Once it’s loaded, the files are saved locally on the device so the game runs even faster on subsequent loads.
Another benefit of browser-based ML is that it allows all the ML computations—in this case, image recognition—to happen on the client-side (e.g. within their browser), while a conventional ML experience normally requires a lot of processing power on the server side. For Emoji Scavenger Hunt, the server only has to access website assets like graphics and the actual html files during game play. This makes the backend scalability relatively easy and cost effective.
Power of the web meets power of ML
Although most designers and developers today put so much focus on developing apps, the web is still an incredibly powerful medium. It’s cross-platform, and works with all kinds of different devices, from mobile and tablet to desktop, and across different OS (Android, iOS and Mac, Windows, etc.), with just one URL. Unlike apps, there’s no need to download and install, and it doesn’t require complex configurations. With the web, users are just one tap away from diving into your experience. And, of course, web-based content and experiences are really easy to share.
Today, people crave quick, fun experiences; combining the power of the web with ML allows for powerful new interactions utilizing a device’s own sensors.
Using a device’s camera for image recognition is just one example. Using the Generic Sensor API, web developers can now access a range of device sensors including the accelerometer, microphone, and GPS. By combining device sensors with in-browser ML, you can imagine and design any number of new interactions.
ML and the future
We’re just starting to see the many ways ML can transform web development. There’s a whole world of potential applications just waiting to be explored, and I can’t wait to see all the new interactive experiences people will design. Working on a new project? Share it with us by using the #tensorflowjs hashtag, or submit your project to AI Experiments. And if you’re interested in the technical side of this project, all the code is available on GitHub.
This work was made possible through a collaboration between Brand Studio and the TensorFlow.js team at Google. I’d also like to thank Jacques Bruwer, Jason Kafalas, Shuhei Iitsuka, Cathy Cheng, Kyle Gray, Blake Davidoff, Kyle Conerty, Daniel Smilkov, Nikhil Thorat, Ping Yu, and Sarah Sirajuddin.
Takashi Kawashima* is a designer and creative lead at Google Brand Studio. Prior to joining the team, he spent three years as an art director for the Google Data Arts Team where he worked on Chrome Experiments.