In addition to working towards the principles above, we wrestled with a few key decisions while prototyping.
Training versus inference
One big question was if “training” and “inference” should be separate interaction modes. Traditionally, the training of a machine learning model is done by teaching it with a large dataset of examples. You might train the model on a ton of pictures of oranges (as shown before) and switch to inference mode—asking the model to “infer,” or make a guess, about a new picture. You could show it a lemon, and see how confident it is that it’s similar to an orange.
Early in the design process, we considered displaying these two modes in the interface. The model would start in “training” mode and learn from the inputs before users switched it to “inference” mode to test the model.
While this clearly communicated what training and inference mean when creating a model, it took longer to get a model up and running. Plus, the words “training” and “inference”—while useful to know if you work in the field—aren’t crucial for someone just getting a feel for machine learning. The final version of Teachable Machine is always in “inference” mode unless it’s being actively trained. You can hold the button to train for a second, release it to see how the model is working, and then start training it again. This quick feedback loop, and the ability to get a model up and running fast, were higher priorities than explicitly delineating the two modes in the interface.
Thresholds versus gradients
We also debated about how to demonstrate the model’s effect on the output. Should an output be triggered by the confidence level reaching a certain threshold (say, play a GIF when the confidence of a class reaches 90%?) Or does it interpolate between different states based on the confidence of each class (e.g. shifting between red and blue so that if the model is 50% confident in both classes, it shows purple?) One could imagine an interface that interpolates between two different sounds based on how high your hand is, or one that just switches between the two sounds when your hand crosses the confidence barrier.
The outputs we found to be familiar and easy to get started with were discrete pieces of media—GIFs, sound bytes, and lines of text. Because switching between those pieces of media is clearer than blending between them, we ended up switching the outputs discretely. When the model switches which class has the highest confidence level, we switch to the output associated with that class.
Neutral class versus no neutral class
For many of the outputs we originally designed, having a “neutral” or “do-nothing” class made it easier to create games. For example, if you were playing Pong, you might want one class to make the paddle move up, one class to make it move down, and a “neutral” state where it does nothing. We considered having a dedicated “neutral” state in our interface that always acted as a “don’t do anything” catch-all class.
Having a dedicated neutral state might help you build a mental model for how to control a specific game, but it’s not quite an honest representation of how the model works. The model doesn’t care which class is neutral—from its perspective, “neutral” is a class just like any other, just with a different name and design on the surface. We didn’t want people thinking that a “neutral” state was intrinsically part of training an ML model. To that end, we eliminated some outputs that were frustrating or confusing without a neutral state. Our final outputs, gifs, sound, and text, work just by switching the output when you switch classes—no need for a neutral state. Just an honest one.