Interpreting and translating live speech is much trickier than simply processing written text. Indeed, unlike human brains, machines would typically need to go through three separate phases to convert oral communication from one language to another. Initially, speech would need to be interpreted by the machine and transcribed into text, which would then be translated into the target dialect, before being fed into a text-to-speech engine to be spoken out loud. Although this cascaded process is transparent for the user and relatively fast, Google is working on a more natural speech-to-speech method it called Translatotron, which doesn't need intermediate processing for translation.

