The presentation discusses the use of text-to-speech technology to bypass voice authentication systems and the implications of this technology in cybersecurity.
- Text-to-speech technology can be used to bypass voice authentication systems by generating audio that sounds like the target's voice.
- Voice models are trained on a single person's voice and require high-quality audio data of at least 24 hours.
- Open source datasets such as LJ speech and Blizzard can be used for training voice models.
- The technology has implications in cybersecurity as it can be used for impersonation and social engineering attacks.
- The presentation provides an anecdote from the movie Sneakers to illustrate the concept of social engineering to obtain voice data.
The presentation provides an anecdote from the movie Sneakers where the heroes bypass a voice authentication system by social engineering their target to say specific words in the passphrase. This is difficult to do in practice as the people you want to impersonate are busy and may not want to sit down with you. Additionally, authentication prompts are smart and randomly change, making it difficult to obtain the necessary voice data. However, text-to-speech technology can be used to generate audio that sounds like the target's voice, bypassing the need for actual voice data.