logo

Your Voice is My Passport

Conference:  BlackHat USA 2018

2018-08-09

Summary

The presentation discusses the process of impersonating a target's voice using machine learning and the broader context of machine learning for offense in cybersecurity.
  • Impersonating a target's voice requires scraping data from a publicly available source, selecting high-quality samples, transcribing and chunking the audio, and using data augmentation techniques such as shifting pitch.
  • The quality and quantity of data are important, but scraping data from a public source limits the amount of high-quality data available.
  • Data augmentation techniques can multiply the training data set and reduce the amount of manual transcription required, but can also introduce potential overfitting.
  • The presentation also discusses the broader context of machine learning for offense in cybersecurity, including adversarial attacks, poisoning the well, and attacks using machine learning systems.
  • An anecdote is given about the potential security implications of adversarial attacks on self-driving systems.
The speaker gives an example of adversarial attacks on self-driving systems, where a stop sign has been intelligently tweaked to be misclassified as a yield sign by the machine learning system. This highlights the potential security implications of adversarial attacks.

Abstract

Financial institutions, home automation products, and hi-tech offices have increasingly used voice fingerprinting as a method for authentication. Recent advances in machine learning have shown that text-to-speech systems can generate synthetic, high-quality audio of subjects using audio recordings of their speech. Are current techniques for audio generation enough to spoof voice authentication algorithms? We demonstrate, using freely available machine learning models and limited budget, that standard speaker recognition and voice authentication systems are indeed fooled by targeted text-to-speech attacks. We further show a method which reduces data required to perform such an attack, demonstrating that more people are at risk for voice impersonation than previously thought.

Materials:

Tags:

Post a comment

Related work

Conference:  Defcon 26
Authors:
2018-08-01

Conference:  BlackHat USA 2019
Authors:
2019-08-07