logo

Human or Not: Can You Really Detect the Fake Voices?

Conference:  Black Hat USA 2022

2022-08-11

Summary

The presentation discusses a new concept called Speaker Humanity Features, which can be used to bypass existing voice detection approaches. The proposed method involves merging the extracted features with existing AI waste runners to create a tool called Safety Twisting Safety, which can be used to bypass existing detection approaches.
  • Existing voice detection approaches have problems and use only on-stage data sets
  • The proposed Speaker Humanity Features can be used to bypass existing detection approaches
  • The method involves merging the extracted features with existing AI waste runners to create a tool called Safety Twisting Safety
  • The tool can be used to bypass existing detection approaches
  • The speaker evaluation is important to merge with Safety Twisting Safety
  • The steps to make an a mask really speak are successfully demonstrated
The presentation includes a demo based on the recording of Elon Musk using Safety Twisting Safety. The original recording is played, followed by the manipulated recording, which sounds like Elon Musk is offering everyone 10 million dollars. The demo illustrates how the proposed method can be used to bypass existing voice detection approaches.

Abstract

Voice is an essential medium for humans to transfer information and build trust, and the trustworthiness of voice is of great importance to humans. With the development of deep learning technologies, attackers have started to use AI techniques to synthesize and even clone human voices. To combat the misuse of such techniques, researchers have proposed a series of AI-synthesized speech detection approaches and achieved very promising detection results in laboratory environments. Can these approaches really be as effective in the real world as they claim to be? This study provides an in-depth analysis of these works, identifies a set of potential problems, and designs a novel voice clone attack framework, SiF-DeepVC, based on these problems. This study first proposes the idea "bypass fake voice detection using speaker-irrelative features" and proves that detecting AI-synthesized speeches is still highly challenging, and existing approaches are not applicable in the real world. In a word, the Red is still far ahead of the Blue.

Materials:

Tags:

Post a comment

Related work

Conference:  Defcon 31
Authors: Alessandro Magnosi Principal Security Consultant - BSI, Arash Parsa, Athanasios "trickster0" Tserpelis Red Teamer and Malware Developer
2023-08-01

Conference:  Transform X 2022
Authors: Alan Cowen
2022-10-19

Conference:  Black Hat Asia 2023
Authors: Maxine Holt, Marina Krotofil, Tara Seals, Fyodor Yarochkin, Stefano Zanero
2023-05-11



Conference:  BlackHat USA 2019
Authors:
2019-08-07