Listening to Sounds of Silence for Audio replay attack detection

Published in International Conference on Signal Processing and Intelligent Systems (ICSPIS), 2021

Authors

Mohammad Hajipour, Mohammad Ali Akhaee, Ramin Toosi

Abstract

Automatic Speaker Verification (ASV) is a biometric authentication system identifying a person based on the voice presented to a system. Nowadays, due to the widespread use of these systems, various attacks are carried out on them. These attacks are in four different formats, which are impersonation, speech synthesis, voice conversion and replay attack. One of the most commonly used attacks is replay attack due to its simplicity. The purpose of this study is to provide a countermeasure system against replay attacks. We found that the effect of noises generated by different recorders and playback devices on the spoof samples can be used as a criterion for attack detection. So this study analyzes the silent parts of the speech signal that include the noises of various recording and playback devices. Also due to the proper operation of deep convolutional neural networks in classification applications, we propose an ensemble classifier based on end to end neural networks architecture and residual structures to accurately distinguish spoof utterances from genuine ones. We have decreased the t-DCF metric on ASVspoof2019 database by almost 16% compared to similar models that have processed on full speech signals.