A template for writing scientific papers

Tom Dörr

Towards Resistant Audio Adversarial Examples

Tom Dörr^*, Karla Markert^*, Nicolas M. Müller, Konstantin Böttinger

Fraunhofer AISEC

tom.doerr@tum.de; karla.markert@aisec.fraunhofer.de; nicolas.mueller@aisec.fraunhofer.de; konstantin.boettinger@aisec.fraunhofer.de

^*Both authors contributed equally to this research.

Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition is also susceptible to adversarial attacks. However, reliably bridging the air gap (i.e., making the adversarial examples work when recorded via a microphone) has so far eluded researchers. We find that due to flaws in the generation process, state-of-the-art adversarial example generation methods cause overfitting because of the binning operation in the target speech recognition system (e.g., Mozilla Deepspeech). We devise an approach to mitigate this flaw and find that our method improves generation of adversarial examples with varying offsets. We confirm the significant improvement with our approach by empirical comparison of the edit distance in a realistic over-the-air setting. Our approach states a significant step towards over-the-air attacks. We publish the code and an applicable implementation of our approach.

DOI: 10.1145/3385003.3410921

Note that this is a limited PDF or print version; animated and interactive figures are disabled. For the full version of this article, please visit https://andrewgyork.github.io/publication_template

Your browser doesn't seem to support Javascript. This document uses Javascript for interactive figures, math typesetting, and to automatically generate the reference list. Either activate Javascript, or use the "Download PDF" link above if you want properly typeset math and a reference list.

Examples

Without Offset Training

**Figure 1: The plot shows the edit distance between prediction and targeted adversarial label for settings 1 - 4 from Table 1 in the paper** (y-axis: edit distance; x-axis: added offset in samples).

With Offset Training

**Figure 2: Offset analysis for four adversarial audio files** (same original - target combination as in Figure 1) **that were generated with the offset training** (y-axis: edit distance; x-axis: added offset in samples). Axis limits chosen as in Figure 1 to improve comparability.