Obfuscated Gradients Give a Fals

2019-08-03  本文已影响0人  winddy_akoky

1. 介绍

Kurakin, A., Goodfellow, I., and Bengio, S. Adversar- ial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016a.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to ad- versarial attacks. International Conference on Learning Representations, 2018. URL https://openreview. net/forum?id=rJzIBfZAb. accepted as poster.
Carlini, N. and Wagner, D. Towards evaluating the robust- ness of neural networks. In IEEE Symposium on Security & Privacy, 2017c.

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Ce- lik, Z. B., and Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communica- tions Security, ASIA CCS ’17, pp. 506–519, New York, NY, USA, 2017. ACM. ISBN 978-1-4503-4944-4. doi: 10.1145/3052973.3053009. URL http://doi.acm. org/10.1145/3052973.3053009.

  1. shattered gradients are: 破碎梯度是不存在的或不正确的梯度,要么是有意地通过不可微操作造成的,要么是无意地通过数值不稳定造成的.
  2. stochastic gradients: 随机梯度依赖于测试时间的随机性.
  3. vanishing/exploding gradients: 在非常深的计算中,消失/爆炸梯度会导致不可用的梯度。

Athalye, A., Engstrom, L., Ilyas, A., and Kwok, K. Syn- thesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.

2. 准备

符号

对抗样本

给定图片x和分类器f(\cdot),对抗样本x^{\prime}满足两个性质:根据某个量化标准,其距离\mathcal{D}\left(x, x^{\prime}\right)必须足够小,且c\left(x^{\prime}\right) \neq c^{*}(x)

数据集和模型

威胁模型

攻击方法

3. 混淆梯度

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Ce- lik, Z. B., and Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communica- tions Security, ASIA CCS ’17, pp. 506–519, New York, NY, USA, 2017. ACM. ISBN 978-1-4503-4944-4. doi: 10.1145/3052973.3053009. URL http://doi.acm. org/10.1145/3052973.3053009.

Tram`er, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., and McDaniel, P. Ensemble adversarial train- ing: Attacks and defenses. International Conference on Learning Representations, 2018. URL https:// openreview.net/forum?id=rkZvSe-RZ. ac- cepted as poster.

Shattered Gradients

Stochasitc Gradients

Exploding & Vanishing Gradients

3.1 识别混淆梯度和掩码梯度

4. 攻击技术

4.1 向后传播可微估计

4.1.1 一个特殊的例子:The Straight-Through Estimator

4.1.2 广义的攻击:BPDA

4.2 攻击随机化分类器

Athalye, A., Engstrom, L., Ilyas, A., and Kwok, K. Syn- thesizing robust adversarial examples. arXiv preprint arXiv:1707.07397, 2017.

4.3 重新参数化

5. 案例研究:ICLR 2018 防御

Raghunathan, A., Steinhardt, J., and Liang, P. Certified de- fenses against adversarial examples. International Confer- ence on Learning Representations, 2018. URL https: //openreview.net/forum?id=Bys4ob-Rb.

Sinha, A., Namkoong, H., and Duchi, J. Certifiable distri- butional robustness with principled adversarial training. International Conference on Learning Representations, 2018. URL https://openreview.net/forum? id=Hk6kPgZA-.

Tram`er, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., and McDaniel, P. Ensemble adversarial train- ing: Attacks and defenses. International Conference on Learning Representations, 2018. URL https:// openreview.net/forum?id=rkZvSe-RZ. ac- cepted as poster.

Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Schoenebeck, G., Houle, M. E., Song, D., and Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. International Conference on Learning Representations, 2018. URL https://openreview. net/forum?id=B1gJ1L2aW. accepted as oral pre- sentation.

上一篇 下一篇

猜你喜欢

热点阅读