再读快速梯度符号方法(FGSM)

2022-02-04  本文已影响0人  莫底凯

2022-02-04

Sec. 3: THE LINEAR EXPLANATION OF ADVERSARIAL EXAMPLES

Consider the dot product between a weight vector w and an adversarial example \tilde{x}: w^T\tilde{x}=w^Tx+w^T\eta The adversarial perturbation causes the activation to grow by w^T\eta. We can maximize this increase subject to the max norm constraint on \eta by assigning \eta=\text{sign}(w).

  1. 这里,如果\eta=\text{sign}(w),怎能满足\Vert\eta\Vert_{\infty}<\epsilon?从下文来看应该是\eta=\epsilon \cdot \text{sign}(w).
    If w has n dimensions and the average magnitude of an element of the weight vector is m, then the activation will grow by \epsilon m n.
  2. 这里的目标优化式应为:\max_{w}{w^T\eta} \ \ \text{s.t.} \ \ \Vert\eta\Vert_{\infty}<\epsilon 由于须满足\Vert\eta\Vert_{\infty}<\epsilon,那么\eta的最大值最大只能为\epsilon了;同时,为了最大化w^T\eta,需要保证\eta的符号与w一致,因此,\eta=\epsilon \cdot \text{sign}(w)

注:什么是Max norm constraints?下面是来自CS231n课程的答案:
Max norm constraints. Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector \vec{w} of every neuron to satisfy \Vert\vec{w}\Vert_{2}. Typical values of c are on orders of 3 or 4. Some people report improvements when using this form of regularization. One of its appealing properties is that network cannot “explode” even when the learning rates are set too high because the updates are always bounded.

Sec 4: LINEAR PERTURBATION OF NON-LINEAR MODELS

Let \theta be the parameters of a model, x the input to the model, y the targets associated with x (for machine learning tasks that have targets) and J(\theta; x; y) be the cost used to train the neural network. We can linearize the cost function around the current value of \theta, obtaining an optimal max-norm constrained perturbation of
\eta=\epsilon \cdot \text{sign} \left( \nabla_{x} J(\theta; x; y) \right) 这里,有如下几个问题:

  1. 为什么是对x求导呢?因为我们要扰动的是x.
  2. 在Sec. 3中是取w\text{sign},这里为什么是取\nabla_{x} J(\theta; x; y)\text{sign}?可以简单的理解为:如果是线性分类器的话,\nabla_{x} J(\theta; x; y)的结果就是参数w
上一篇下一篇

猜你喜欢

热点阅读