CVNN

2022-03-16  本文已影响0人  咚咚董dyh

Complex valued Neural Network

几个复数概念:

几个概念的价值和定位:

复数导数

复变函数f(z=x+yj) = F(x,y) = u(x,y) + v(x,y)j,其中x=Re(z), y=Im(z)为实数,u,v为实变函数,取自f的实部和虚部。u,v之于fx,y之于z。复数导数的极限定义为:

f'(z) = \lim_{Δz \to 0, Δz \in C} \frac{f(z+Δz) - f(z)}{Δz}

柯西-黎曼方程

柯西-黎曼方程(Cauchy–Riemann equations, CR)核心思想导数极限定义中沿实轴或虚轴逼近(实部Δx \to 0或虚部Δyj \to 0j)所得导数相等,即复可微。

f'(z) = \lim_{Δx \to 0, Δx \in R} \frac{f(z+Δx) - f(z)}{Δx} = \frac{\partial F}{\partial x}(z) \\ f'(z) = \lim_{Δy \to 0, Δyj \in Rj} \frac{f(z+Δyj) - f(z)}{Δyj} = -j * \frac{\partial F}{\partial y}(z) \\ \frac{\partial F}{\partial x}(z) = -j * \frac{\partial F}{\partial y}(z) \\ \frac{\partial F}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x}j \\ -j * \frac{\partial F}{\partial y} = -j*(\frac{\partial u}{\partial y} + \frac{\partial v}{\partial y}j) = \frac{\partial v}{\partial y} - \frac{\partial u}{\partial y}j

假设在点z=(x,yj)u,v可微/可导(不必连续可微/可导),偏导数存在(这是后续所有结论的前提条件)。当且仅当u,v偏导数满足下列CR方程时(充要条件),f,F(两者等价)复可微。

实数形式:

\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y} \\ \frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}

复数形式:

j*\frac{\partial F}{\partial x} = \frac{\partial F}{\partial y} \\

结合复数形式和Wirtinger Calculus \frac{\partial f}{\partial z^*} = 1/2 * (\frac{\partial F}{\partial x} + 1j * \frac{\partial F}{\partial y})可得下列形式,即f独立(无关)于变量z^*=x-yjz的共轭):

\frac{\partial f}{\partial z^*} = 0

全纯函数

复变函数f(z=x+yj) = F(x,y) = u(x,y) + v(x,y)j在定义域上(复数域\mathbb{C}的一个连续开放子域,开集)处处可微(满足CR等式),则f,F全纯。

若复变函数f(z)z^*相关,则f(z)一定非全纯。如实值复变函数(非常函数)f(z) = \frac{z+z^*}{2} = x+0jv=0,因此\frac{\partial v}{\partial x} = \frac{\partial v}{\partial y} = 0,而\frac{\partial u}{\partial x} = 1,不满足CR不等式。

几个等价的陈述:

Wirtinger Calculus

对任意f(z), z = x + yj(不必全纯)必然可以转换为G(z,z^*)(注意此处两个函数不同,但是等价)。转换方法:

\begin{aligned} x &= \frac {z + z^*}{2} \\ y &= \frac {z - z^*}{2j} \end{aligned}

若将z,z^*中的一个视为常量,G(z,z^*)则变为形式上的全纯函数,因而存在偏导数\frac{\partial G}{\partial z}。“形式上”是因为z,z^*中一个为常量时,另一个不可能为变量。“全纯”是因为z,z^*中一个为常量时,G(z,z^*)“形式上”变成一个复值复变函数,所以“全纯”。另一种理解思路是\frac{\partial G}{\partial z}\mathbb{R}-differentiable”,相当于从r(x,y)坐标系切换到c(z,z^*)坐标系。如:

f(z) = Re(z) = F(x,y) = x = \frac{z+z^*}{2} = G(z,z^*) \\ f(z) = |z|^2 = F(x,y) = x^2 + y^2 = zz^* = G(z,z^*)

x,y分别利用链式法则求偏导,得到\frac{\partial F}{\partial x},\frac{\partial F}{\partial y}
\begin{aligned} \frac{\partial F}{\partial x} &= \frac{\partial G}{\partial z}\frac{\partial z}{\partial x} + \frac{\partial G}{\partial z^*}\frac{\partial z^*}{\partial x} \\ &= \frac{\partial G}{\partial z} + \frac{\partial G}{\partial z^*} \\ \\ \frac{\partial F}{\partial y} &= \frac{\partial G}{\partial z}\frac{\partial z}{\partial y} + \frac{\partial G}{\partial z^*}\frac{\partial z^*}{\partial y} \\ &= 1j * (\frac{\partial G}{\partial z} - \frac{\partial G}{\partial z^*}) \end{aligned}

由上式可得到Wirtinger Calculus/Derivatives:

\begin{aligned} \frac{\partial G}{\partial z} &= 1/2 * (\frac{\partial F}{\partial x} - 1j * \frac{\partial F}{\partial y}) \\ \frac{\partial G}{\partial z^*} &= 1/2 * (\frac{\partial F}{\partial x} + 1j * \frac{\partial F}{\partial y}) \end{aligned}

也可通过链式法则\frac{\partial G}{\partial x}\frac{\partial x}{\partial z}...得出Wirtinger Derivatives。注意几个导数的存在情况(存在即可导):

由Wirtinger Derivatives可得下述关系,说明z,z^*是不相关的变量,对其一求导时,另一个可看做常量。

\begin{aligned} \frac{\partial z^*}{\partial z} &= 1/2 * (1 - 1j * (-1j)) = 0 \\ \frac{\partial z}{\partial z^*} &= 1/2 * (1 + 1j * 1j) = 0 \end{aligned}

\frac{\partial G}{\partial z^*}符合CVNN所需梯度形式,忽略系数1/2,将其规约到学习率中,可用于复数权重参数更新。

f,F为全纯函数时,根据CR方程,Wirtinger derivatives变为(和上面复导数定义一致):

\begin{aligned} \frac{\partial f}{\partial z} &= \frac{\partial G}{\partial z} = \frac{\partial F}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x}j \\ \frac{\partial f}{\partial z^*} &= \frac{\partial G}{\partial z^*} = 0 \end{aligned}

复导数对共轭运算的特性:

链式法则

对于全纯函数,链式法则同实数的链式法则。对于损失函数l(z)=L(z,z^*)为实值的CVNN,因为\frac{\partial l}{\partial z}不存在,故需利用Wirtinger derivatives(\frac{\partial L}{\partial z}形式上存在)进行链式法则。

给定CVNN,实值损失函数为L(s,s^*)s = f(z) = G(z,z^*)为前向输出,z为前向输入,\frac{\partial L}{\partial s^*}为反向输入grad_output,求反向输出\frac{\partial L}{\partial z^*}(梯度),链式法则如下:

\begin{aligned} \frac{\partial L}{\partial z^*} &= \frac{\partial L}{\partial s} * \frac{\partial s}{\partial z^*} + \frac{\partial L}{\partial s^*} * \frac{\partial s^*}{\partial z^*} \\ &= (\frac{\partial L}{\partial s^*})^* * \frac{\partial s}{\partial z^*} + \frac{\partial L}{\partial s^*} * (\frac{\partial s}{\partial z})^* \\ &= \boxed{ (grad\_output)^* * \frac{\partial s}{\partial z^*} + grad\_output * {(\frac{\partial s}{\partial z})}^* } \\ \end{aligned}

\frac{\partial L}{\partial z^*} = 2 * grad\_output * \frac{\partial s}{\partial z^*}

\frac{\partial L}{\partial z^*} = 2 * Re(grad\_output^* * \frac{\partial s}{\partial z^*})

举个例子

全纯函数f(z) = czF(x,y) = cx + cyj, c \in \mathbb{R}

  1. 直接求导:f’(z) = c
  2. 通过Wirtinger求导:f’(z) = 0.5(\frac{\partial F}{\partial x} - \frac{\partial F}{\partial y}j) = 0.5[c - (cj)j] = c
  3. 通过CR方程求导:f'(z) = \frac{\partial F}{\partial x} = -\frac{\partial F}{\partial y}j = c

全纯函数f(z) = czF(x,y) = (ax-by) + (bx+ay)j, c=a+bj \in \mathbb{C}, a,b \in \mathbb{R}

  1. 直接求导:f’(z) = c
  2. 通过Wirtinger求导:f’(z) = 0.5(\frac{\partial F}{\partial x} - \frac{\partial F}{\partial y}j) = 0.5[(a+bj) -(-b+aj)j] = c
  3. 通过CR方程求导:f'(z) = \frac{\partial F}{\partial x} = -\frac{\partial F}{\partial y}j = a+bj = c

全纯函数f(z) = z^2F(x,y) = x^2 - y^2 + 2xyj

  1. 直接求导:f’(z) = 2z = 2x+2yj
  2. 通过Wirtinger求导:f’(z) = 0.5(\frac{\partial F}{\partial x} - \frac{\partial F}{\partial y}j) = 0.5[(2x+2yj) - (-2y+2xj)j] = 2x+2yj
  3. 通过CR方程求导:f'(z) = \frac{\partial F}{\partial x} = -\frac{\partial F}{\partial y}j = 2x+2yj

参考资料

上一篇 下一篇

猜你喜欢

热点阅读