Improved Training of Wasserstein
Improved Training of Wasserstein GANs翻译 上
4 Gradient penalty
4梯度罚款
We now propose an alternative way to enforce the Lipschitz constraint. A differentiable function is 1-Lipschtiz if and only if it has gradients with norm at most 1 everywhere, so we consider directly constraining the gradient norm of the critic’s output with respect to its input. To circumvent tractability issues, we enforce a soft version of the constraint with a penalty on the gradient norm for random samples![](https://img.haomeiwen.com/i11059787/75187a6530b1acf1.jpg)
. Our new objective is
我们现在提出一种强制Lipschitz约束的替代方法。可区分函数是1-Lipschtiz,当且仅当它具有最多1个范数的梯度时,所以我们考虑直接约束评论者输出相对于其输入的梯度范数。为了避免易处理性问题,我们强制执行约束的软版本,对随机样本![](https://img.haomeiwen.com/i11059787/58156bd1adcc6a4c.jpg)
的梯度范数进行惩罚。我们的新目标是
![](https://img.haomeiwen.com/i11059787/48c1692e8712fa42.jpg)
![](https://img.haomeiwen.com/i11059787/a735e9e951a305c3.jpg)
![](https://img.haomeiwen.com/i11059787/36d4b87551b10c9e.jpg)
![](https://img.haomeiwen.com/i11059787/ec601bd3f0ffb001.jpg)
![](https://img.haomeiwen.com/i11059787/2326750d46f9f3d1.jpg)
![](https://img.haomeiwen.com/i11059787/0188fde3eb2c46cb.jpg)
![](https://img.haomeiwen.com/i11059787/178ab483fb766b9d.jpg)
![](https://img.haomeiwen.com/i11059787/2027c14ad27be6ab.jpg)
![](https://img.haomeiwen.com/i11059787/2c33419475170d3f.jpg)
![](https://img.haomeiwen.com/i11059787/e61ab599ab17a5d8.jpg)
![](https://img.haomeiwen.com/i11059787/f0d0633e51062505.jpg)
![](https://img.haomeiwen.com/i11059787/88e98bc3cf6f2b16.jpg)
, which we found to work well across a variety of architectures and datasets ranging from toy tasks to large ImageNet CNNs.
惩罚系数本文中的所有实验都使用![](https://img.haomeiwen.com/i11059787/891a1c2f4a2af58e.jpg)
,我们发现它可以很好地适用于从玩具任务到大型ImageNet CNN的各种架构和数据集。
No critic batch normalization Most prior GAN implementations [22, 23, 2] use batch normalization in both the generator and the discriminator to help stabilize training, but batch normalization changes the form of the discriminator’s problem from mapping a single input to a single output to mapping from an entire batch of inputs to a batch of outputs [23]. Our penalized training objective is no longer valid in this setting, since we penalize the norm of the critic’s gradient with respect to each input independently, and not the entire batch. To resolve this, we simply omit batch normalization in the critic in our models, finding that they perform well without it. Our method works with normalization schemes which don’t introduce correlations between examples. In particular, we recommend layer normalization [3] as a drop-in replacement for batch normalization.
没有评论批量标准化大多数先前的GAN实现[22,23,2]在生成器和鉴别器中都使用批量标准化来帮助稳定训练,但批量标准化会将鉴别器问题的形式从单个输入映射到单个输出变为从一批输入映射到一批输出[23]。我们的惩罚性培训目标在此设置中不再有效,因为我们会独立地惩罚评论者关于每个输入的梯度的标准,而不是整个批次。为了解决这个问题,我们在模型中忽略批评规范化,发现它们在没有它的情况下表现良好。我们的方法适用于规范化方案,这些方案不会引入示例之间的相关性。特别是,我们建议将层标准化[3]作为批量标准化的直接替代。
Two-sided penalty We encourage the norm of the gradient to go towards 1 (two-sided penalty) instead of just staying below 1 (one-sided penalty). Empirically this seems not to constrain the critic too much, likely because the optimal WGAN critic anyway has gradients with norm 1 almost everywhere under![](https://img.haomeiwen.com/i11059787/087cf172bb78dc20.jpg)
![](https://img.haomeiwen.com/i11059787/6fb754c8c7e9cd18.jpg)
![](https://img.haomeiwen.com/i11059787/4d0c018c4dc1087c.jpg)
![](https://img.haomeiwen.com/i11059787/d03b291003aa255b.jpg)
5 Experiments
5实验
5.1 Training random architectures within a set
5.1培训集合中的随机体系结构
We experimentally demonstrate our model’s ability to train a large number of architectures which we think are useful to be able to train. Starting from the DCGAN architecture, we define a set of architecture variants by changing model settings to random corresponding values in Table 1. We believe that reliable training of many of the architectures in this set is a useful goal, but we do not claim that our set is an unbiased or representative sample of the whole space of useful architectures: it is designed to demonstrate a successful regime of our method, and readers should evaluate whether it contains architectures similar to their intended application.
我们通过实验证明了我们的模型训练大量架构的能力,我们认为这些架构对训练有用。从DCGAN架构开始,我们通过将模型设置更改为表1中的随机对应值来定义一组架构变体。我们相信,对这一系列中的许多架构进行可靠的培训是一个有用的目标,但我们并不认为我们的集合是整个有用架构空间的公正或有代表性的样本:它旨在展示我们的成功制度。方法,读者应评估它是否包含与其预期应用类似的架构。
Table 1: We evaluate WGAN-GP’s ability to train the architectures in this set.
表1:我们评估WGAN-GP在该组中训练架构的能力。
![](https://img.haomeiwen.com/i11059787/9e013a250e20d7e2.jpg)
![](https://img.haomeiwen.com/i11059787/27db5f83b30a4ae2.jpg)
![](https://img.haomeiwen.com/i11059787/843e38ff5fb53e47.jpg)
![](https://img.haomeiwen.com/i11059787/d4592a5f73028f34.jpg)
![](https://img.haomeiwen.com/i11059787/b4e55e5b25e5b394.jpg)
Table 2: Outcomes of training 200 random architectures, for different success thresholds. For comparison, our standard DCGAN scored 7.24.
表2:针对不同的成功阈值,培训200个随机体系结构的结果。相比之下,我们的标准DCGAN得分为7.24。
![](https://img.haomeiwen.com/i11059787/38b8a0120c8b4ad1.jpg)
![](https://img.haomeiwen.com/i11059787/d73aa44bc281d72b.jpg)
101-layer ResNet G and D 5.2 Training varied architectures on LSUN bedrooms To demonstrate our model’s ability to train many architectures with its default settings, we train six different GAN architectures on the LSUN bedrooms dataset [31]. In addition to the baseline DCGAN architecture from [22], we choose six architectures whose successful training we demonstrate: (1) no BN and a constant number of filters in the generator, as in [2], (2) 4-layer 512-dim ReLU MLP generator, as in [2], (3) no normalization in either the discriminator or generator (4) gated multiplicative nonlinearities, as in [24], (5) tanh nonlinearities, and (6) 101-layer ResNet generator and discriminator.
基于LSUN卧室的101层ResNet G和D 5.2培训各种架构为了展示我们的模型能够以默认设置训练许多架构,我们在LSUN卧室数据集上训练了六种不同的GAN架构[31]。除了[22]的基线DCGAN架构外,我们选择了六种架构,我们展示了它们的成功训练:(1)发生器中没有BN和恒定数量的滤波器,如[2],(2)4层512 -dim ReLU MLP发生器,如[2]中所述,(3)在鉴别器或发生器中没有归一化(4)门控乘法非线性,如[24],(5)tanh非线性,和(6)101层ResNet发生器和鉴别器。
![](https://img.haomeiwen.com/i11059787/f86a59f286c1d9d4.jpg)
![](https://img.haomeiwen.com/i11059787/dddbd37ff16ac4e9.jpg)
Figure 2: Different GAN architectures trained with different methods. We only succeeded in training every architecture with a shared set of hyperparameters using WGAN-GP.
图2:使用不同方法训练的不同GAN架构。我们只使用WGAN-GP成功地使用一组共享的超参数来训练每个架构。
Although we do not claim it is impossible without our method, to the best of our knowledge this is the first time very deep residual networks were successfully trained in a GAN setting. For each architecture, we train models using four different GAN methods: WGAN-GP, WGAN with weight clipping, DCGAN [22], and Least-Squares GAN [18]. For each objective, we used the default set of optimizer hyperparameters recommended in that work (except LSGAN, where we searched over learning rates).
虽然我们没有声称没有我们的方法是不可能的,但据我们所知,这是第一次在GAN设置中成功训练非常深的残留网络。对于每种架构,我们使用四种不同的GAN方法训练模型:WGAN-GP,带权重限幅的WGAN,DCGAN [22]和最小二乘GAN [18]。对于每个目标,我们使用了该工作中推荐的默认优化器超参数集(除了LSGAN,我们搜索了学习率)。
For WGAN-GP, we replace any batch normalization in the discriminator with layer normalization (see section 4). We train each model for 200K iterations and present samples in Figure 2. We only succeeded in training every architecture with a shared set of hyperparameters using WGAN-GP. For every other training method, some of these architectures were unstable or suffered from mode collapse.
对于WGAN-GP,我们用层规范化替换鉴别器中的任何批量标准化(参见第4节)。我们训练每个模型进行200K次迭代,并在图2中显示样本。我们只使用WGAN-GP成功地使用一组共享的超参数来训练每个架构。对于其他所有训练方法,其中一些架构不稳定或遭受模式崩溃。
5.3 Improved performance over weight clipping
5.3改善了重量削减的性能
One advantage of our method over weight clipping is improved training speed and sample quality. To demonstrate this, we train WGANs with weight clipping and our gradient penalty on CIFAR10 [13] and plot Inception scores [23] over the course of training in Figure 3. For WGAN-GP, we train one model with the same optimizer (RMSProp) and learning rate as WGAN with weight clipping, and another model with Adam and a higher learning rate. Even with the same optimizer, our method converges faster and to a better score than weight clipping. Using Adam further improves performance. We also plot the performance of DCGAN [22] and find that our method converges more slowly (in wall-clock time) than DCGAN, but its score is more stable at convergence.
我们的方法优于减重的一个优点是提高了训练速度和样本质量。为了证明这一点,我们在图3中的训练过程中训练WGAN进行了体重削减和CIFAR10 [13]的梯度惩罚以及初始得分[23]。对于WGAN-GP,我们训练一个模型使用相同的优化器(RMSProp)和学习率作为WGAN进行权重削减,另一个模型使用Adam和更高的学习率。即使使用相同的优化器,我们的方法收敛速度更快,并且比重量限幅更好。使用Adam进一步提高了性能。我们还绘制了DCGAN [22]的性能,并发现我们的方法比DCGAN收敛得更慢(在挂钟时间内),但其收敛在收敛时更稳定。
![](https://img.haomeiwen.com/i11059787/8886e446708fb2a7.jpg)
Figure 3: CIFAR-10 Inception score over generator iterations (left) or wall-clock time (right) for four models: WGAN with weight clipping, WGAN-GP with RMSProp and Adam (to control for the optimizer), and DCGAN. WGAN-GP significantly outperforms weight clipping and performs comparably to DCGAN.
图3:CIFAR-10在四个模型的生成器迭代(左)或挂钟时间(右)上的初始得分:具有权重削减的WGAN,具有RMSProp和Adam的WGAN-GP(用于控制优化器)和DCGAN。WGAN-GP显着优于减重并且与DCGAN相当。
5.4 Sample quality on CIFAR-10 and LSUN bedrooms
5.4 CIFAR-10和LSUN卧室的样品质量
For equivalent architectures, our method achieves comparable sample quality to the standard GAN objective. However the increased stability allows us to improve sample quality by exploring a wider range of architectures. To demonstrate this, we find an architecture which establishes a new state of the art Inception score on unsupervised CIFAR-10 (Table 3). When we add label information (using the method in [20]), the same architecture outperforms all other published models except for SGAN.
对于等效架构,我们的方法实现了与标准GAN目标相当的样本质量。然而,增加的稳定性使我们能够通过探索更广泛的架构来提高样品质量。为了证明这一点,我们找到了一种架构,它在无人监督的CIFAR-10上建立了一种新的最先进的入门分数(表3)。当我们添加标签信息时(使用[20]中的方法),相同的架构优于除SGAN之外的所有其他已发布模型。
Table 3: Inception scores on CIFAR-10. Our unsupervised model achieves state-of-the-art performance, and our conditional model outperforms all others except SGAN.
表3:CIFAR-10的初始分数。我们的无监督模型实现了最先进的性能,我们的条件模型优于除SGAN之外的所有其他模型。
Unsupervised Supervised believe these samples are at least competitive with the best reported so far on any resolution for this We also train a deep ResNet on![](https://img.haomeiwen.com/i11059787/6816215f4a0a5471.jpg)
LSUN bedrooms and show samples in Figure 4. We dataset.
无监督的监督认为这些样本至少与迄今为止报道的最佳报告竞争对手。我们还在![](https://img.haomeiwen.com/i11059787/cbb788f3117cf07f.jpg)
LSUN卧室培训深度ResNet并在图4中显示样本。我们的数据集。
![](https://img.haomeiwen.com/i11059787/849841f5167dcab4.jpg)
5.5 Modeling discrete data with a continuous generator
5.5使用连续发电机建模离散数据
To demonstrate our method’s ability to model degenerate distributions, we consider the problem of modeling a complex discrete distribution with a GAN whose generator is defined over a continuous space. As an instance of this problem, we train a character-level GAN language model on the Google Billion Word dataset [6]. Our generator is a simple 1D CNN which deterministically transforms a latent vector into a sequence of 32 one-hot character vectors through 1D convolutions. We apply a softmax nonlinearity at the output, but use no sampling step: during training, the softmax output is to the best published results so far. Figure 4: Samples of![](https://img.haomeiwen.com/i11059787/136ed18c2a1cc120.jpg)
LSUN bedrooms. We believe these samples are at least comparable passed directly into the critic (which, likewise, is a simple 1D CNN). When decoding samples, we just take the argmax of each output vector.
为了证明我们的方法能够对退化分布进行建模,我们考虑使用GAN对复杂离散分布建模的问题,其中GAN的生成器是在连续空间上定义的。作为这个问题的一个例子,我们在Google Billion Word数据集上训练了一个字符级的GAN语言模型[6]。我们的生成器是一个简单的1D CNN,通过1D卷积确定性地将潜在向量转换为32个单热字符向量的序列。我们在输出端应用softmax非线性,但不使用采样步骤:在训练期间,softmax输出到目前为止发布的最佳结果。图4:![](https://img.haomeiwen.com/i11059787/4a527a7d4f3a1a50.jpg)
LSUN卧室的样品。我们相信这些样本至少可以直接传递给评论家(同样,这是一个简单的1D CNN)。解码样本时,我们只取每个输出向量的argmax。
![](https://img.haomeiwen.com/i11059787/789ea4bf439c9798.jpg)
We present samples from the model in Table 4. Our model makes frequent spelling errors (likely because it has to output each character independently) but nonetheless manages to learn quite a lot about the statistics of language. We were unable to produce comparable results with the standard GAN objective, though we do not claim that doing so is impossible.
我们在表4中提供了模型中的样本。我们的模型经常出现拼写错误(可能是因为它必须独立输出每个字符),但仍然能够学到很多关于语言统计的知识。我们无法与标准GAN目标产生可比较的结果,但我们并未声称这样做是不可能的。
Table 4: Samples from a WGAN-GP character-level language model trained on sentences from the Billion Word dataset, truncated to 32 characters. The model learns to directly output one-hot character embeddings from a latent vector without any discrete sampling step. We were unable to achieve comparable results with the standard GAN objective and a continuous generator.
表4:来自WGAN-GP字符级语言模型的样本,该模型使用Billion Word数据集中的句子进行训练,截断为32个字符。该模型学习直接从潜在向量输出单热字符嵌入而无需任何离散采样步骤。我们无法使用标准GAN物镜和连续发电机获得可比较的结果。
![](https://img.haomeiwen.com/i11059787/57437042ff6bea33.jpg)
![](https://img.haomeiwen.com/i11059787/ed972bac8cf45e79.jpg)
![](https://img.haomeiwen.com/i11059787/b182db83cf1d72a9.jpg)
![](https://img.haomeiwen.com/i11059787/b2f583b68f038174.jpg)
![](https://img.haomeiwen.com/i11059787/43c43217bff2493f.jpg)
![](https://img.haomeiwen.com/i11059787/b59e96edbb867de6.jpg)
![](https://img.haomeiwen.com/i11059787/2657dd9f72e22a01.jpg)
![](https://img.haomeiwen.com/i11059787/f7790562d608dc00.jpg)
![](https://img.haomeiwen.com/i11059787/f0f24297224c8479.jpg)
![](https://img.haomeiwen.com/i11059787/b329b850de00ab42.jpg)
![](https://img.haomeiwen.com/i11059787/59aa9483b804f0c7.jpg)
![](https://img.haomeiwen.com/i11059787/7a95399dfb6c51c7.jpg)
![](https://img.haomeiwen.com/i11059787/7f893c9cf6e8a6d5.jpg)
![](https://img.haomeiwen.com/i11059787/d8e992c5c18f933a.jpg)
![](https://img.haomeiwen.com/i11059787/e5855d8ed0cdd23e.jpg)
![](https://img.haomeiwen.com/i11059787/d680664bd9c27e27.jpg)
![](https://img.haomeiwen.com/i11059787/4317451178b540b0.jpg)
![](https://img.haomeiwen.com/i11059787/6197e84bdd804ba4.jpg)
![](https://img.haomeiwen.com/i11059787/7039f29ad98ab545.jpg)
![](https://img.haomeiwen.com/i11059787/e6189bce17a8291e.jpg)
![](https://img.haomeiwen.com/i11059787/a6b68cbd1a746e2f.jpg)
![](https://img.haomeiwen.com/i11059787/7e5c565bcf31c1bc.jpg)
![](https://img.haomeiwen.com/i11059787/135a1fe5df053b26.jpg)
![](https://img.haomeiwen.com/i11059787/c2c1d32bdbd6685c.jpg)
![](https://img.haomeiwen.com/i11059787/b846fa0c4ace83e0.jpg)
![](https://img.haomeiwen.com/i11059787/bb29cb0719be2ffa.jpg)
![](https://img.haomeiwen.com/i11059787/b8db88db1919efed.jpg)
![](https://img.haomeiwen.com/i11059787/4a01dbea01f8305d.jpg)
![](https://img.haomeiwen.com/i11059787/30a04837cf2d50eb.jpg)
![](https://img.haomeiwen.com/i11059787/58416166cdf6c839.jpg)
![](https://img.haomeiwen.com/i11059787/f6d9ee2e17e2e5b9.jpg)
![](https://img.haomeiwen.com/i11059787/359e774d7472cc03.jpg)
![](https://img.haomeiwen.com/i11059787/b43950997a897735.jpg)
![](https://img.haomeiwen.com/i11059787/411d4e354f09138c.jpg)
![](https://img.haomeiwen.com/i11059787/de9cd6092e69468a.jpg)
![](https://img.haomeiwen.com/i11059787/5fe13061244ce740.jpg)
![](https://img.haomeiwen.com/i11059787/c24583ff9613f650.jpg)
![](https://img.haomeiwen.com/i11059787/bf4432927d798e7e.jpg)
![](https://img.haomeiwen.com/i11059787/a9cc87a714e58449.jpg)
![](https://img.haomeiwen.com/i11059787/b363af6b5df26a19.jpg)
![](https://img.haomeiwen.com/i11059787/fef8beb25306f75a.jpg)
![](https://img.haomeiwen.com/i11059787/6d1504bcbdb9dd6a.jpg)
Figure 5: (a) The negative critic loss of our model on LSUN bedrooms converges toward a minimum as the network trains. (b) WGAN training and validation losses on a random 1000-digit subset of MNIST show overfitting when using either our method (left) or weight clipping (right). In particular, with our method, the critic overfits faster than the generator, causing the training loss to increase gradually over time even as the validation loss drops.
图5:(a)我们的LSUN卧室模型的负面批评损失在网络训练时趋于最小。 (b)当使用我们的方法(左)或权重削减(右)时,随机的1000位MNIST子集上的WGAN训练和验证损失显示过度拟合。特别是,使用我们的方法,批评者比发电机更快,导致培训损失随着时间的推移逐渐增加,即使验证损失下降。
Other attempts at language modeling with GANs [32, 14, 30, 5, 15, 10] typically use discrete models and gradient estimators [28, 12, 17]. Our approach is simpler to implement, though whether it scales beyond a toy language model is unclear.
使用GAN [32,14,30,5,15,10]进行语言建模的其他尝试通常使用离散模型和梯度估计[28,12,17]。我们的方法实现起来比较简单,但是它是否超出了玩具语言模型还不清楚。
5.6 Meaningful loss curves and detecting overfitting
5.6有意义的损耗曲线和检测过度拟合
An important benefit of weight-clipped WGANs is that their loss correlates with sample quality and converges toward a minimum. To show that our method preserves this property, we train a WGAN-GP on the LSUN bedrooms dataset [31] and plot the negative of the critic’s loss in Figure 5a. We see that the loss converges as the generator minimizes![](https://img.haomeiwen.com/i11059787/ac15077211dc6dd4.jpg)
.
重量限制WGAN的一个重要好处是它们的损失与样品质量相关,并且收敛到最小。为了表明我们的方法保留了这个属性,我们在LSUN卧室数据集上训练了一个WGAN-GP [31],并绘制了图5a中评论家损失的负面影响。我们看到损失在发生器最小化![](https://img.haomeiwen.com/i11059787/db823af03d0f2297.jpg)
时收敛。
Given enough capacity and too little training data, GANs will overfit. To explore the loss curve’s behavior when the network overfits, we train large unregularized WGANs on a random 1000-image subset of MNIST and plot the negative critic loss on both the training and validation sets in Figure 5b. In both WGAN and WGAN-GP, the two losses diverge, suggesting that the critic overfits and provides an inaccurate estimate of![](https://img.haomeiwen.com/i11059787/febd0eec2ae47832.jpg)
, at which point all bets are off regarding correlation with sample quality. However in WGAN-GP, the training loss gradually increases even while the validation loss drops.
如果有足够的容量和太少的训练数据,GAN将会过度。为了探索网络过度时的损失曲线的行为,我们在MNIST的随机1000图像子集上训练大的非正规化WGAN,并在图5b中的训练和验证集上绘制负面评论者损失。在WGAN和WGAN-GP中,这两种损失有所不同,这表明对于过滤器的批评并提供了对![](https://img.haomeiwen.com/i11059787/8089560ef85fb6a3.jpg)
的不准确估计,此时所有的投注均与样本质量相关。然而,在WGAN-GP中,即使验证损失下降,训练损失也逐渐增加。
[29] also measure overfitting in GANs by estimating the generator’s log-likelihood. Compared to that work, our method detects overfitting in the critic (rather than the generator) and measures overfitting against the same loss that the network minimizes.
[29]还通过估计发电机的对数似然来测量GAN中的过量配置。与该工作相比,我们的方法检测批评者(而不是发电机)中的过度配置,并针对网络最小化的相同损失进行测量。
6 Conclusion
六,结论
In this work, we demonstrated problems with weight clipping in WGAN and introduced an alternative in the form of a penalty term in the critic loss which does not exhibit the same problems. Using our method, we demonstrated strong modeling performance and stability across a variety of architectures. Now that we have a more stable algorithm for training GANs, we hope our work opens the path for stronger modeling performance on large-scale image datasets and language. Another interesting direction is adapting our penalty term to the standard GAN objective function, where it might stabilize training by encouraging the discriminator to learn smoother decision boundaries.
在这项工作中,我们展示了WGAN中减重的问题,并在批评者损失中以惩罚性术语的形式引入了替代方案,其没有表现出相同的问题。使用我们的方法,我们展示了各种架构的强大建模性能和稳定性。现在我们有了一个更稳定的GAN训练算法,我们希望我们的工作为大规模图像数据集和语言打开更强大的建模性能之路。另一个有趣的方向是使我们的惩罚项适应标准的GAN目标函数,它可以通过鼓励鉴别器学习更平滑的决策边界来稳定训练。
Acknowledgements
致谢
We would like to thank Mohamed Ishmael Belghazi, L´eon Bottou, Zihang Dai, Stefan Doerr, Ian Goodfellow, Kyle Kastner, Kundan Kumar, Luke Metz, Alec Radford, Colin Raffel, Sai Rajeshwar, Aditya Ramesh, Tom Sercu, Zain Shah and Jake Zhao for insightful comments.
我们要感谢Mohamed Ishmael Belghazi,L'Thon Bottou,Zihang Dai,Stefan Doerr,Ian Goodfellow,Kyle Kastner,Kundan Kumar,Luke Metz,Alec Radford,Colin Raffel,Sai Rajeshwar,Aditya Ramesh,Tom Sercu,Zain Shah和杰克赵的见解很有见地。
文章引用于 http://tongtianta.site/paper/3418
编辑 Lornatang
校准 Lornatang