PapeRman #3
The Evolved Transformer
Authors: David R. So, Chen Liang, Quoc V. Le
Institute: Google Brain
Recent works have highlighted the strengths of the Transformer architecture for dealing with sequence tasks. At the same time, neural architecture search has advanced to the point where it can outperform human-designed models. The goal of this work is to use architecture search to find a better Transformer architecture. We first construct a large search space inspired by the recent advances in feed-forward sequential models and then run evolutionary architecture search, seeding our initial population with the Transformer. To effectively run this search on the computationally expensive WMT 2014 English-German translation task, we develop the progressive dynamic hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments - the Evolved Transformer - demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At big model size, the Evolved Transformer is twice as efficient as the Transformer in FLOPS without loss in quality. At a much smaller - mobile-friendly - model size of ~7M parameters, the Evolved Transformer outperforms the Transformer by 0.7 BLEU on WMT'14 English-German.
最近的工作突出了变换器(Transformer)架构处理序列任务的优势。与此同时,神经架构搜索已经发展到可以超越人类设计模型的程度。这项工作的目标是使用架构搜索来找到更好的变换器架构。我们首先构建了一个大型搜索空间,其灵感来自前馈顺序模型的最新进展,然后运行进化架构搜索,用变换器为我们的初始种群播种。为了在计算成本高昂的 WMT 2014 英德翻译任务中有效地运行此搜索,我们开发了渐进式动态障碍方法,该方法允许我们为更有希望的候选模型动态分配更多资源。在我们的实验中发现的架构 - 演化变换器 - 在四个完善的语言任务中表现出对变换器的持续改进:WMT 2014 英语 - 德语,WMT 2014 英语 - 法语,WMT 2014 英语 - 捷克语和 LM1B。在大型模型中,演化变换器的效率是 FLOPS 中变压器的两倍,而不会降低质量。在一个小得多的 - 适合移动设备 - 模型尺寸为~7M参数的情况下,进化变换器在 WMT'14 英语 - 德语上的表现优于变换器 0.7 BLEU。