NLP

生成流畅文本方法

2020-07-28  本文已影响0人  人工智能遇见磐创

作者|Aaron Abrahamson
编译|VK
来源|Towards Data Science

在沙丘魔堡2000上训练文本生成模型

沙丘魔堡是一个遥远的封建社会的故事。它关注的是一位公爵和他的家人,他们被迫成为沙漠星球阿拉基斯的管理者。弗兰克·赫伯特在1965年出版了这部经典作品。几乎任何现代科幻小说都可以追溯到沙丘的某些元素。

我最近完成了《沙丘》的续集《沙丘的弥赛亚》,并且刚刚开始了《沙丘的孩子》系列的第三部。有六个故事最初是赫伯特写的,后来又有一大堆是他儿子写的。我没读过那些。

我一直在探索文本生成模型。我觉得用沙丘试试会很有趣。很多的“经典”机器学习模型被用于预测和聚类。生成性建模允许模型创建角度从中学习的训练数据。最近一个关于生成建模能力的例子是StyleGAN,看看这段视频(https://www.youtube.com/watch?v=kSLJriaOumA)。

这里有一个链接到我在这个项目中使用的Colab笔记本(https://drive.google.com/file/d/15Z7SNBnBL12acmUGvvMLQ-OoMspb-B5k/view?usp=sharing)。

处理过程

第一章:男爵

我想在一段时间后测试一下,看看会有什么结果。种子词是“男爵”(Baron),是书中一个卑鄙的对手。

‘Baron The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron Of The Baron’

一直是这样。一点也不好。

33个epoch之后的模型做得非常好,但它仍然陷入循环,只是不停地发出各种名词。下面是种子单词Spice的输出结果:

The Spice Itself Stood Out To The Left Wall The Fremen Seeker Followed The Chains The Troop Was A Likely Shadow And The Natural Place Of The Great Places That Was A Subtle City Of The Room'S Features That The Man Was A Master Of The Cavern The Growing The Bronze The Sliding Hand

以下是“Paul”(主角)的输出:

Paul Stood Unable To The Duke And The Reverend Mother Ramallo To The Guard Captain And The Man Looked At Him And The Child Was A Relief One Of The Fremen Had Been In The Doorway And The Fedaykin Control Them To Be Like The Spice Diet Out Of The Wind And The Duke Said I Am The Fremen To Get The Banker Said When The Emperor Asked His Fingers Nefud I Know You Can Take The Duchy Of Government The Sist The Duke Said He Turned To The Hand Beside The Table The Baron Asked The Emperor Will Hold

下面是“She looked”的输出:

'She Looked At The Transparent End Of The Table Saw A Small Board In The Room And The Way Of The Old Woman He Had Been Sent By The Wind Of The Duke And The Worms They Had Seen The Waters Of The Desert And The Sandworms The Troop Had Been Subtly Prepared By The Wind Of The Worm Had Been Subtly Always In The Deep Sinks Of The Women And The Duke Had Been Given Last Of Course But The Others Had Been In The Fremen Had Been Shaped On The Light Of The Light Of The Hall Had Had Seen'

想法和下一步

我认为这绝对是进步和进步的表现。我想把它训练到至少100个epoch,但进展缓慢。每个epoch大约11分钟,所以总共超过18个小时。我需要一台更好的电脑。

最后,我想补充一点,这样做的讽刺意味并没有让我忘记。在《沙丘宇宙》中,在远古时代的某个时刻,“会思考的电脑”反抗人类,几乎将人类灭绝。在这本书的时代,计算机已经被“mentats”所取代,反而是人类被培养和训练来模仿计算机的计算能力。

原文链接:https://towardsdatascience.com/the-text-must-flow-3bb4edff7b5b

欢迎关注磐创AI博客站:
http://panchuang.net/

sklearn机器学习中文官方文档:
http://sklearn123.com/

欢迎关注磐创博客资源汇总站:
http://docs.panchuang.net/

上一篇 下一篇

猜你喜欢

热点阅读