最近学术界对于AM观点

2016-09-14 本文已影响0人 MeGaSong

俞栋

csdn专访
谈到了声学模型方面，其关注Deep CNN和LFMMI（即povey的chain-model）。
提到了LFMMI是吸取了CTC优点（无force-alignment），仍基于传统HMM-DNN混合系统，进行的改进，性能不差于CTC，最主要的是训练稳定，CTC要大量调参，目前只有google和百度声称成功应用，即便成功，每个任务要大量调参并不是成熟的方法。

povey:

论坛topic链接
Firstly, CTC was never in the master branch of Kaldi. It's dropped permanently, because the 'chain' models were always better than CTC. And I removed the branch because I don't want to answer questions about it (and because it's a waste of their time too). BTW, a presentation by Google here at Interspeech is saying something similar, that a conventional model, discriminatively trained, with 1/3 the normal frame rate, beats CTC.

google

povey提到了interspeech上google的一个观点，interspeech应该有google这方面论文

百度

在搞深层CNN（6层据听说）和深层LSTM网络

facebook

CNN搞end-to-end的论文（wav2letter）

出门问问

听说很想搞CTC在嵌入式设备（手表、VR）的应用，我觉得CTC可能在这方面是其优势（模型大小、解码复杂度）

interspeech 2016

会议论文集链接：http://pan.baidu.com/s/1pLB3w2v 密码：fww7