最近学术界对于AM观点

2016-09-14  本文已影响0人  MeGaSong

俞栋

csdn专访
谈到了声学模型方面,其关注Deep CNN和LFMMI(即povey的chain-model)。
提到了LFMMI是吸取了CTC优点(无force-alignment),仍基于传统HMM-DNN混合系统,进行的改进,性能不差于CTC,最主要的是训练稳定,CTC要大量调参,目前只有google和百度声称成功应用,即便成功,每个任务要大量调参并不是成熟的方法。

povey:

论坛topic链接
Firstly, CTC was never in the master branch of Kaldi. It's dropped permanently, because the 'chain' models were always better than CTC. And I removed the branch because I don't want to answer questions about it (and because it's a waste of their time too). BTW, a presentation by Google here at Interspeech is saying something similar, that a conventional model, discriminatively trained, with 1/3 the normal frame rate, beats CTC.

google

povey提到了interspeech上google的一个观点,interspeech应该有google这方面论文

百度

在搞深层CNN(6层据听说)和深层LSTM网络

facebook

CNN搞end-to-end的论文(wav2letter)

出门问问

听说很想搞CTC在嵌入式设备(手表、VR)的应用,我觉得CTC可能在这方面是其优势(模型大小、解码复杂度)

interspeech 2016

会议论文集 链接:http://pan.baidu.com/s/1pLB3w2v 密码:fww7

上一篇下一篇

猜你喜欢

热点阅读