随笔0418/速记——Multimodal Few-Shot L
多模态交互 —— 无论是机器人还是智能汽车的智能座舱(概念),都涉及此项技术。
而这篇文章,是DeepMind的作品 —— 总体来看,技术细节还没完全看懂(O(∩_∩)O哈哈~),但是大概意思应该是:
- 模态内部的交互是过去的主要形态
- NLP才是所有“学习”的核心,是人工智能的Holy Grail
- 其他的CV、语音等模型,是插件
- 将其他插件的输入转换为语言模型之后,可能是机器形成“理解”的合理路径
- 随后形成交互
We believe this work is an important proof-of-concept for a desired, much more powerful system capable of open-ended multimodal few-shot learning. Frozen achieves the necessary capacities to some degree, but a key limitation is that it achieves far from state-of-the-art performance on the specific tasks that it learns in a few shots, compared to systems that use the full training set for thosetasks. As such, the main contribution of this work should be seen as a starting point or baseline for this exciting area of research of multimodal few-shot learning.