Paper Writing - Introduction
![](https://img.haomeiwen.com/i4905462/103bb389a7f9dccb.png)
1.《A Stack-Propagation Framework with Token-Level Intent Detection for Spoken Language Understanding》
Spoken language understanding (SLU) is a critical
component in task-oriented dialogue systems. It usually consists of intent detection to identify users’ intents and slot filling task to extract semantic constituents from the natural language utterances (Tur and De Mori, 2011)
. As shown in Table 1, given a movie-related utterance “watch action movie”, there are different slot labels for each token and an intent for the whole utterance.
Usually, intent detection and slot filling are implemented separately. But intuitively, these two tasks are not independent and the slots often highly depend on the intent (Goo et al., 2018)
. For example, if the intent of a utterance is WatchMovie, it is more likely to contain the slot movie name rather than the slot music name. Hence, it is promising to incorporate the intent information to guide the slot filling.
Considering this strong correlation
between the two tasks, some joint models are proposed based on the multi-task learning framework (Zhang and Wang,2016;Hakkani-Tu ̈retal.,2016;Liuand Lane, 2016)
and all these models outperform the pipeline models via mutual enhancement between two tasks
. However, their work just modeled the relationship between intent and slots by sharing parameters. Recently, some work begins to model the intent information for slot filling explicitly in joint model. Goo et al. (2018) and Li et al. (2018)
proposed the gate mechanism to explore incorporating the intent information for slot filling. Though achieving the promising performance, their models still suffer from two issues including: (1) They all adopt the gate vector
to incorporate the intent information. In the paper, we argue that it is risky to simply rely on the gate function to summarize or memorize the intent information. Besides, the interpretability of how the intent information guides slot filling procedure is still weak due to the interaction with hidden vector between the two tasks. (2) The utterance-level intent information they use for slot filling may mislead
the prediction for all slots in an utterance if the predicted utterance-level intent is incorrect.
In this paper, we propose a novel framework to address both two issues
above. For the first issue, inspired by the Stack-Propagation which was proposed by Zhang and Weiss (2016) to leverage the POS tagging features for parsing and achieved good performance, we propose a joint model with Stack-Propagation for SLU tasks. Our framework directly use the output of the intent detection as the input for slot filling to better guide the slot prediction process. In addition, the framework make it easy to design oracle intent experiment to intuitively show how intent information enhances slot filling task. For the second issue, we perform a token-level intent prediction in our framework, which can provide the token-level intent information for slot filling. If some token-level intents in the utterance are predicted incorrectly, other correct token-level intents will still be useful for the corresponding slot prediction. In practice, we use a self-attentive encoder for intent detection to capture the contextual information
at each token and hence predict an intent label at each token. The intent of an utterance is computed by voting
from predictions at each token of the utterance. This token-level prediction, like ensemble neural networks (Lee et al., 2016)
, reduces the predicted variance to improve the performance of intent detection. And it fits better in our Stack-Propagation framework, where intent detection can provide token-level intent features and retain more useful intent information for slot filling.
We conduct experiments on two benchmarks SNIPS (Coucke et al., 2018)
and ATIS (Goo et al., 2018)
datasets. The results of both experiments show the effectiveness of our framework by outperforming
the current state-of-the-art methods by a large margin. Finally, Bidirectional Encoder Representation from Transformer (Devlin et al., 2018, BERT)
, as the pre-trained model, is used to further boost the performance of our model.
To summarize, the contributions of this work are as follows:
- We propose a Stack-Propagation framework in SLU task, which can
better incorporate
the intent semantic knowledge to guide the slot filling and make our joint model more interpretable. - We perform the token-level intent detection for Stack-Propagation framework, which improves the intent detection performance and further
alleviate the error propagation
. - We present extensive experiments demonstrating the benefit of our proposed framework. Our experiments on two publicly available datasets show
substantial improvement
and our framework achieve the state-of-the-art performance. - We explore and analyze the effect of
incorporating
BERT in SLU tasks.
For reproducibility, our code for this paper is publicly available at https://github.com/LeePleased/StackPropagation-SLU.
2. 《A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling》
Spoken language understanding plays an important role in
spoken dialogue system. SLU aims at extracting the semantics from user utterances. Concretely, it identifies the intent and captures semantic constituents. These two tasks are known as intent detection and slot filling (Tur and De Mori, 2011)
, respectively. For instance, the sentence ‘what flights leave from phoenix’ sampled from the ATIS corpus is shown in Table 1. It can be seen that each word in the sentence corresponds to one slot label, and a specific intent is assigned for the whole sentence.
Traditional pipeline approaches
manage the two mentioned tasks separately. Intent detection is seen as a semantic classification problem to predict the intent label. General approaches such as support vector machine (SVM) (Haffner et al., 2003)
and recurrent neural network (RNN) (Lai et al., 2015)
can be applied. Slot filling is regarded as a sequence labeling task. Popular approaches include conditional random field (CRF) (Raymond and Riccardi, 2007)
, long short-term memory (LSTM) networks (Yao et al., 2014).
Considering the unsatisfactory performance of pipeline approaches
caused by error propagation, the tendency is to develop a joint model (Chen et al., 2016a; Zhang and Wang, 2016)
for intent detection and slot filling tasks. Liu and Lane (2016) proposed an attention-based RNN model. However, it just applied a joint loss function to link the two tasks implicitly. Hakkani-Tu ̈r et al. (2016)
introduced a RNN-LSTM model where the explicit relationships between the slots and intent are not established. Goo et al. (2018)
proposed a slot-gated model which applies the intent information to slot filling task and achieved superior performance. But the slot information is not used in
intent detection task. The bi-directional direct connections are still not established
. In fact, the slots and intent are correlative, and the two tasks can mutually reinforce each other. This paper proposes an SF-ID network which consists of an SF subnet and an ID subnet. The SF subnet applies intent information to slot filling task while the ID subnet uses slot information in intent detection task. In this case, the bi-directional interrelated connections for the two tasks can be established. Our contributions are summarized as follows: 1) We propose an SF-ID network to establish the interrelated mechanism
for slot filling and intent detection tasks. Specially, a novel ID subnet is proposed to apply the slot information to intent detection task. 2) We establish a novel iteration mechanism inside the SF-ID network in order to enhance the connections between the intent and slots. 3) The experiments on two benchmark datasets show the effectiveness and superiority of the proposed model.
3. 《Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks》
Aspect level sentiment classification aims to identify the sentiment polarity (eg. positive, negative, neutral) of an aspect target in its context sentence. Compared to
sentence-level sentiment classification, which tries to detect the overall sentiment in a sentence, it is a more fine-grained task. Aspect level sentiment classification can distinguish sentiment polarity for multiple aspects in a sentence with various sentiment polarity, while sentence-level sentiment classification often fails in these conditions (Jiang et al., 2011)
. For example, in a sentence “great food but the service was dreadful”, the sentiment polarity for aspects “food” and “service” are positive and negative respectively. In this case, however, it is hard to determine the overall sentiment since the sentence is mixed with positive and negative expressions.
Typically, researchers use machine learning algorithms to classify the sentiment of given aspects in sentences.
Some early work manually designs features, eg. sentiment lexicons and linguistic features, to train classifiers for aspect level sentiment classification (Jiang et al., 2011; Wagner et al., 2014). Later, various neural network- based methods became popular for this task (Tang et al., 2016b; Wang et al., 2016), as they do not require manual feature engineering. Most of them are based on long short-term memory (LSTM) neural networks (Tang et al., 2016a; Huang et al., 2018) and few of them use convolutional neural networks (CNN) (Huang and Carley, 2018; Xue and Li, 2018).
Most of these neural network based methods treat a sentence as a word sequence and embed aspect information into the sentence representation via various methods, eg. attention (Wang et al., 2016)
and gate (Huang and Carley, 2018)
. These methods largely ignore the syntactic structure of the sentence, which would be beneficial to identify sentiment features directly related to the aspect target. When an aspect term is separated away from its sentiment phrase, it is hard to
find the associated sentiment words in a sequence. For example, in a sentence “The food, though served with bad service, is actually great”, the word “great” is much closer to the aspect “food” in the dependency graph than in the word sequence. Using the dependency relationship
is also helpful to resolve potential ambiguity in a word sequence. In a simple sentence “Good food bad service”, “good” and “bad” can be used interchangeably. Using an attention-based method, it is hard to
distinguish which word is associated with “food” or “service” among “good” and “bad”. However, a human reader with good grammar knowledge can easily recognize that “good” is an adjectival modifier for “food” while “bad” is the modifier for “service”.
In this paper, we propose a novel neural network framework named target-dependent graph attention network (TD-GAT), which leverages the syntax structure of a sentence for aspect level sentiment classification. Unlike these previous methods
, our approach represents a sentence as a dependency graph instead of a word sequence. In the dependency graph, the aspect target and related words will be connected directly. We employ
a multi-layer graph attention network to propagate sentiment features from important syntax neigh- bourhood words to the aspect target. We further incorporate
an LSTM unit in TD-GAT to explicitly capture aspect related information across layers during recursive neighbourhood expansion. Though some work tries to incorporate syntax knowledge using recursive neural networks (Dong et al., 2014)
, it has to convert the original dependency tree into a binary tree, which may move syntax related words away from the aspect term. Compared to (Dong et al., 2014), one advantage of our approach
is that it keeps the original syntax order unchanged.
We apply the proposed method to laptop and restaurant datasets from SemEval 2014 (Pontiki et al., 2014). Our experiments show that our approach outperforms multiple baselines
with GloVe embeddings (Pennington et al., 2014). We further demonstrate that using BERT representations (Devlin et al., 2018) boosts the performance a lot. In our analysis, we show that our model is lightweight in terms of model size. It achieves better performance
and requires fewer computational resources and less running time than fine-tuning the original BERT model.