Topically Driven Neural Language Model
Jey Han Lau
1,2
Timothy Baldwin
2
Trevor Cohn
2
1
IBM Research
2
School of Computing and Information Systems,
The University of Melbourne
jeyhan.lau@gmail.com, tb@ldwin.net, t.cohn@unimelb.edu.au
Abstract
Language models are typically applied at
the sentence level, without access to the
broader document context. We present a
neural language model that incorporates
document context in the form of a topic
model-like architecture, thus providing
a succinct representation of the broader
document context outside of the current
sentence. Experiments over a range of
datasets demonstrate that our model out-
performs a pure sentence-based model in
terms of language model perplexity, and
leads to topics that are potentially more co-
herent than those produced by a standard
LDA topic model. Our model also has the
ability to generate related sentences for a
topic, providing another way to interpret
topics.
1 Introduction
Topic models provide a powerful tool for extract-
ing the macro-level content structure of a docu-
ment collection in the form of the latent topics
(usually in the form of multinomial distributions
over terms), with a plethora of applications in NLP
(Hall et al., 2008; Newman et al., 2010a; Wang
and McCallum, 2006). A myriad of variants of
the classical LDA method (Blei et al., 2003) have
been proposed, including recent work on neural
topic models (Cao et al., 2015; Wan et al., 2012;
Larochelle and Lauly, 2012; Hinton and Salakhut-
dinov, 2009).
Separately, language models have long been a
foundational component of any NLP task involv-
ing generation or textual normalisation of a noisy
input (including speech, OCR and the processing
of social media text). The primary purpose of a
language model is to predict the probability of a
span of text, traditionally at the sentence level, un-
der the assumption that sentences are independent
of one another, although recent work has started
using broader local context such as the preceding
sentences (Wang and Cho, 2016; Ji et al., 2016).
In this paper, we combine the benefits of a
topic model and language model in proposing
a topically-driven language model, whereby we
jointly learn topics and word sequence informa-
tion. This allows us to both sensitise the predic-
tions of the language model to the larger docu-
ment narrative using topics, and to generate topics
which are better sensitised to local context and are
hence more coherent and interpretable.
Our model has two components: a language
model and a topic model. We implement both
components using neural networks, and train them
jointly by treating each component as a sub-task
in a multi-task learning setting. We show that our
model is superior to other language models that
leverage additional context, and that the generated
topics are potentially more coherent than LDA
topics. The architecture of the model provides
an extra dimensionality of topic interpretability,
in supporting the generation of sentences from a
topic (or mix of topics). It is also highly flex-
ible, in its ability to be supervised and incor-
porate side information, which we show to fur-
ther improve language model performance. An
open source implementation of our model is avail-
able at: https://github.com/jhlau/
topically-driven-language-model.
2 Related Work
Griffiths et al. (2004) propose a model that learns
topics and word dependencies using a Bayesian
framework. Word generation is driven by either
LDA or an HMM. For LDA, a word is generated
based on a sampled topic in the document. For the
arXiv:1704.08012v2 [cs.CL] 2 May 2017
评论