Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints

Yufan Guo, Roi Reichart and Anna Korhonen

Inferring the information structure of scientific documents is useful for many down-stream applications. Existing feature-based machine learning approaches to this task require substantial training data and suffer from limited performance. Our idea is to guide feature-based models with declarative domain knowledge encoded as posterior distribution constraints. We explore a rich set of discourse and lexical constraints which we incorporate through the Generalized Expectation (GE) criterion. Our constrained model improves the performance of existing fully and lightly supervised models. Even a fully unsupervised version of this model outperforms lightly supervised feature-based models, showing that our approach can be useful even when no labeled data is available.

Back to Papers Accepted