Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions

Marta Recasens, Matthew Can and Dan Jurafsky

Coreference resolution systems rely heavily on string overlap (e.g., "Google Inc." and "Google"), performing badly on mentions with very different words ("opaque" mentions) like "Google" and "the search giant". Yet prior attempts to resolve opaque pairs using ontologies or distributional semantics hurt precision more than improved recall. We present a new unsupervised method for mining opaque pairs. Our intuition is to "restrict" distributional semantics to articles about the same event, thus promoting referential match. Using an English comparable corpus of tech news, we built a dictionary of opaque coreferent mentions (only 3% are in WordNet). Our dictionary can be integrated into any coreference system (it increases the performance of a state-of-the-art system by 1% F1 on all measures) and is easily extendable by using news aggregators.

