Measuring the Structural Importance through Rhetorical Structure Index

Narine Kokhlikyan, Yuqi Zhang, Alex Waibel and Joy Zhang

In this paper, we propose a novel Rhetorical Structure Index (RSI) to measure the structural importance of a word or a phrase. Unlike TF-IDF and other content-driven measurements, RSI identifies words or phrases that are structural cues in an unstructured document. We show structurally motivated features with high RSI values are more useful than content-driven features for applications such as segmenting the unstructured lecture transcription into meaningful segments. Experiments show that using RSI significantly improves the segmentation accuracy compared to the traditional content-based feature weighting scheme such as TF-IDF.

