Discriminative Training of 150 Million Translation Parameters and Its Application to Pruning

Hendra Setiawan and Bowen Zhou

Until recently, the application of discriminative training to log linear-based statistical machine translation has been limited to tuning the weights of a handful of features or training features with a limited number of parameters. In this paper, we propose to scale up discriminative training to train features with 150 million parameters, which is one order of magnitude higher than previously published system, and to apply discriminative training to redistribute probability mass that is lost due to model pruning. The experimental results confirm the effectiveness of our proposals on NIST MT06 test set over a strong hierarchical phrase-based baseline.

Back to Papers Accepted