Prashant Mathur

Masters Thesis

Saturday, October 01, 2011

The present work attempts to build an automatic translation system of nominal compound (NC) from English to Hindi. A noun compound is a sequence of nouns acting as a single noun, e.g., colon cancer, suppressor protein, colon cancer tumor suppressor protein. They comprise 3.9% and 2.6% of all tokens in the Reuters corpus and the British National Corpus (BNC), respectively. As of today, no good system exists for the translation of multi-word expressions from English to any Indian languages. We have evaluated two state-of-the-art systems, Moses and Google Translation system, to check the Noun Compound translation accuracy from English to Hindi. Google translation system results in an accuracy of 57% while Moses, a statistical machine translation system, returns an accuracy of 48% on a test data of 300 Noun Compounds. The above figures indicate that automatic NC translation from English to Hindi is an important subtask of machine translation system. We build a Noun Compound Translation system (NCT) which returns an accuracy of 64% on the same set of test data.

You can find the thesis here