Anusaaraka – IIIT H’s dream of overcoming language barrier

Indian Institute of Information Technology, Hyderabad has come up with an innovative project to aid translation from English to Indian languages and vice versa. Sanskrit grammar is considered to be the most accurate yet detailed presentation of the language. Panini in 6th-4th century BCE set the rules of the language based on the prevalent dialects. The languages of Indo-Aryan group like Hindi, Marathi, Bangla, Gujrati, Bhojpuri etc. are but regional variants of Prakṛt. Prakrit is considered as the daughter of Sanskrit. The grammars of this Indo-Aryan group of languages follow Sanskrit grammar to a great extent. The computer demands precise instructions for its working. Thus, putting together the availability of precision of Sanskrit grammar which supplements other languages along with capabilities of the computer to give suitable translations seems to be the basic idea of this venture.

The translation process goes through 8 layers, one following the other. Ms Bhagyashri Kulkarni says, “The translation is based on rule-based approach and not statistical approach -the one used by Google translator. The rules are in CLIP format. The conditions are given to the machine as per the requirement of the languages.” This ensures better accuracy in Indian context. “The rules are given to disambiguate the meaning of the words. These rules are called WSD rules (word sense disambiguation) Eg. If the sentences were given, ‘The book is very old’ and ‘My granny is very old’. The word ‘old’ is used in a different sense in both of the sentences. Also, Hindi has two different words for the two senses. 1. पुरानी 2. बूढ़ी Here required was a rule to disambiguate the meaning of the word ‘old’ ” explains Bhagyashri. Prof. Vineet Chaitanya and Prof. Rajiv Singhal came up with this idea. The team Akshara Bharati has developed the main architecture of the project.

The main purpose of the project is to give accurate translation even in case of lesser data of the language. Bhagyashri further elaborates, “Google is working with this approach because it has a huge data and still we know the accuracy of the language. Such approach picks the natural style of the language, but the translation is not always faithful. Eg Reading is mandatory. The output of system based on statistical approach may be पढ़ना जरुरी है। whereas, the rule-based approach may give the output as, पढ़ना आवश्यक है। so there are minute differences between two senses and rule-based approach tries to catch that instead of giving you the translation which sounds natural but not faithful to the original message.”

The underlying purpose of this project is to enable the masses access to government or other such official documents which are predominantly in English. This will surely go a long way in ensuring better governance. Students of computer sciences could be of great help to develop the tools and to analyse the rules. In case of the language part, the project is in need of linguists, especially who have learned Sanskrit grammar (not well-versed but at least who know how the language presents the meaning). It is a great opportunity to fuse the modern facilities with ancient knowledge systems and make it available to common man. In an attempt to popularise the project, Dr. Soma Paul; who is currently handling major parts of the project is trying to take the project to the nearby school and receiving immense response from the children.

Talking about her own experience Bhagyashri tells us, “Any language lover would enjoy the work to the core. The brainstorming sessions make you think about your own language patterns and you can explore the more miracles that a language can do the more you go deeper. It is surely a great learning experience”

Special thanks to Ms Bhagyashri Kulkarni who was a part of this project for some time and has shared detailed insights of the project.

NO COMMENTS

LEAVE A REPLY

two × 3 =