Title of Invention

A METHOD AND SYSTEM TO CREATE LANGUAGE INTEROPERABILITY ENVIRONMENT

Abstract A method and system for run time translation of input, independent of its language and format, wherein the language background knowledge is used to convey context of the input using internet based communication protocol to create a Language Interoperability Environment (LIE), said method comprising steps of : identifying and tagging the input received, for document format and format details, analyzing the tagged input using Source Language Analyzer (SLA to obtain broken-down words and word groups along with its grammatical features, replacing the analyzed input to its target language(s) using Multi Language Mapper (MLM) wherein the MLM is a database with source language equivalent elements in other languages, generating words taking root, its lexical category and grammatical features using Target Language Generator (TLG) to give translated output in target language, and applying the identified and tagged information to the translated output before sending it to an intended destination.
Full Text Field of the Present Invention
Present Invention addresses the issue of communication between people speaking different languages, which is of magnanimous proportion. An equally magnanimous endeavor is to build a Language Interoperability Environment (LIE) which can translate content (text or speech) from a source language to target language(s) at run time and help people of the world to communicate with each other and to access the wealth of information that is available the world over.
Background of the Present invention:
Communication between people speaking different languages has always troubled mankind. Especially in a country like India with so many languages the problem is more pronounced and magnified.
There are a few possible solutions to this problem:
1. Either one of them or both of them know the other person's language or both of them
know a common language like English.
2. Employ a translator to help communicate with each other.
3. Use sign language
Offices have become truly cosmopolitan with people from different parts of the country speaking different languages. With the advent of computers / internet, globalization and the need to communicate with people speaking different languages, English has become the middleman.
English is the language of modern day research and scholarship, and has emerged as the lingua-franca of the modern world and the globalized society. All important works are translated into English from the different languages of the world resulting in the creation of an immense knowledgebase.

The number of people who know English is miniscule and the Internet is dominated by English. Advanced countries have made enough investments on systems to ensure that their languages are adapted to the digital age. It is in the poor and the developing countries where the problem is acute - in Asia and Africa. Unfortunately it is here that the maximum diversity exists.
These immense groups of people who are left out are forced to catch up at the cost of neglecting their languages. With little investments and lack of focus, the usage of languages has been coming down drastically. It is evident if you look at the new generation of people.
Experts say in 2-3 centuries there will be only 4-5 languages in use on planet Earth; a repeat of the Rosetta script - only larger in scale.
Languages have evolved over centuries and have literature, knowledgebase and a world of their own. Mankind will lose something so great and immense, and the worst part is - it is irreversible. There is an urgent need to evolve a system to keep them alive. Languages survive only through usage and this can happen only if people are able to transact in their own languages. Computers and Internet have become modern day necessities and people should be able to use them and access the world's information in their own languages and feel empowered.
Prior Art of the present Invention
To reach people speaking different languages, translations have been popular since centuries and the Bible is the most translated book.
There have been developments in the Machine Translation (MT) arena even though it is in its infancy.
Current state is: though there are multi lingual sites and translation software's available, they operate independently and are inaccurate.
Many web sites like Alta Vista and Google have translation facility where a user can type in text and ask the system to translate to other languages. But the sad part is it is 'word to word' translation. It is not even repeatable: if a user types in text in English and

asks the system to translate to say French, and copies the output and pastes it in the input box again and ask the system to translate back to English, the output will be different from the original input.
To address this issue of magnanimous proportion, an equally magnanimous endeavor is to build a Language Interoperability Environment (LIE) which can translate content from a source language to target language(s) at run time and help people of the world to communicate with each other and to access the wealth of information that is available the world over.
Even though computers and processing power has increased multifold in the last decade, a major weakness of the machine remains - that it has little or no common sense or world knowledge.
Fully-automatic general purpose high quality machine translation systems (FGH-MT) are extremely difficult to build. In fact, there is no system in the world that qualifies as a FGH-MT. The reasons are very evident and are not difficult to locate. Translation is a creative process that involves interpretation of the given text by a translator and also varies depending on the audience and the purpose. This explains the difficulty of building a machine translation system.
The major difficulty the machine faces in interpreting a given text is the lack of general world knowledge or common knowledge, subject specific knowledge, knowledge of the context, etc. which can be collectively called as 'background knowledge'. The difficulty the machine faces at the first level pertains to information coded in a text.
To overcome the complexities of such a large scale MT system, the most common approach has been to delimit the subject domain so that machine works in a narrow subject area, such as, weather reports, computer manuals, etc. It has been hoped that by delimiting MT in a narrow area, one stands a better chance of using context, domain knowledge, etc. The system would perform badly when given a text outside the domain but that is a limitation one would have to live with. The real difficulty is in identifying a domain that is narrow enough that the system works well, and wide enough that enough real texts qualify to be in it, so that it is practically useful.

When some information is transferred from one language to another, there is no way to express it exactly. There will be losses/imperfections to some extent, as in any other case where you see transmission or interpretation losses whenever something transforms from one medium to another.
LIE addresses these issues. Another important aspect is: LIE is NOT aimed at translating serious stuff like poetry but to do mundane stuff - the kind of language used in everyday life is fairly simple and LIE is to help people as much as possible.
Brief description of the Accompanying drawings:
Figure 1: represents mapping of all the languages in Multi Language Mapper (MLM)
Figure 2; represents a summarized LIE
Figure 3: represents LIE specific to speech
Figure 4: represents an elaborate LIE
Objects of the Present Invention
The main object of the present invention is to develop a method for run time translation of input, independent of its language and format.
Yet another object of the present invention is to develop a method wherein the language background knowledge is used to convey context of the text.
Still another object of the present invention is to develop said method using internet based protocol.
Still another object of the present invention is to develop said method in order to create a Language Interoperability Environment (LIE).
Another main object of the present invention is to develop a system for run time translation of input, independent of its language and format.

Yet another object of the present invention is to develop a system wherein the language background knowledge is used to convey context of the text.
Still another object of the present invention is to develop said system using internet based protocol.
Still another object of the present invention is to develop said system in order to create a Language Interoperability Environment (LIE).
Statement of the present Invention
The present invention is related to a method for run time translation of input, independent of its language and format, wherein the language background knowledge is used to convey context of the input using internet based protocol in order to create a Language Interoperability Environment (LIE), said method comprising steps of sending the input in source language to Source Language Analyzer (SLA), analyzing the input using SLA to obtain broken-down word groups along with its grammatical features, replacing the analyzed input to its target language(s) using Multi Language Mapper (MLM), generating words taking root, its lexical category and grammatical features using Target Language Generator (TLG), receiving the output in target language(s) in identical format at an intended destination and a system for run time translation of input, independent of its language and format, wherein the language background knowledge is used to convey context of the input using internet based protocol in order to create a Language Interoperability Environment (LIE), said system comprises: means for sending the input in a source language to Source Language Analyzer (SLA); means for analyzing the input using SLA to obtain broken-down word groups alongwith its grammatical features, thereafter replacing the analyzed text to its target language(s) using Multi Language Mapper (MLM), and thereby generating words taking root, its lexical category and grammatical features using Target Language Generator (TLG); means for receiving the output in target language(s) in identical format at an intended destination. Detailed description of present invention:

Accordingly, the present invention relates to a method for run time translation of input, independent of its language and format, wherein the language background knowledge is used to convey context of the input using internet based protocol in order to create a Language Interoperability Environment (LIE), said method comprising steps of
a) sending the input in source language to Source Language Analyzer (SLA),
b) analyzing the input using SLA to obtain broken-down word groups along
with its grammatical features,
c) replacing the analyzed input to its target language(s) using Multi
Language Mapper (MLM),
d) generating words taking root, its lexical category and grammatical features
using Target Language Generator (TLG), and
e) receiving the output in target language(s) in identical format at an intended
destination.
In an embodiment of the present invention, wherein the method further comprises editing the text at steps (a) and/or (e) using pre- and post-editor respectively.
In yet another embodiment of the present invention the input and output are text or speech (Figure 3).
In still another embodiment of the present invention, wherein the pre-editor provides for editing, identifying non-standard forms, seeking corrections and offering alternatives in order to choose correct form.
In still another embodiment of the present invention the sent text is tagged to characterize the format.
In still another embodiment of the present invention SLA comprises Word Splitter (WS), Morphological Analyzer (MA) and Language Rules Engine (LRE).

In still another embodiment of the present invention the WS analyzes and separates words and word groups.
In still another embodiment of the present invention, wherein the MA analyzes each word and produces its root and grammatical features.
In still another embodiment of the present invention, wherein the MA breaks up each word into a root and a suffix at different points to look-up the proposed root in dictionary and the proposed suffix in a suffix table.
In still another embodiment of the present invention, wherein adding and/or deleting characters during breakup of words.
In still another embodiment of the present invention, wherein the MLM replaces elements of source language with elements of target language(s) using database having equivalent elements of the source language in all other languages.
In still another embodiment of the present invention, wherein the TLG comprises Word Grouper (WG), Morphological Synthesizer (MS) and Language Rules Engine (LRE).
In still another embodiment of the present invention the WG analyzes and separates and/or combines words and word groups.
In still another embodiment of the present invention, wherein the MS synthesizes words taking root, its lexical category, grammatical rules and features.
In still another embodiment of the present invention, wherein the LRE helps check lexical category, exceptions, grammatical rules and features.
In still another embodiment of the present invention, wherein the received text format reflects characteristics of the tagged sent text.

In still another embodiment of the present invention the editing provides for background knowledge to convey context of the text.
In still another embodiment of the present invention the method maintains meaning, information, context and concordance of the source language in the target language(s).
In another main embodiment of the present invention, wherein a system for run time translation of input, independent of its language and format, wherein the language background knowledge is used to convey context of the input using internet based protocol in order to create a Language Interoperability Environment (LIE), said system comprises:
1. means for sending the input in a source language to Source Language Analyzer
(SLA);
2. means for analyzing the input using SLA to obtain broken-down word groups
alongwith its grammatical features, thereafter replacing the analyzed text to its
target language(s) using Multi Language Mapper (MLM), and thereby generating
words taking root, its lexical category and grammatical features using Target
Language Generator (TLG); and
3. means for receiving the output in target language(s) in identical format at an
intended destination.
In still another embodiment of the present invention, wherein the system further comprises means for pre-editor/post-editing.
In still another embodiment of the present invention the input and output are text or
speech.
In still another embodiment of the present invention the SLA comprises Word Splitter (WS), Morphological Analyzer (MA) and Language Rules Engine (LRE).

In still another embodiment of the present invention the MA has proposed suffix in a suffix table to look-up at different point during breaking up of each word into a root and a
suffix.
In still another embodiment of the present invention the MLM is a database having the equivalent elements of the source language in all the other languages.
In still another embodiment of the present invention the TLG comprises Word Grouper (WG), Morphological Synthesizer (MS) and Language Rule Engine (LRE).
In still another embodiment of the present invention the LRE has entire grammar rules and exceptions of the language.
In still another embodiment of the present invention the system maintains meaning, information, context and concordance of the source language in the target language(s).
Language Interoperability Environment (LIE) shown in Figure-1 is aimed at creating a:
1. Run time Machine Translation environment.
2. The reference language used is English to create MLM as all the languages of the
world have built dictionaries available between the respective languages and English.
For example: Kannada-English, French-English, Hindi-English, etc. This is done for
grammatical and morphological purposes also. Hence while designing LIE, English
will be used as the gold standard.
3. The LIE engines will translate from the source language to the target language(s) and
vice versa. There are 3 components as shown in Figure 2: Source Language Analyzer
(SLA), Multi Language Mapper (MLM) and Target Language Generator (TLG) and
have to be built in each language adhering to the overall architecture.
4. The Multi Language Mapper (MLM) is a huge database that has the equivalent
elements of the source language in all the other languages under consideration and
will be expanded to include many more languages when resources permit.

5. The Language Interoperability is achieved through creating standard interfaces and formats between the different LIE engines. For example:
a. A person can write a document in Kannada. Now the recipients can read the
document in Kannada.
b. If recipients want to read it in English/German/French/Tamil/Mandarin he/she
can get the Kannada document translated using LIE-English/LIE-
German/LIE-French/LIE-Tamil/LIE-Mandarin, etc.
This way we can achieve language interoperability. Thus LIE unites the entire world and its people together by empowering them to transact in their own languages with all others with the help of advanced technology, computers and connectivity. The result is that the entire world, its people and the immense knowledgebase opens in one's own language.
LIE is a very large and very complex software system hosted on powerful farm of servers. The system is made available in several flavors like:
1. Freely available on the web for writing and translating - email, chat, browsing
and searching the internet in many languages. This supports many concurrent users but limits the input to a few pages at a time.
2. A product that can be set up on powerful servers at local installations like large
corporations. This supports a few users but can take large inputs running into several pages.
3. Licensed and secure usage by companies for their corporate communication.
A basic LIE MT System consists of an analyzer of the source language i.e. Source Language Analyzer (SLA) whose output is fed Target Language Generator (TLG) for the generation of the target language. Between the analyzer and the generator there is a Multi Language Mapper (MLM) which uses multi lingual dictionaries and grammar rules/exceptions with support from LRE to map the source language elements to target language(s) elements.

To make the system more usable, User Interface Editors are also provided for human pre-editing of the input and post-editing of the output. These are also part of the overall system.
The important components are described in Figure-2:
At a basic level, the Machine Translation (MT) is perceived as a sequence of independent steps/processes executed by the different modules of the overall software system. The Engines are different for different languages hence for each language a separate system need to be built which adheres to the over all system needs and architecture.
The input to the system is either formatted text (email, html, Microsoft Word document, Excel spread sheet, pdf... file) or voice.
The way in which LIE system works is described and represented in Figure-4:
1. A Listener software module receives the formatted input text - identifies and tags
them for characteristics such as:
• Original format: html, Microsoft Word document, Excel spread sheet, pdf
...file.
• Format details like paragraphs, fonts, bold/italic, etc.
• Source Language.
• Target Language(s).

2. If the input is speech, the voice modulations are analyzed by the Speech Analyzer
(SA), corresponding values are fetched from the MLM-speech database and the
output will be given using the Speech Generator (SG) in the target language.
Figure-4 explains this scenario.
3. Depending on the source language and the target language(s), the respective
software engines are invoked and the input is passed to them. Now onwards the
processing steps refer to the specific language engines.
4. The input text is passed to the pre-editor which is a user interface that allows the
user to edit and correct the input: words spelt with non-standard spellings are
changed to their standard spellings. It also points out the non-standard forms and

seeks corrections. It can also present alternatives out of which the user can choose
the correct form. The user can avoid this step if he/she wishes to do so.
The input text in a source language is passed through Source Language Analyzer
(SLA) which has components like: local Word Splitter and Morphological
Analyzer.
The local Word Splitter (WS) analyzes and separates words and word groups like
idioms and phrases.
The output is passed to Morphological Analyzer (MA) which is designed to'
handle inflectional and derivational morphology. It analyzes each word and
produces its root and grammatical features using the elaborate Language Rules
Engine (LRE) which has the entire grammar rules and exceptions. For a given
word, it checks for the lexical category (such as pronoun, post-position, noun,
verb, etc.) and other grammatical features. It also tries to see whether the word
can be broken up into a root and a suffix. At the breakup point, some characters
such as vowels may be added or deleted. It may have to try several times to break
the word at different points. For each breakup it looks up the proposed root in the
dictionary and the proposed suffix in a suffix table. Whenever, both lookups are
successful that value is taken as valid. This is the output of the source system.
The Multi Language Mapper (MLM) is a huge database that has the equivalent
elements of the source language in all the other languages. MLM takes the output
produced so far to replace the elements of the source language with elements of
the target language(s) and kick starts the Target Language Generator (TLG)
processes of the respective languages.
The output of the MLM is fed into the respective Target Language Generator
(TLG) which has components like: local Word Grouper (WG) and Morphological
Synthesizer (MS). These are the reverse of Word Splitter and Morphological
Analyzer.
The Word Grouper (WG) analyzes and separates words and word groups like
idioms and phrases.
Morphological Synthesizer (MS) takes a root, its lexical category and
grammatical features and generates words.

12. The out put is fed into a Language Packager (LP) to package in the target
language. It applies the formats of the original text to the output text such as:
a) Original format: html, Microsoft Word document, Excel spread sheet, pdf,
...file
b) Format details like paragraphs, fonts, bold/italic, etc.
13. The output produced is the LIE system output. The post-editing user interface
allows the user to do post-editing rapidly. The user can avoid this step if he/she
wishes to do so. There are three levels of post-editing:
a. First level seeks to make the output grammatically correct.
b.In second level, the raw output is corrected not only grammatically but also stylistically.
c. In the third level, the post-editor might change the setting and the events in the story to convey the same meaning to the reader who has a different cultural and social background. This is really trans-creation, and a creative post-editor can go all the way up to this level.
LIE takes the information in the source language text and presents it in the target language. Thus, at the prefix/suffix level, a prefix/suffix in the source language is replaced by a suitable element in the target language and at the word level, the source words are replaced by equivalent words in the target language. Similarly, the word groups are also replaced by equivalent groups in the target language.
The LIE system is to be designed so that the combination of man and machine together can perform translations and the output is as close to the target language as possible.
If LIE enters into mainstream and common use, it has major implications for global communication and integration as a person can access documents in his/her language which will be a big asset.

The LIE answer to the world's communication problem is that it envisages building a massive IT backbone which can take input in the languages for which the LIE systems are built and give output in other languages and vice versa. The architecture and standards are defined in such a way that all the LIE engines adhere to a standard architecture and talk to each other based on defined document interchange standards which are based on open standards like Unicode, XML and web services. A person from Japan can transact in his own language - Japanese with a person from Germany who is transacting in his own language - German.
The task of building a LIE machine translation system for each language is subdivided into two parts:
1. The first module, the core LIE, does language analysis based on language
knowledge: It takes all the information in the source text and presents it in its
output which is quite close to the target language.
2, The second module does domain specific knowledge based processing, statistical
processing, etc. based on world knowledge, statistical knowledge, etc. in which it
utilizes world knowledge, frequency information, concordances, etc. to produce
output in the target language.
The advantages of said modular approach are as given below:
1. The first module can be made available for use at an earlier day since it requires less
effort and easier to be built. But, the user needs a certain amount of training to read
the output and make sense out of it.
2. The early feedback guides the refinement and building of the system. Since the
system can be used at an early date, not only does it serve a useful purpose, it also
becomes easier to build the second module.

3. The system provides a robust layer in the first module which can be used even if the
second module fails to an extent in any specific context. The second module by its
very nature is fragile. The first module is made much more robust.
4. The segregation of said modules is critical to appreciate the boundaries of various
activities and accordingly co-ordinate in a better manner. It also facilitates due
recognition of language knowledge and also thereby the knowledge background to
ultimately achieve LIE.
5. When LIE is made available in a few languages in the first phase of implementation
as the software is very complex and needs teams from the respective language groups
and a lot of money to build and operate, the people speaking those languages will
validate the translations and its uses, and will help in refining the system. After
several such iterations a more robust environment can be developed and subsequently
enhanced to involve more languages. The ultimate aim is to develop the Language
Interoperability Environment (LIE) in most of the languages of the world and bring
the entire planet under one interoperable umbrella.
6. It is also envisaged that the knowledgebase is made available in the many languages
of the world at runtime. The philosophy is to provide access to "all the world's
information" through mechanized translation with interoperability mechanisms
inbuilt.
The invention is further elaborated with the help of following examples. However, such examples should not be construed to limit scope of the invention
Example 1
A sample MLM table
A sample Multi Language Mapper (MLM) table which is part of the MLM database is given below:




The format of the input is maintained in the output with the $10000 in bold.
Similarly input formats like html, Microsoft Word, Excel, etc., are packaged accordingly
with the formats like paragraphs, bold, italics, punctuation marks, etc.
Example 3
As shown in figure 4, wherein Person A sends an email with a Microsoft Word document attachment in Source language to Person B. This email goes to the LIE before it reaches Person B for transformation to target language. The Listener initially tags the formats like: document format - html, Excel, Word etc. format characteristics -paragraphs, bold, italics etc. and optionally passes it to User Interface (UI) for Pre-editing. Now user can edit for non-standard forms/spellings. Further, SLA takes the

process information to Analyze and produce a broken down structure and its grammatical features. MLM replaces each element of the source language with an element of the target language. In TLG a combination of man and machine together can perform translations and the output is as close to the target language as possible. Packager applies the original formats with the help of tags produced by the Listener. User can now edit the output form the Packager for non-standard forms/spellings. The Person B now receives the email with the Microsoft Word attachment in the Target Language.

Documents:

1227-che-2006 abstract.jpg

1227-CHE-2006 ABSTRACT.pdf

1227-CHE-2006 CLAIMS GRANTED.pdf

1227-CHE-2006 CORRESPONDENCE OTHERS.pdf

1227-CHE-2006 CORRESPONDENCE PO.pdf

1227-CHE-2006 FORM 1.pdf

1227-CHE-2006 FORM 18.pdf

1227-CHE-2006 FORM 2.pdf

1227-CHE-2006 PETITIONS.pdf

1227-che-2006-abstract.pdf

1227-che-2006-abstractimage.jpg

1227-che-2006-claims.pdf

1227-che-2006-correspondence-others.pdf

1227-che-2006-correspondence-po.pdf

1227-che-2006-description-complete.pdf

1227-che-2006-drawings.pdf

1227-che-2006-form 1.pdf

1227-che-2006-form 26.pdf

1227-che-2006-form 3.pdf

1227-che-2006-form 5.pdf

1227-che-2006-form 9.pdf


Patent Number 234635
Indian Patent Application Number 1227/CHE/2006
PG Journal Number 29/2009
Publication Date 17-Jul-2009
Grant Date 10-Jun-2009
Date of Filing 14-Jul-2006
Name of Patentee CHANDRASHEKAR RUDRAPPA KORANAHALLY
Applicant Address 45, 3rd Main, 1st phase, Basaveshwara Layout, Vijayanagar, Bangalore 560 040. KARNATAKA, INDIA
Inventors:
# Inventor's Name Inventor's Address
1 CHANDRASHEKAR RUDRAPPA KORANAHALLY 45, 3rd Main, 1st phase, Basaveshwara Layout, Vijayanagar, Bangalore 560 040. KARNATAKA, INDIA
PCT International Classification Number G06F9/44,G06F9/44
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 NA