UNITED24 - Make a charitable donation in support of Ukraine!

Intelligence


Machine Translation History

During the 1950s, enthusiasts voiced extraordinary claims for new Machine Translation technology. It has had lofty goals, promising quick and cheap translation. DARPA funded a computer program to translate Soviet documents into English. The difficulties of machine translation became clear when the Russian term hydraulic ram was translated as "water goat." There was a backlash of skepticism following the disastrous failure of the machine translation effort in the 1950s.

One hallmark of the Air Force Foreign Technology Division (FTD) was (and continues to be for HQ NASIC) its machine translation (MT) capabilities. In 1955, the Rome Air Development Center at Griffiss AFB, New York, was tasked to develop an MT system for the center. The IBM Mark I Translating Device produced its first automated translation in 1959, and, in October 1963, FTD installed the Mark II, which provided word-for-word Russian language translations at the rate of about 5,000 words per hour.

The National Air and Space Intelligence Center (NASIC) has been developing, operating, and maintaining Systran [MT] systems since 1969. In July 1970, FTD upgraded to an IBM 360 Systran system. Translation speed increased 20-fold and the system analyzed the Russian text sentence-by-sentence to provide improved grammar and syntax. In October 1982, an optical character reader was added to the system to more fully automate text translation.

In September 1971, Air Force Rome Air Development Center developed an English-to-Vietnamese automated translator. Designed to operate on the IBM 360/67 computer, the translation system had an output rate of 80,000 to 100,000 words per hour. As part of the overall "Vietnamization Program," RADC produced in May an automated translation from English to Vietnamese of AF Manual 51-37, Instrument Flying. The translation was accomplished using the LOGOS I System for English-to-Vietnamese machine translation.

By the late 1970s three types of projects include those relying on "brute force" methods involving larger and faster computers; those based on a linguistic tradition which asserts that knowledge required for machine translation can be assimilated to the structure of a grammar-based system with a semantic component; and those stemming from artificial intelligence research, with an emphasis on knowledge structures. At that time the artificial intelligence approach seemed to have the best chance of simulating the communicative abilities necessary for realistic machine translation and gives an account of how knowledge structures might cope with one of the classic problems of machine translation: that of metaphor, or "semantic boundary breaking."

Machine translation efforts at RADC concluded on 27 October 1980 upon completion of a German/English translation system, dubbed METAL. Developed in conjunction with the University of Texas at Austin, the third-generation machine translated with an accuracy rate of 83 percent. From its beginnings 25 years before as an in-house research and development project, translation machines were designed by the Center for Russian, Chinese, and Vietnamese languages.

Today's MT capabilities provides translation "on-the-fly." Within seconds after receiving text, the computer begins providing the translation. Also, almost all HQ NASIC personnel have access to the interactive machine translation system. Russian is the most "robust" language, with built-in Russian translation dictionaries containing more than 350,000 words and expressions.

The Systran MT systems are the only known MT systems that cover the wide range of systems of interest to NASIC and which employ the context-sensitive language analysis that is compatible with NASIC's systems. In addition, Systran MT systems have been identified as the only Department of Defense Intelligence Information System (DODIIS) migration MT System by the DODIIS Migration Board. Existing Systran MT systems include Russian-English, French-English, German-English, Chinese- English, Spanish-English, Korean-English, Slovak-English, Albanian-English, Ukrainian-English, Serbo-Croatian-English, Japanese-English, Polish-English, English-Chinese, English-Japanese, English-Korean, Czech-English, Arabic- English, Urdu-English, and Farsi-English.

Over the past few years there has been a significant research program funded by ARPA, NSA and other government agencies to develop and test automatic machine translation algorithms. While this research program has been constrained to a limited source of documents and a limited set of languages, results so far have been very promising. However a follow-on program is needed to transfer the results of this research into operational use. NSA sponsored work to extend the applicability of the best language translation algorithms to more languages and more general domains; to improve the computational efficiency of those algorithms; to port those algorithms to networked workstations; and to develop good human-machine interfaces to allow easy control and operation of the system.

For textual information, there are ongoing research programs for document retrieval by topic, for data extraction and for machine translation. For several years, ARPA, NSA and other agencies onducted and sponsored research programs to develop algorithms for large vocabulary, continuous speech recognition. A follow-on to this research program was needed to further improve the recognition algorithms and to build a prototype speech recognition system and a system capable of processing continuous speech dictation of arbitrary text.

NSA sponsored work to extend the applicability of the best large vocabulary continuous speech recognition systems to vocabularies with sizes up to 50,000 words and to languages other than English; to improve the computational efficiency of those algorithms; to port those algorithms to networked workstations; and to develop effective human-machine interfaces to allow easy training, testing and general use of the system. The goal of the program is to deliver a usable prototype system for taking dictation on arbitrary topics using continuous speech input.

A major effort was initiated for development of efficient and reliable text summarization technology. Text summarization will combine existing text generation systems with a new understanding of how to identify key points of information in a text to reduce the volume of text an analyst needs to review. Prototype development for text summarization and relevance feedback from users is a near-term goal of the program.

By 2000 many projects to develop and use technology, including machine translation tools, for foreign language training and processing were under way in the Intelligence Community with funding from the National Foreign Intelligence Program, Joint Military Intelligence Program, and the Tactical Intelligence and Related Activities budget. A number of pilot projects are underway that could eventually help IC analysts and information processors deal with the increasing volume of foreign language material.

But humans remained a key part of this equation. The trend was toward development of tools that are intended to assist rather than replace the human language specialist and the instructor. Still, though this capability was not intended to replace humans, it was increasingly useful in niche areas, such as technical publications.

By 2003 the performance of machine translation technology on Arabic news feeds had vastly improved from essentially garbled output to nearly edit-worthy text, often understandable down to the level of individual sentences. This work pointed the way to unprecedented capabilities for exploiting huge volumes of speech and text in multiple languages.



NEWSLETTER
Join the GlobalSecurity.org mailing list