UNITED24 - Make a charitable donation in support of Ukraine!

Intelligence


Language Translation IT-04

This DARPA project will develop and test powerful new technology for processing human languages that will provide critical capabilities for a wide range of national security needs. This technology will enable systems to (a) automatically exploit large volumes of speech and text in multiple languages; (b) revolutionize human-computer interaction via spoken and written English and foreign languages; (c) perform computing and decision-making tasks in stressful, time-sensitive situations; and (d) autonomously collate, filter, synthesize and present relevant information in timely and relevant forms.

This program element and project were created in accordance with congressional intent in the FY 2005 DoD appropriations bill. Prior year funding was budgeted in PE 0602301E, Project ST-29.

During 2003 progress was made on all fronts of the DARPA computerized speech and text translation programs. In tests administered by the National Institute of Standards and Technology, the Text-to-Text translation program was declared the world's best algorithm for translating Arabic language news reports to English. Speech-to-Text efforts showed similar progress, reducing word error rates down from the 50 percent level (where they have hovered for over a decade) to 13 percent for broadcast news and 18 percent for telephone conversations. The on-going speech-to-speech program has been successfully deployed to Iraq where the "phraselator" has been used to translate the medical needs of Iraqi prisoners and for interrogation purposes.

The Compact Aids for Speech Translation (CAST) program is providing the tactical warfighter with real-time, face-to-face speech translation during combat and humanitarian operations in foreign territories. The program addresses domain-specific translation accuracy and response time. Early CAST prototypes relied on simple dictionaries and phrases. The CAST program resulted primarily in quickly making one-way translation systems (from English to multiple foreign languages) availa ble to warfighters in the field. The DARPA Phraselator is the key prototype system in use today. The system was deployed in Operation Iraqi Freedom and Operation Enduring Freedom. Future versions will offer a more sophisticated, flexible and fluid trans lation and paraphrasing capability that is robust and conducive to normal human conversations.

The Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program will develop technologies that enable robust spontaneous two-way tactical speech communications between American warfighters and native speakers. The program addresses the issues surrounding the rapid deployment of new languages, especially, low-resource languages and dialects. TRANSTAC will build on existing speech translation platforms developed in CAST to create a rapidly deployable language tool that will meet the military's language translation needs. For example, the program will add a two-way translation capability and will include Arabic dialects spoken in Iraq (the initial Phraselator used only Modern Standard Arabic).

Translingual Information Detection, Extraction and Summarization (TIDES) is revolutionizing the way time-critical intelligence is obtained from speech and text. The program has been developing technology to enable English-speaking operators and analysts to exploit the huge amounts of foreign speech and text (broadcast and newswire) that currently go unanalyzed due to shortages of skilled foreign language analysts. TIDES is creating new capabilities for Translation (converting foreign language material to English), Detection (finding or discovering needed information, e.g. topics), Extraction (pulling out key information including entities and relations), and Summarization (substantially shortening what a user must read). TIDES technology will dramatically increase the quantity, quality, and timeliness of analysis and reporting, thereby providing vital information to senior decision-makers and enabling commanders to carry out critical missions more swiftly, safely, and effectively.

Effective, Affordable, Reusable Speech-To-Text (EARS) is creating new automatic transcription (speech-to-text) technology whose output is substantially richer and more accurate than previously possible. Fast, accurate, automatic transcription of broadcasts, telephone conversations and multiparty speech will make rapid search and analysis of speech possible. EARS also provides text versions of spoken language for input to systems developed in TIDES, thereby extending the scope of what is possible with automatic transla tion, detection, extraction, and summarization.

Global Autonomous Language Exploitation (GALE) will revolutionize the exploitation of both speech and text in multiple languages (which is currently slow, labor-intensive, and limited) by developing core enabling technologies and end-to-end systems for insertion into a series of high-impact military and intelligence operational settings. GALE will substantially improve upon and exploit capabilities developed under TIDES, build off of the successes of both TIDES and EARS, and emphasize the creation of a systems framework for integrating the component language processing technologies, evaluating them based on their utility in various end-user tasks. GALE technology will enable machines to convert and distill enormous volumes of streaming speech and text in many languages to provide critical intelligence. Captured documents will be converted into readable, searchable English text.



NEWSLETTER
Join the GlobalSecurity.org mailing list