Biometrics Overview
Biometrics Programs

History of Biometrics
Fingerprint Identification
Face Recognition
Retina and Iris Identification
Hand Geometry and Handwriting
Voice Verification
Emerging Biometrics
CBEFF

Biometrics References

Voice Verification

Voice biometrics works by digitizing a profile of a person's speech to produce a stored model voice print, or template. Biometric technology reduces each spoken word to segments composed of several dominant frequencies called formants. Each segment has several tones that can be captured in a digital format. The tones collectively identify the speaker's unique voice print. Voice prints are stored in databases in a manner similar to the storing of fingerprints or other biometric data.

To ensure a good-quality voice sample, a person usually recites some sort of text or pass phrase, which can be either a verbal phrase or a series of numbers. The phrase may be repeated several times before the sample is analyzed and accepted as a template in the database. When a person speaks the assigned pass phrase, certain words are extracted and compared with the stored template for that individual. When a user attempts to gain access to the system, his or her pass phrase is compared with the previously stored voice model. Some voice recognition systems do not rely on a fixed set of enrolled pass phrases to verify a person's identity. Instead, these systems are trained to recognize similarities between the voice patterns of individuals when the persons speak unfamiliar phrases and the stored templates.

A person's speech is subject to change depending on health and emotional state. Matching a voice print requires that the person speak in the normal voice that was used when the template was created at enrollment. If the person suffers from a physical ailment, such as a cold, or is unusually excited or depressed, the voice sample submitted may be different from the template and will not match. Other factors also affect voice recognition results. Background noise and the quality of the input device (the microphone) can create additional challenges for voice recognition systems. If authentication is being attempted remotely over the telephone, the use of a cell phone instead of a landline can affect the accuracy of the results. Voice recognition systems may be vulnerable to replay attacks: if someone records the authorized user's phrase and replays it, that person may acquire the user's privileges. More sophisticated systems may use liveness testing to determine that a recording is not being used.

Consumer voice recognition systems are typically inexpensive and user-friendly. Most computer systems are equipped to support a microphone used to develop a voice template and later to collect the authentication request. Voice recognition is more often used in an environment in which voice is the only available biometric identifier, such as in telephony and call-center applications. Voice recognition systems have a high user acceptance rate because they are perceived as less intrusive and are one of the easiest biometric systems to use.

Voice verification technology uses the different characteristics of a person's voice to discriminate between speakers. These characteristics are based on both physiological and behavioral components. The physical shape of the vocal tract is the primary physiological component. The vocal tract is made up the oral and nasal air passages that work with the movement of the mouth, jaw, tongue, pharynx and larynx to articulate and control speech production. "The physical characteristics of these airways impart measurable acoustic patters on the speech that is produced," as one expert explained.91 The behavioral component is made up of movement, manner, and pronunciation.

The combination of the unique physiology and behavioral aspects of speaking enable verification of the identity of the person who is speaking. Voice verification technology works by converting a spoken phrase from analog to digital format and extracting the distinctive vocal characteristics, such as pitch, cadence, and tone, to establish a speaker model or voiceprint. A template is then generated and stored for future comparisons.

Voice verification systems can be text dependent, text independent, or a combination of the two. Text dependent systems require a person to speak a predetermined word or phrase. This information, known as a "pass phrase," can be a piece of information such as a name, birth city, favorite color or a sequence of numbers. The pass phrase is then compared to a sample captured during enrollment. Text independent systems recognize a speaker without requiring a predefined pass phrase. It operates on speech inputs of longer duration so that it has a greater opportunity to identify the distinctive vocal characteristics (i.e., pitch, cadence, tone).

Voice verification systems can be used to verify a person's claimed identity or to identify a particular person. It is often used where voice is the only available biometric identifier, such as over the telephone. Voice verification systems may require minimal hardware investment as most personal computers already contain a microphone. The downside to the technology is that, although advances have been made in recognizing the human voice, ambient temperature, stress, disease, medications, and other physical changes can negatively impact automated recognition.

Voice verification systems are different from voice recognition systems although the two are often confused. Voice recognition is used to translate the spoken word into a specific response, while voice verification verifies the vocal characteristics against those associated with the enrolled user. The goal of voice recognition systems is simply to understand the spoken word, not to establish the identity of the speaker. A familiar example of voice recognition systems is that of an automated call center asking a user to "press the number one on his phone keypad or say the word 'one'." In this case, the system is not verifying the identity of the person who says the word "one"; it is merely checking that the word "one" was said instead of another option.

The US PORTPASS Program, deployed at remote locations along the U.S.-Canadian border, recognizes voices of enrolled local residents speaking into a handset. This system enables enrollees to cross the border when the port is unstaffed.

NEWSLETTER

Join the GlobalSecurity.org mailing list

Homeland Security

Voice Verification