Modem Voice Commands
From: https://en.wikipedia.org/wiki/Voice_modem_command_set
Voice modem command set
A voice modem is an analog telephone data modem with a built-in capability
of transmitting and receiving voice recordings over the phone line. Voice
modems are used for telephony and answering machine applications. Similar
to the Hayes command set used for data modems, in which the host PC
commands the modem via a series of commands known as AT commands, there
exists a well-defined set of common voice AT commands that are somewhat
consistent throughout the industry.
Implementation problems
Because voice mode is not the typical use for a modem, many modems on the
market have poor or buggy support for their voice modes. Characteristics
of a good voice modem depend greatly upon the intended application, and
include:
- Reliable operation. Many modems simply "lock up" or crash the host PC,
though this is more common with Winmodems. Others have flow control bugs
and other implementation bugs, possibly causing calls to hang, audio to
skip, or audio to keep playing after an attempted abort.
- Good audio characteristics. Some modems have an uncorrectably low signal
volume or produce audio noise. Some modems are unable to recognize all but
the best DTMF signals. Some modems do a poor job of recording, or
detecting and reporting silence or the end-of-call voltage reversal, which
some applications need.
- Support for caller ID, if needed. "Type-1 caller ID" as used in North
America is missing from the vast majority of modems. Nearly all modem
chipsets support caller ID, but because the typical dial-up Internet user
doesn't need caller ID, the extra components needed to support caller ID
are often omitted for cost reasons.
- Support for multiple instances. The drivers for many internal modems
(typically Winmodems) cannot tolerate more than one of the same device
inside a single computer. Symptoms of incompatibility include crashes,
blue screens of death, or simple inoperability of all but a single modem.
External RS-232-based (serial) modems do not have this limitation because
each modem contains its own microprocessor and is unaware of other modems
on the same host. USB modems may or may not have this problem, because
some USB modems are simply serial modems with a "USB-to-serial" converter
chipset (in which case there should be no problem), and other USB modems
are "host-controlled" and are essentially externally attached Winmodems
(in which case the problem may persist).
Plus versus Hash
Each voice modem platform tends to support either one of two sets of voice
commands—in particular, one flavor of the command set contains a plus
(+) sign, and the other contains a hash (#) sign.
Detecting voice mode
Support for voice mode can be detected on a modem by issuing the following
command: AT+FCLASS=?
This command is usually supported containing the plus sign whether a modem
supports "plus" or the "hash" command set, because the command (which
stands for "fax class") is part of the industry-standard fax commands
which always use the plus.
A modem supporting voice will respond with a comma-delimited list of
numbers that includes the number 8. A modem not supporting voice will
respond with ERROR, or with a list of numbers not including 8. (Many
modems will report 0,1,2 indicating support for data (0), and class 1 and
2 faxes—this is an indication that voice support is not present.)
Modems supporting the "hash" command set usually respond to AT#CLS=? as
well.
Entering voice mode
The command AT+FCLASS=8 or AT#CLS=8 will put the modem in voice mode. Most
modems still remain on-hook and respond with OK. Once this command has
been accepted, most modems will respond with Data Link Escape (DLE)
messages instead of or in addition to normal modem responses. For example,
instead of reporting a phone line ringing with the RING message, many
modems will instead send the DLE ASCII character, followed by the letter
R. The specific set of DLE events reported by each modem is specific to
its chipset and documented in its reference guide.
Querying the modem's capabilities
The command AT+VLS=? or AT#VLS=? usually returns a list of operating modes
that are specific to each modem. Each of these numbered modes determines
the telephone line's on-hook or off-hook status, as well as sound routing
between each of the following:
- Recording/playback
- Telephone handset
- Speakerphone jack (which could simply be hard-wired as an audio input on
the PC's sound card instead of being a discrete jack)
- Microphone jack (available on some voice modems)
Many chipsets offer a listing of all the possible combinations of modes
even if the specific modem board doesn't support them all. That's because
the board manufacturer is almost always different from the chipset maker,
and the chipset comes pre-configured to support all possible hardware,
even if not implemented on the circuit board.
Example of response to AT+VLS=?
from a modem on the market in 2006:
AT+VLS=?
0,"",0000000000,0000000000,B084008000
1,"T",0B8418E000,0FE418E000,0B8419E000
2,"L",0884008000,0CE4008000,0884018000
3,"LT",0B8418E000,0FE418E000,0B8419E000
4,"S",0084008000,0484008000,3084018000
5,"ST",0B8418E000,0FE418E000,0B8419E000
6,"M",0084008000,04E4008000,3084008000
7,"MST",0B8418E000,0FE418E000,0B8419E000
8,"S1",0084008000,0484008000,3084018000
9,"S1T",0B8418E000,0FE418E000,0B8419E000
10,"MS1T",0B8418E000,0FE418E000,0B8419E000
11,"M1",0084008000,04E4008000,3084008000
13,"M1S1T",0B8418E000,0FE418E000,0B8419E000
14,"H",0084008000,04E4008000,3084018000
15,"HT",0B8418E000,0FE418E000,0B8419E000
16,"MS",0084008000,04E4008000,3084018000
17,"MS1",0084008000,04E4008000,3084018000
19,"M1S1",0084008000,04E4008000,3084018000
20,"t",0B8418E000,0FE418E000,BB8419E000
While every modem is different, usually mode 0 means on-hook (hung up) and
mode 1 is sufficient to pick up the phone, record/playback audio, and
detect DTMF (touch tones).
The command AT+VSM=? or AT#VSM=? usually returns a list of audio data
formats supported by the modem. Each format includes a name (such as PCM,
ADPCM, μ-law, A-law), a number of bits per sample (usually 2, 3, 4, 8, or
16) and an audio sampling rate (usually 7,200, 8,000, or 11,025 Hertz).
These are industry-standard audio codecs whose implementations are well
published. The ADPCM standard is an exception. Modems claiming to support
ADPCM almost always support Dialogic ADPCM, also known as "VOX", which is
similar but not compatible with other ADPCM implementations, including
Interactive Multimedia Association (IMA) ADPCM as well as MS ADPCM (a
Microsoft implementation used in WAV files). Modems may support these as
well, if a qualifier is listed—otherwise, by default, ADPCM means
Dialogic.
Example response to AT+VSM=?
from a modem on the market in 2006:
AT+VSM=?
1,"UNSIGNED PCM",8,0,8000,0,0
129,"IMA ADPCM",4,0,8000,0,0
130,"UNSIGNED PCM",8,0,8000,0,0
140,"2 Bit ADPCM",2,0,8000,
141,"4 Bit ADPCM",4,0,8000,0,0
The desired audio data format is selected using the same command but with a
number instead of a question mark. It is used for both sending and
receiving.
Answering calls
Answering calls is usually done with either the AT+VLS=n or AT#VLS=n
commands, where n is a number representing the modem's mode. For the vast
majority of modems, this number will be 1 to answer a telephone call, and
0 to hang up; other numbers activate other functionality when present,
such as speakerphone. Some modems answer in response to ATA—the standard
data-mode answer command—but other modems will interpret this as a
command to actually answer in data and not voice mode.
Transmitting audio data
To begin transmitting audio data, the host sends the command AT+VTX or
AT#VTX. This results in a response from the modem of CONNECT or VCON.
(Modems using the "plus" command set usually respond CONNECT, while those
using the "hash" set respond VCON, which stands for voice connect.)
From then on, the modem interprets any data sent from the computer as wave
audio data, using the codec selected by the AT+VSM or AT#VSM command.
The audio data is always sent to the modem slightly faster than it can play
it, so the modem may buffer a small portion of it and play it smoothly
with no clicks or pops caused by delays in the computer's operating
system. For example, during playback of an 8 kHz audio file at 8-bit
resolution (which creates 8,000 bytes, or 80,000 bits when including
start/stop bits, per second), the data must travel over the serial port at
a minimum of 115,200 bits per second. (115,200 bit/s is the first setting
of a typical computer serial port that's greater than 80,000.) In
addition, due to some extra overhead involved in doubling DLE bytes in the
stream (mentioned below), a small amount of extra bandwidth is mandatory
to allow for this.
When the modem wants the computer to temporarily pause so the playback can
catch up, it temporarily lowers the CTS (Clear to Send) signal on the
RS-232 serial port. The modem re-raises the signal in time for the
computer to resume sending audio data before the playback buffer becomes
completely empty.
When the computer wants to signal the end of audio data, most modems expect
to see an ASCII DLE character (0x10), followed by the ! character.
Because the DLE byte can and often does occur in normal audio data, it must
be sent twice to the modem when it is to be interpreted as a byte of audio
data.
Most modems also accept a sequence of DLE + CAN (cancel) as a signal to
cancel audio playback. The distinction is that the modem is to understand
that it is to immediately abort playback now, rather than let remaining
data in the playback buffer run to completion.
When the modem is done playback, it responds OK.
Throttling playback
During playback, it is necessary to send the audio data at a rate that
keeps the audio playing smoothly, but without sending it faster than the
modem can handle it. It is also desirable to make sure the modem can
always abort playback and discard any buffered audio in case a message is
to be canceled. Message cancellation is expected by callers who already
know the answers to voice prompts and provide their answer early (and who
would become irritated at being forced to listen to a prompt they've
already responded to).
There are several ways to keep the computer sending audio data to the modem
at a rate to keep up with playback without overrunning the audio buffer.
The most straightforward is to use CTS flow control. The following caveats
exist.
- Some voice modems have bugs in their implementation of flow control. In
particular, a large number of Conexant chipsets will sometimes drop their
CTS line and never bring it back up during playback. Conexant is a hugely
popular chipset in voice modems today and they otherwise implement voice
commands well, making it worthwhile to consider working around this bug.
Some Conexant chipsets will also not bring CTS back up if the "playback
abort" command is sent or processed by the modem while CTS is down.
- Some voice modems offer a very large transmit buffer (for example, 4
seconds' worth of audio) coupled with a bug that prevents the host from
requesting an "abort playback". The result is that if a caller presses a
touch-tone that's supposed to interrupt a message, and the host is
providing unlimited audio data mediated by CTS alone, the end result is
that the message can't be interrupted for at least 4 seconds.
A second way to throttle playback involves polling a "tick" timer provided
by the host computer's operating system and based on a hardware clock
that's independent of the host's CPU load. This may or may not be
available, and it depends entirely on the host operating system. However,
when available, it is extremely reliable. It is reasonable to assume that
the PC needs to stay ahead of the playback by a couple of hundred bytes
and that the modem will buffer this. (The commands AT+VBQ or AT#VBQ on
voice modems will often reveal the size of the buffer in bytes, and 1 to 2
kilobytes is a typical response.)
A third way to throttle playback involves inserting dummy DLE messages into
the output stream such that the audio data takes a known amount of time to
transmit through the serial port, and the playback is essentially clocked
by the UART in the serial port.
For example, when considering using dummy DLE stuffing, a few things must
first be noted. In a typical scenario, one second of audio might be 8,000
one-byte samples, and with a small percentage of the samples being equal
to the DLE byte and must be doubled, a typical second of audio might be
8,050 bytes. The trick involves inserting enough meaningless DLE messages
into the bytes that the modem will discard (that is, a DLE followed by a
byte without any specific meaning) so that there are exactly 11,520 bytes
(assuming a serial port locked at 115,200 bit/s) which will take exactly 1
second to transmit through the serial port. Although it is possible that
interrupt latency on the host PC may cause slightly less than 11,520 bytes
to be sent per second, most voice modems will buffer enough bytes before
actually starting playback to permit a small skew here. Also the PC can be
programmed to convert a second of audio into slightly fewer than 11,520
bytes (all voice modems will buffer a small overrun without the need for
flow control as long as it is no more than a few hundred bytes).
Dummy DLE stuffing is unlikely to work with "Winmodems" that have no
physical UART. It makes sense only with external serial modems that are
physically clocked to a specific bit rate by a clock generator behind the
external serial port.
Recording audio data
The method for recording audio data is the same, except that the command is
AT+VRX, or AT#VRX, and the modem transmits audio data while the computer
receives it. The RTS/CTS flow control are not used here (the computer must
accept all the audio data it receives, and the modem automatically paces
its transmission to match the audio sampling rate).
The modem never stops transmitting until the computer tells it to stop,
which is usually with CTRL-C. The data is always terminated with DLE+!,
and all DLE bytes naturally occurring in the stream are sent twice to
differentiate them from normal DLE messages.
Before, during, and after recording, the modem may notify the computer host
of specific events including, but not limited to, the following:
- Touch-tone keypresses detected
- Silence detected
- Line polarity reversal detected (often meaning caller hang-up)
- Dial tone detected
- Fax tone detected
When the modem wants to tell the host about these, it sends a DLE byte,
plus a (usually) 1-byte message describing the event. The list of
supported events varies by modem, but usually a digit (as well as * and #)
mean touch-tones pressed, and the letter "s" means silence detected. Some
modems report only one event for each touch-tone keypress, while others
report a keypress repeatedly until the key is released, and then a special
"key released" event.
Terminating a voice call
Any of the following commands usually cause the modem to hang up and
terminate a voice call: AT+VLS=0, AT#VLS=0, ATH, ATZ. Dropping the RS-232
DTR (data terminal ready) signal often accomplishes this as well. The
modem remains in voice mode (except in the case of ATZ).
Voice modems do not automatically hang up even when the caller on the other
end does. They may report the hangup, dialtone, or silence events, but it
is up to the computer to act upon them. If when the modem is recording,
the caller hangs up and the computer doesn't react, the modem will
continue providing the audio recording everything else heard on the line,
such as dial tones, telephone company error messages, and so forth.
See also
Hayes command set
List of ITU-T V-series recommendations
Telephony
References
- AT command reference manual for Rockwell, Conexant, and Lucent chipsets.
(Each chipset manufacturer produces a manual with this same title,
followed by the name of the product to which it applies)
- Zoom Tech Support Documentation, AT Command References
- International Telecommunication Union (February 1998), Control of
voice-related functions in a DCE by an asynchronous DTE, Series V: Data
communication over the telephone network: Control procedures,
International Telecommunication Union, ITU-T Recommendation V.253
- Mirho, Charles (August 1996), "To Learn About the Voice Modem Extensions
for Windows 95, Press 1 Now!", Microsoft Systems Journal, "The Hayes AT
standard helped promote widespread acceptance of data modems because
programs could just send the appropriate AT-mumble-this and
AT-mumble-that, and any modem that speaks the AT standard will know what
to do. A similar standard, AT+V, appears to have emerged for voice modems
as well. The AT+V command set consists of Hayes AT-prefixed commands and
+V-prefixed voice commands. AT+V is documented as ANSI/TIA/EIA standard
IS-101 entitled "Facsimile Digital Interfaces-Voice Control Interim
Standard for Asynchronous DCE." A follow-up to this specification is
PN-3131 by TIA Technical Subcommittee TR-29.2."
Category:
Modems