There are several Dialogic telephony-type file formats as enumerated below. The usual file extension is ".vox", but this a habit, not a rule. Some Dialogic-based voice processing system developers use different file extensions.
Dialogic telephony cards use sampling rates of 6.0, 6.053, 8.0, 8.117 or 11.025 KHz. For some cards the sampling rate is programmable, for others it is not. The sampling rate to use depends on the card you are using. Consult your telephony hardware supplier for the exact sample rate your voice processing hardware requires.
One of the annoying characteristics of native Dialogic telephony file formats is that they contain only raw data (except for the new Dialogic ".wav" formats). Most have no header with additional information such as coding algorithm, sampling rate or resolution. Therefore when referring to such a file in Vox Studio, or any other program, it is imperative to specify the exact file coding and sampling rate. This may puzzle you in the beginning, but you will soon learn to discern Vox Studio's difference in behavior regarding files with or without headers. When Vox Studio reads ".wav" files there is no need to tell it what the file contains; this information is found in the file header itself. When Vox Studio reads native Dialogic ".vox" files it is necessary to tell it what the exact file type is, as this information is NOT in the file. Obviously when Vox Studio has to write in any format, with or without a header, it is always necessary to tell it what file type it needs to generate as there is no way for Vox Studio to guess what you want to do.
In addition to the formats enumerated below, Vox Studio can concatenate ".vox" files into ".vap" format (this operation is called "grouping" in the program). Inversely, it can ungroup a ".vap" file into a number of ".vox" files.
Here are the Dialogic telephony sound file formats known to Vox Studio today:
ADPCM (OKI variant) and ADPCM Wav
ADPCM stands for Adaptive Differential Pulse Code Modulation. There are various flavors of ADPCM. The algorithm we have implemented in this version is the original algorithm used by Dialogic voice processing hardware. Future versions of Vox Studio will support more flavors of ADPCM required for other telephony hardware.
OKI ADPCM, as used by Dialogic, compresses data recorded at 6.0, 6.053, 8.0 or 8.117 KHz sampling rates. Sound is encoded as a succession of 4-bit nibbles glued together in pairs in an 8-bit stream of data. Each 4-bit nibble represents the difference between the current sampled signal value and the previous value. The compression ratio obtained is relatively modest (12 bits resolution data samples are encoded as 4-bit differentials).
ADPCM coding introduces signal errors and the sound quality is slightly affected, but it remains sufficient for many telephony applications. Naturally, 8 KHz ADPCM sounds MUCH better than 6 KHz ADPCM.
Traditionally, 6 KHz ADPCM is also called 24 KBps (6KHz x 4 bits) and 8 KHz ADPCM is called 32 KBps (8KHz x 4 bits). This is a very confusing way of defining the sound coding algorithm used as, for instance, some other ADPCM algorithms produce 24 KBps which is in fact 3-bit data sampled at 8 KHz!
Not many people know that some cards use 6.0 and 8.0 KHz sampling rates and other (very old cards) use 6.053 and 8.117 KHz rates. Beware when playing back files from one card type onto another. If the files contain voice samples, the chances are nobody will ever notice the slight difference in pitch. However, if the files contain frequency-sensitive material, say DTMF data streams, then the 1.5% difference may in fact cause very severe problems.
Vox Studio has the capability to convert to and from, indexed ADPCM files (".vap" files) as well. These are files that contain more than one voice message per physical file, with a header (at the beginning of the file) that contains pointers to the start of each separate voice message. This technique was introduced mainly to circumvent the problems good old DOS had when too many files were opened simultaneously by a running application.
The Dialogic ADPCM ".wav" format uses the same coding as normal Dialogic files (it can contain sound coded in A-law, Mu-law or ADPCM) but it has a RIFF-standard file header instead of just raw data. One more sample frequency is provided: 11.025 KHz.
A-law and A-law Wav
The European digital telephone network uses a companding algorithm operating on a segmented straight lines approximation to a logarithmic curve called the A-law digital coding standard.
The A-law companders produce 8 bits of companded data per 16-bit sample at a sample rate of 8 KHz. This is also called 64 KBps A-law PCM.
This is the coding algorithm used by PTTs (Telephone Companies) throughout Europe. In the USA a similar algorithm called Mu-law is used.
Telephony cards capable of recording and playing 64 KBps data produce very good quality voice. In fact you cannot get any better on the current analog telephone network. Of course, 64 KBps PCM data requires more hard disk space than 24 KBps or 32 KBps ADPCM data, but the voice quality is better. A-law companding produces a better signal-to-noise ratio at low voice amplitudes than Mu-law, but Mu-law has a lower idle channel noise.
Although the "normal" sample rate for A-law telephony is 8 KHz, most Dialogic cards allow using A-law at 6KHz as well, and Vox Studio supports this.
The Dialogic A-law ".wav" format uses the same coding as above but has a RIFF-standard file header instead of just raw data. One more sample frequency is provided: 11.025 KHz.
Mu-law and Mu-law Wav
The US and Japanese digital telephone networks use a companding algorithm operating on a segmented straight lines approximation to a logarithmic curve called the Mu-law digital coding standard.
Per channel, Mu-law companders produce 8 bits of companded data per 16-bit sample at a sample rate of 8 KHz. This is also called 64 KBps Mu-law PCM.
This is the coding algorithm used in the Bell System throughout the US. In Europe a similar algorithm called A-law is used.
Telephony cards capable of recording and playing 64 KBps data produce very good quality voice. In fact you cannot get any better on the current analog telephone network. Of course, 64 KBps PCM data requires more hard disk space than 24 KBps or 32 KBps ADPCM data, but the voice quality is better. Mu-law companding produces a lower idle channel noise than A-law, but A-law has a better signal-to-noise ratio at low voice amplitudes.
Although the "normal" sample rate for Mu-law telephony is 8 KHz, most Dialogic cards allow using Mu-law at 6KHz as well, and Vox Studio supports this.
The Dialogic Mu-law ".wav" format uses the same coding as above but has a RIFF-standard file header instead of just raw data. One more sample frequency is provided: 11.025 KHz.