There are two main types of digital circuits: Common Channel Signaling (CCS) and Channel Associated Signaling (CAS). CAS circuits are available in two speeds: Tl at 1.544Mbs supports 24 calls, and El at 2.048Mbs supports 30 calls. (For these values, we are assuming the calls are not compressed; more on this later). CCS circuits are designated as PRI T l , PRI E l , and BRI. A PRI Tl can support 23 calls, a PRI El 30, and a BRI only 2.
The use of a digital circuit by definition implies that the voice signal must be digitized; the conversion from analog to digital is performed by a codec. The following sections discuss the conversion of analog to digital.
Digitizing Analog Signals
There are four steps in the process of digitizing analog sound:
1. Sample the analog sound at regular intervals
2. Quantize the sample
3. Encode the value into a binary expression
4. Optionally compress the sample
Sampling could be done any number of times per second; the more samples taken per second, the higher the audio quality, but the amount of digital data produced is much larger. Nyquist's theorem states that the sampling interval should be 2x the highest frequency of the sample to produce acceptable audio quality during playback. Because the highest frequency in human speech that we want to reproduce in telephony is around 4000 Hz, the sampling rate for standard tollquality digital voice is 8000 intervals per second. By contrast, CD music audio, which must encode both much higher and much lower frequencies, samples at about 192,000 times per second.
Quantizing refers to making a digital approximation of an analog waveform. Imagine drawing an arc on a chessboard; if you had to define the arc using only the square it was in for each row (segment) and column (interval), you would end up with a stepped pattern that was sort of close to the original arc but not exact. This is exactly the process that happens with quantization: the codec chooses a segment value that is as close as possible to the analog value at the interval it was sampled, but it cannot be exact. To make the quantization more accurate, each sample is divided into 16 intervals that are adjusted to more closely match the sampled wave. Furthermore, the segments are actually more fine-grained at the origin than at the high and low ranges. This is because most of the human speech we are trying to capture accurately is in this center range of the scale; there are fewer sounds at the very highest and lowest values.
Encoding the signal is a simple process. We have a single 8-bit code word to identify whether the analog signal was a positive or negative voltage, what value the signal was quantized to (which segment), and finally, which interval is represented by the code word. The first bit identifies either positive voltage (1) or negative (0). The next three bits represent the segment. There are eight segments in the positive range and eight segments on the negative range, so three bits provide the necessary encoding for the quantization. The last four bits identify the interval. A code word example is shown next:
1 0 0 1 1 1 0 0
In this case, the first 1 indicates a positive voltage; the next digits of 001 indicate this is the first segment (on the positive side), and 1100 indicates the twelfth interval.
The code word is 8 bits; we generate a code word 8000 times per second (the sample rate). This gives us a bitrate output of 8 x 8000 = 64,000 bps (64 kbps). The process we just described is known as Pulse Code Modulation (PCM) and is the standard for uncompressed digital voice in telephony. One voice stream thus requires 64k of bandwidth for transport.
Compression is not a required step, but it is often done to save bandwidth in VoIP environments. The two main types of compression we are concerned with are the following:
- Adaptive Differential PCM (ADPCM): This method does not send entire code words, but instead sends a smaller code that represents the difference between this word and the last one sent. This is not commonly used today, because it produces lower voice quality and compresses down only to about 16 kbps.
- Conjugate Structure Algebraic Code Excited Linear Prediction (CS_ACELP): As the name suggests, this is more complex compression. Based on a dictionary or codebook of known sounds made by a standardized American male voice, the digital sample is analyzed and compared to the dictionary. The dictionary code that is the closest to the sample is sent. The codebook is constantly learning. The output of this compression is typically 8 kbps—withvery little degradation of voice quality. This compression is widely used in VoIP.
Time Division Multiplexing (TDM)
TDM is the primary technology used in traditional digital voice; it is also extensively used in data circuits. The basic premise is to take pieces of multiple streams of digital data and interleave them on a single transmission medium.
T1 Circuits
On a Tl circuit, there are up to 24 channels available for voice. 64k from conversation 1 is loaded into the first Tl channel, then 64k from the conversation 2 is loaded into the second channel, and so on. If not enough conversations exist to fill the available channels, they are padded with null values. The 24 channels are grouped together as a frame. Depending on the implementation, either 12 frames are grouped together as a larger frame (called SuperFrame or SF), or 24 frames are grouped together (called Extended SuperFrame or ESF). T l s are typically full duplex, with two wires sending and the other two wires receiving.
E1 Circuits
An El is very similar to a T l . There are 32 channels, of which 30 can be used for voice. (The other two are used for framing and signaling, respectively.) The 32 channels are grouped together as a frame, and 16 frames are grouped together as a multiframe. El circuits are common in Europe and Mexico, with some El services becoming available in the United States.
Channel Associated Signaling (CAS)—T1
Although the 64 k channels of a Tl are intended to carry digitized voice, we must also be able to transmit signaling information, such as on-hook and off-hook, addressing, and so forth. In CAS circuits, the least significant bit of each channel in every sixth frame is "stolen" to generate signaling bit strings. SF implementation takes 12 frames and creates a SuperFrame. Using one bit per channel in every sixth frame gives two 12-bit signaling strings (known as A and B) per SuperFrame. The A and B strings are used to signal basic status, addressing, and supervisory messages. In ESF, 24 channels are in an Extended SuperFrame, which gives A, B, C, and D signaling strings. These can be used to signal more advanced supervisory functions.
Because CAS takes one bit from each channel in every sixth frame, it is known as Robbed Bit Signaling (RBS). Using RBS means that a slight degradation occurs in voice quality because every sixth frame has only 7 instead of 8 bits to represent the sample; however, this is not generally a perceptible degradation.
Channel Associated Signaling (CAS)—T1
El signaling is slightly different. In an El CAS circuit, the first channel (channel 0 or timeslot 1) is reserved for framing information. The 17th channel (channel 16 or timeslot 17) contains signaling information—no bits are robbed from the individual channels. Timeslots 2-16 and 18-32 carry the voice data. Each channel has specific bits in timeslot 17 for signaling. This means that although El CAS does not use RBS, it is still considered CAS; however, the signaling is outof-band in its own timeslot.
Common Channel Signaling (CCS)
CCS provides for a completely out-of-band signaling channel. This is the function of the D channel in an ISDN PRI or BRI implementation. The full 64 k of bandwidth per channel is available for voice; instead of generating ABCD bits, a protocol known as Q.931 is used out-of-band in a separate channel for signaling. An ISDN PRI Tl provides 23 voice channels of 64 k each (called Bearer or B channels) and one 64 k D (for Data) channel (timeslot 24) for signaling. An ISDN PRI El provides 30 B channels and 1 D channel (timeslot 17); an ISDN BRI circuit provides two 64 k B channels and one D channel of 16 k.