Digitizing voice
The steps involved are,Sampling and quantizing
The first step in converting analog voice signals into digital is called sampling. The voice signal is sampled 8,000 times per second and each sample can be encoded in 8 bits. This produces a bit stream of 64,000 bits per second. This many samples are sufficient to reproduce the original sound accurately. The process of converting one sample into 8 bits is called “quantizing” because the infinite possible values of a voice sample must fit into one of 256 discrete values available for the digital byte (28=256). This process is called Pulse Code Modulation (PCM). The device that produces a digital signal from an analog one is called a codec, which is an abbreviation of code/decode. Normally a codec is embedded in a microchip called a digital signal processor (DSP).
PCM produces a 64 kbps stream of data with excellent voice quality. This process allowed long distance calls to be places on the T1 lines of the telephone company for transmission. One voice call takes up one channel, not a very efficient scheme. With VoIP, we want to cram as much voice data into as little digital signal as possible. And instead of diverting our digital voice signal directly onto a T1 line, we need to packetize it and send it over an IP network.
Silence suppression and compression
It has been estimated that as much as 60% of a voice conversation is silence. Deleting these empty bits decreases the amount of data needed for the voice transmission. However taking all of these empty bits out of the transmission produces an eerie, other worldly quality to the conversation. In practice, voice engineers compensate by putting some background “comfort noise” back into the conversation.In addition to silence suppression, the digital data that represents the voice can be compressed with modern compression techniques, similar to that used for computer data.
The net effect of these techniques is to reduce the bandwidth required for a voice conversation down from 64 kbps to 32 kbps, 16 kbps, 8 kbps or even less. Eight voice conversations at 8 kbps can take place over the same circuit as a single conversation at 64 kbps.Encoding and compression techniques are published as standards by the International Telephone Union. Expect to see these when looking at specifications for VoIP equipment. The original PCM at 64 kbps is G.711 and is always supported by VoIP equipment. Other important encoding and compression standards are as follows.
| Codec | G.711 PCM |
G.726 ADPCM |
G.728 LD-CELP |
G.729A CS-CELP |
G.723.1 ACELP/MP-MLQ |
| kbps | 64 | 32 | 16 | 8 | 5.3/6.4 |
Packetizing voice
Once the voice data has been digitized, compressed and the silence suppressed, it has to be divided into sections for placing into IP packets.VoIP is inefficient for small voice packets while large voice packets lead to long delays. The VoIP packet will have overhead in the form of headers. The headers for IP, UDP and RTP add up to 40 bytes. If the data was as small as 40 bytes, the packet would only be 50% efficient. The largest size packet that can exist on an Ethernet system is 1500 bytes. Take away the 40 bytes for the header and you still have 1460 bytes available. That translates into 1460 samples of uncompressed voice or about one fifth of a second (182ms). If it is compressed with a ratio of 1 to 8, that represents about 1.5 seconds. If a packet with this much voice is lost or arrives out of turn, the conversation will be severely disrupted.
Typically, 10ms to 30ms (average 20ms) of voice is placed inside one packet. 20ms of uncompressed voice takes 160 bytes. Compressed at 4 to 1, 20ms would take 40 bytes. The amount of voice carried inside one packet is a trade-off between the need for efficiency and the need to smooth out a conversation if a packet is lost in transit.
Transmission of voice by IP
The three protocols of the TCP/IP protocol suite used by the voice data are Real-time Transport Protocol (RTP), User Datagram Protocol (UDP) and the Internetworking Protocol (IP).Why UDP and not TCP?
TCP is used for call control and setup but UDP is used for the voice transmission itself. TCP is the protocol used when guaranteed delivery of a packet is required. If a packet is lost, TCP provides a re-transmission mechanism that continues to transmit the same data until it is finally received. This same mechanism makes TCP unsuitable for voice transmission. Re-transmission of a lost packet will introduce a gap in the conversation. It is better that one packet, which represents typically 20ms of conversation, stays lost than that the conversation is interrupted. UDP does not provide a re-transmission facility and is therefore the protocol of choice for voice transmission.
Menu