Taken from An Introduction to SteganographyBy Duncan StellarsSteganography in AudioBecause of the range of the human auditory system (HAS), data hiding in audio signals is especially challenging. The HAS perceives over a range of power greater than one billion to one and range of frequencies greater than one thousand to one. Also, the auditory system is very sensitive to additive random noise. Any disturbances in a sound file can be detected as low as one part in ten million (80dB below ambient level) [1]. However, while the HAS has a large dynamic range, it has a fairly small differential range - large sounds tend to drown quiet sounds. When performing data hiding on audio, one must exploit the weaknesses of the HAS, while at the same time being aware of the extreme sensitivity of the human auditory system.
8.1 Audio EnvironmentsWhen working with transmitted audio signals, one should bear in mind two main considerations. First, the means of audio storage, or digital representation of the audio, and second, the transmission medium the signal might take.
8.1.1 Digital representationDigital audio files generally have two primary characteristics:
Another digital representation that should be considered is the ISO MPEG-Audio format, a perceptual encoding standard. This format drastically changes the statistics of the signal by encoding only the parts the listener perceives, thus maintaining the sound, but changing the signal.
8.1.2 Transmission mediumThe transmission medium, or transmission environment, of an audio signal refers to the environments the signal might go through on its way from encoder to decoder. Bender in identifies four possible transmission environments:
The signal representation and transmission environment both need to be considered when choosing a data-hiding method.
8.3 Methods of Audio Data HidingWe now need to consider some methods of audio data-hiding.
8.2.1 Low-bit encodingSimilarly to how data was stored in the least-significant bit of images, binary data can be stored in the least-significant bit of audio files. Ideally the channel capacity is 1kb per second per kilohertz, so for example, the channel capacity would be 44kbps in a 44kHz sampled sequence. Unfortunately, this introduces audible noise. Of course, the primary disadvantage of this method is its poor immunity to manipulation. Factors such as channel noise and resampling can easily destroy the hidden signal. A particularly robust implementation of such a method is described by Bassia and Pitas in [8]. The result is a slight amplitude modification of each sample in a way that does not produce any perceptual difference. Their implementation offers high robustness to MPEG compression plus other forms of signal manipulation, such as filtering, resampling and requantization.
8.2.2 Phase codingThe phase coding method works by substituting the phase of an initial audio segment with a reference phase that represents the data. The procedure for phase coding is as follows:
For the decoding process, the synchronisation of the sequence is done before the decoding. The length of the segment, the DFT points, and the data interval must be known at the receiver. The value of the underlying phase of the first segment is detected as 0 or 1, which represents the coded binary string.
8.2.3 Spread spectrumMost communication channels try to concentrate audio data in as narrow a region of the frequency spectrum as possible in order to conserve bandwidth and power. When using a spread spectrum technique, however, the encoded data is spread across as much of the frequency spectrum as possible. One particular method discussed in [1], Direct Sequence Spread Spectrum (DSSS) encoding, spreads the signal by multiplying it by a certain maximal length pseudorandom sequence, known as a chip. The sampling rate of the host signal is used as the chip rate for coding. The calculation of the start and end quanta for phase locking purposes is taken care of by the discrete, sampled nature of the host signal. As a result, a higher chip rate and therefore a higher associated data rate, is possible. However, unlike phase coding, DSSS does introduce additive random noise to the sound.
8.2.4 Echo data hidingEcho data hiding embeds data into a host signal by introducing an echo. The data are hidden by varying three parameters of the echo: initial amplitude, decay rate, and offset, or delay. As the offset between the original and the echo decreases, the two signals blend. At a certain point, the human ear cannot distinguish between the two signals, and the echo is merely heard as added resonance. This point depends on factors such as the quality of the original recording, the type of sound, and the listener. By using two different delay times, both below the human ear's perceptual level, we can encode a binary one or zero. The decay rate and initial amplitude can also be adjusted below the audible threshold of the ear, to ensure that the information is not perceivable. To encode more than one bit, the original signal is divided into smaller portions, each of which can be echoed to encode the desired bit. The final encoded signal is then just the recombination of all independently encoded signal portions. As a binary one is represented by a certain delay y, and a binary zero is represented by a certain delay x, detection of the embedded signal then just involves the detection of spacing between the echoes. A process for doing this is described in Gruhl, et al.s work, [13]. Echo hiding was found to work exceptionally well on sound files where there is no additional degradation, such as from line noise or lossy encoding, and where there is no gaps of silence. Work to eliminate these drawbacks is being done.
|