| Synthesis Technology: FM and Wavetable
 
There are a number of different technologies or algorithms used to create sounds 
in music synthesizers. Two widely used techniques are Frequency Modulation (FM) 
synthesis and Wavetable synthesis. A D V E R T I S E M E N T
 
 FM synthesis techniques generally use one periodic signal (the modulator) to 
modulate the frequency of another signal (the carrier). If the modulating signal 
is in the audible range, then the result will be a significant change in the 
timbre of the carrier signal. Each FM voice requires a minimum of two signal 
generators. These generators are commonly referred to as "operators", and 
different FM synthesis implementations have varying degrees of control over the 
operator parameters.  Sophisticated FM systems may use 4 or 6 operators per voice, and the 
operators may have adjustable envelopes which allow adjustment of the attack and 
decay rates of the signal. Although FM systems were implemented in the analog 
domain on early synthesizer keyboards, modern FM synthesis implementations are 
done digitally.  FM synthesis techniques are very useful for creating expressive new 
synthesized sounds. However, if the goal of the synthesis system is to recreate 
the sound of some existing instrument, this can generally be done more 
accurately with digital sample-based techniques.  Digital sampling systems store high quality sound samples digitally, and then 
replay these sounds on demand. Digital sample-based synthesis systems may employ 
a variety of special techniques, such as sample looping, pitch shifting, 
mathematical interpolation, and digital filtering, in order to reduce the amount 
of memory required to store the sound samples (or to get more types of sounds 
from a given amount of memory). These sample-based synthesis systems are often 
called "wavetable" synthesizers (the sample memory in these systems contains a 
large number of sampled sound segments, and can be thought of as a "table" of 
sound waveforms which may be looked up and utilized when needed).  Wavetable Synthesis Techniques The majority of professional synthesizers available today use some form of 
sampled-sound or Wavetable synthesis. The trend for multimedia sound products is 
also towards wavetable synthesis. To help prospective MIDI developers, a number 
of the techniques employed in this type of synthesis are discussed in the 
following paragraphs.  Looping and Envelope Generation One of the primary techniques used in wavetable synthesizers to conserve 
sample memory space is the looping of sampled sound segments. For many 
instrument sounds, the sound can be modeled as consisting of two major sections: 
the attack section and the sustain section. The attack section is the initial 
part of the sound, where the amplitude and the spectral characteristics of the 
sound may be changing very rapidly. The sustain section of the sound is that 
part of the sound following the attack, where the characteristics of the sound 
are changing less dynamically.  Figure 4 shows a waveform with portions which could be considered the attack 
and the sustain sections indicated. In this example, the spectral 
characteristics of the waveform remain constant throughout the sustain section, 
while the amplitude is decreasing at a fairly constant rate. This is an 
exaggerated example, in most natural instrument sounds, both the spectral 
characteristics and the amplitude continue to change through the duration of the 
sound. The sustain section, if one can be identified, is that section for which 
the characteristics of the sound are relatively constant.     
Figure 4: Attack and Sustain Portions of a Waveform     
Figure 5: Looping of a Sample Segment A great deal of memory can be saved in wavetable synthesis systems by storing 
only a short segment of the sustain section of the waveform, and then looping 
this segment during playback. Figure 5 shows a two period segment of the sustain 
section from the waveform in Figure 4, which has been looped to create a steady 
state signal. If the original sound had a fairly constant spectral content and 
amplitude during the sustained section, then the sound resulting from this 
looping operation should be a good approximation of the sustained section of the 
original.  For many acoustic string instruments, the spectral characteristics of the 
sound remain fairly constant during the sustain section, while the amplitude of 
the signal decays. This can be simulated with a looped segment by multiplying 
the looped samples by a decreasing gain factor during playback to get the 
desired shape or envelope. The amplitude envelope of a sound is commonly modeled 
as consisting of some number of linear segments. An example is the commonly used 
four part piecewise-linear Attack-Decay-Sustain-Release (ADSR) envelope model. 
Figure 6 depicts a typical ADSR envelope shape, and Figure 7 shows the result of 
applying this envelope to the looped waveform from Figure 5.     
Figure 6: A Typical ADSR Amplitude Envelope     
Figure 7: ADSR Envelope Applied to a Looped Sample Segment A typical wavetable synthesis system would store sample data for the attack 
section and the looped section of an instrument sound. These sample segments 
might be referred to as the initial sound and the loop sound. The initial sound 
is played once through, and then the loop sound is played repetitively until the 
note ends. An envelope generator function is used to create an envelope which is 
appropriate for the particular instrument, and this envelope is applied to the 
output samples during playback.  Playback of the initial wave (with the attack portion of the envelope 
applied) begins when a Note On message is received. The length of the initial 
sound segment is fixed by the number of samples in the segment, and the length 
of the attack and decay sections of the envelope are generally also fixed for a 
given instrument sound.  The sustain section will continue to repeat the loop samples while applying 
the sustain envelope slope (which decays slowly in our examples), until a Note 
Off message is applied. The Note Off message triggers the beginning of the 
release portion of the envelope.  Loop Length  The loop length is measured as a number of samples, and the length of the 
loop should be equal to an integral number of periods of the fundamental pitch 
of the sound being played (if this is not true, then an undesirable "pitch 
shift" will occur during playback when the looping begins). In practice, the 
length of the loop segment for an acoustic instrument sample may be many periods 
with respect to the fundamental pitch of the sound. If the sound has a natural 
vibrato or chorus effect, then it is generally desirable to have the loop 
segment length be an integral multiple of the period of the vibrato or chorus.
 One-Shot Sounds  The previous paragraphs discussed dividing a sampled sound into an attack 
section and a sustain section, and then using looping techniques to minimize the 
storage requirements for the sustain portion. However, some sounds, particularly 
sounds of short duration or sounds whose characteristics change dynamically 
throughout their duration, are not suitable for looped playback techniques. 
Short drum sounds often fit this description. These sounds are stored as a 
single sample segment which is played once through with no looping. This class 
of sounds are referred to as "one-shot" sounds.  Sample Editing and Processing There are a number of sample editing and processing steps involved in 
preparing sampled sounds for use in a wavetable synthesis system. The 
requirements for editing the original sample data to identify and extract the 
initial and loop segments have already been mentioned.  Editing may also be required to make the endpoints of the loop segment 
compatible. If the amplitude and the slope of the waveform at the beginning of 
the loop segment do not match those at the end of the loop, then a repetitive 
"glitch" will be heard during playback of the looped section. Additional 
processing may be performed to "compress" the dynamic range of the sound to 
improve the signal/quantizing noise ratio or to conserve sample memory. This 
topic is addressed next.  When all of the sample processing has been completed, the resulting sampled 
sound segments for the various instruments are tabulated to form the sample 
memory for the synthesizer.  Sample Data Compression The signal-to-quantizing noise ratio for a digitally sampled signal is 
limited by sample word size (the number of bits per sample), and by the 
amplitude of the digitized signal. Most acoustic instrument sounds reach their 
peak amplitude very quickly, and the amplitude then slowly decays from this 
peak. The ear's sensitivity dynamically adjusts to signal level. Even in systems 
utilizing a relatively small sample word size, the quantizing noise level is 
generally not perceptible when the signal is near maximum amplitude. However, as 
the signal level decays, the ear becomes more sensitive, and the noise level 
will appear to increase. Of course, using a larger word size will reduce the 
quantizing noise, but there is a considerable price penalty paid if the number 
of samples is large.  Compression techniques may be used to improve the signal-to-quantizing noise 
ratio for some sampled sounds. These techniques reduce the dynamic range of the 
sound samples stored in the sample memory. The sample data is decompressed 
during playback to restore the dynamic range of the signal. This allows the use 
of sample memory with a smaller word size (smaller dynamic range) than is 
utilized in the rest of the system. There are a number of different compression 
techniques which may be used to compress the dynamic range of a signal.  Note that there is some compression effect inherent in the looping techniques 
described earlier. If the loop segment is stored at an amplitude level which 
makes full use of the dynamic range available in the sample memory, and the 
processor and D/A converters used for playback have a wider dynamic range than 
the sample memory, then the application of a decay envelope during playback will 
have a decompression effect similar to that described in the previous paragraph.
 Pitch Shifting In order to minimize sample memory requirements, wavetable synthesis systems 
utilize pitch shifting, or pitch transposition techniques, to generate a number 
of different notes from a single sound sample of a given instrument. For 
example, if the sample memory contains a sample of a middle C note on the 
acoustic piano, then this same sample data could be used to generate the C# note 
or D note above middle C using pitch shifting.  Pitch shifting is accomplished by accessing the stored sample data at 
different rates during playback. For example, if a pointer is used to address 
the sample memory for a sound, and the pointer is incremented by one after each 
access, then the samples for this sound would be accessed sequentially, 
resulting in some particular pitch. If the pointer increment was two rather than 
one, then only every second sample would be played, and the resulting pitch 
would be shifted up by one octave (the frequency would be doubled).  In the previous example, the sample memory address pointer was incremented by 
an integer number of samples. This allows only a limited set of pitch shifts. In 
a more general case, the memory pointer would consist of an integer part and a 
fractional part, and the increment value could be a fractional number of 
samples. The memory pointer is often referred to as a "phase accumulator" and 
the increment value is then the "phase increment". The integer part of the phase 
accumulator is used to address the sample memory, the fractional part is used to 
maintain frequency accuracy.  For example if the phase increment value was equivalent to 1/2, then the 
pitch would be shifted down by one octave (the frequency would be halved). A 
phase increment value of 1.05946 (the twelfth root of two) would create a pitch 
shift of one musical half-step (i.e. from C to C#) compared with an increment of 
1. When non-integer increment values are utilized, the frequency resolution for 
playback is determined by the number of bits used to represent the fractional 
part of the address pointer and the address increment parameter.  Interpolation When the fractional part of the address pointer is non-zero, then the 
"desired value" falls between available data samples. Figure 8 depicts a 
simplified addressing scheme wherein the Address Pointer and the increment 
parameter each have a 4-bit integer part and a 4-bit fractional part. In this 
case, the increment value is equal to 1 1/2 samples. Very simple systems might 
simply ignore the fractional part of the address when determining the sample 
value to be sent to the D/A converter. The data values sent to the D/A converter 
when using this approach are indicated in the Figure 8, case I.      
Figure 8: Sample Memory Addressing and Interpolation A slightly better approach would be to use the nearest available sample 
value. More sophisticated systems would perform some type of mathematical 
interpolation between available data points in order to get a value to be used 
for playback. Values which might be sent to the D/A when interpolation is 
employed are shown as case II. Note that the overall frequency accuracy would be 
the same for both cases indicated, but the output is severely distorted in the 
case where interpolation is not used.  There are a number of different algorithms used for interpolation between 
sample values. The simplest is linear interpolation. With linear interpolation, 
interpolated value is simply the weighted average of the two nearest samples, 
with the fractional address used as a weighting constant. For example, if the 
address pointer indicated an address of (n+K), where n is the integer part of 
the address and K is the fractional part, than the interpolated value can be 
calculated as s(n+K) = (1-K)s(n) + (K)s(n+1), where s(n) is the sample data 
value at address n. More sophisticated interpolation techniques can be utilized 
to further reduce distortion, but these techniques are computationally 
expensive.  Oversampling Oversampling of the sound samples may also be used to improve distortion in 
wavetable synthesis systems. For example, if 4X oversampling were utilized for a 
particular instrument sound sample, then an address increment value of 4 would 
be used for playback with no pitch shift. The data points chosen during playback 
will be closer to the "desired values", on the average, than they would be if no 
oversampling were utilized because of the increased number of data points used 
to represent the waveform. Of course, oversampling has a high cost in terms of 
sample memory requirements.  In many cases, the best approach may be to utilize linear interpolation 
combined with varying degrees of oversampling where needed. The linear 
interpolation technique provides reasonable accuracy for many sounds, without 
the high penalty in terms of processing power required for more sophisticated 
interpolation methods. For those sounds which need better accuracy, oversampling 
is employed. With this approach, the additional memory required for oversampling 
is only utilized where it is most needed. The combined effect of linear 
interpolation and selective oversampling can produce excellent results.  Splits When the pitch of a sampled sound is changed during playback, the timbre of 
the sound is changed somewhat also. The effect is less noticeable for small 
changes in pitch (up to a few semitones), than it is for a large pitch shift. To 
retain a natural sound, a particular sample of an instrument sound will only be 
useful for recreating a limited range of notes. To get coverage of the entire 
instrument range, a number of different samples, each with a limited range of 
notes, are used. The resulting instrument implementation is often referred to as 
a "multisampled" instrument. This technique can be thought of as splitting a 
musical instrument keyboard into a number of ranges of notes, with a different 
sound sample used for each range. Each of these ranges is referred to as a 
split, or key split.  Velocity splits refer to the use of different samples for different note 
velocities. Using velocity splits, one sample might be utilized if a particular 
note is played softly, where a different sample would be utilized for the same 
note of the same instrument when played with a higher velocity. This technique 
is not commonly used to produce basic sound samples because of the added memory 
expense, but both key splitting and velocity splitting techniques can be 
utilized as a performance enhancement. For instance, a key split might allow a 
fretless bass sound on the lower octaves of a keyboard, while the upper octaves 
play a vibraphone. Similarly, a velocity split might "layer" strings on top of 
an acoustic piano sound when the keys are hit with higher velocity.  Aliasing Noise Earlier paragraphs discussed the timbre changes which result from pitch 
shifting. The resampling techniques used to shift the pitch of a stored sound 
sample can also result in the introduction of aliasing noise into an instrument 
sound. The generation of aliasing noise can also limit the amount of pitch 
shifting which may be effectively applied to a sound sample. Sounds which are 
rich in upper harmonic content will generally have more of a problem with 
aliasing noise. Low-pass filtering applied after interpolation can help 
eliminate the undesirable effect of aliasing noise. The use of oversampling also 
helps eliminate aliasing noise.  LFOs for Vibrato and Tremolo  Vibrato and tremolo are effects which are often produced by musicians playing 
acoustic instruments. Vibrato is basically a low-frequency modulation of the 
pitch of a note, while tremolo is modulation of the amplitude of the sound. 
These effects are simulated in synthesizers by implementing low-frequency 
oscillators (LFOs) which are used to modulate the pitch or amplitude of the 
synthesized sound being produced.  Natural vibrato and tremolo effects tend to increase in strength as a note is 
sustained. This is accomplished in synthesizers by applying an envelope 
generator to the LFO. For example, a flute sound might have a tremolo effect 
which begins at some point after the note has sounded, and the tremolo effect 
gradually increases to some maximum level, where it remains until the note stops 
sounding.  Layering Layering refers to a technique in which multiple sounds are utilized for each 
note played. This technique can be used to generate very rich sounds, and may 
also be useful for increasing the number of instrument patches which can be 
created from a limited sample set. Note that layered sounds generally utilize 
more than one voice of polyphony for each note played, and thus the number of 
voices available is effectively reduced when these sounds are being used.  Digital Filtering It was mentioned earlier that low-pass filtering may be used to help 
eliminate noise which may be generated during the pitch shifting process. There 
are also a number of ways in which digital filtering is used in the timbre 
generation process to improve the resulting instrument sound. In these 
applications, the digital filter implementation is polyphonic, meaning that a 
separate filter is implemented for each voice being generated, and the filter 
implementation should have dynamically adjustable cutoff frequency and/or Q.  For many acoustic instruments, the character of the tone which is produced 
changes dramatically as a function of the amplitude level at which the 
instrument is played. For example, the tone of an acoustic piano may be very 
bright when the instrument is played forcefully, but much more mellow when it is 
played softly. Velocity splits, which utilize different sample segments for 
different note velocities, can be implemented to simulate this phenomena.  Another very powerful technique is to implement a digital low-pass filter for 
each note with a cutoff frequency which varies as a function of the note 
velocity. This polyphonic digital filter dynamically adjusts the output 
frequency spectrum of the synthesized sound as a function of note velocity, 
allowing a very effective recreation of the acoustic instrument timbre.  Another important application of digital filtering is in smoothing out the 
transitions between samples in key-based splits. At the border between two 
splits, there will be two adjacent notes which are based on different samples. 
Normally, one of these samples will have been pitch shifted up to create the 
required note, while the other will have been shifted down in pitch. As a 
result, the timbre of these two adjacent notes may be significantly different, 
making the split obvious. This problem may be alleviated by employing a digital 
filter which uses the note number to control the filter characteristics. A table 
may be constructed containing the filter characteristics for each note number of 
a given instrument. The filter characteristics are chosen to compensate for the 
pitch shifting associated with the key splits used for that instrument.  It is also common to control the characteristics of the digital filter using 
an envelope generator or an LFO. The result is an instrument timbre which has a 
spectrum which changes as a function of time. An envelope generator might be 
used to control the filter cutoff frequency generate a timbre which is very 
bright at the onset, but which gradually becomes more mellow as the note decays. 
Sweeping the cutoff frequency of a filter with a high Q setting using an 
envelope generator or LFO can help when trying to simulate the sounds of analog 
synthesizers.  |