Everything2
Near Matches
Ignore Exact
Full Text
Everything2

audio time stretching

created by yerricde

(idea) by yerricde (8.6 mon) (print)   ?   (I like it!) Fri Jun 08 2001 at 17:16:48

OK, I have a song stored as 2-channel, 16-bit linear PCM on my reasonably fast computer. I want to slow down the tempo because I'm trying to remix with another song.

"Re-perform it!" No, I don't have the source score or samples, and I don't have the vocal training; all I have is this wav file I extracted from a CD.

"Resample it!" No, resampling digital audio has an effect analogous to that of slowing down the turntable: it transposes the song to a lower key makes the singer sound like an ogre (no, not Shrek).

I guess it's time for my old friends Fourier and Wigner to come help. We'll build a phase vocoder after Flanagan, Golden, and Portnoff. Basic steps: compute the frequency/time relationship of the signal by taking the FFT of each windowed block of 2,048 samples (assuming 44 KHz input), do some processing of the frequencies' amplitudes and phases, and perform the inverse FFT. A good algorithm will give good results at compression/expansion ratios of + 25%; beyond that, the pre-echo and other smearing artifacts of frequency domain interpolation on transient ("beat") waveforms, which are not localized at all in the frequency domain, begin to take a toll on perceived audio quality.

Rabiner and Schafer in 1978 put forth an alternate solution: work in the time domain, attempt to find the period of a given section of the fundamental wave with the autocorrelation function, and crossfade one period into another. This is called time domain harmonic scaling or synchronized overlap-add method and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation misunderestimates the period of a signal with complicated harmonics (such as orchestral pieces). Cool Edit Pro seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 Hz and the lowest bass frequency. For a 120 bpm tune, use 48 Hz because 48 Hz = 2,880 cycles/minute = 24 cycles/beat * 120 bpm.

High-end commercial audio processing packages combine the two techniques, using wavelet techniques to separate the signal into sinusoid and transient waveforms, applying the phase vocoder to the sinusoids, and processing transients in the time domain, producing the highest quality time stretching.

These techniques can also be used to scale the pitch of an audio sample while holding time constant. (Note that I said pitch scaling, not "shifting," as pitch shifting by amplitude modulation with a complex exponential does not preserve the ratios of the partial frequencies that determine the sound's timbre.) Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the formants into a sort of Alvin and the Chipmunks-like effect, which may be desirable or undesirable. To preserve the formants and character of the voice, you can use a "regular" channel vocoder keyed to the signal's fundamental frequency. (Fundamental following is straightforward; send me a /msg if you want me to write about it.)

Sources: http://www.dspdimension.com/html/timepitch.html
Further reading: comp.dsp FAQ
Application of this technique can be found in Eminenya 2.0.

©

Copyright © 2002 Damian Yerrick.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the writeup entitled "GNU Free Documentation License".

(idea) by conform (1.6 wk) (print)   ?   (I like it!) 1 C! Fri Jun 08 2001 at 18:04:42

Note that while this may sound like a lot of complicated mathematical mumbo jumbo to you, the aspiring Britney Spears remix artist, the reality is that you just import your sample into your multitrack audio application of choice (Pro Tools, Sonic Foundry's Vegas Audio, Cubase's Logic Audio, Cakewalk...), route the channel with your sample to an effects plugin, select the time stretch/pitch shift plugin of your choice, tell it how much to stretch, and you're good to go. It's remarkably easy.

In fact, one application, Sonic Foundry's ACID, does automatic real time pitch- and time-scaling for every sample you use, according to a tempo and key you set, with remarkable results. It's not the slickest algorithm I've heard, but it is computationally lightweight, and sounds good enough that for moderate amounts of shifting most people will never notice a difference. Plus I believe they use a better algorithm when you render your track to an audio file.

Time- and pitch-scaling software hardware is incredibly useful, and the market today contains a very clear spectrum from inexpensive consumer products to high end, professional products with high end, professional prices. The current king of software plugins is Serato's Pitch N Time 2.0.1, which will run you USD$800, comes only in AudioSuite format, but produces breathtakingly clear results. Roland also has a newish sample playback synth, the VP-9000 Variphrase Processor, which does DSP-based realtime tempo- and pitch-matching, as well as bends and other, more synth-y, manipulations. To take one home, however, will set you back about USD$2500...


printable version
chaos

Fourier transforms, Wigner distributions, and Wavelet transforms Eminenya Alvin and the Chipmunks Using /dev/audio to eavesdrop
digitally remastered GNU Free Documentation License AIFF FFT
The Session Description Protocol Two Phase Commit The Three Brain Phases orchestra
autocorrelation PCM Coase Theorem audio editing
Phase lock Reconstructed carrier phase Yamaha MD8 MiniDisc Digital Recorder The Rainbow Connection
Cool Edit Pro Transient crossfade Wigner distribution
Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.
  Epicenter
Login
Password

password reminder
register

Everything2 Help

Cool Staff Picks
Look at this mess the Death Borg made!
The fall and rise of online community
Wall of Death
Miami Herald, 2/13/96
Sumerian
futhark
biopiracy
Cat o' nine tails
Too much living is no way to die
Kiss of the Spider Woman
Watching my kitty-cat die
flea
Meek and obedient you follow the leader down well trodden corridors into the valley of steel
Pioneer 10
New Writeups
Aerobe
Watch out for falling meat(poetry)
C-Dawg
Beelzebub has a devil put aside for me(fiction)
Pavlovna
My Better Half(fiction)
kanoodle
Molson muscle(essay)
aneurin
You pays your money and you takes your choice(idea)
shaogo
July 20, 2008(log)
Glowing Fish
Tualatin River(place)
The Jacket
Words of Advice(idea)
John_Fox
Good Intentions Gone Wrong(person)
Heitah
Posthumous Oscar(thing)
ignis_glaciesque
University of South Florida(place)
ignis_glaciesque
Flogstaskriket(idea)
liveforever
Caesar's last breath(idea)
dagnyswaggart
she wants to believe(personal)
antigravpussy
he doesn't know, but her eyes widen too far(thing)
This affordable entertainment brought to you by The Everything Development Company